A million-row CSV editor that never copies the data

June 11, 2026 · 9 min read · from the csvtodashboard codebase

csvtodashboard is a static site. Every tool on it runs in the browser, on your machine, with no server behind it. That is a feature right up until someone drops a 180 MB export with a million rows on /csv-editor and expects to fix one column and download the result.

The editor handles that file. It is one React component — EditorShell in tools5.jsx, about 900 lines — which also powers /csv-viewer, /excel-viewer, /parquet-viewer and /open-large-csv. These are the four design decisions that make a million rows editable in a tab, plus two bugs worth writing down.

A naive editable table is thirty million DOM nodes

The arithmetic rules out the obvious approach. Our million-row demo file has seven columns: seven million cells. Render that as an editable <table> the naive way — a <td>, a wrapper, and an <input> per cell, plus a million <tr>s — and you are asking the browser for something within sight of thirty million DOM nodes. Tabs stop painting, then stop responding, somewhere in the low millions.

Memory is the quieter constraint. The parsed file is the largest object the tab will ever hold — uploads are capped at 250 MB, past which the error message points you at /csv-split instead. Whatever the editing model is, it cannot involve copying the rows: not per edit, not per undo step, not per sort.

That leaves three rules the rest of this post unpacks: parse once and never mutate or copy; let the DOM hold only the rows you can see; and run anything O(file) off the main thread, or exactly once, on demand.

Parsing is the first O(file) job, so it is the first thing off the thread. CSVTool.parseCSVAsync hands any file over 2 MB to a Web Worker that importScripts() the same parser.js the page uses — the two engines can never drift — and posts progress messages back: the parse maps to 0–85% of the bar, column profiling to the remaining 15%. The page keeps painting, and the progress bar actually moves.

Edits are an overlay; the parsed rows are never touched

After parsing, prof.rows is frozen in all but name. Nothing ever writes to it. Every change you make lives in a small set of side structures: edits, a Map of cell overrides; deleted, a Set of row keys; added, an array of keys for new rows whose cells live entirely in the overlay; and columns, an array of specs shaped {id, name, src, type}.

The overlay key is the row's identity concatenated with the column's id, and every cell read in the component goes through one function:

// tools5.jsx:21, 247
function edCellKey(rowKey, colId) { return rowKey + "" + colId; }

const cellVal = (rowKey, col, row) => {
  const k = edCellKey(rowKey, col.id);
  if (edits.has(k)) return edits.get(k);
  return col.src && row ? (row[col.src] != null ? row[col.src] : "") : "";
};

The grid renders through cellVal, the filter matches through it, sort compares through it, export writes through it. The overlay wins if it has an entry; otherwise the value comes off the parsed row via col.src, the original header that column reads from.

Row identity is what makes this safe. A row's key is its original index in the parsed file — a plain number — or "a0", "a1", … for rows added this session. Sorting and filtering produce permutations of {key, row} pairs for display; they never reassign keys. So when you sort by revenue, scroll, and edit a cell, the edit is recorded against the row's permanent key, not against "the 14th row currently on screen". Re-sort, clear the filter — the edit follows its row, because it was never attached to a position.

Columns get the same treatment. A header rename does not touch a million cells; it changes name on one spec object while src keeps pointing at the original data. A new column is a spec with src: null, so every read falls through to the overlay — empty until you type. Deleting a column removes one spec.

One detail keeps the bookkeeping honest: editing a cell back to its original value deletes the overlay key instead of storing an override. The modified flag and the edited-cell count are derived from the overlay's size, so they stay truthful.

Undo is a log of operations, not snapshots

With the overlay in place, undo gets cheap. A snapshot model — copy the state, push it on a stack — costs O(whatever you copy) per keystroke. Snapshot the parsed rows and one undo step costs more than the file. Snapshot just the overlay Map and it starts out harmless, then turns brutal the moment a find-and-replace puts 50,000 entries in it.

So the history is a log of operations, where each op is a small object that knows how to run in both directions:

// tools5.jsx:325
function applyOp(op, forward) {
  if (op.t === "cells") {
    setEdits(m => {
      const n = new Map(m);
      for (const it of op.items) {
        const has = forward ? it.nextHas : it.prevHas;
        const val = forward ? it.next : it.prev;
        if (has) n.set(it.ek, val); else n.delete(it.ek);
      }
      return n;
    });
  } // … plus delRows, addRow, addCol, delCol, renameCol
}

A "cells" op carries items of {ek, prevHas, prev, nextHas, next} — enough to apply or revert each cell without consulting anything else. The other shapes are just as small. A delete-rows op stores keys, not row data, because the parsed rows were never destroyed in the first place — undoing a deletion just removes keys from the deleted Set.

// tools5.jsx:355
function pushOp(op) {
  applyOp(op, true);
  setUndoStack(s => s.length >= 500 ? s.slice(1).concat([op]) : s.concat([op]));
  setRedoStack([]);
}

The stack caps at 500 ops, dropping the oldest, and any new op clears redo — the standard rule.

The payoff is grouping. Find & replace runs every match through the same cell-change path and pushes the entire result as one op: a 50,000-cell replace-all is a single entry on the stack and a single Ctrl+Z to take back (the flash message says so: "Ctrl+Z undoes all of it"). Fill-down and range paste group the same way. A guard at 200,000 replacements tells you to narrow the scope rather than letting the overlay quietly eat the tab.

Only a few dozen rows exist

The DOM never sees the file. The grid is a real <table>, but its <tbody> holds the visible slice of rows and nothing else:

// tools5.jsx:298
const OVERSCAN = 8;
const visibleCount = Math.max(1, Math.ceil(viewportH / rowH)) + OVERSCAN * 2;
const winStart = Math.max(0, Math.floor(scrollTop / rowH) - OVERSCAN);
const winEnd = Math.min(totalRows, winStart + visibleCount);
const windowRows = shown.slice(winStart, winEnd);
const padTop = winStart * rowH;
const padBottom = Math.max(0, (totalRows - winEnd) * rowH);

scrollTop is React state, set by the scroll handler. rowH and viewportH are measured, not assumed: a layout effect reads the first rendered row's offsetHeight and the scroll container's clientHeight on every pass, so the math self-corrects across themes, fonts and zoom levels. With a ~480px viewport and 38px rows, that is 13 visible rows plus 8 of overscan on each side — about 30 <tr>s in the document, whether the file has four rows or a million.

The space the missing rows would occupy is faked by two spacer rows — one above the window at padTop pixels, one below at padBottom — so the scrollbar stays honest and scroll position maps linearly to row index. After a first auto-layout pass, the measured column widths are pinned in a <colgroup> and the table flips to table-layout: fixed, so the window slice cannot reflow column widths as new rows scroll in.

Row identity matters here too. Each <tr> is keyed by item.key — the same permanent row key the overlay uses — not by its index in the window. As the window shifts, React keeps DOM nodes attached to logical rows instead of rewriting every row in place, which is what stops an open cell editor (a focused <input>) from being torn out from under you by a two-pixel scroll.

The million-row Playwright check pins this down: after loading the demo it asserts the <tbody> holds fewer than 120 row elements, then scrolls to 70% depth and asserts the row numbers on screen are above 500,000.

Export rebuilds the file, in the original order

Sort and filter are display permutations, so export deliberately ignores them. Download walks the source rows in their original order, skips deleted keys, applies the overlay through the same cellVal read path, and appends added rows at the end:

// tools5.jsx:688
function materializeCSV() {
  const lines = [columns.map(c => edEscCSV(c.name)).join(",")];
  const pushRow = (key, row) => {
    const cells = new Array(columns.length);
    for (let j = 0; j < columns.length; j++)
      cells[j] = edEscCSV(cellVal(key, columns[j], row));
    lines.push(cells.join(","));
  };
  for (let i = 0; i < prof.rows.length; i++) if (!deleted.has(i)) pushRow(i, prof.rows[i]);
  for (const k of added) if (!deleted.has(k)) pushRow(k, null);
  return lines.join("\n");
}

The header line comes from the column specs, which is where renames finally materialize. Note what is not here: no diffing, no patch application, no special case for edited-versus-clean rows. The overlay design makes export a single O(file) pass — and it is the only O(file) pass an editing session ever pays for on the main thread.

Everything that leaves the editor goes through this function: the CSV download, the .xlsx export, copy-to-clipboard, and the send-to-dashboard handoff. If you changed anything at all — a cell, a deleted row, even just a header rename — the download is named yourfile-edited.csv, so it never silently shadows the original in your downloads folder.

Two bugs worth writing down

The double-click that landed on the wrong row. The grid focuses its scroll container on mousedown so keyboard shortcuts — arrows, Ctrl+Z, type-to-edit — work without an extra click. But plain element.focus() scrolls the focused element into view. The first click of a double-click triggered focus, the page scrolled, the table shifted under the unmoving cursor — and the second click landed on a different row. The symptom: double-click-to-edit opened an editor one or two rows away from where you aimed, but only sometimes, depending on scroll position. The fix is one option, annotated in the source so nobody simplifies it away:

// tools5.jsx:845 — the cell mousedown handler
onMouseDown={(e) => {
  if (isEditingCell) return;
  e.preventDefault();
  if (editing) commitEdit();
  setDragging(true);
  pickCell(ri, ci, e.shiftKey);
  // preventScroll is load-bearing: focusing the scroll container
  // mid-double-click otherwise scrolls the page, the table shifts
  // under the cursor, and the second click lands on a different row.
  scrollRef.current && scrollRef.current.focus({ preventScroll: true });
}}

If your grid manages focus from mouse handlers, focus({ preventScroll: true }) is almost certainly what you want everywhere — the editor uses it on commit and cancel too.

Playwright's fill() versus React's onChange. The test suite drives the editor like a user. The first version used page.fill() to type into the cell editor, and the edit never committed: fill() assigns the input's value through the native value setter rather than sending keystrokes, and in this controlled-input setup React's onChange never saw the new value — so committing wrote back the stale state. The fix, documented in the suite itself, is to edit the way a human does:

// tests/verify-editor.mjs:54
// 1) double-click edit a cell (select-all + type, like a real user — fill()
//    sets the DOM value via the prototype setter and React's onChange misses it)
await page.dblclick('td[data-edcell="0-1"]');
await page.waitForSelector(".ed-input", { timeout: 5000 });
await page.keyboard.press("Control+a");
await page.keyboard.type("Tokyo");
await page.keyboard.press("Enter");

keyboard.type() sends real key events, onChange fires per keystroke, and the controlled value is current when Enter commits. If you are testing a React grid — or any controlled input where state, not the DOM, is the source of truth — type, don't fill.

It holds up at a million

The /open-large-csv page exists to prove the claim rather than state it. One button synthesizes a 1,000,000-row file locally — deterministically, from a Knuth multiplicative hash of the row index, built in 100,000-row chunks with setTimeout yields so the progress bar paints — then parses it in the worker with live progress, and renders it into those ~30 DOM rows. No upload, because there is nothing to upload to.

Behind it sits a 53-check Playwright suite: cell edits, type-to-edit, undo/redo, grouped find-and-replace, fill-down, range paste, header renames, a real two-sheet .xlsx, a real .parquet, and — most relevant here — a mid-file edit on a virtualized grid, verified to land on exactly the right line of the exported CSV.

Open the editor — drop a huge file, or generate the 1M-row demo and scroll to row 700,000. Everything stays on your machine.