CSV Fuzzy Dedupe — Remove Near-Duplicate Rows by Key Column

01 · How it works

Three steps, then done.

Exact dedupe misses rows that differ by a stray capital, a double space, or a trailing period. Fuzzy dedupe normalizes the key column first, then drops later rows whose normalized key was already seen — so the first occurrence of each group wins.

1

Pick the key column

Choose the column that identifies a row — an email, a company name, a SKU. Two rows are near-duplicates when this column matches after normalization, regardless of the other columns.

2

Choose match strength

Strict ignores letter case and collapses runs of whitespace. Loose does all that and also strips punctuation, so "O'Brien" and "OBrien" or "Acme, Inc." and "Acme Inc" merge.

3

Keep the first, drop the rest

The first row in each near-duplicate group is kept in its original form; later matches are removed. You get a count of rows dropped and rows kept.

02 · Why ours

Why fuzzy beats exactfuzzy

Real-world keys are messy. The same customer, vendor, or product shows up spelled three slightly different ways across exports. Exact-match dedupe leaves all three; fuzzy dedupe folds them into one.

01
Catches human entry drift
Hand-typed data picks up stray capitals, double spaces, and trailing periods. Normalizing the key before comparing means those cosmetic differences stop counting as distinct rows.
02
Two strengths, your call
Strict is conservative — it only ignores case and spacing. Loose is aggressive — it also drops punctuation. Pick the level that matches how dirty your key really is.
03
First occurrence wins
Rows are processed top to bottom, so the earliest version of each key survives untouched. Sort your file first if you want a particular row to be the keeper.
04
Private by construction
Everything runs in your browser with plain JavaScript. No upload, no account, no server round-trip — close the tab and the data is gone.

""Acme, Inc.", "acme inc", and "ACME Inc." are three spellings of one company. Fuzzy dedupe keeps the first and drops the other two."

Why near-duplicate keys slip past exact matching

03 · FAQ

fuzzy dedupe questions.

What counts as a near-duplicate?

Two rows whose key column matches after normalization. Strict strength lowercases the key and collapses whitespace; loose strength additionally removes punctuation. Only the key column is compared — the other columns can differ freely.

Which row gets kept?

The first one. Rows are scanned top to bottom and the earliest occurrence of each normalized key is kept in its original, unmodified form. Every later match is dropped. If you need a specific row to win, sort your file before running the tool.

Does it change the values in my kept rows?

No. Normalization is only used internally to decide which rows match. The rows that survive are written out exactly as they came in — original case, spacing, and punctuation intact.

What's the difference between strict and loose?

Strict ignores case and collapses runs of spaces, so "Acme Inc" and "acme inc" match. Loose does that and also strips punctuation, so "Acme, Inc." matches "Acme Inc" too. Use loose when your key has inconsistent commas, periods, or apostrophes.

Is my data uploaded anywhere?

No. The entire transform runs client-side in your browser. Your CSV is parsed, deduped, and rebuilt locally — it never touches a server, and there's no account or tracking.

Related tools

Fuzzy Dedupe

Three steps, then done.

Pick the key column

Choose match strength

Keep the first, drop the rest

Why fuzzy beats exactfuzzy

Catches human entry drift

Two strengths, your call

First occurrence wins

Private by construction

fuzzy dedupe questions.