CSV, defined.
The CSV terms that actually cause trouble — delimiters, quoting, encodings, the BOM, line endings — in plain English, with the fix and the tool for each. Everything here runs in your browser; nothing you drop is ever uploaded.
The CSV terms that actually cause trouble — delimiters, quoting, encodings, the BOM, line endings — in plain English, with the fix and the tool for each. Everything here runs in your browser; nothing you drop is ever uploaded.
A plain-text format where each line is a record and fields are separated by a delimiter — classically a comma. There is no single official standard; RFC 4180 describes the common conventions, but real-world files bend them constantly.
Because a CSV is just text, almost any tool can read it. That universality is also why two files that look identical can parse differently — they may use different dialects.
Tool: Drop a CSV → dashboard →
The character that separates fields. Comma is the classic choice, but Tab (TSV), semicolon, and pipe (|) are all common. Spreadsheets in many European locales default to a semicolon, because the comma is already the decimal separator there.
A CSV variant that uses a Tab between fields instead of a comma. It is popular precisely because data rarely contains tabs, so you sidestep most quoting headaches when your values themselves contain commas.
Tool: Re-delimit a file →
The first line of the file, naming each column. Most tools treat row one as headers and use those names everywhere downstream. A file without a header forces you to refer to columns by position (column 1, column 2…), which is brittle.
Per RFC 4180, a field is wrapped in double quotes whenever it contains the delimiter, a double quote, or a line break. So a value like Smith, John is written "Smith, John" so the comma inside it is not mistaken for a field separator.
Tool: Validate quoting →
Inside a quoted field, a literal double quote is escaped by doubling it. A value like 6" nail becomes "6"" nail". Note that backslash escaping (\") is a JSON convention, not a CSV one — assuming it is a frequent source of broken parsing.
The 2005 memo that codified the most common CSV conventions: CRLF line endings, double-quote quoting, and doubled quotes for escaping. It is a de facto reference, not a mandatory standard, so plenty of valid-looking CSV does not strictly follow it.
The bytes that end each row: \r\n (CRLF, the Windows convention RFC 4180 specifies) or \n (LF, the Unix/macOS convention). Mismatched endings are behind a lot of mystery bugs — phantom blank rows, or an entire file read as a single line. Most modern parsers accept either.
How characters map to bytes on disk. UTF-8 is the safe modern default; older files are often Windows-1252 or Latin-1. Guess the encoding wrong and accented or non-Latin characters turn to garbage — see mojibake.
Tool: Check a file →
The dominant text encoding: variable-width, backward-compatible with ASCII, and able to represent every Unicode character. Unless a downstream tool specifically demands otherwise, save your CSV as UTF-8.
An invisible three-byte prefix (EF BB BF) that some applications — Excel most notoriously — prepend to UTF-8 files. It can surface as  stuck to your first header, or trip a strict parser into mislabeling column one. Good parsers detect and strip it.
Tool: Detect a BOM →
Garbled text — café rendered as café — caused by decoding bytes with the wrong encoding. It is almost always a UTF-8 file read as Latin-1, or the reverse. The fix is to re-open the file declaring the correct encoding, not to hand-edit the broken characters.
A line break inside a single field — legal only when that field is quoted. Splitting a file naively on newlines breaks on these, scattering one record across several rows. A proper CSV parser keeps the quoted field intact.
CSV has no real concept of null. An empty field is simply two delimiters with nothing between them (a,,c). Whether that means "empty string", "zero", or "missing" is left to interpretation — and some exports write a literal NULL or \N instead, which then has to be handled explicitly.
Examining a column's values to decide whether it is a number, date, boolean, or text — so 42 is treated as a number, not the string "42". Because CSV stores everything as text, good tools infer types on import; that is what keeps a dashboard from summing your dates.
Tool: Typed JSON output →
The specific combination of delimiter, quote character, escape style, and line ending a particular CSV uses. "CSV" is really a family of dialects, which is exactly why two files that look the same can parse differently.
Guessing a file's delimiter and quote style by scanning the first rows — counting candidate delimiters per line and choosing the one that yields a consistent column count. It is convenient, but genuinely ambiguous files (a single column, or mixed delimiters) can fool it.
Excel opens CSV happily but applies its own rules: it converts long numbers to scientific notation, drops leading zeros (00123 → 123), reinterprets dates by your locale, and writes a BOM on save. Round-tripping data through Excel can quietly change it, so prefer a converter when fidelity matters.
Tool: CSV ↔ Excel →
A delimiter-free layout where every field occupies a fixed range of character positions, padded with spaces. It is common in legacy and mainframe exports and needs the column positions — not a delimiter — to parse correctly.
Tool: Fixed-width → CSV →
Newline-delimited JSON: one self-contained JSON object per line. It is the streaming-friendly cousin of CSV for records that have nested structure, and it appends cleanly — you can add a line without rewriting the file.
Tool: CSV → NDJSON →
Collapsing nested data (objects and arrays) into flat columns using dot notation — user.address.city. It is how nested JSON, YAML, or XML becomes a CSV, and it is lossy: the original hierarchy cannot be reconstructed from the flat columns alone, which is why structure-preserving conversions are a different problem.
Tool: JSON → CSV →
When a field's own text contains the delimiter — a comma inside an address, say. Correct quoting resolves it; an unquoted collision is the number-one cause of "my columns are shifted by one".
Run into one of these in the wild? Validate the file, convert it with types intact, or → drop it for a dashboard.
Troubleshooting guides: why CSVs open wrong in Excel · fixing garbled encoding · delimiters explained.