Quoting Rules
Fields containing comma, quote, or newline MUST be wrapped in double quotes
Double quotes inside quoted fields escape as
""(two quotes), not backslashUnquoted fields with leading/trailing spaces—some parsers trim, some don't; quote to preserve
Empty field
,,vs empty string,"",—semantically different; be explicit
Delimiters
CSV isn't always comma—detect
;(European Excel),\t(TSV),|in legacy systemsExcel exports use system locale delimiter; semicolon common in non-US regions
Sniff delimiter from first line but verify—header might not contain special chars
Encoding
UTF-8 BOM (
0xEF 0xBB 0xBF) breaks naive parsers but Excel needs it for UTF-8 detectionWhen generating for Excel on Windows: add BOM; for programmatic use: omit BOM
Latin-1 vs UTF-8 ambiguity—explicitly declare or detect encoding before parsing
Common Parsing Failures
Newlines inside quoted fields are valid—don't split on
\nbefore parsingUnescaped quote in middle of field corrupts rest of file—validate early
Trailing newline at EOF—some parsers create empty last row; strip or handle
Inconsistent column count per row—validate all rows match header count
Numbers & Dates
1,234.56vs1.234,56—locale-dependent; standardize or document formatDates: ISO 8601 (
2024-01-15) only unambiguous format;01/02/24is chaosLeading zeros in numeric fields (
007)—quote to preserve or document as string
Excel Quirks
Formula injection: fields starting with
=,+,-,@execute as formulas—prefix with'or tabLong numbers (>15 digits) lose precision—quote and format as text
Scientific notation triggered by
Ein numbers—quote if literal text needed