json2toon.co
Secure

CSV Format Specification & Converter Guide

CSV (Comma-Separated Values) is a plain-text format for representing tabular data. Each line represents a row, and values within rows are separated by delimiters—typically commas. The first row often contains column headers. Despite its simplicity, CSV remains one of the most widely-used formats due to its universal compatibility and human readability.

Introduction

CSV has been the backbone of data interchange for decades. It is supported by virtually all spreadsheet applications (Excel, Google Sheets, LibreOffice), databases, and programming languages. The format's simplicity makes it ideal for exporting, importing, and manually editing tabular data.

Key Features

  • Universal Compatibility: Supported by all major spreadsheet and database applications.
  • Human-Readable: Plain text format that can be edited with any text editor.
  • Lightweight: Minimal overhead, efficient for large datasets.
  • Streamable: Can be processed line-by-line without loading entire file.
  • Simple Structure: Easy to parse and generate programmatically.

Syntax

CSV uses a simple row-column structure with values separated by delimiters.

Basic Structure

name,age,city
Alice,30,New York
Bob,25,London
Charlie,35,Tokyo

RFC 4180 Standard

The RFC 4180 specification defines CSV behavior:

  • Files may optionally include a header row
  • Records are separated by line breaks (CRLF or LF)
  • Fields containing delimiters, quotes, or line breaks must be quoted
  • Quotes within quoted fields must be escaped by doubling ("")

Quoting and Escaping

Fields containing special characters must be properly quoted:

name,message,date
Alice,"Hello, world!",2024-01-15
Bob,"She said ""yes""",2024-01-16
Charlie,"Line 1
Line 2",2024-01-17

Delimiter Variations

While named "comma-separated," CSV files can use different delimiters:

DelimiterUsage
, (comma)Most common, international standard
; (semicolon)Common in regions where comma is decimal separator
\t (tab)TSV (Tab-Separated Values) variant
| (pipe)Less common, useful when data contains commas

Common Challenges

No Type Preservation

CSV is schema-less and stores everything as text. Type interpretation is left to the consuming application:

  • Numbers stored as strings: "30" not 30
  • Booleans as strings: "true" not true
  • No native null values (often represented as empty strings)
  • Dates stored as formatted strings (no standard format)

Missing Values

CSV has no standard representation for null or missing values. Common conventions include:

  • Empty string: ,,
  • Literal NULL or null
  • Special placeholder like N/A or -

Nested Data Limitations

CSV is inherently flat and cannot natively represent nested structures. Workarounds include:

  • Flatten nested objects with dot notation (user.address.city)
  • Use multiple CSV files with foreign keys
  • Store nested data as escaped JSON strings (anti-pattern)

CSV in This Converter

Parsing (CSV → JSON)

Our converter transforms CSV into JSON arrays of objects:

name,age,city
Alice,30,New York
Bob,25,London

Becomes:

[
  {
    "name": "Alice",
    "age": "30",
    "city": "New York"
  },
  {
    "name": "Bob",
    "age": "25",
    "city": "London"
  }
]

Serialization (JSON → CSV)

Converting JSON to CSV involves flattening nested structures:

  • Flat objects are converted directly to rows
  • Nested objects are flattened using dot notation
  • Arrays are converted to multiple rows or serialized as strings

Best Practices

  • Always include headers - Makes data self-documenting
  • Use UTF-8 encoding - Ensures international character support
  • Quote consistently - Quote all text fields or only when necessary
  • Validate data - Check for proper escaping and delimiter consistency
  • Document your dialect - Specify delimiter, encoding, and escaping rules
  • Handle nulls explicitly - Define how missing values are represented

When to Use CSV

Good For:

  • Simple tabular data (spreadsheets, database exports)
  • Data exchange between different systems
  • Large datasets with consistent structure
  • Human-readable data needing manual editing
  • Universal compatibility requirements

Avoid For:

  • Complex nested or hierarchical data
  • Data requiring type preservation
  • Binary data
  • Data with inconsistent schemas
  • Performance-critical parsing scenarios

Related Formats

Compare CSV with other data formats:

  • JSON - Better for nested data and type preservation
  • TOON (Token-Oriented Object Notation) - Optimized for LLMs to save tokens, particularly efficient for tabular data
  • YAML - Often used for configuration files where human readability is paramount
  • XML - A markup language with a strict syntax, often used in enterprise systems
  • TOML - A minimal configuration file format

Further Reading

Explore more about CSV and related topics:

Resources