'Comedy Limited' with SQL Server

,

The comma-separated value (CSV) file format is the standard way of transferring data between databases, even different RDBMSs, and between applications and databases. When done properly, it is the best, probably the only, convention for representing tabular data in a text file. It works.

SQL Server does many things with great elegance, but it can't do CSV properly. A standard for CSV has been published since 2005, RFC 4180. Other RDBMSs can do it well. PostgreSQL's CSV support implements the RFC specification as well as a host of dialects and variants. If it detects the slightest problem, it abandons the import with a helpful error message. MongoDB imports CSV straight into binary JSON (BSON) without even blushing. I've never hit a problem with either.

MS SQL Server can neither bulk-import nor bulk-export CSV. As the documentation says:

"Comma-separated value (CSV) files are not supported by SQL Server bulk-import operations. However, in some cases you can use a CSV file as the data file for a bulk import of data into SQL Server. Note that the field terminator of a CSV file does not have to be a comma."

Yes. SQL Server has its own dialect of CSV that is vaguely described here. It is known by many DBAs and Database Developers as the 'Comedy-Limited' format. It can result in data being quietly corrupted in ways that are difficult to detect (a proper importer should 'fail fast' and volubly). Many database people waste a great deal of time writing PowerShell or Python routines to correct the CSV that SQL Server produces, or converting proper standard CSV into SQL Server 'Comedy-Limited' format.

SQL Server's failure to support CSV properly has allowed a whole third-party tool industry to thrive. There are organizations dedicated to providing tools, such as CSVLint, to validate your CSV and promote the use of schemas (ironically, in JSON format).

Just don't get me started about the tool in SSMS called the import flat file wizard (Tasks…| Import flat file …). I get a strange jolt of surprise when it even manages to detect CSV properly. So often, one reflexively reaches for PowerShell just to do the job properly, and for the purposes of anger-management.

I realise that the SQL Server team has problems with text-based bulk import. As well as its own backward-compatibility problems with BCP and other bulk import and export methods, it must be compatible with MS Office too. However, this is a fairly trivial problem, at least compared with hosting SQL Server on Linux, for example, and there must be enough bad-tempered veterans like me who would help out with an open source project. After all, we've all had plenty of experience fixing recalcitrant data files.

Phil Factor.

Rate

5 (5)

Share

Share

Rate

5 (5)