• N_Muller (10/1/2015)


    I have a CSV file with roughly 6 million rows. The file is unstructured; that is, some rows have 5 fields, others have 15, and there are as many 50 fields in one row.

    ...

    A standard CSV file contains the column headers in the first record and all records have the same number of comma separated fields. However, if each record in this particular file contains anywhere from 5 - 50 fields, then it's not clear how you would know which specific fields are populated or how they line up with a column.

    For example, do the records look similar to this, with some records simply having empty fields?

    a,b,c,d,e,,, ...

    a,,,d,e,,, ...

    Or do they look more like this with some records having missing fields?

    a,b,c,d,e,,, ...

    a,d,e ...

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho