TSV

  • Comments posted to this topic are about the item TSV

  • I prefer pipe-delimited over tab out of habit, but the prevalence of commas inside data fields has made me utterly despise csv. Excel's poor handling of csv files certainly has not helped in that regard.


    Puto me cogitare, ergo puto me esse.
    I think that I think, therefore I think that I am.

  • It's a funny thing  how people crab about the way Excel handles data that has a comma in a "field".  It actually works exactly the way the CSV standards say such a thing should work.

    CSV, TSV, Pipe delimited... they actually all faulty although I prefer TSV out of those 3.  I actually prefer ASCII characters 28 thru 31 because they're really hard for users to mistakenly get into a "field" and they offer mush more control in a file without going nuts.   Characters 1 thru 4 also came in real handy, as well.  With that, I'll say...

    And don't forget character 7 to alert the operator when things are all done. 😀

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • @JeffModen - that would be great, except most don't even know what character 7 let alone 28 through 31 are or how they were used.  Or why we used CR (character 13) and then LF (character 10) when printing to our line printers.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Jeffrey Williams wrote:

    @JeffModen - that would be great, except most don't even know what character 7 let alone 28 through 31 are or how they were used.  Or why we used CR (character 13) and then LF (character 10) when printing to our line printers.

    That's because we've not yet made "ASCII GREAT AGAIN".  Most should learn it.  I'm hoping people might take the time because it's actually quite powerful in what it can do in those first 31 characters.

    And (sigh), since I know people won't take the time to search for it, here's a good quick reference just to start with.

    https://www.asciitable.com/

    Here's a whole lot more on the subject.  It's also the basis for "unicode", which requires 2 bytes per character instead of just 1.  For the U.S.A., the first byte would be "00" (first byte is the country code) and the second byte is the ASCII character.

    https://en.wikipedia.org/wiki/ASCII

    Character 31 (Unit Separator) would be used instead of a comma or tab separator and character 30 (Group Separator) would be used instead of carriage return (character 13), line feed (character 10), or the combination.

    The normal ASCII characters take only 7 bits and so we used to use the 8th bit as a checksum bit for each byte.

    Some of the folks I worked with took it to the next level with "Hamming Codes", which can actually correct bad bytes.  See the following very well done, super interesting 'tube on the subject.  It's amazing that "self correction" can be so simple and that it can actually be done with a hardware-only solution... which can be nasty fast.

    https://www.youtube.com/watch?v=X8jsijhllIA

    Of course, there are more modern algorithms but knowing how Hamming Codes were able to do way-back-when is amazing and some modems had this type of thing (self-correction) built in.  The really cool part is that the bigger the block the more efficient it is.

    As a bit of a spoiler to whet your appetite, the check can be reduced to a single line of incredible simple code.  That is taught in "part 2" and that's also where they get into bigger blocks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply