• GSquared (7/20/2016)


    I built something very similar, but it uses XML instead of EAV, and NULLs out empty columns. Used less space and XML parsing was faster than EAV reconstitution. (I got the idea from Sparse Columns tables, which use XML for their actual storage.)

    I did build a parser on XML with the same technologie, but this worked far more efficient within XML each '<' you encounter does start something and within the 'fields' you will never encounter a '<', because it is replaced by a '& l t' symbol (spaces added otherwise it is not 'correctly' represented here). So splitting up the XML is a 'one' step process. Also the EOL is just White space and does not account for anything.

    Code for this was relatively easy build in transact SQL not using RBAR.

    One advantage was that with hardly any coding after (partly) parsing a XML, it was very easy to create the output which was 'beautified' with correct indents etc.

    My solution keeps a record of the rows received, and assigns a hash value to each, since many of the files have very low delta rates and send the exact same rows day after day after day. Dramatically reduced the amount of processing needed for subsequent files after the first one.

    But a hash is only a guess that something is the same. How did you determine that it was actually the same ?

    But mine's a specialized solution for a specific datastream.

    Most generic solutions start of as a specialized solution. In my case 6 specialized solutions are required, I have only seen 2 of them, so I thought building a generalized solution is a good start for all of them.

    (Within my XML solution I did also create and remove temporary and static tables).

    Thanks for your sharing your information.

    Ben

    (Edit: Indents instead of indexes for the beautifier)