• This is great timing we were just discussing this in regards to using HaDoop or SQL Server to do some very large, close to terabyte, imports and transformations.

    I have a couple questions, and I realize you didn't want to include code but a little start on it would help if possible:

    ?How do you physically spit up the large files and track them in each thread?

    ?How is the master process receiving it's messages from the children processes?

    ?In a SQL SERVER implementation would you most likely use CRL code to break up the files and Bulk Load to import it?

    ?Any suggestions on foreign keys and indexes?

    This is a great topic, it may give us a lead on how to proceed with our new data project.

    Very much appreciated!

    Skål - jh