This is great timing we were just discussing this in regards to using HaDoop or SQL Server to do some very large, close to terabyte, imports and transformations.
I have a couple questions, and I realize you didn't want to include code but a little start on it would help if possible:
?How do you physically spit up the large files and track them in each thread?
?How is the master process receiving it's messages from the children processes?
?In a SQL SERVER implementation would you most likely use CRL code to break up the files and Bulk Load to import it?
?Any suggestions on foreign keys and indexes?
This is a great topic, it may give us a lead on how to proceed with our new data project.
Very much appreciated!
Skål - jh