Loading a large number of files into a single table

  • I've been asked to load a large (100K+) number of smallish files (500 lines) into a single destination table as fast as possible. What makes the challenge that little more interesting is that the file name and line number are also part of the destination table.

    I've had a quick search of SSC, but have not found any similar problems or solutions.

    I tried creating a file list and then running n (number of processors) copies of an SP that performs a BULK INSERT into a temp table then does an INSERT INTO, adding the file name, but that caused locking on the destination table and performance was awful (as was a straight linear one at a time process). I've also considered SSIS, but the foreach containers are sequential.

    The best option so far (provided by the client) was a C# routine that read each line of each file into a DataTable, and when the number of rows in the DataTable reached 1m performed a bulk copy to flush the data to SQL (read and pump). This almost reached the test requirement of 250K files in under 60 minutes - almost but still a fail.

    My next option is to multithread the read and pump approach accepting that the bulk copies may overlap / lock each other. But before I go down that rabbit hole, I was wondering if anyone else has any suggestions.

    pcd

Viewing 0 posts

You must be logged in to reply to this topic. Login to reply