Suggest me the best choice regarding Importing Data & Indexing

  • Hi Friends,

    I have a CSV File, I am importing this into SQL Server using SSIS package through Flat File source Task.

    (I am new to this company (SSIS as well), so I need to tune the existing package to shine well. )

    Few Points about data & its handling

    1) inside the Procedure they are dropping the index, then populating the table, then again creating the same index.

    2) Data is huge in figure (say, in millions)

    My doubt:

    which is the best way to import the data

    1) Just inserting the data without dropping the Index

    2) Drop index, populate table, re-create index (the way they do right now)

    thanks in advance,

    Prabhu

  • Prabhu

    It depends. You can only find out by testing. Test it with the index drop and without, for all sizes of data you're likely to encounter.

    John

  • Yeah, I agree with John, test it.

    It's a pretty standard approach when you're doing large scale inserts to drop the indexes. This is because maintaining indexes across all the inserts can be quite costly. But, if you're doing targeted deletes (deletes with a WHERE clause instead of just a truncate) or updates, having the indexes in place can be extremely useful to the process. If you have a mixed process, inserts, updates and deletes, you're back to testing to see which will work best. I wouldn't be surprised at all if you find that the current approach does work better.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks John, Thanks Grant..

    I too did what you guys suggested, earlier the table having a clustered Index scan (found from execution plan) to get the key of the particular row, I did add one more index (nonclustered) on top of it where it was missing earlier, now the procedure is executing much better and I can be able to find the Index seek instead.

    this is what the procedure do exactly.

    Step1:

    compare the staging data (staging table) and deletes from the history table

    Step2:

    Inserts the staging data to the history table

    Step3:

    Inserts the staging data (selected columns only) to another buffer table with a process_bit (boolean) column (which helps to process the un-processed data on a row basis (quiet out of the box for this scenario))

    which are not there in the buffer table.

    Step4:

    Updates the buffer table by comparing the buffer table's existing record from the staging table and mark it as "to process" (process_bit=0)

    simply for this 4 main steps it was taking around 15-30 minutes.

    now the execution time is reduced to 3-5 minutes.

    thanks a lot for all your ideas..

    Thanks,

    Prabhu

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply