Loading to table with clustered PK / Order by in source

  • Here's a question which I feel like I should know the answer to.

    If I'm using SSIS to load a table with a clustered PK, will the data load faster if the source has been sorted to match the target's clustered PK?

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Phil Parkin (3/25/2015)


    Here's a question which I feel like I should know the answer to.

    If I'm using SSIS to load a table with a clustered PK, will the data load faster if the source has been sorted to match the target's clustered PK?

    I don't think it will necessarily load faster, because (imo) the query engine will still include a sort operation in the execution plan of the insert. I do think it could help reduce index fragmentation though.

    Would be an interesting scenario to put to the test...

  • Bulk loads are quickest for tables which are either empty or have no indexes.

    If you are inserting a lot of data into an already large table then rebuilding the indexes and PK after a load might not be a good use of CPU resources.

    I assume from the nature of your question that the new data your loading is appended to the clustered PK and wouldn't include any data that falls in between existing PK values.

    If this is the case it might be worthwhile investigating partitioning strategies. With partitioning you can quickly bulk load into a empty table and then swap the newly populated table in for the next empty partition.

  • Samuel Vella (3/27/2015)


    Bulk loads are quickest for tables which are either empty or have no indexes.

    If you are inserting a lot of data into an already large table then rebuilding the indexes and PK after a load might not be a good use of CPU resources.

    I assume from the nature of your question that the new data your loading is appended to the clustered PK and wouldn't include any data that falls in between existing PK values.

    If this is the case it might be worthwhile investigating partitioning strategies. With partitioning you can quickly bulk load into a empty table and then swap the newly populated table in for the next empty partition.

    I am loading empty work tables in a DW from a staging database. As all of the data is within SQL Server, sorting the source data is trivial. After this load is complete, I MERGE the data into the target tables. I need the PK to be in place to assist the MERGE.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply