big table insert advice

  • Hi I need to refresh data in a table from another table. It has about ~14 million records that need to be inserted.

    (was planning to use import data task, there are no dependencies except for a view)

    The destination table (identical to source table) has a primary key and clustered index. It also has few non clustered indexes.

    Do I drop the indexes and the primary key and then insert data and add them back?

    Or do I just directly copy data without dropping indexes. This seems to me to negatively impact performance. But, please let me know.

  • sqlsurfer11 (12/23/2012)


    Do I drop the indexes and the primary key and then insert data and add them back? .

    see i am not in favour of diasble the index (clus + non clus both ) , as you wil start the insert, the data will get sorted as per clus index and non clus index will also get organized acordingly. but one thing you need to assure that you should some mechanism for population like "batch insert approach" in this way you wil manage the load of resources particulary tlog. 🙂

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • sqlsurfer11 (12/23/2012)


    Hi I need to refresh data in a table from another table. It has about ~14 million records that need to be inserted.

    (was planning to use import data task, there are no dependencies except for a view)

    The destination table (identical to source table) has a primary key and clustered index. It also has few non clustered indexes.

    Do I drop the indexes and the primary key and then insert data and add them back?

    Or do I just directly copy data without dropping indexes. This seems to me to negatively impact performance. But, please let me know.

    When bulk inserting a large table, one potential performance hit will be index fragmentation, in both the target table's clustered key and perhaps also the non-clustered indexes. It depends on if the clustered key in the target table is sequentually incremented, like a date/time or identity. If they keys are inserted sequentially, there may be insignificant fragmentation. Run a query to check index fragmentation before and after an insert. If it's getting fragmented heavily, then it might make sense to drop indexes and re-create. It also depends on how often this table gets bulk inserted.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Hi,

    a while ago I had to create an environment copy for a large database at a client site.

    I tried several methods to copy the large tables, but the one which provided the fastest copy was to drop the table and indexes - "select into" the add back all the indexes and stats.

    A couple of caveats:

    1) this was a copy to development!!!

    2) The logging method was simple

    3) I generated a script to run through the indexes and recreate them

    I hope this gives you food for thought.

    CHeers

    Peter

    I work for 1st Consulting, a small ISV based in the UK. We build web and desktop applications using the MSFT stack. Find us at www.1stconsulting.biz

  • Peter Bannister (12/24/2012)


    Hi,

    a while ago I had to create an environment copy for a large database at a client site.

    I tried several methods to copy the large tables, but the one which provided the fastest copy was to drop the table and indexes - "select into" the add back all the indexes and stats.

    A couple of caveats:

    1) this was a copy to development!!!

    2) The logging method was simple

    3) I generated a script to run through the indexes and recreate them

    I hope this gives you food for thought.

    CHeers

    Peter

    The reason why SELECT INTO <new table> is faster than INSERT INTO <existing table> is because SELECT INTO doesn't transaction log the data pages being inserted. In the event of rollback, SQL Server simply has to discard the table. However, when inserting into an existing table, SQL Server has to keep track of what pages are being modified to support a rollback.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (12/25/2012)


    The reason why SELECT INTO <new table> is faster than INSERT INTO <existing table> is because SELECT INTO doesn't transaction log the data pages being inserted. In the event of rollback, SQL Server simply has to discard the table.

    any article/reference for this ??

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • There are two ways (apart from BULK insert) which creates minimal transaction logs

    1) SELECT INTO

    2) using TABLOCK hint

    ~ Lokesh Vij


    Guidelines for quicker answers on T-SQL question[/url]
    Guidelines for answers on Performance questions

    Link to my Blog Post --> www.SQLPathy.com[/url]

    Follow me @Twitter

  • Bhuvnesh (12/25/2012)


    Eric M Russell (12/25/2012)


    The reason why SELECT INTO <new table> is faster than INSERT INTO <existing table> is because SELECT INTO doesn't transaction log the data pages being inserted. In the event of rollback, SQL Server simply has to discard the table.

    any article/reference for this ??

    Here is the link Bhuvnesh:

    http://msdn.microsoft.com/en-us/library/ms191244(v=sql.105).aspx

    ~ Lokesh Vij


    Guidelines for quicker answers on T-SQL question[/url]
    Guidelines for answers on Performance questions

    Link to my Blog Post --> www.SQLPathy.com[/url]

    Follow me @Twitter

  • Eric M Russell (12/25/2012)


    The reason why SELECT INTO <new table> is faster than INSERT INTO <existing table> is because SELECT INTO doesn't transaction log the data pages being inserted. In the event of rollback, SQL Server simply has to discard the table. However, when inserting into an existing table, SQL Server has to keep track of what pages are being modified to support a rollback.

    Just a small correction, there is minimal logging which is done. It is not correct to say that "NO transaction log is created". You can use the below query to verify the transaction log which is created.

    SELECT TOP 10 *

    FROM Fn_dblog(NULL, NULL)

    WHERE allocunitname = <Your_table_name>

    ORDER BY [Log Record Length] DESC

    ~ Lokesh Vij


    Guidelines for quicker answers on T-SQL question[/url]
    Guidelines for answers on Performance questions

    Link to my Blog Post --> www.SQLPathy.com[/url]

    Follow me @Twitter

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply