Slow insert

  • I have a rather simple insert script that inserts about 400,000 records into a table. It takes 15 minutes which seems too long, especially since I have several other similar queries.

    INSERT INTO db1.dbo.Patient_Data

    (22 fields….)

    SELECT

    L.idNew,

    P.(other 21 fields…)

    FROM db2.dbo.Patient_Data AS P

    JOIN Lookup_PatientData AS L ON P.PtID = L.idOld

    The PatientData table has a primary key on the PtID field, which is a varchar(32) and populated with newid().

    I have played around with indexes on my lookup table, but no matter what I’ve tried it still takes 15 minutes.

    CREATE TABLE dbo.Lookup_PatientData

    (idOld varchar(32) UNIQUE CLUSTERED,

    idNew varchar(32))

    Is it because the primary key is a varchar(32)? What am I missing? Any ideas on how to speed it up?

  • Hi Stef

    I've noticed before with large varchar columns as the primary key the inserts takes very long. The chance of each new primary key value to be inserted to be in sequence is very slim, so each new row may be scattered all over the size of the table. So when that table gets large then the inserts slows down.

    If you have to use the GUID as the primary key, make sure your fill factor is very low, maybe around 30% or less.

    Experiment with the insert by replacing the GUID with an IDENTITY column, and make the identity the primary key, so you know all the data will go in sequencial order. If that works faster add the GUID as a UNIQUE column

  • Things to consider:

    Are there any triggers on the table being inserted ?

    Is there a clustered index on the inserted table and does the clustered index include the GUID column ?

    How many other non-clustered indexes are on the table ?

    Is the database in simple or full logging mode, and if full, have you checked that the log file isn't continually auto-growing during the insert ?

  • Which table do you mean when you "inserted table"? The table I'm inserting into, or inserting from?

    The tables i'm inserting into and from are identical in schema. Both have a primary key on PtId (varchar(32)) populated w/ newid().

    Both have a clustered index on HospId and 4 non-clustered indexes, along with 6 constraints.

    There is an insert trigger but I disable that prior to the insert.

  • >>Which table do you mean when you "inserted table"? The table I'm inserting into, or inserting from?

    The table being inserted to.

    Can you remove the indexes and primary key constraint prior to the insert, then recreate them ?

  • Possibly, but the table has 17 million records. Might the dropping and recreating of indexes and constraints take long enough to defeat the purpose of trying to speed up a 15 minute insert statement?

  • Ahh, it wasn't clear that it was 400K records into an existing very large table. I thought it was creation of a new table with an initial 400K records.

    OK, so dropping indexes/constraints is out.

    The problem is likely (as already suggested) excessive index page splitting on the table being inserted to. What is the nature of the data in the clustered index (column HospID) ?

  • Hosp_id is a varchar(20) and typically has 4 char values in it (ie '0528'). Ugh, i know.

    Maybe it's not going to get much better than 15 minutes, or at least not without a disproportionate cost/savings.

  • Are any of the non-clustered indexes perhaps better candidates for being clustered ?

    Do any of them have some naturally ascending type of data in them, eg dates.

  • No, it doesn't look like it.

  • Both have a primary key on PtId (varchar(32)) populated w/ newid().

    Bad choice for an index seek 🙁

    You should try the NEWSEQUENTIALID instead or another form of ID that is serchable 🙂


    * Noel

  • Agreed. Unfortunately I am unable to make changes to the db design, and the whole app/db is slated to be re-written in 08 so the problems will be going away.

    BTW, I did reindex the table w/ a lower fill factor, and that cut the insert time down to half (8 minutes)! BUT... the reindex took 34 minutes 🙂

    I only have to do this import for 2 sites so I guess the fact that it's slow isn't that critical. At least I can justify why it's slow.

    Thanks to all who replied for the great input.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply