Insert to Clustered Index

  • Hello, I have a stored procedure that consists of a relatively simple INSERT INTO SELECT, the table being inserted into is a Clustered Index. The target table is truncated before being inserted with roughly 100k rows.

    The procedure sometimes completes in 3-5 seconds, sometimes it completes in circa 37 minutes. Most recently it has been the latter. I have ruled out blocking.

    I have attached an estimated query plan that I got from sp_whoisactive while the query was running (anonymized). The issue appears to be the INSERT part of the query. Can anyone help with this intermittent performance issue?

  • I have ruled out blocking.


    How specifically?  That's a lot of LEFT JOINs.  Is it safe to use NOLOCK on those tables?  If so, have you tried that?  (Let the storm against it begin!)

    Also, not likely an issue here, but just in case it helps:
    What is the recovery model of the db?  FULL or not NULL?
    If not FULL, be sure to specify "WITH (TABLOCK)" or "WITH (TABLOCKX)" on the INSERTed table.  That at least would allow SQL to do minimal logging if possible.

    INSERT INTO dbo.table1 WITH (TABLOCKX) (...)
    SELECT ...
    FROM ...

    Given that the table was trunc'd before it was loaded, no one would need be able to read anything from it in the interim anyway.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher - Tuesday, March 26, 2019 8:08 AM

    I have ruled out blocking.


    How specifically?  That's a lot of LEFT JOINs.  Is it safe to use NOLOCK on those tables?  If so, have you tried that?  (Let the storm against it begin!)

    Also, not likely an issue here, but just in case it helps:
    What is the recovery model of the db?  FULL or not NULL?
    If not FULL, be sure to specify "WITH (TABLOCK)" or "WITH (TABLOCKX)" on the INSERTed table.  That at least would allow SQL to do minimal logging if possible.

    INSERT INTO dbo.table1 WITH (TABLOCKX) (...)
    SELECT ...
    FROM ...

    Given that the table was trunc'd before it was loaded, no one would need be able to read anything from it in the interim anyway.

    I observed the results of sp_whoisactive whilst the query was running, I could see that there were no processes causing a block. However, there were quite a lot of CX PACKET waits.
    The recovery model of the database is simple, sorry but I'm not sure what you mean by FULL or NOT NULL.
    Interesting point regarding the TABLOCK, I have never considered this during an INSERT but I will do some reading.

  • Jim-S - Tuesday, March 26, 2019 8:17 AM

    ScottPletcher - Tuesday, March 26, 2019 8:08 AM

    I have ruled out blocking.


    How specifically?  That's a lot of LEFT JOINs.  Is it safe to use NOLOCK on those tables?  If so, have you tried that?  (Let the storm against it begin!)

    Also, not likely an issue here, but just in case it helps:
    What is the recovery model of the db?  FULL or not NULL?
    If not FULL, be sure to specify "WITH (TABLOCK)" or "WITH (TABLOCKX)" on the INSERTed table.  That at least would allow SQL to do minimal logging if possible.

    INSERT INTO dbo.table1 WITH (TABLOCKX) (...)
    SELECT ...
    FROM ...

    Given that the table was trunc'd before it was loaded, no one would need be able to read anything from it in the interim anyway.

    I observed the results of sp_whoisactive whilst the query was running, I could see that there were no processes causing a block. However, there were quite a lot of CX PACKET waits.
    The recovery model of the database is simple, sorry but I'm not sure what you mean by FULL or NOT NULL.
    Interesting point regarding the TABLOCK, I have never considered this during an INSERT but I will do some reading.

    CX_PACKET is parallelism related.  Have you changed the default for the 'cost threshold for parallelism'.  The default from MS is too low.

    And in SIMPLE model, you'd definitely want to use WITH (NOLOCK) so that minimal logging is possible.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher - Tuesday, March 26, 2019 10:19 AM

    CX_PACKET is parallelism related.  Have you changed the default for the 'cost threshold for parallelism'.  The default from MS is too low.

    And in SIMPLE model, you'd definitely want to use WITH (NOLOCK) so that minimal logging is possible.

    I'm not aware of the 'cost threshold for parallelism' being changed.

    Thanks for your help, I will try the NOLOCK hint and let you know if it works.

  • I think he meant the TABLOCK hint, like in his original response:
    INSERT INTO dbo.table1 WITH (TABLOCKX)
    That is what will ensure minimal logging.

    Since all we are looking at is an estimated plan, it's difficult to tell exactly where the bottleneck is since it says 97.8% of estimated cost is in the INSERT.  Have you tried just running the query itself and discarding the results to see how long it takes without doing the INSERT?
    https://blog.sqlauthority.com/2012/07/01/sql-server-discard-results-after-query-execution-ssms/

  • Chris Harshman - Wednesday, March 27, 2019 12:45 PM

    I think he meant the TABLOCK hint, like in his original response:
    INSERT INTO dbo.table1 WITH (TABLOCKX)
    That is what will ensure minimal logging.

    Since all we are looking at is an estimated plan, it's difficult to tell exactly where the bottleneck is since it says 97.8% of estimated cost is in the INSERT.  Have you tried just running the query itself and discarding the results to see how long it takes without doing the INSERT?
    https://blog.sqlauthority.com/2012/07/01/sql-server-discard-results-after-query-execution-ssms/

    Sorry, I meant TABLOCK, not NOLOCK.

    I have tried running the query on its own and it completes in a few seconds.

  • btw... the execution plan isn't showing "100K" rows... it's show 1.2 MILLION rows!  Are you sure that you don't have an accidental many-to-many join going on here in the form of multiple small triangular joins in the final join?

    Also, you have a shedload of joins to what appear to be HEAPs and they are mostly resulting in "Table Scans" according to the execution plan you attached.  Consider some proper indexing.  You also have an "index seek" that's being executed in a RBAR fashion with an estimated 109K executions,

    Of course, you only included the estimated execution plan, which can be and frequently is quite different than the actual execution plan.  Consider posting the latter on future posts.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Wednesday, March 27, 2019 1:28 PM

    btw... the execution plan isn't showing "100K" rows... it's show 1.2 MILLION rows!  Are you sure that you don't have an accidental many-to-many join going on here in the form of multiple small triangular joins in the final join?

    Also, you have a shedload of joins to what appear to be HEAPs and they are mostly resulting in "Table Scans" according to the execution plan you attached.  Consider some proper indexing.  You also have an "index seek" that's being executed in a RBAR fashion with an estimated 109K executions,

    Of course, you only included the estimated execution plan, which can be and frequently is quite different than the actual execution plan.  Consider posting the latter on future posts.

    Hi Jeff, Thanks for your reply.

    Firstly, regarding your points on 'shedload of joins to what appear to be HEAPs'... The tables I am selecting from are staging tables and they are all heaps by design. They are populating a Fact table in a DW which may explain the many joins.

    I don't have permission to run INSERT statements on the production server so what I did was create the table and procedure as a global temp table and global temp procedure and executed it, the actual plan from this operation is attached in case you are interested. Although it completed in a few seconds, the estimated row count still shows over 1 MILLION. I'm sorry for the anonymized plan, I know they are a pain to read. 

    I observed something strange regarding the estimated rows, by disabling the JOIN (LEFT JOIN Database2.Schema1.Object10 Object2 ON Object2.Column35 = Object7.Column36) the estimated rows reduced to ~120k so I guess this is the culprit. Database2.Schema1.Object10 is a non clustered index with only a single column. When I change the JOIN to (LEFT JOIN Database2.Schema1.Object10 Object2 ON Object2.Column35 = Object7.Column36 AND Object7.Column36 IS NOT NULL), the estimated rows reduce to ~750k. I found an article on hash joins to nullable columns that may explain this https://sqlperformance.com/2015/11/sql-plan/hash-joins-on-nullable-columns. The column in question is nullable but will probably never have any NULLs, I know this is poor design but the column in the CRM system is nullable and our staging database mirrors the schema of the source systems.

    I also managed to rewrite another part of the query that reduced the logical reads on one table from ~600k to ~30k.

    I will deploy the three changes that we discussed here (TABLOCK, Adding IS NOT NULL to the join criteria and the change on the row above that reduces logical reads) as a starting point as they will offer some performance gain.

    Thank you to everyone who contributed, I appreciate your help.

  • Jim-S - Thursday, March 28, 2019 4:13 AM

    Jeff Moden - Wednesday, March 27, 2019 1:28 PM

    btw... the execution plan isn't showing "100K" rows... it's show 1.2 MILLION rows!  Are you sure that you don't have an accidental many-to-many join going on here in the form of multiple small triangular joins in the final join?

    Also, you have a shedload of joins to what appear to be HEAPs and they are mostly resulting in "Table Scans" according to the execution plan you attached.  Consider some proper indexing.  You also have an "index seek" that's being executed in a RBAR fashion with an estimated 109K executions,

    Of course, you only included the estimated execution plan, which can be and frequently is quite different than the actual execution plan.  Consider posting the latter on future posts.

    Hi Jeff, Thanks for your reply.

    Firstly, regarding your points on 'shedload of joins to what appear to be HEAPs'... The tables I am selecting from are staging tables and they are all heaps by design. They are populating a Fact table in a DW which may explain the many joins.

    I don't have permission to run INSERT statements on the production server so what I did was create the table and procedure as a global temp table and global temp procedure and executed it, the actual plan from this operation is attached in case you are interested. Although it completed in a few seconds, the estimated row count still shows over 1 MILLION. I'm sorry for the anonymized plan, I know they are a pain to read. 

    I observed something strange regarding the estimated rows, by disabling the JOIN (LEFT JOIN Database2.Schema1.Object10 Object2 ON Object2.Column35 = Object7.Column36) the estimated rows reduced to ~120k so I guess this is the culprit. Database2.Schema1.Object10 is a non clustered index with only a single column. When I change the JOIN to (LEFT JOIN Database2.Schema1.Object10 Object2 ON Object2.Column35 = Object7.Column36 AND Object7.Column36 IS NOT NULL), the estimated rows reduce to ~750k. I found an article on hash joins to nullable columns that may explain this https://sqlperformance.com/2015/11/sql-plan/hash-joins-on-nullable-columns. The column in question is nullable but will probably never have any NULLs, I know this is poor design but the column in the CRM system is nullable and our staging database mirrors the schema of the source systems.

    I also managed to rewrite another part of the query that reduced the logical reads on one table from ~600k to ~30k.

    I will deploy the three changes that we discussed here (TABLOCK, Adding IS NOT NULL to the join criteria and the change on the row above that reduces logical reads) as a starting point as they will offer some performance gain.

    Thank you to everyone who contributed, I appreciate your help.

    Even though they are staging tables, if you're going to JOIN on them even more than once, you should go ahead index them.  Simplest is just to cluster on the lookup key(s).  There are times when a non-clus index would work better in those situations, but those are relatively rare enough it's probably not worth worrying about unless you still have performance problems after indexing the tables.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • I finally managed to get my code into production.

    Jeff and Scott, you both hit the nail on the head when you mentioned estimated v actual rows and joining heaps. I changed one of my heaps to a unique clustered index which also fixed the issue with estimated rows. The query now runs in a few seconds.

    Thanks again for your help.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply