LATCH_EX when creating a clustered index

  • Hi

     

    I have a package that runs weekly

    It has an intermittant problem

    There are a number of tables that have a reasonable amount of data in

    Once built indexes are added. This includes clustered

    Around 1 in every 3 times, one specific table takes around 8-12 hours to add the clustered index

    A typical build would be around 15 mins

    There's only 80M rows and the index is narrow

    On investigation, LATCH_EX is showing as the wait type for the duration

    Is there a better approach when you see LATCH_EX as a wait type

    I'm thinking leave the CIX on (this is something i've been looking at else where as I've been advised to test - although that was a seperate issue and did not have LATCH_EX waits)

    OR pre-order by the clustered index i.e. load in that order then apply the cluster

    OR something else?

    I haven't really seen this when adding a clustered index before

     

    Thanks

    • This topic was modified 2 years, 11 months ago by  DamianC.
    • This topic was modified 2 years, 11 months ago by  DamianC.
    • This topic was modified 2 years, 11 months ago by  DamianC.

    - Damian

  • Create the clustered index only before loading the table.  Create the non-clustered indexes after loading the table.

    If this is a permanent table, seriously consider using data compression.  Although it takes CPU and elapsed time to compress the data, often the I/O saved makes the table still load faster and you still get the benefits of much faster reads from the table from then on.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Great, thanks

    I will give that a go

    - Damian

  • A quick note on why creating a clustered index first and then loading the data is usually the best approach (testing is always your best friend). The clustered index defines the data storage. It's not simple an index. A clustered index is the table. So, when you create a table, without a clustered index, you're defining what is called a heap. It stores the data one way. When you then tell that table, now you have a clustered index, you're not "adding" an index. Instead, you're telling that table, now you ARE a clustered index. This means that the data has to be moved. It's simply adding a thing on the side. It's literally moving the data out of the heap and into the cluster. So, you're going to see lots and lots of IO, and possibly lots and lots of contention for researches, making that process really painful.

    Hope that helps a little.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Ummm.... ok... I want to remind you that NOTHING is a panacea!  And, with that,   I have a couple of questions...

    1. What is the source of the data that you're creating the table from?
    2. What is the recovery model of the database that table lives in during population?
    3. Can the BULK LOGGED Recovery Model be used on this particular database or do you have some log file dependencies that will prevent that?
    4. Is the data in this table ever updated after the massive load?
    5. Are you merging data or are doing a wholesale replacement of ALL the data during the load?
    6. Can you post the complete CREATE TABLE statement with all of the indexes and keys that the final table will have?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks

     

    In answer to your question:

     

    1.       The source is another table in another database and I am using a dataflow task to move it from one table to another. The source is a Stored procedure

    So, within SSIS, it’s split in 3. One task to drop the index, one task to extract the data and one task to create the index

    It’s the index creation that seems to be the issue here

    2.       Full recovery model

    3.        Not really

    4.       The data is not updated; it’s read for the next phase

    5.       Wholesale replacement

    6.

    CREATE TABLE [InboundExp].[CreditAccountHeader](
    [CreditAccountHeaderID] [bigint] NOT NULL,
    [PersonID] [bigint] NOT NULL,
    [AccountTypeID] [smallint] NOT NULL,
    [CompanyTypeID] [smallint] NOT NULL,
    [SpecialInstructionID] [smallint] NOT NULL,
    [SourceCode] [int] NULL
    ) ON [PRIMARY]
    GO


    CREATE CLUSTERED INDEX [IXC_CreditAccountHeader] ON [InboundExp].[CreditAccountHeader]
    (
    [PersonID] ASC,
    [CreditAccountHeaderID] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    GO

     

    - Damian

  • K.  Thanks.  One of your answers begs another question.  Is that "other" database on the same instance?

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yes, same instance

     

    Thanks

    - Damian

  • I would recommend the following:

    1. Don't drop the indexes - disable the NC indexes prior to load
    2. Make sure you set an appropriate batch/commit size and use tablock on the OLEDB destination.  Don't use the default values for batch/commit sizes.
    3. After the data is loaded - rebuild ALL indexes.

    The load will cause the clustered index to fragment - but since you will be rebuilding all indexes after the load that shouldn't cause any issues.  But - you could output the data from the source system in clustered index order and that may help.  Only way to be sure would be to test, and compare the increase in how long it takes to extract the data sorted - as it takes to rebuild.

     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • This was removed by the editor as SPAM

  • I'll try to get back to this later today but... the load should not actually cause the Clustered Index to fragment... if I'm reading things right, it's a wholesale load/replacement.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks

    Yes, it's wholesale

    Fragmentation post population does not appear to be an issue, so doesn't need rebuilding (99.4%)

    If I leave the CIX in place (i.e. don not drop and recreate) and load using my normal SP, it takes forever (cancelled after one hour at around 20% through)

    If I leave the CIX in place and tweak my normal SP to order by the CIX order, it takes around 10% longer (no great deal - 16 mins vs 18 mins); This comparison is against the original drop , populate, recreate method (when it works fine)

    If I leave as is, it sometimes sits waiting (LATCH_EX) for hours - initial problem

    One further issue - I cannot reproduce this on my development server!

    So, do I take the marginal increase as a reliability expense against no waits or am I missing something else here?

     

    Thanks

    - Damian

  • 1.       The source is another table in another database and I am using a dataflow task to move it from one table to another. The source is a Stored procedure

    There must be some blocking / other delay in reading the data from that source table.  Look for some activity on that table that is slowing things down.

     

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • I thought that initially Scott

    but, there are 2 seperate tasks both indepdant and both sequential; 1 to populate the table then a second to create the index

    It's waiting on the index create task by which time any interaction with anoth DB has taken place

    - Damian

  • Hmm, interesting.  For the nonclustered index(es)?  For a clus index create, I could see delays as SQL would have to sort the entire contents of the table.

    What is the 'cost threshold for parallelism' setting?  Is it at least 40?  Sometimes SQL has huge waits if it tries to parallel a task that shouldn't be.  80M rows presumably could gain from parallelism, but not necessarily, and frankly SQL has some quirks in this area.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply