LATCH_EX when creating a clustered index

Question

LATCH_EX when creating a clustered index

DamianC

SSCrazy Eights

Points: 8023
More actions
May 10, 2021 at 3:42 pm

#3881521
Hi

I have a package that runs weekly
It has an intermittant problem
There are a number of tables that have a reasonable amount of data in
Once built indexes are added. This includes clustered
Around 1 in every 3 times, one specific table takes around 8-12 hours to add the clustered index
A typical build would be around 15 mins
There's only 80M rows and the index is narrow
On investigation, LATCH_EX is showing as the wait type for the duration
Is there a better approach when you see LATCH_EX as a wait type
I'm thinking leave the CIX on (this is something i've been looking at else where as I've been advised to test - although that was a seperate issue and did not have LATCH_EX waits)
OR pre-order by the clustered index i.e. load in that order then apply the cluster
OR something else?
I haven't really seen this when adding a clustered index before

Thanks
- This topic was modified 4 years, 10 months ago by DamianC.
- This topic was modified 4 years, 10 months ago by DamianC.
- This topic was modified 4 years, 10 months ago by DamianC.
- Damian

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply

ScottPletcher SSC Guru Points: 101238 More actions · Answer 1

Create the clustered index only before loading the table. Create the non-clustered indexes after loading the table.

If this is a permanent table, seriously consider using data compression. Although it takes CPU and elapsed time to compress the data, often the I/O saved makes the table still load faster and you still get the benefits of much faster reads from the table from then on.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

DamianC SSCrazy Eights Points: 8023 More actions · Answer 2

DamianC

SSCrazy Eights

Points: 8023

May 10, 2021 at 3:53 pm

#3881528

Great, thanks

I will give that a go

- Damian

Grant Fritchey SSC Guru Points: 399381 More actions · Answer 3

A quick note on why creating a clustered index first and then loading the data is usually the best approach (testing is always your best friend). The clustered index defines the data storage. It's not simple an index. A clustered index is the table. So, when you create a table, without a clustered index, you're defining what is called a heap. It stores the data one way. When you then tell that table, now you have a clustered index, you're not "adding" an index. Instead, you're telling that table, now you ARE a clustered index. This means that the data has to be moved. It's simply adding a thing on the side. It's literally moving the data out of the heap and into the cluster. So, you're going to see lots and lots of IO, and possibly lots and lots of contention for researches, making that process really painful.

Hope that helps a little.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 4

Ummm.... ok... I want to remind you that NOTHING is a panacea! And, with that, I have a couple of questions...

What is the source of the data that you're creating the table from?
What is the recovery model of the database that table lives in during population?
Can the BULK LOGGED Recovery Model be used on this particular database or do you have some log file dependencies that will prevent that?
Is the data in this table ever updated after the massive load?
Are you merging data or are doing a wholesale replacement of ALL the data during the load?
Can you post the complete CREATE TABLE statement with all of the indexes and keys that the final table will have?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

DamianC SSCrazy Eights Points: 8023 More actions · Answer 5

Thanks

In answer to your question:

1. The source is another table in another database and I am using a dataflow task to move it from one table to another. The source is a Stored procedure

So, within SSIS, it’s split in 3. One task to drop the index, one task to extract the data and one task to create the index

It’s the index creation that seems to be the issue here

2. Full recovery model

3. Not really

4. The data is not updated; it’s read for the next phase

5. Wholesale replacement

6.

CREATE TABLE [InboundExp].[CreditAccountHeader](
[CreditAccountHeaderID] [bigint] NOT NULL,
[PersonID] [bigint] NOT NULL,
[AccountTypeID] [smallint] NOT NULL,
[CompanyTypeID] [smallint] NOT NULL,
[SpecialInstructionID] [smallint] NOT NULL,
[SourceCode] [int] NULL
) ON [PRIMARY]
GO


CREATE CLUSTERED INDEX [IXC_CreditAccountHeader] ON [InboundExp].[CreditAccountHeader]
(
[PersonID] ASC,
[CreditAccountHeaderID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

- Damian

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 6

K. Thanks. One of your answers begs another question. Is that "other" database on the same instance?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

DamianC SSCrazy Eights Points: 8023 More actions · Answer 7

DamianC

SSCrazy Eights

Points: 8023

May 12, 2021 at 6:43 pm

#3882467

Yes, same instance

Thanks

- Damian

Jeffrey Williams SSC Guru Points: 90351 More actions · Answer 8

I would recommend the following:

Don't drop the indexes - disable the NC indexes prior to load
Make sure you set an appropriate batch/commit size and use tablock on the OLEDB destination. Don't use the default values for batch/commit sizes.
After the data is loaded - rebuild ALL indexes.

The load will cause the clustered index to fragment - but since you will be rebuilding all indexes after the load that shouldn't cause any issues. But - you could output the data from the source system in clustered index order and that may help. Only way to be sure would be to test, and compare the increase in how long it takes to extract the data sorted - as it takes to rebuild.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 9

I'll try to get back to this later today but... the load should not actually cause the Clustered Index to fragment... if I'm reading things right, it's a wholesale load/replacement.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

DamianC SSCrazy Eights Points: 8023 More actions · Answer 10

Thanks

Yes, it's wholesale

Fragmentation post population does not appear to be an issue, so doesn't need rebuilding (99.4%)

If I leave the CIX in place (i.e. don not drop and recreate) and load using my normal SP, it takes forever (cancelled after one hour at around 20% through)

If I leave the CIX in place and tweak my normal SP to order by the CIX order, it takes around 10% longer (no great deal - 16 mins vs 18 mins); This comparison is against the original drop , populate, recreate method (when it works fine)

If I leave as is, it sometimes sits waiting (LATCH_EX) for hours - initial problem

One further issue - I cannot reproduce this on my development server!

So, do I take the marginal increase as a reliability expense against no waits or am I missing something else here?

Thanks

- Damian

ScottPletcher SSC Guru Points: 101238 More actions · Answer 11

1. The source is another table in another database and I am using a dataflow task to move it from one table to another. The source is a Stored procedure

There must be some blocking / other delay in reading the data from that source table. Look for some activity on that table that is slowing things down.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

DamianC SSCrazy Eights Points: 8023 More actions · Answer 12

I thought that initially Scott

but, there are 2 seperate tasks both indepdant and both sequential; 1 to populate the table then a second to create the index

It's waiting on the index create task by which time any interaction with anoth DB has taken place

- Damian

ScottPletcher SSC Guru Points: 101238 More actions · Answer 13

Hmm, interesting. For the nonclustered index(es)? For a clus index create, I could see delays as SQL would have to sort the entire contents of the table.

What is the 'cost threshold for parallelism' setting? Is it at least 40? Sometimes SQL has huge waits if it tries to parallel a task that shouldn't be. 80M rows presumably could gain from parallelism, but not necessarily, and frankly SQL has some quirks in this area.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

DamianC SSCrazy Eights Points: 8023 More actions · Answer 14

Thanks Scott

Cost threshold is 75

There are no non-clustered, just the clustered

- Damian