Temp Table's Primary Key Default

Question

Temp Table's Primary Key Default

rchantler

SSCrazy

Points: 2128
More actions
September 25, 2015 at 3:06 pm

#308204

Is a primary key on a temp table implemented as a clustered index by default, or as a unique index by default?

Viewing 15 posts - 1 through 15 (of 30 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1003851 More actions · Answer 1

rchantler (9/25/2015)
Is a primary key on a temp table implemented as a clustered index by default, or as a unique index by default?

Both. As on a normal table, if a clustered index does not already exist, assigning a column as the PK will force a unique clustered index to be formed.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

rchantler SSCrazy Points: 2128 More actions · Answer 2

Thanks Jeff,

The reason for this question is a perplexing performance problem. I am going to take some time to try to put it into compact, reproducible form but it boils down to what looks like 2 functionally identical definitions causing a 100 times slowdown in an update statement. With no other differences in the code def.1 is 100 times faster than def.2

Definition 1:

CREATE TABLE #hidvals(hid INT NOT NULL,

extel2code CHAR(2),

start SMALLDATETIME,

finish SMALLDATETIME,

value1 FLOAT,

value2 FLOAT)

CREATE CLUSTERED INDEX IDX_hidvals ON #hidvals(hid)

Definition 2:

CREATE TABLE #hidvals(hid INT NOT NULL,

extel2code CHAR(2),

start SMALLDATETIME,

finish SMALLDATETIME,

value1 FLOAT,

value2 FLOAT,

PRIMARY KEY (hid)) -- same as PRIMARY KEY CLUSTERED

Here's the UPDATE statement.

update H

set value1 = X1.[1MPM], value2 = X2.[1MPM]

from #hidvals H

left join HD01.prod.USPROD_Calculations X1

on H.hid = X1.hid and H.start = X1.adate

left join HD01.prod.USPROD_Calculations X2

on H.hid = X2.hid and H.finish = X2.adate

Eric M Russell SSC Guru Points: 125519 More actions · Answer 3

For each temporary tables currently allocated, the following script will return the table name, index type(s) (heap, clustered, non-clustered), reserved mb, and row counts. Using this, confirm the allocation of temp table for both code samples are equivalent.

Also take a look at the execution plan to confirm it looks equivalent for both.

use tempdb;

select o.type_desc

, substring(o.name,1,charindex('__',o.name)+1)+right(o.name,12) table_shortname

, si.name index_name

, case si.index_id when 0 then 'HEAP' when 1 then 'CLUSTERED' else 'NONCLUSTERED' end index_type

, row_count

, ((reserved_page_count * 8024) / 1024 / 1024)reserved_mb

, create_date

from sys.dm_db_partition_stats ps

left join sys.objects o on ps.object_id = o.object_id

left join sys.indexes si on si.object_id = o.object_id and si.index_id = ps.index_id

where is_ms_shipped = 0

order by reserved_page_count desc;

One more thing to do when comparing runtime of two variations of scripts, you need to flush the page buffer cache between each execution to insure they both start on equal footing.

DBCC DROPCLEANBUFFERS;

DBCC FREEPROCCACHE;

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

rchantler SSCrazy Points: 2128 More actions · Answer 4

Thanks Eric,

Thanks for the script, I'll see what the results it gives.

The execution plans are totally different. That's the point really. Why should the execution plans be vastly different if the two versions of the temp table are functionally equivalent?

Eric M Russell SSC Guru Points: 125519 More actions · Answer 5

rchantler (9/28/2015)
Thanks Eric,
Thanks for the script, I'll see what the results it gives.
The execution plans are totally different. That's the point really. Why should the execution plans be vastly different if the two versions of the temp table are functionally equivalent?

You said "I'll see what results it gives", but have you confirmed both versions of temp table have same index type, number of rows, and allocated storage?

If both temp tables are equivalent, then reply back with attachment of both execution plans.

http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Kristen-173977 SSCrazy Eights Points: 8687 More actions · Answer 6

rchantler (9/28/2015)
CREATE CLUSTERED INDEX IDX_hidvals ON #hidvals(hid)

That is not declared as UNIQUE, so not quite the same thing as a PRIMARY KEY's Clustered Index (SQL will add a tie-break value which will make the key size wider).

Also, Primary Key will not allow NULLs which might effect how the optimiser treats the index - although you have declared [hid] as NOT NULL so maybe that is as broad as it is long ...

rchantler SSCrazy Points: 2128 More actions · Answer 7

Thanks again Eric,

Yes, they have identical indexes; CLUSTERED with the same # of rows.

I am out of the office today; will get the very different execution plans to you when I get back.

Why do I get the feeling I am missing something obvious?

rchantler SSCrazy Points: 2128 More actions · Answer 8

Thanks Kristen,

Yes, not precisely the same but in my mind (perhaps where the problem lies) functionally the same as the hid field is always unique, even if not declared so. This table version (with potentially non-unique rows) is the fast one, which also runs counter to my expectation.

Jeff Moden SSC Guru Points: 1003851 More actions · Answer 9

Ostensibly, the Temp Table has only the required data in it and every row will be used/updated in some fashion. Try NOT having any indexes on the Temp Table and see what happens.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Eric M Russell SSC Guru Points: 125519 More actions · Answer 10

rchantler (9/29/2015)
Thanks Kristen,
Yes, not precisely the same but in my mind (perhaps where the problem lies) functionally the same as the hid field is always unique, even if not declared so. This table version (with potentially non-unique rows) is the fast one, which also runs counter to my expectation.

I understand you as saying that sample #1, with the non-unique clustered key, performs better than sample #2, which has a primary key. They should both be clustered, since primary keys are clustered by default. The mystery, however, is why sample #2 is not the better performer, because generally speaking a primary or unique constraint on an index will help tip SQL Server's execution plan optimizer.

One thing I notice is that in sample #1 you are creating the clustered index in a separate operation. Is this index added before or after the table is inserted? If the table is indexed after insertion(s), then index fragmentation is minimal.

In contrast, sample #2 creates it's index inline. This means that the index pages are allocated as insertion(s) take place, so there is more potential for fragmentation.

Using the script below, compare the allocated size in MB for both temp tables, which indicates how organized the internal page structure is. This could play a role in table statistics and ultimately the chosen plan.

I have modified my script to add a handful of additional columns indicating the primary and unique properties of each index and also a column listing what columns are in each index.

use tempdb;

select o.type_desc

, substring(o.name,1,charindex('__',o.name)+1)+right(o.name,12) table_shortname

, si.name index_name

, case si.index_id when 0 then 'HEAP' when 1 then 'CLUSTERED' else 'NONCLUSTERED' end index_type

, (

select c.name + ', '

from sys.index_columns ic

left join sys.columns c on c.object_id = ic.object_id and c.column_id = ic.column_id

where ic.object_id = si.object_id and ic.index_id = si.index_id

order by ic.key_ordinal

for xml path('')

) index_cols

, si.is_unique

, si.is_primary_key

, si.is_unique_constraint

, si.fill_factor

, si.is_padded

, row_count

, ((reserved_page_count * 8024) / 1024 / 1024) reserved_mb

, create_date

from sys.dm_db_partition_stats ps

left join sys.objects o on ps.object_id = o.object_id

left join sys.indexes si on si.object_id = o.object_id and si.index_id = ps.index_id

where is_ms_shipped = 0

-- order by reserved_page_count desc;

order by table_shortname;

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

rchantler SSCrazy Points: 2128 More actions · Answer 11

Thanks Jeff,

If I leave the temp table as a heap it performs slightly better than with the clustered index, i.e. est. subtree cost 15.59 versus 17.65 on actual runs. That is instructive, thanks.

Still, the mystery is why the primary key definition leads to a different execution plan with an ext. subtree cost of 2,548.

I tried another version where I created a primary key unclustered and then added the clustered index. This muddies the waters further, as it has the very slow execution plan. It seems as if the existence of the primary key sets the optimizer down the wrong track!

One further point (clue?). If i switch the query to do joins instead of left joins the performance difference evaporates.

rchantler SSCrazy Points: 2128 More actions · Answer 12

Thanks Eric,

I collected the stats with your expanded script. The 2 indexes are both reported as CLUSTERED with hid as the only column, 4318 rows and 0 reserved_mb.

As expected, the primary key version is also reported as a primary key and unique.

I should also mention I'm running on 2008 R2.

rchantler SSCrazy Points: 2128 More actions · Answer 13

Hi again Eric,

Re the fragmentation - I've just tested this; it makes no difference performance-wise in the clustered index example whether the data is inserted before or after the index is created.

rchantler SSCrazy Points: 2128 More actions · Answer 14

rchantler

SSCrazy

Points: 2128

September 30, 2015 at 2:16 pm

#1830585

I've attached the 2 execution plans.