insert lots of data into a table from scratch: why so many reads on the target table?

Question

Post reply

insert lots of data into a table from scratch: why so many reads on the target table?

Eric Mamet

SSChampion

Points: 11728
More actions
September 23, 2010 at 10:20 am

#229623

Dear all,
I have a simple piece of code where I try to pump some data into a newly created temp table and I am surprised to see ten times more page reads on my temp table than I had on the original table...
The original table is a Data Warehouse (small) fact table.
I extract ~230,000 rows (out of a few millions with no relevant index) and this costs me ~70,000 page reads.
Fair enough...
However, I noticed that it also did ~700,000 reads on my temp table just for the insert ???
My temp table had a clustered primary key on all the dimension keys (12 columns).
I tried removing the PK constraint and leave the temp table as a heap. The reads on my temp table went down to ~250,000.
Building the original clustered index after the data is loaded only takes ~2000 page reads...
I thought maybe it does not like a clustered index with many columns so I tried adding an Identity column as Primary Key with a Non Clustered constraint
Same again, back to ~700,000 reads
Why??? :crazy:
PS: I have no other constraints on the target table (no referential integrity)

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Eric Mamet SSChampion Points: 11728 More actions · Answer 1

I have discovered that it's quite easy to reproduce and seems to be due to a "group by" I had in my query.

The sample below does something similar

CREATE TABLE #MyTable(What INT, Ever BIGINT)

INSERT INTO #MyTable(What, Ever)

SELECT T1.object_Id, SUM(CAST(T2.object_Id AS BIGINT))

FROM sys.objects T1

CROSS JOIN sys.objects T2

GROUP BY T1.object_id

Table '#MyTable__ ... _000000000019'. Scan count 0, logical reads 209, physical reads 0, read-ahead reads 0...

Table 'sysschobjs'. Scan count 2, logical reads 16, physical reads 0, read-ahead reads 0...

(209 row(s) affected)

I was "upset" about the reads on the #MyTable which seemed to match the number of rows retrieved but I have now noticed that these reads disappear if I don't group by.

My original code is supposed to have millions of rows, hence I would rather avoid this...

I suppose it is understandable to do these reads to do the Group By, never mind...

Eric

Bhuvnesh SSC Guru Points: 59351 More actions · Answer 2

Yes group by could be reason for heavy reads.

-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 3

If there's a unique index on the destination table, SQL has to read it to ensure that you aren't trying to insert duplicates.

If there's a non-unique clustered index SQL has to read it to assign uniquifier values for the clustering key.

That'll explain why it had lower reads when there was no cluster and no pk.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Eric Mamet SSChampion Points: 11728 More actions · Answer 4

For reasons like this, I modified my table to be just a heap.

No constraint, no index.

This is really frustrating me because I tried to reproduce it simply and it seemed to work "bad" everywhere I tried (sql 2008, 2005 sp3, 2005 sp2). As I was about to give up, I realised I had missread one of the tests results and it had been working fine (without these extra reads) but I could not remember how I did it in which environment!!!

Anyway, I have not given up yet...

I realise it will be difficult to comment until I can reproduce it doing it right, and then wrong...

I'll be back!