September 23, 2010 at 10:20 am
Dear all,
I have a simple piece of code where I try to pump some data into a newly created temp table and I am surprised to see ten times more page reads on my temp table than I had on the original table...
The original table is a Data Warehouse (small) fact table.
I extract ~230,000 rows (out of a few millions with no relevant index) and this costs me ~70,000 page reads.
Fair enough...
However, I noticed that it also did ~700,000 reads on my temp table just for the insert ???
My temp table had a clustered primary key on all the dimension keys (12 columns).
I tried removing the PK constraint and leave the temp table as a heap. The reads on my temp table went down to ~250,000.
Building the original clustered index after the data is loaded only takes ~2000 page reads...
I thought maybe it does not like a clustered index with many columns so I tried adding an Identity column as Primary Key with a Non Clustered constraint
Same again, back to ~700,000 reads
Why??? :crazy:
PS: I have no other constraints on the target table (no referential integrity)
September 24, 2010 at 2:33 am
I have discovered that it's quite easy to reproduce and seems to be due to a "group by" I had in my query.
The sample below does something similar
CREATE TABLE #MyTable(What INT, Ever BIGINT)
INSERT INTO #MyTable(What, Ever)
SELECT T1.object_Id, SUM(CAST(T2.object_Id AS BIGINT))
FROM sys.objects T1
CROSS JOIN sys.objects T2
GROUP BY T1.object_id
Table '#MyTable__ ... _000000000019'. Scan count 0, logical reads 209, physical reads 0, read-ahead reads 0...
Table 'sysschobjs'. Scan count 2, logical reads 16, physical reads 0, read-ahead reads 0...
(209 row(s) affected)
I was "upset" about the reads on the #MyTable which seemed to match the number of rows retrieved but I have now noticed that these reads disappear if I don't group by.
My original code is supposed to have millions of rows, hence I would rather avoid this...
I suppose it is understandable to do these reads to do the Group By, never mind...
Eric
September 24, 2010 at 3:29 am
Yes group by could be reason for heavy reads.
-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)
September 24, 2010 at 4:03 am
If there's a unique index on the destination table, SQL has to read it to ensure that you aren't trying to insert duplicates.
If there's a non-unique clustered index SQL has to read it to assign uniquifier values for the clustering key.
That'll explain why it had lower reads when there was no cluster and no pk.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 24, 2010 at 4:16 am
For reasons like this, I modified my table to be just a heap.
No constraint, no index.
This is really frustrating me because I tried to reproduce it simply and it seemed to work "bad" everywhere I tried (sql 2008, 2005 sp3, 2005 sp2). As I was about to give up, I realised I had missread one of the tests results and it had been working fine (without these extra reads) but I could not remember how I did it in which environment!!!
Anyway, I have not given up yet...
I realise it will be difficult to comment until I can reproduce it doing it right, and then wrong...
I'll be back!
Viewing 5 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply