insert lots of data into a table from scratch: why so many reads on the target table?

  • Dear all,

    I have a simple piece of code where I try to pump some data into a newly created temp table and I am surprised to see ten times more page reads on my temp table than I had on the original table...

    The original table is a Data Warehouse (small) fact table.

    I extract ~230,000 rows (out of a few millions with no relevant index) and this costs me ~70,000 page reads.

    Fair enough...

    However, I noticed that it also did ~700,000 reads on my temp table just for the insert ???

    My temp table had a clustered primary key on all the dimension keys (12 columns).

    I tried removing the PK constraint and leave the temp table as a heap. The reads on my temp table went down to ~250,000.

    Building the original clustered index after the data is loaded only takes ~2000 page reads...

    I thought maybe it does not like a clustered index with many columns so I tried adding an Identity column as Primary Key with a Non Clustered constraint

    Same again, back to ~700,000 reads

    Why??? :crazy:

    PS: I have no other constraints on the target table (no referential integrity)

  • I have discovered that it's quite easy to reproduce and seems to be due to a "group by" I had in my query.

    The sample below does something similar

    CREATE TABLE #MyTable(What INT, Ever BIGINT)

    INSERT INTO #MyTable(What, Ever)

    SELECT T1.object_Id, SUM(CAST(T2.object_Id AS BIGINT))

    FROM sys.objects T1

    CROSS JOIN sys.objects T2

    GROUP BY T1.object_id

    Table '#MyTable__ ... _000000000019'. Scan count 0, logical reads 209, physical reads 0, read-ahead reads 0...

    Table 'sysschobjs'. Scan count 2, logical reads 16, physical reads 0, read-ahead reads 0...

    (209 row(s) affected)

    I was "upset" about the reads on the #MyTable which seemed to match the number of rows retrieved but I have now noticed that these reads disappear if I don't group by.

    My original code is supposed to have millions of rows, hence I would rather avoid this...

    I suppose it is understandable to do these reads to do the Group By, never mind...

    Eric

  • Yes group by could be reason for heavy reads.

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • If there's a unique index on the destination table, SQL has to read it to ensure that you aren't trying to insert duplicates.

    If there's a non-unique clustered index SQL has to read it to assign uniquifier values for the clustering key.

    That'll explain why it had lower reads when there was no cluster and no pk.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • For reasons like this, I modified my table to be just a heap.

    No constraint, no index.

    This is really frustrating me because I tried to reproduce it simply and it seemed to work "bad" everywhere I tried (sql 2008, 2005 sp3, 2005 sp2). As I was about to give up, I realised I had missread one of the tests results and it had been working fine (without these extra reads) but I could not remember how I did it in which environment!!!

    Anyway, I have not given up yet...

    I realise it will be difficult to comment until I can reproduce it doing it right, and then wrong...

    I'll be back!

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply