Indexes, Querys and sparse data ?

  • Trying to trawl through BOL to figure out if the SS (2000 or 2k5) db engine takes the spares-ness or selectivity of the data in a column in an index into account when it choose the plan for a query. and then how SS determines the query plan for different syntax's of the same logical query.

    This is as close as I got to an answer. Tried to do some tests to see if it effects the engine in SS 2K.

    [Code]

    SET NOCOUNT ON

    -- create the tables and indexes

    CREATE table A(k integer primary key clustered)

    CREATE table B(k integer primary key clustered, fk integer foreign key references A(K) NULL)

    CREATE index B_FK_A on B(fk)

    -- populate a with the first 10,000 integer values

    Insert A

    SELECT 10000 * V.I + 1000 * W.I + 100 * X.I + 10 * Y.I + Z.I k FROM

    (SELECT 0 I UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) V,

    (SELECT 0 I UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) W,

    (SELECT 0 I UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) X,

    (SELECT 0 I UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) Y,

    (SELECT 0 I UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) Z

    -- populate B with the same 10,000 integers

    INSERT B SELECT K,NULL FROM A

    -- update 0.1% of B to relate to A

    UPDATE B SET fk = k WHERE k < 1000

    -- force update of stats

    UPDATE STATISTICS B

    -- Compare different syntax that should produce the same plan

    GO

    SET SHOWPLAN_TEXT ON

    GO

    SET STATISTICS TIME OFF SET STATISTICS IO OFF

    GO

    DECLARE @a INT

    SELECT @a = A.k FROM A JOIN B ON A.K = B.fk

    SELECT @a = A.k FROM A, B WHERE A.K = B.fk

    SELECT @a = A.k FROM A JOIN (SELECT fk FROM B) C ON C.fk = A.k

    SELECT @a = A.k FROM A WHERE A.K IN (SELECT fk FROM B)

    SELECT @a = A.k FROM A WHERE exists (SELECT 1 FROM B WHERE A.K = B.fk)

    SELECT @a = k FROM (SELECT DISTINCT(B.fk) k FROM B WHERE B.fk IS NOT NULL ) B

    -- and again to get stats

    GO

    SET SHOWPLAN_TEXT OFF

    GO

    SET STATISTICS TIME ON SET STATISTICS IO ON

    GO

    DECLARE @a INT

    SELECT @a = A.k FROM A JOIN B ON A.K = B.fk

    SELECT @a = A.k FROM A, B WHERE A.K = B.fk

    SELECT @a = A.k FROM A JOIN (SELECT fk FROM B) C ON C.fk = A.k

    SELECT @a = A.k FROM A WHERE A.K IN (SELECT fk FROM B)

    SELECT @a = A.k FROM A WHERE exists (SELECT 1 FROM B WHERE A.K = B.fk)

    SELECT @a = k FROM (SELECT DISTINCT(B.fk) k FROM B WHERE B.fk IS NOT NULL ) B

    --clean up

    GO

    DROP TABLE B DROP TABLE A

    GO

    SET NOCOUNT OFF

    [/Code]

    Produced some concerning differences in the number of pages scanned etc ....

    Do I need to update statistics after insert in order to produce a a valid test ?

    I had to up the number of rows in b relating to A to about 500 to see any execution time difference.

    Am I missing something obvious or does the data in column have no effect on the query execution plan used ?

    P.S. I was always taught to do existence tests if not materialising the data for return does this mean this is a bad practice in 2K or 2K5 ?

  • OK so only the first column in the index is the only one that statistics are calculated on. Which answers my original question

    And Adding an UPDATE STATISTICS B.B_FK_A had no change on the pages read /cpu cost.

    results I got were not expected ie all the different plans I guess I cant assume that it is smarter than the person coding the select.

    [Code]

    SELECT @a = A.k FROM A JOIN B ON A.K = B.fk

    |--Merge Join(Inner Join, MERGE : ( [A].[k])=(.[fk]), RESIDUAL : ( .[fk]=[A].[k]))

    |--Clustered Index Scan(OBJECT : ( [SandBox].[dbo].[A].[PK__A__745AE5AF]), ORDERED FORWARD)

    |--Index Scan(OBJECT : ( [SandBox].[dbo]..[B_FK_A]), ORDERED FORWARD)

    CPU time = 15 ms, elapsed time = 20 ms.

    Table 'B'. Scan count 1, logical reads 177, physical reads 0, read-ahead reads 0.

    Table 'A'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0.

    SELECT @a = A.k FROM A, B WHERE A.K = B.fk

    |--Merge Join(Inner Join, MERGE : ( [A].[k])=(.[fk]), RESIDUAL : ( .[fk]=[A].[k]))

    |--Clustered Index Scan(OBJECT : ( [SandBox].[dbo].[A].[PK__A__745AE5AF]), ORDERED FORWARD)

    |--Index Scan(OBJECT : ( [SandBox].[dbo]..[B_FK_A]), ORDERED FORWARD)

    CPU time = 22 ms, elapsed time = 22 ms.

    Table 'B'. Scan count 1, logical reads 177, physical reads 0, read-ahead reads 0.

    Table 'A'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0.

    SELECT @a = A.k FROM A JOIN (SELECT fk FROM B) C ON C.fk = A.k

    |--Merge Join(Inner Join, MERGE : ( [A].[k])=(.[fk]), RESIDUAL : ( [A].[k]=.[fk]))

    |--Clustered Index Scan(OBJECT : ( [SandBox].[dbo].[A].[PK__A__745AE5AF]), ORDERED FORWARD)

    |--Index Scan(OBJECT : ( [SandBox].[dbo]..[B_FK_A]), ORDERED FORWARD)

    CPU time = 15 ms, elapsed time = 22 ms.

    Table 'B'. Scan count 1, logical reads 177, physical reads 0, read-ahead reads 0.

    Table 'A'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0.

    SELECT @a = A.k FROM A WHERE A.K IN (SELECT fk FROM B)

    |--Merge Join(Left Semi Join, MERGE : ( [A].[k])=(.[fk]), RESIDUAL : ( [A].[k]=.[fk]))

    |--Clustered Index Scan(OBJECT : ( [SandBox].[dbo].[A].[PK__A__745AE5AF]), ORDERED FORWARD)

    |--Index Scan(OBJECT : ( [SandBox].[dbo]..[B_FK_A]), ORDERED FORWARD)

    CPU time = 32 ms, elapsed time = 269 ms.

    Table 'B'. Scan count 1, logical reads 177, physical reads 0, read-ahead reads 0.

    Table 'A'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0.

    SELECT @a = A.k FROM A WHERE exists (SELECT 1 FROM B WHERE A.K = B.fk)

    |--Merge Join(Left Semi Join, MERGE : ( [A].[k])=(.[fk]), RESIDUAL : ( [A].[k]=.[fk]))

    |--Clustered Index Scan(OBJECT : ( [SandBox].[dbo].[A].[PK__A__745AE5AF]), ORDERED FORWARD)

    |--Index Scan(OBJECT : ( [SandBox].[dbo]..[B_FK_A]), ORDERED FORWARD)

    CPU time = 27 ms, elapsed time = 27 ms.

    Table 'B'. Scan count 1, logical reads 177, physical reads 0, read-ahead reads 0.

    Table 'A'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0.

    SELECT @a = k FROM (SELECT DISTINCT(B.fk) k FROM B WHERE B.fk IS NOT NULL ) B

    |--Stream Aggregate(GROUP BY : ( .[fk]))

    |--Index Seek(OBJECT : ( [SandBox].[dbo]..[B_FK_A]), SEEK : ( .[fk] IsNotNull) ORDERED FORWARD)

    CPU time = 25 ms, elapsed time = 25 ms.

    Table 'B'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0.

    [/Code]

    I would have expected the last one to be the plan that was used and to be the best performance but the join seemed to much higher number of logical reads but lower effort.

    2K5 is worse weirder still in that the first two like 2K have the same plan and scans but different execution times in this case.

    CPU time = 0 ms, elapsed time = 0 ms.

    Table 'B'. Scan count 1, logical reads 352, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'A'. Scan count 1, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    CPU time = 16 ms, elapsed time = 21 ms.

    Table 'B'. Scan count 1, logical reads 352, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'A'. Scan count 1, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Should I be trying to normalise out caching effects as the results are symptomatic of query plan or reuse ?

    or does it really mean that two different queries producing the same result with the same execution plan can vary in their cpu time ? (SQLserver is dedicated to test and CPU is not thrashing)

  • Two different queries producing the same results with the same execution plan can very definitely take different amounts of CPU time.

    Heck, the same query run two or more times can take more or less CPU time for each run. When running performance tests, I usually run each query five or more times and average the CPU times to get a better feel for how it really goes.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply