Indexes

  • Comments posted to this topic are about the item Indexes

  • Data type "bool"? What are you running?

    Also, your statement is only true for the small number of rows in your example. Its a different story with real world volumes. Remember, SQL will choose what it feels is an optimal execution plan based on statistics.

    Run the code below and compare the execution plan for your example against the execution plan for a table with even 100 rows. You'll see what I mean.

    create table #t (id int,ch char,na varchar(20),flag char(1))

    insert into #t values (2,'A','jack','Y')

    insert into #t values (5,'b','amy','N')

    insert into #t values (1,'$','adams','N')

    insert into #t values (3,'*','anna','Y')

    insert into #t values (7,'@','rose','N')

    insert into #t values (4,'&','smith','Y')

    insert into #t values (6,'!','sue','Y')

    create nonclustered index nc_t on #t (id,ch,na)

    -- query 1

    select na from #t where ch = '!'

    -- query 2

    select na from #t where id = 6 and ch = '!'

    -- query 3

    select na from #t where ch = '!' and id = 6

    -- query 4

    select na from #t where flag = 'Y' and id = 6 and ch = '!'

    ;with tally (N) as (select row_number() over(order by id) from master..syscolumns)

    select N as ID, ch, na, flag

    into #bigT

    from tally

    cross join #t t

    where N <=100

    create nonclustered index nc_bigT on #bigT (id,ch,na)

    -- query 1

    select na from #bigT where ch = '!'

    -- query 2

    select na from #bigT where id = 6 and ch = '!'

    -- query 3

    select na from #bigT where ch = '!' and id = 6

    -- query 4

    select na from #bigT where flag = 'Y' and id = 6 and ch = '!'

    drop table #t

    drop table #bigT

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • table scan....I think it will be index scan....and query 1 will give a index scan...

    Regards,
    [font="Verdana"]Sqlfrenzy[/font]

  • table scan....I think it will be index scan....and query 1 will give a index scan...

    Regards,
    [font="Verdana"]Sqlfrenzy[/font]

  • Can you explain what you mean with "Also, your statement is only true for the small number of rows..."?

    I ran both your example and that from the question. On all queries the execution plan looks about the same, saying query 4 scans the table, while the others scan/seek the index with a dozen rows and 250+ rows in the table.

    Since 3 columns (id,ch,na) are part of the index, no table scan should occure unless the forth column is addressed in the where clause.

  • Hmmm... I played with this a little and the number of rows certainly makes a difference.

    I added 10,000 rows (actually 10,001 - as below) and this time query 1 gave the only table scan and query 4 went for nested loops joining an Index Seek and a record id lookup, which is what I had expected it to do when I saw the question.

    For the question as asked, I accept I got it wrong (I said Q1 as I thought all the others would perform an index seek but Q1 could not), but it is interesting how the question is not a simple one of the structure leading to a deterministic result, but the optimiser may take very different routes in the same database structures depending on other factors such as data volumes.

    WITH cte (Num) AS

    (

    SELECT 0 Num

    UNION ALL

    SELECT Num + 1

    FROM cte

    WHERE Num < 10000

    )

    INSERT

    INTO t (id, ch, na, flag)

    SELECT num ,

    CHAR(Num%128 + 50) ,

    CAST(num AS VARCHAR),

    CASE

    WHEN Num%2 = 1

    THEN 'Y'

    ELSE 'N'

    END

    FROM cte OPTION(maxrecursion 10000)

  • Please note that the data of this table fits in one data page - therefore a table scan is nothing more than reading one page of data. And doing row-id lookups is definitively more expensive than a single page read.

    When you add more rows, the picture gets different.

    If you expect only one row to be returned in a table with say 70000 rows, then the index seek + row id lookup (for one row) is much less expensive than a table scan (assume approx. 357 data pages) used for the 70000 rows.

    Best Regards,

    Chris Büttner

  • Now that's interesting. I filled "t" with your script. Including my initial data, there are 10288 rows. I ran the four queries and still get the only table scan on #4. This is on SS2008 Developer Ed.

  • ma (9/7/2009)


    Now that's interesting. I filled "t" with your script. Including my initial data, there are 10288 rows. I ran the four queries and still get the only table scan on #4. This is on SS2008 Developer Ed.

    Did you rebuild the index?

    What I found was:

    Initial conditions:

    1) Index Scan

    4) Table Scan

    After adding 10001 rows:

    1) Index Scan

    4) Index Seek & RowID lookup

    After rebuilding index

    1) Table Scan

    4) Index Seek & RowID lookup

  • After rebuilding the index I also get a table scan on querie 1.

  • dave.farmer (9/7/2009)


    Hmmm... I played with this a little and the number of rows certainly makes a difference.

    I added 10,000 rows (actually 10,001 - as below) and this time query 1 gave the only table scan and query 4 went for nested loops joining an Index Seek and a record id lookup, which is what I had expected it to do when I saw the question.

    For the question as asked, I accept I got it wrong (I said Q1 as I thought all the others would perform an index seek but Q1 could not), but it is interesting how the question is not a simple one of the structure leading to a deterministic result, but the optimiser may take very different routes in the same database structures depending on other factors such as data volumes.

    I agree with you! Only Q1 may lead to a table scan. The q4 have 3 field in the index, the optimizer will use the index scan.

  • Nice question - really gets you thinking...

  • the question is not a simple one of the structure leading to a deterministic result, but the optimiser may take very different routes in the same database structures depending on other factors such as data volumes.

    Very well stated.

    This is a fundamental concept of SQL. The optimizer will make decisions based on the statistics available to it in order to reach an efficient execution plan. It makes decisions based not only on the existence and characteristics of any indexes, but on the volume and distribution of values within those indexes.

    Plans which are quite efficient for small amounts of data, or for certain values, are often less efficient at larger volumes or for different values. (And plans which are efficient for larger amounts of data may be inefficient for smaller volumes.)

    Beyond that, plans can change when a constant gets changed to a @variable within the query, and can change with different search operators. You should always look at the execution plan as it is. Never assume.

    The reliance on statistics is why "parameter sniffing" can cause performance problems for stored procedures, or why temp tables will outperform table variables. It is also why the most common answer to questions regarding performance and execution plans is "It depends." 😉

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • I see that different people are seeing different results.

    On my machine, at 100 rows, the 4th query (against #bigT) produces an index seek, and an RID lookup, joined by a nested loop. If anyone is seeing anything different, please cut and paste the exact code you are running to create the #temp table and populate it. You might also add the results from this query:

    select ID,count(*) from #bigt group by ID

    In my #bigT example, the underlying table is a heap without a clustered index and each row has a unique ID. So only one row has an ID = 6.

    Ma, the results you are seeing may depend on your initial data differing from the data generated by my script to create #bigT. Remember that the optimizer considers the distribution of data within an index. If you only have a couple of different ID values repeated many times, the execution plan for your data might well be different.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • I chose 1 as well.

    When faced with a problem like this, my brain automatically went into "performance problem" mode, and since these generally occur with big datasets, I selected the answer that would be most likely to happen in these circumstances.

    I would have to admit that for these data, 4 is the only right answer - I must learn to be a bit slower with my answers.

Viewing 15 posts - 1 through 15 (of 46 total)

You must be logged in to reply to this topic. Login to reply