SQL server scans table even if TOP 1 * is used?

  • Hi,

    I have a 20 GB table with 6 nonclustered and 1 clustered index (PK) on it. The below query uses one of the index in where clause, but SQL chooses table scan, the index has been rebuilded recently an has fragmentation 0.3%.

    SELECT TOP 1 *

    FROM tbl1 with (nolock)

    where col1>='2010-07-01' and col1<'2010-07-06'

    --Takes 2 minutes with table scan

    SELECT TOP 1 *

    FROM [SMS].[dbo].[tblSMSArsiv] with (nolock,index=index_3)

    where SentTarih>='2010-07-01' and SentTarih<'2010-07-06'

    --Takes 0 seconds

    What is the reason of this behavior, does optimizer neglects TOP clause?

    Thanks

  • By the way index_3 is on col1 ascending

  • Yeah... but you have two different tables here. Which table is the index actually on?

    If you want help with a performance problem, you might want to take a look at the second link in my signature line below. Otherwise, we just don't have enough information to be able to help on things like this.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Oh, thats my fault, they is same tables, I forgot to replace the names of second..

    SELECT TOP 1 *

    FROM tbl1 with (nolock)

    where col1>='2010-07-01' and col1<'2010-07-06'

    --Takes 2 minutes with table scan

    SELECT TOP 1 *

    FROM tbl1 with (nolock,index=index_3)

    where col1>='2010-07-01' and col1<'2010-07-06'

    --Takes 0 seconds

    What is the reason of this behavior, does optimizer neglects TOP clause?

    Thanks

  • - Can you post the actual ddl of as well table and _all_ indexes ? (mybe even statistics)

    - can you post the actual generated plan for both queries ?

    (you can attach the .sqlplan file in your reply)

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • The simple answer is it is because the optimiser estimates that a table scan would be cheaper.

    Try the same query just SELECTing col1, instead of all columns (*), removing the NOLOCK hint, and adding an ORDER BY clause to properly qualify the TOP, and (perhaps) adding OPTION (RECOMPILE).

    By asking for all columns, you are increasing the chances of switching to a table scan. Depending on the number of rows that your date range returns, the optimiser might conclude that a table scan would be faster than seeking on the index and then performing a lookup on the clustered index on each row to fetch the extra columns. A table scan is modelled as using sequential I/O whereas bookmark lookups are modelled at random I/O.

    The NOLOCK hint (without an ORDER BY) means the Storage Engine might choose an allocation-order scan driven by IAM pages.

    It is always bad to specify TOP without an ORDER BY to define how 'top' should be determined.

    OPTION (RECOMPILE) will help to ensure you get a plan optimised for the parameters in the query, rather than a cached version that might have been created with different parameter values.

    Paul

  • DDL of the table is, sql plans is attached, without index hint it took absolutely 00:06:15 scanning entirely whole 29 GB table!

    /****** Object: Table [dbo].[tblSMSArsiv] Script Date: 07/29/2010 09:41:42 ******/

    CREATE TABLE [dbo].[tblSMSArsiv](

    [ID] [bigint] NOT NULL,

    [Oncelik] [char](1) NULL,

    [Sender] [char](5) NOT NULL,

    [SenderRef] [varchar](30) NOT NULL,

    [Tarih] [datetime] NULL,

    [StartDate] [char](14) NULL,

    [EndDate] [char](14) NULL,

    [Baslik] [char](10) NULL,

    [Phone] [char](13) NULL,

    [Text] [char](160) NULL,

    [Sent] [char](1) NULL,

    [GrupNo] [char](40) NULL,

    [GrupSiraNo] [int] NULL,

    [SentTarih] [datetime] NULL,

    [Status] [char](1) NULL,

    [StatusBilgi] [char](40) NULL,

    [GonderimKanal] [char](1) NULL,

    [CID] [bigint] NOT NULL,

    [AktarimTarih] [datetime] NULL,

    CONSTRAINT [PK_tblSMSArsiv] PRIMARY KEY CLUSTERED

    (

    [ID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    CREATE NONCLUSTERED INDEX [IX_tblSMSArsiv] ON [dbo].[tblSMSArsiv]

    (

    [Phone] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    CREATE NONCLUSTERED INDEX [IX_tblSMSArsiv_1] ON [dbo].[tblSMSArsiv]

    (

    [SenderRef] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    CREATE NONCLUSTERED INDEX [IX_tblSMSArsiv_2] ON [dbo].[tblSMSArsiv]

    (

    [GrupNo] ASC,

    [GrupSiraNo] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    CREATE NONCLUSTERED INDEX [IX_tblSMSArsiv_3] ON [dbo].[tblSMSArsiv]

    (

    [SentTarih] ASC

    )

    INCLUDE ( [Status],

    [GonderimKanal]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    CREATE NONCLUSTERED INDEX [IX_tblSMSArsiv_4] ON [dbo].[tblSMSArsiv]

    (

    [Status] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    CREATE NONCLUSTERED INDEX [IX_tblSMSArsiv_ETL] ON [dbo].[tblSMSArsiv]

    (

    [AktarimTarih] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    EXEC sys.sp_addextendedproperty @name=N'MS_Description', @value=N'2009-01-16 10:46:25.000 itibaren' , @level0type=N'SCHEMA',@level0name=N'dbo', @level1type=N'TABLE',@level1name=N'tblSMSArsiv', @level2type=N'COLUMN',@level2name=N'ID'

    GO

    /****** Object: Default [DF_tblSMSArsiv_GonderimKanal] Script Date: 07/29/2010 09:41:42 ******/

    ALTER TABLE [dbo].[tblSMSArsiv] ADD CONSTRAINT [DF_tblSMSArsiv_GonderimKanal] DEFAULT ('T') FOR [GonderimKanal]

    GO

    /****** Object: Default [DF_tblSMSArsiv_CID] Script Date: 07/29/2010 09:41:42 ******/

    ALTER TABLE [dbo].[tblSMSArsiv] ADD CONSTRAINT [DF_tblSMSArsiv_CID] DEFAULT ((0)) FOR [CID]

    GO

  • Paul you are right, by using order by, optimizer chooses to use the index, why is that behavior, but it is still interesting scanning table for TOP 100 in a table having 75M rows?

    SELECT TOP 1 *

    FROM [SMS].[dbo].[tblSMSArsiv]

    where SentTarih>='2010-07-01' and SentTarih<'2010-07-06'

    order by SentTarih

  • Can you post the execution plan for the query Without the index hint ?



    Clear Sky SQL
    My Blog[/url]

  • sporoy (7/29/2010)


    Paul you are right, by using order by, optimizer chooses to use the index, why is that behaviuor?

    Using ORDER BY, if the query optimiser chose to scan the table, it would have to do a sort. That sort, on 75,861,200 rows, is expensive enough to make the seek + bookmark lookup on the non-clustered index a much cheaper option. Does that make sense to you?

    but it is still interesting scanning table for TOP 100 in a table having 75M rows?

    It's all about estimated costs and assumptions made by the optimiser. Try a TOP (100) query hinting the non-clustered index and hinting a table scan (INDEX(0)). Compare the estimated cost shown in the execution plans: you should find that the table scan is estimated to be cheaper.

    The optimiser uses a number of assumptions and heuristics which don't always hold true for every possible situation, I'm afraid. There's not much you can do, except to make sure your statistics are up-to-date so you give the optimiser accurate information.

    A sure sign that a bad plan was caused by duff information is where an 'actual' execution plan shows wildly different information from the estimates used by the optimiser to find a good plan.

    Paul

  • Just to clarify about the ORDER BY: the index on SentTarih is already in sorted order (by definition) so no explicit sort is required.

    Your clustered index is in a different order, so SQL Server would need to re-sort by SentTarih to find the TOP (1) row.

    I hope that makes it clearer.

  • Execution plan without index hint:

  • I understand your point and you are right Paul,

    It's all about estimated costs and assumptions made by the optimiser. Try a TOP (100) query hinting the non-clustered index and hinting a table scan (INDEX(0)). Compare the estimated cost shown in the execution plans: you should find that the table scan is estimated to be cheaper.

    But the hinted version runs a lot faster, and queryplans says me:

    Estimated I/O Cost w/o hint: 2636, w/ hint: 3,26

    Estimated CPU Cost w/o hint: 83, w/hint:1,56

    Operator Cost w/o hint: 0.0051, w/hint: 0.01

    So optimizer considers only operator cost? Do you know why is that so?

  • I understand your point and you are right Paul,

    It's all about estimated costs and assumptions made by the optimiser. Try a TOP (100) query hinting the non-clustered index and hinting a table scan (INDEX(0)). Compare the estimated cost shown in the execution plans: you should find that the table scan is estimated to be cheaper.

    But the hinted version runs a lot faster, and queryplans says me:

    Estimated I/O Cost w/o hint: 2636, w/ hint: 3,26

    Estimated CPU Cost w/o hint: 83, w/hint:1,56

    Operator Cost w/o hint: 0.0051, w/hint: 0.01

    So optimizer considers only operator cost? Do you know why is that so?

  • sporoy (7/29/2010)


    Estimated I/O Cost w/o hint: 2636, w/ hint: 3,26

    Estimated CPU Cost w/o hint: 83, w/hint:1,56

    Operator Cost w/o hint: 0.0051, w/hint: 0.01

    So optimizer considers only operator cost? Do you know why is that so?

    The estimated CPU and I/O costs are the costs to produce all of the rows in the table.

    The optimiser knows that only one row is ultimately required because of the TOP (1).

    It knows that it probably won't have to run the whole scan before finding a row that matches the conditions. It uses statistics to estimate how many rows it will need to scan until one matches.

    The operator cost is adjusted to reflect this expectation.

    Paul

Viewing 15 posts - 1 through 15 (of 29 total)

You must be logged in to reply to this topic. Login to reply