SQL server scans table even if TOP 1 * is used?

Question

Post reply

SQL server scans table even if TOP 1 * is used?

sporoy

Hall of Fame

Points: 3114
More actions
July 28, 2010 at 8:51 am

#238142

Hi,
I have a 20 GB table with 6 nonclustered and 1 clustered index (PK) on it. The below query uses one of the index in where clause, but SQL chooses table scan, the index has been rebuilded recently an has fragmentation 0.3%.
SELECT TOP 1 *
FROM tbl1 with (nolock)
where col1>='2010-07-01' and col1<'2010-07-06'
--Takes 2 minutes with table scan
SELECT TOP 1 *
FROM [SMS].[dbo].[tblSMSArsiv] with (nolock,index=index_3)
where SentTarih>='2010-07-01' and SentTarih<'2010-07-06'
--Takes 0 seconds
What is the reason of this behavior, does optimizer neglects TOP clause?
Thanks

Viewing 15 posts - 1 through 15 (of 29 total)

You must be logged in to reply to this topic. Login to reply

sporoy Hall of Fame Points: 3114 More actions · Answer 1

sporoy

Hall of Fame

Points: 3114

July 28, 2010 at 8:52 am

#1199272

By the way index_3 is on col1 ascending

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 2

Yeah... but you have two different tables here. Which table is the index actually on?

If you want help with a performance problem, you might want to take a look at the second link in my signature line below. Otherwise, we just don't have enough information to be able to help on things like this.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

sporoy Hall of Fame Points: 3114 More actions · Answer 3

Oh, thats my fault, they is same tables, I forgot to replace the names of second..

SELECT TOP 1 *

FROM tbl1 with (nolock)

where col1>='2010-07-01' and col1<'2010-07-06'

--Takes 2 minutes with table scan

SELECT TOP 1 *

FROM tbl1 with (nolock,index=index_3)

where col1>='2010-07-01' and col1<'2010-07-06'

--Takes 0 seconds

What is the reason of this behavior, does optimizer neglects TOP clause?

Thanks

Johan Bijnens SSC Guru Points: 135602 More actions · Answer 4

- Can you post the actual ddl of as well table and _all_ indexes ? (mybe even statistics)

- can you post the actual generated plan for both queries ?

(you can attach the .sqlplan file in your reply)

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data and code to get the best help

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Who am I ? Sometimes this is me but most of the time this is me

Paul White SSC Guru Points: 150468 More actions · Answer 5

The simple answer is it is because the optimiser estimates that a table scan would be cheaper.

Try the same query just SELECTing col1, instead of all columns (*), removing the NOLOCK hint, and adding an ORDER BY clause to properly qualify the TOP, and (perhaps) adding OPTION (RECOMPILE).

By asking for all columns, you are increasing the chances of switching to a table scan. Depending on the number of rows that your date range returns, the optimiser might conclude that a table scan would be faster than seeking on the index and then performing a lookup on the clustered index on each row to fetch the extra columns. A table scan is modelled as using sequential I/O whereas bookmark lookups are modelled at random I/O.

The NOLOCK hint (without an ORDER BY) means the Storage Engine might choose an allocation-order scan driven by IAM pages.

It is always bad to specify TOP without an ORDER BY to define how 'top' should be determined.

OPTION (RECOMPILE) will help to ensure you get a plan optimised for the parameters in the query, rather than a cached version that might have been created with different parameter values.

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

sporoy Hall of Fame Points: 3114 More actions · Answer 6

DDL of the table is, sql plans is attached, without index hint it took absolutely 00:06:15 scanning entirely whole 29 GB table!

/****** Object: Table [dbo].[tblSMSArsiv] Script Date: 07/29/2010 09:41:42 ******/

CREATE TABLE [dbo].[tblSMSArsiv](

[ID] [bigint] NOT NULL,

[Oncelik] [char](1) NULL,

[Sender] [char](5) NOT NULL,

[SenderRef] [varchar](30) NOT NULL,

[Tarih] [datetime] NULL,

[StartDate] [char](14) NULL,

[EndDate] [char](14) NULL,

[Baslik] [char](10) NULL,

[Phone] [char](13) NULL,

[Text] [char](160) NULL,

[Sent] [char](1) NULL,

[GrupNo] [char](40) NULL,

[GrupSiraNo] [int] NULL,

[SentTarih] [datetime] NULL,

[Status] [char](1) NULL,

[StatusBilgi] [char](40) NULL,

[GonderimKanal] [char](1) NULL,

[CID] [bigint] NOT NULL,

[AktarimTarih] [datetime] NULL,

CONSTRAINT [PK_tblSMSArsiv] PRIMARY KEY CLUSTERED

(

[ID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

) ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [IX_tblSMSArsiv] ON [dbo].[tblSMSArsiv]

(

[Phone] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [IX_tblSMSArsiv_1] ON [dbo].[tblSMSArsiv]

(

[SenderRef] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [IX_tblSMSArsiv_2] ON [dbo].[tblSMSArsiv]

(

[GrupNo] ASC,

[GrupSiraNo] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [IX_tblSMSArsiv_3] ON [dbo].[tblSMSArsiv]

(

[SentTarih] ASC

)

INCLUDE ( [Status],

[GonderimKanal]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [IX_tblSMSArsiv_4] ON [dbo].[tblSMSArsiv]

(

[Status] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [IX_tblSMSArsiv_ETL] ON [dbo].[tblSMSArsiv]

(

[AktarimTarih] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

EXEC sys.sp_addextendedproperty @name=N'MS_Description', @value=N'2009-01-16 10:46:25.000 itibaren' , @level0type=N'SCHEMA',@level0name=N'dbo', @level1type=N'TABLE',@level1name=N'tblSMSArsiv', @level2type=N'COLUMN',@level2name=N'ID'

GO

/****** Object: Default [DF_tblSMSArsiv_GonderimKanal] Script Date: 07/29/2010 09:41:42 ******/

ALTER TABLE [dbo].[tblSMSArsiv] ADD CONSTRAINT [DF_tblSMSArsiv_GonderimKanal] DEFAULT ('T') FOR [GonderimKanal]

GO

/****** Object: Default [DF_tblSMSArsiv_CID] Script Date: 07/29/2010 09:41:42 ******/

ALTER TABLE [dbo].[tblSMSArsiv] ADD CONSTRAINT [DF_tblSMSArsiv_CID] DEFAULT ((0)) FOR [CID]

GO

sporoy Hall of Fame Points: 3114 More actions · Answer 7

Paul you are right, by using order by, optimizer chooses to use the index, why is that behavior, but it is still interesting scanning table for TOP 100 in a table having 75M rows?

SELECT TOP 1 *

FROM [SMS].[dbo].[tblSMSArsiv]

where SentTarih>='2010-07-01' and SentTarih<'2010-07-06'

order by SentTarih

Dave Ballantyne SSC-Dedicated Points: 33667 More actions · Answer 8

Can you post the execution plan for the query Without the index hint ?

Clear Sky SQL
My Blog[/url]

Paul White SSC Guru Points: 150468 More actions · Answer 9

sporoy (7/29/2010)
Paul you are right, by using order by, optimizer chooses to use the index, why is that behaviuor?

Using ORDER BY, if the query optimiser chose to scan the table, it would have to do a sort. That sort, on 75,861,200 rows, is expensive enough to make the seek + bookmark lookup on the non-clustered index a much cheaper option. Does that make sense to you?

but it is still interesting scanning table for TOP 100 in a table having 75M rows?

It's all about estimated costs and assumptions made by the optimiser. Try a TOP (100) query hinting the non-clustered index and hinting a table scan (INDEX(0)). Compare the estimated cost shown in the execution plans: you should find that the table scan is estimated to be cheaper.

The optimiser uses a number of assumptions and heuristics which don't always hold true for every possible situation, I'm afraid. There's not much you can do, except to make sure your statistics are up-to-date so you give the optimiser accurate information.

A sure sign that a bad plan was caused by duff information is where an 'actual' execution plan shows wildly different information from the estimates used by the optimiser to find a good plan.

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Paul White SSC Guru Points: 150468 More actions · Answer 10

Just to clarify about the ORDER BY: the index on SentTarih is already in sorted order (by definition) so no explicit sort is required.

Your clustered index is in a different order, so SQL Server would need to re-sort by SentTarih to find the TOP (1) row.

I hope that makes it clearer.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

sporoy Hall of Fame Points: 3114 More actions · Answer 11

sporoy

Hall of Fame

Points: 3114

July 29, 2010 at 1:41 am

#1199640

Execution plan without index hint:

sporoy Hall of Fame Points: 3114 More actions · Answer 12

I understand your point and you are right Paul,

It's all about estimated costs and assumptions made by the optimiser. Try a TOP (100) query hinting the non-clustered index and hinting a table scan (INDEX(0)). Compare the estimated cost shown in the execution plans: you should find that the table scan is estimated to be cheaper.

But the hinted version runs a lot faster, and queryplans says me:

Estimated I/O Cost w/o hint: 2636, w/ hint: 3,26

Estimated CPU Cost w/o hint: 83, w/hint:1,56

Operator Cost w/o hint: 0.0051, w/hint: 0.01

So optimizer considers only operator cost? Do you know why is that so?

sporoy Hall of Fame Points: 3114 More actions · Answer 13

I understand your point and you are right Paul,

It's all about estimated costs and assumptions made by the optimiser. Try a TOP (100) query hinting the non-clustered index and hinting a table scan (INDEX(0)). Compare the estimated cost shown in the execution plans: you should find that the table scan is estimated to be cheaper.

But the hinted version runs a lot faster, and queryplans says me:

Estimated I/O Cost w/o hint: 2636, w/ hint: 3,26

Estimated CPU Cost w/o hint: 83, w/hint:1,56

Operator Cost w/o hint: 0.0051, w/hint: 0.01

So optimizer considers only operator cost? Do you know why is that so?

Paul White SSC Guru Points: 150468 More actions · Answer 14

sporoy (7/29/2010)
Estimated I/O Cost w/o hint: 2636, w/ hint: 3,26
Estimated CPU Cost w/o hint: 83, w/hint:1,56
Operator Cost w/o hint: 0.0051, w/hint: 0.01
So optimizer considers only operator cost? Do you know why is that so?

The estimated CPU and I/O costs are the costs to produce all of the rows in the table.

The optimiser knows that only one row is ultimately required because of the TOP (1).

It knows that it probably won't have to run the whole scan before finding a row that matches the conditions. It uses statistics to estimate how many rows it will need to scan until one matches.

The operator cost is adjusted to reflect this expectation.

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi