July 28, 2010 at 8:51 am
Hi,
I have a 20 GB table with 6 nonclustered and 1 clustered index (PK) on it. The below query uses one of the index in where clause, but SQL chooses table scan, the index has been rebuilded recently an has fragmentation 0.3%.
SELECT TOP 1 *
FROM tbl1 with (nolock)
where col1>='2010-07-01' and col1<'2010-07-06'
--Takes 2 minutes with table scan
SELECT TOP 1 *
FROM [SMS].[dbo].[tblSMSArsiv] with (nolock,index=index_3)
where SentTarih>='2010-07-01' and SentTarih<'2010-07-06'
--Takes 0 seconds
What is the reason of this behavior, does optimizer neglects TOP clause?
Thanks
July 28, 2010 at 8:52 am
By the way index_3 is on col1 ascending
July 28, 2010 at 11:30 pm
Yeah... but you have two different tables here. Which table is the index actually on?
If you want help with a performance problem, you might want to take a look at the second link in my signature line below. Otherwise, we just don't have enough information to be able to help on things like this.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 29, 2010 at 12:09 am
Oh, thats my fault, they is same tables, I forgot to replace the names of second..
SELECT TOP 1 *
FROM tbl1 with (nolock)
where col1>='2010-07-01' and col1<'2010-07-06'
--Takes 2 minutes with table scan
SELECT TOP 1 *
FROM tbl1 with (nolock,index=index_3)
where col1>='2010-07-01' and col1<'2010-07-06'
--Takes 0 seconds
What is the reason of this behavior, does optimizer neglects TOP clause?
Thanks
July 29, 2010 at 12:35 am
- Can you post the actual ddl of as well table and _all_ indexes ? (mybe even statistics)
- can you post the actual generated plan for both queries ?
(you can attach the .sqlplan file in your reply)
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
July 29, 2010 at 12:39 am
The simple answer is it is because the optimiser estimates that a table scan would be cheaper.
Try the same query just SELECTing col1, instead of all columns (*), removing the NOLOCK hint, and adding an ORDER BY clause to properly qualify the TOP, and (perhaps) adding OPTION (RECOMPILE).
By asking for all columns, you are increasing the chances of switching to a table scan. Depending on the number of rows that your date range returns, the optimiser might conclude that a table scan would be faster than seeking on the index and then performing a lookup on the clustered index on each row to fetch the extra columns. A table scan is modelled as using sequential I/O whereas bookmark lookups are modelled at random I/O.
The NOLOCK hint (without an ORDER BY) means the Storage Engine might choose an allocation-order scan driven by IAM pages.
It is always bad to specify TOP without an ORDER BY to define how 'top' should be determined.
OPTION (RECOMPILE) will help to ensure you get a plan optimised for the parameters in the query, rather than a cached version that might have been created with different parameter values.
Paul
July 29, 2010 at 12:53 am
DDL of the table is, sql plans is attached, without index hint it took absolutely 00:06:15 scanning entirely whole 29 GB table!
/****** Object: Table [dbo].[tblSMSArsiv] Script Date: 07/29/2010 09:41:42 ******/
CREATE TABLE [dbo].[tblSMSArsiv](
[ID] [bigint] NOT NULL,
[Oncelik] [char](1) NULL,
[Sender] [char](5) NOT NULL,
[SenderRef] [varchar](30) NOT NULL,
[Tarih] [datetime] NULL,
[StartDate] [char](14) NULL,
[EndDate] [char](14) NULL,
[Baslik] [char](10) NULL,
[Phone] [char](13) NULL,
[Text] [char](160) NULL,
[Sent] [char](1) NULL,
[GrupNo] [char](40) NULL,
[GrupSiraNo] [int] NULL,
[SentTarih] [datetime] NULL,
[Status] [char](1) NULL,
[StatusBilgi] [char](40) NULL,
[GonderimKanal] [char](1) NULL,
[CID] [bigint] NOT NULL,
[AktarimTarih] [datetime] NULL,
CONSTRAINT [PK_tblSMSArsiv] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_tblSMSArsiv] ON [dbo].[tblSMSArsiv]
(
[Phone] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_tblSMSArsiv_1] ON [dbo].[tblSMSArsiv]
(
[SenderRef] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_tblSMSArsiv_2] ON [dbo].[tblSMSArsiv]
(
[GrupNo] ASC,
[GrupSiraNo] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_tblSMSArsiv_3] ON [dbo].[tblSMSArsiv]
(
[SentTarih] ASC
)
INCLUDE ( [Status],
[GonderimKanal]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_tblSMSArsiv_4] ON [dbo].[tblSMSArsiv]
(
[Status] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_tblSMSArsiv_ETL] ON [dbo].[tblSMSArsiv]
(
[AktarimTarih] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
EXEC sys.sp_addextendedproperty @name=N'MS_Description', @value=N'2009-01-16 10:46:25.000 itibaren' , @level0type=N'SCHEMA',@level0name=N'dbo', @level1type=N'TABLE',@level1name=N'tblSMSArsiv', @level2type=N'COLUMN',@level2name=N'ID'
GO
/****** Object: Default [DF_tblSMSArsiv_GonderimKanal] Script Date: 07/29/2010 09:41:42 ******/
ALTER TABLE [dbo].[tblSMSArsiv] ADD CONSTRAINT [DF_tblSMSArsiv_GonderimKanal] DEFAULT ('T') FOR [GonderimKanal]
GO
/****** Object: Default [DF_tblSMSArsiv_CID] Script Date: 07/29/2010 09:41:42 ******/
ALTER TABLE [dbo].[tblSMSArsiv] ADD CONSTRAINT [DF_tblSMSArsiv_CID] DEFAULT ((0)) FOR [CID]
GO
July 29, 2010 at 1:08 am
Paul you are right, by using order by, optimizer chooses to use the index, why is that behavior, but it is still interesting scanning table for TOP 100 in a table having 75M rows?
SELECT TOP 1 *
FROM [SMS].[dbo].[tblSMSArsiv]
where SentTarih>='2010-07-01' and SentTarih<'2010-07-06'
order by SentTarih
July 29, 2010 at 1:19 am
Can you post the execution plan for the query Without the index hint ?
July 29, 2010 at 1:26 am
sporoy (7/29/2010)
Paul you are right, by using order by, optimizer chooses to use the index, why is that behaviuor?
Using ORDER BY, if the query optimiser chose to scan the table, it would have to do a sort. That sort, on 75,861,200 rows, is expensive enough to make the seek + bookmark lookup on the non-clustered index a much cheaper option. Does that make sense to you?
but it is still interesting scanning table for TOP 100 in a table having 75M rows?
It's all about estimated costs and assumptions made by the optimiser. Try a TOP (100) query hinting the non-clustered index and hinting a table scan (INDEX(0)). Compare the estimated cost shown in the execution plans: you should find that the table scan is estimated to be cheaper.
The optimiser uses a number of assumptions and heuristics which don't always hold true for every possible situation, I'm afraid. There's not much you can do, except to make sure your statistics are up-to-date so you give the optimiser accurate information.
A sure sign that a bad plan was caused by duff information is where an 'actual' execution plan shows wildly different information from the estimates used by the optimiser to find a good plan.
Paul
July 29, 2010 at 1:38 am
Just to clarify about the ORDER BY: the index on SentTarih is already in sorted order (by definition) so no explicit sort is required.
Your clustered index is in a different order, so SQL Server would need to re-sort by SentTarih to find the TOP (1) row.
I hope that makes it clearer.
July 29, 2010 at 1:41 am
Execution plan without index hint:
July 29, 2010 at 2:26 am
I understand your point and you are right Paul,
It's all about estimated costs and assumptions made by the optimiser. Try a TOP (100) query hinting the non-clustered index and hinting a table scan (INDEX(0)). Compare the estimated cost shown in the execution plans: you should find that the table scan is estimated to be cheaper.
But the hinted version runs a lot faster, and queryplans says me:
Estimated I/O Cost w/o hint: 2636, w/ hint: 3,26
Estimated CPU Cost w/o hint: 83, w/hint:1,56
Operator Cost w/o hint: 0.0051, w/hint: 0.01
So optimizer considers only operator cost? Do you know why is that so?
July 29, 2010 at 2:27 am
I understand your point and you are right Paul,
It's all about estimated costs and assumptions made by the optimiser. Try a TOP (100) query hinting the non-clustered index and hinting a table scan (INDEX(0)). Compare the estimated cost shown in the execution plans: you should find that the table scan is estimated to be cheaper.
But the hinted version runs a lot faster, and queryplans says me:
Estimated I/O Cost w/o hint: 2636, w/ hint: 3,26
Estimated CPU Cost w/o hint: 83, w/hint:1,56
Operator Cost w/o hint: 0.0051, w/hint: 0.01
So optimizer considers only operator cost? Do you know why is that so?
July 29, 2010 at 2:58 am
sporoy (7/29/2010)
Estimated I/O Cost w/o hint: 2636, w/ hint: 3,26Estimated CPU Cost w/o hint: 83, w/hint:1,56
Operator Cost w/o hint: 0.0051, w/hint: 0.01
So optimizer considers only operator cost? Do you know why is that so?
The estimated CPU and I/O costs are the costs to produce all of the rows in the table.
The optimiser knows that only one row is ultimately required because of the TOP (1).
It knows that it probably won't have to run the whole scan before finding a row that matches the conditions. It uses statistics to estimate how many rows it will need to scan until one matches.
The operator cost is adjusted to reflect this expectation.
Paul
Viewing 15 posts - 1 through 15 (of 29 total)
You must be logged in to reply to this topic. Login to reply