Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««1234»»»

Index Creation Guidelines Expand / Collapse
Author
Message
Posted Thursday, June 17, 2004 9:21 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Wednesday, July 23, 2014 8:16 AM
Points: 1,035, Visits: 410

Generally a good article, but the author fell prey to one of the most common errors about clustered indexes.  He states:

"Do not try to place a clustered index on a column that can be updated or in the table were the INSERT is a frequently issued command and clustered index column is randomly defined.  It will force the engine to physically rearrange pages allocated for the table, which will slow the system down."

Not quite true...  Yes, inserts in in that situation can cause page splits, but this is not the same as "physically rearranging pages".  The data pages do not need to be physically arranged (on disk) in any particular order, and in fact they are not.  The various pages are linked together via a "double linked list" mechanisim where each page contains a pointer to the next page (logically NOT physically) and to the previous page.  It is this linkage that must be updated and maintained when pages are split.  Maintaining the physical order of pages would indeed be prohibitively expensive and it is not done for that reason.

 




/*****************

If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek



*****************/
Post #121639
Posted Thursday, June 17, 2004 11:04 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Monday, January 16, 2012 12:55 PM
Points: 34, Visits: 688

Excellent article.

One of the few I have seen which mentioned the selectivity guidelines for non clustered indexes.

Yes it is a good reason to drop an index if a single value returns more than 10-15% of the table.

David Branscome




Post #121674
Posted Wednesday, July 21, 2004 8:42 PM
SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Monday, October 13, 2014 8:08 PM
Points: 4,573, Visits: 8,353

What's wrong with Datetime fields? Why indexes on those fields are not recommended?

I use to work with big tables containing records about some events. And main part of information is Start Date, End Date, Date Scheduled, etc.

Typical query: select Cols from TABLE where Date between @StartDate and @EndDate.

It takes ages to run this query unless you create clustered index on Datetime column. Non clustered index helps but not really much. But clustered index dramatically improves performance, especially if you are interested in short period between @StartDate and @EndDate - e.g. one month from 30 years log.

That’s why I really surprised by suggestion not to use indexes on Datetime columns. If there is any reasonable explanation to this?

 

And another notice for INSERT, DELETE and UPDATE. If you choose right indexes policy this operations may not cause significant delays because of indexes. For example, if you delete solid range of rows arranged by clustered index (delete from TABLE where INDEXEDCOL between A and B) it will take the same time as delete up to 10 rows depending on fragmentation level of your table. Because it will delete rows only from 2 pages (where A and B values are stored) and then delete pointers to PAGES.
But if you have 10 other indexes on the same table it won’t really help. Those indexes will eat all your time.

Post #127529
Posted Monday, August 9, 2004 2:09 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Friday, September 7, 2007 1:13 AM
Points: 30, Visits: 1

I agree with Sergiy on the indexes on datetime data types.  These are just represented by integers anyway, aren't they?  Is there a large additional overhead because of using datetime over, say, integer?

I have a table that logs requests from users and will be frequently queried on the request date (either one day or a date range).  I have used a non-clustered index on an IDENTITY PK and a clustered index on the request date (smalldatetime).  The likelihood of two requests being made during the same minute is quite small so the request date column is almost unique.  Also the request date inserted will always be greater than or equal to the greatest request date.

The table will start at 0 and will only increase by about 2-3000 rows each year so I doubt the index options would make much difference. 

Does anyone have some more info on the use of datetime/smalldatetime as indexes, clustered or non-clustered?

 

Post #130877
Posted Wednesday, August 11, 2004 5:51 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, August 13, 2014 7:25 AM
Points: 138, Visits: 272

Based on my experience I do believe that clustered index should be for every table. If table don't have a good candidate for the clustered index for whatever reason then it may be a case to add fake column as identity (Identity column is always good candidate for clustered index) It will not slow performance because record always will be stored at the end but will give great benefit for the table maintainace and database shrinking if database has often data loads and/or delete rows.




Post #131160
Posted Thursday, June 16, 2005 6:55 AM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Friday, December 13, 2013 8:04 PM
Points: 545, Visits: 255

Excellent Article.

I like the thinking here ,as in all situation DBA wan't find enough time to check lot many queries in QA . Your Skill as DBA is reflected when you can trace something important without going into long and short of analysis.

 

Thanks,

Sameer Raval

 

 



Kindest Regards,

Sameer Raval [Sql Server DBA]
Geico Insurance
RavalSameer@hotmail.com

Post #191230
Posted Thursday, June 16, 2005 12:25 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, August 13, 2014 7:25 AM
Points: 138, Visits: 272

Thanks , your comment is true.




Post #191445
Posted Friday, October 21, 2005 4:48 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, April 14, 2011 5:55 AM
Points: 16, Visits: 76

I already posted the comment that was here once I checked, but dissapeared in the meantime.

Anyway, I like the article, it is pretty good beginners guide to index design.

However, I dissagree with next point:

  • If a column in a table is not at least 80 - 85% unique values, then most likely the SQL Server Query Optimizer will not use a non-clustered index based on that column.

For example: index on column gender with values ‘M’, ‘F’, and NULL is not a subject for the index creation. The same rule can be applied to the flag and status columns with 5-6 different values.

Index usefulness doesn't depend on number of possible values. Instead data distribution and query usage pattern should be considered. For example:

If military database is frequently searched for females (let's say lest then 5% soldiers are females) index on gender column may be used. Also, condition Status = "in process" in order processing system would benefit from index on Status column even if there are only a few possible statuses.

Post #231087
Posted Friday, October 21, 2005 7:02 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Friday, September 26, 2014 6:24 PM
Points: 31, Visits: 82

Can you explain a little more about not needing indexes on date columns?

The majority of my searches use a date parameter and look at small portions of the data based on date.

As far as i can tell the oprimizer is using the index

Post #231128
Posted Friday, October 21, 2005 7:20 AM
SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: Moderators
Last Login: 2 days ago @ 1:24 PM
Points: 8,370, Visits: 743

Actually not sure about the 80% value but there is a threshold in which statistically the index will not be a value. But in your example you mention M, F, null and a military database. No matter the percentage of F in the system the statistics are still based on the overall uniqueness of the data in the index. So if you have said index and it has a million rows then the statistical uniqueness is .0003 percent. Due to this, more often than not it will choose a table scan or clustered index scan no matter how many times you run it. That said there are some conditional situations where things will behave differently than the statement leads to believe.

Ex. -- Try building an querying against this table.

CREATE TABLE dbo.MilPer (idx int identity(1,1) not null primary key,
sex char(1) null
)
go

CREATE NONCLUSTERED INDEX IX_MilPer ON dbo.MilPer
 (
 sex
 
GO

set nocount on

--Insert 1,000,000 records all M
declare @x int

set @x = 0

while @x < 1000000
begin
 insert dbo.MilPer (sex) values ('M')
 set @x = @x + 1
end
go

--50000 updated to F
UPDATE
 M
SET
 sex = 'F'
FROM
 dbo.MilPer M
INNER JOIN
 (SELECT TOP 50000 X.idx FROM dbo.MilPer X ORDER BY NewId()) Y
ON
 M.idx = Y.idx
go

--20000 updated to null
UPDATE
 M
SET
 sex = Null
FROM
 dbo.MilPer M
INNER JOIN
 (SELECT TOP 20000 X.idx FROM dbo.MilPer X WHERE sex != 'F' ORDER BY NewId()) Y
ON
 M.idx = Y.idx
go

--Verify counts
select sex, count(*) from dbo.MilPer  group by sex
GO

-----Do your testing here.--------

--Now change the table to this.

ALTER TABLE dbo.MilPer ADD
 [alt] char(1) NOT NULL CONSTRAINT DF_MilPer_at DEFAULT 'A'
GO

-----Do your testing here.--------

You'll find that when sex and idx were the only columns it will use the non-clustered index. This is because it also happens that the non-clustered index will work as a covering index for SELECT * in that case because the idx column is tucked inside the non-clustered index because of the way clustered indexes relate to non-clustered.

However once you add the alt column you will find it will do a clustered index seek. This is because the non-clustered index no longer will work as a covering index and the statistic density (based on uniqueness) isn't going to pose high enough to have the system consdier it. If you instead do SELECT idx, sex you will get the non-clustered seek again. That said however neither method executes faster than the other because the data itself in the case isn't dense enough for thorough testing.




Post #231137
« Prev Topic | Next Topic »

Add to briefcase ««1234»»»

Permissions Expand / Collapse