Speeding Up Database Access Part 3 - Fixing Missing Indexes

  • Comments posted to this topic are about the item Speeding Up Database Access Part 3 - Fixing Missing Indexes

  • Some corrections...

    The article implies (thought doesn't outright say) that the rows in a table are physically sorted by the clustered index. This is not necessarily true, the order that the rows appear within pages and within the file can differ from the clustered index order. The clustered index enforces a logical sort order and SQL tries as much as possible to make the logical and physical sort order the same when the cluster is created or rebuilt, but it is not guaranteed.

    If there is no clustered index, or if it is not unique, then non-clustered index records do have the physical address.

    Only if the base table is a heap do the nonclustered index records get the RID (the row identifier). If the table has a clustered index then the nonclustered indexes get the clustering key. If the clustered index is not unique SQL adds a uiniquifier (a 4 byte int) which will also appear in the nonclustered index records.

    Putting a Primary Key on a column has the effect of giving it a clustered index.

    By default only. It's possible (and in some cases a good idea) to create a nonclustered primary key

    When to use an index

    Consider columns that have a UNIQUE constraint. Having an index on the column makes it easier for SQL Server to check whether a new value would not be unique.

    No need, because unique constraints are enforced by unique indexes. Hence if you add an index to a column that has a unique constraint you're adding a duplicate index, wasting space and wasting performance.

    Look at putting an index on at least one column involved in every JOIN. If you join two tables with 500 rows each, this potentially creates a set of 500 * 500 = 250000 rows - so an index on a JOIN can make a big difference.

    It only creates such a large resultset if you're doing a cross join, and indexes won't help that. Indexes are recommended on foreign key columns, but not for the reason given here.

    Low Specificity

    Even if there is an index on a column, the query optimizer won't always use it.

    If the index is covering for the query, SQL will use it even if there are only 2 values in a million rows.

    The query optimizer is unlikely to use a non-clustered index for a price whose specificity is below 85%.

    The figure's closer to 99%, assuming the index in question is not covering.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Gila answered one of my questions about the unique constraint/unique index already.

    I do still have a question about the when to use clustered indexes section though. The sentence below makes sense to me.

    Most often you'll make the primary key an IDENTITY column, so each new record is assigned a unique, ever increasing number. This means that if you put the clustered index on the primary key, new records are always added at the end of the table without page splits.

    My question though is what if you have a table named Customers with a CustomerID primary key. CustomerID is is a uniqueidentifier. I have another table named PhoneNumber with a PhoneNumberID int identity and CustomerID is a FK that references Customers. I know I could create a clustered primary key on the PhoneNumberID integer and a nonclustered index on CustomerID. But, none of my queries/stored procedures reference the PhoneNumberID. They all do something similiar to "WHERE CustomerID = @CustomerID". Would it make more sense to make the PhoneNumberID field a nonclustered primary key to guarantee uniqueness in the table and put the clustered index on the CustomerID even though it's a uniqueidentifier?

    Some small testing that I did seems to suggest that the latter seems to perform better than the former but I want to make sure I'm not missing anything. I'm also trying to figure out how it will perform once my application starts to scale out and we go from say a couple hundred customers to a couple hundred thousand customers and more.

  • Personally I prefer clustered index to organise the table and nonclustered indexes for data access.

    Shameless plugging of my own article... http://www.sqlservercentral.com/articles/Indexing/68563/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks for the response. I will give it a read!!

    🙂

  • Well, it looks like Gail already covered my concerns and then some in regards to this article.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Gail,

    Just to clarify- if physical order for clustered index in not always the same as logical order (and I read this before) why in this case page split happens while updating/inserting new records? Or am I missing something? Thanks,

  • The clustered index enforces which rows of the cluster belong together on the index pages. If a page has the values 1,3, 4 and 5 on it and value 2 is inserted, that value MUST go on that page. It doesn't have to go in any specific location on the page (it can go at the end after 5) but it must be on that page. If there isn't space on that page, the page must be split.

    What is not enforced is the order of the index pages within the file (the page containing {10, 11, 12, 13} can appear earlier in the file than the page that contains {1,2,3,4,5} nor the order of rows on the page.

    Yes, I need a blog post on this...

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Make sense.

    Thanks a lot.

  • Hi, reading your article I have stumbled across a few more points:

    Under "Selecting columns to give an index" you say

    Putting a Primary Key on a column has the effect of giving it a clustered index.

    A primary is not necessarily a clustered key by its nature.

    1 line down:

    Putting an index on a table column affects all queries that use that table.

    Not necessarily. It is true though that modifications (DML INSERT, DELETE, UPDATE) will be impacted by the indexes on a table.

    few lines further down:

    A WHERE clause that applies a function to the column value can't use an index on that column

    .

    It can indeed, however, it will do a scan but no seek. Which means it has to go through all index values for that column to see if the result of the function satisfies the (WHERE) constraint. And it is not guaranteed that such index scan will be the fastest approach to execute the query.

    Cheers, Frank

  • GilaMonster (8/2/2011)


    Personally I prefer clustered index to organise the table and nonclustered indexes for data access.

    Shameless plugging of my own article... http://www.sqlservercentral.com/articles/Indexing/68563/

    No shame in pointing out the truth when something less than that is offered.

    Thank you Gail!

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply