Introduction to Indexes: Part 2 – The clustered index

  • Ok,

    So... here is a question we have debated in our shop for quite a while.

    The question is...

    Given a table where they is no good candidate for a clustered index, for example we use client generated GUIDs for primary keys and most of the data does not meet the for criteria you outline above... Which approach should we take:

    1. Keep the table as a heap.

    2. Create an identity field on the table which is used for no other reason that to create a clustered index upon it.

    I believe Kim Tripp recommended number 2 in a podcast iirc. however, we have a dev here who is adamant that this is a mistake.

    I'm am scared that the answer will be "it depends".

    HELP!

    Thanks,

    BOb

  • Some considerations:

    1.Size of table and growth rate.

    2.Primary needed for speed of replication?

    3.OLTP vs. Data Warehouse like

    4.3rd normal form vs. Demoralization?

    5.Contention with other servers and process?

    Sometimes you may want to ask:

    Why and when do I need to swing this thing?

    Instead of

    How do I swing this thing?

  • You really do need to know your data access patterns in order to be able to answer the question you pose. One big question is how many forwarded records are there. If lots, then a surrogate CI could be helpful.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Thanks for another very clear article Gail. I am looking forward to part 3!

    Cheers,

    Nicole Bowman

    Nothing is forever.

  • hlam-1032421 (11/11/2009)


    Some considerations:

    1.Size of table and growth rate.

    2.Primary needed for speed of replication?

    3.OLTP vs. Data Warehouse like

    4.3rd normal form vs. Demoralization?

    5.Contention with other servers and process?

    Sometimes you may want to ask:

    Why and when do I need to swing this thing?

    Instead of

    How do I swing this thing?

    So, in my case we are talking about a large table which is constantly growing. For example the GL transaction table in a financial app. Insert perf is very important, but so is select perf. This is an OLTP app with no replication. It is mostly normalized (till it hurts) and denormalized (till it works).

    I agree, asking WHY and When to be able to apply to certain situations.

  • You made a very important piece of information public there: GL transactions. In my experience with financial data GL rows are almost never updated/deleted. That means record forwarding is not an issue. Also most of the access is point lookups so NC indexes against a heap are fine. The one possibility could be if you do extensive reporting by date period in which case you may consider transaction date as the clustered index - but I bet this would not be best unless you do a lot of reporting by date range. And if you do you may want a datamart for reporting anyway (if you don't have one already).

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • <a href="http://www.atoncer.com/trading-cards/ccg/yugioh.htm"> yugioh cards </a>

    <a href="http://www.atoncer.com/trading-cards/ccg/ccg.htm"> ccg collecting </a>

    <a href="http://www.atoncer.com/comics/platinum-age/main.htm"> platinum age comics </a>

    <a href="http://www.atoncer.com/comics/magazines/main.htm"> comic magazines </a>

    <a href="http://www.atoncer.com/comics/comic-art/main.htm"> comic art collecting </a>

  • Very well explained. Very few articles I found that explained the basics so well. This articles really make this site unique...

  • hlam-1032421 (11/11/2009)


    With comments like this I run for the exit:

    … good clustering. One school says to put the clustered index on the column or set of columns that would be most useful for queries, either frequently run queries or ones doing queries of large ranges of data. The other school says to use the clustered index primarily to organise the table and leave data access to the nonclustered indexes.I hold to the second school…

    Cheers. Nice meeting you. 😉

    The difference between hierarchy database and relational database should tell you why no one should be in that school #2.

    Are you referring to the relational requirement that the user doesn't need to concern themselves with storage details? That may well be fine in theory, but unfortunately reality isn't that clean. There is a physical layout to a table and there are good and bad ways to do that.

    Since multiple nonclustered indexes can be created, I'd rather use those (mostly) for queries and select the cluster so that my nonclustered indexes aren't wider than they need to be, that rows aren't moving around, that pages aren't splitting all over the place.

    If you hold to the other view (that the cluster should be selected for data access), why don't you write an article discussing (and contrasting) the two views? It would be an interesting read.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • TheSQLGuru (11/11/2009)


    You made a very important piece of information public there: GL transactions. In my experience with financial data GL rows are almost never updated/deleted. That means record forwarding is not an issue.

    If that is the case, the heap may well be fine. What I'd suggest (if you have the time and space) is to mock up this table (and related tables) on a dev box, do some tests of insert and select performance with it as a heap (representative of the workload in your production environment), then add the identity column, make it the cluster and repeat said tests.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • TheSQLGuru

    I am not quite sure about many Clustered Indexes here. Since the concept tells us that they are part of the table and determine the Ordering of data,then with 4 or 5 of them how physically data will be aligned?

    2. We all know that a clustered index and table strictly reside in the same file and filegroup, some data "clouding" may be found in one FileGroup.

    Gail help me if this should be possible. TheSQLGuru, maybe you can shed some light there for me.

    Otherwise Gail the article is very fantastic, Easy to follow especially on the issue os SARGable.

  • INNOCENT GUMBO (11/12/2009)


    I am not quite sure about many Clustered Indexes here. Since the concept tells us that they are part of the table and determine the Ordering of data,then with 4 or 5 of them how physically data will be aligned?

    Precisely why there can be only one

    2. We all know that a clustered index and table strictly reside in the same file and filegroup, some data "clouding" may be found in one FileGroup.

    Well, the clustered index is the table. Whichever filegroup the clustered index is created on is where the table is.

    Clouding?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thank you Gail for that because I read somewhere where TheSQLGuru was "asking" for more Clustered Indexes on a single table

  • Great article, thanx.

    Assuming the data to load is sorted, can anyone tell me which is quicker when loading a table with a clustered index; 1) With the index defined and in place or 2) Create the clustered index after the data is loaded?

    Thanx,

  • I routinely work with tables where the entire table is a PK, name and address and five other fields. these are just mailing lists, but HUGE - to me anyway - 50 million names is typical.

    The tables are address validated monthly, other than that they are stable, no insertions or deletions ever, only the monthly update. The monthly address validation updates about 1 to 2% of the records monthly.

    I had been (up until reading this article) creating a clustered index that was also a cover index, i.e. the index had every single field. Obviously a WIDE index, however I do not need any other index in this case. Because there is never an insertion or deletion, that discussion goes away.

    So is there any reason (in THIS case) NOT to do things this way?

    Also, given the lack of future inserts, is there any reason NOT to create the index with the Zip5 / Zip4 as the leading fields in the index. I assume that would make the data physically sort on zip instead of physically sort on PK. Again I join on PK but use where and order by on zip.

    Or should I still create the clustered index on PK (Unique) and then add a cover index of every field (they are all used in various processing).

    Gail, I have to agree - "run for the doors" comments are unhelpful and in poor taste. "Expound in detail or write your own article about why".

    jwcolby54

Viewing 15 posts - 31 through 45 (of 122 total)

You must be logged in to reply to this topic. Login to reply