Huge tables with no appropriate column for creating clustered index !

  • Hello,

    I have come across a stored procedure which needs to be tuned. It is using some tables that are too huge (100+ million records, datasize 5GB).

    The SELECT operations on these tables are taking most of the time. There is no unique column present in these tables that can be either made primary key or clustered index or can be considered as basis of horizontally partitioning the tables. There is one non-clustered index present for each of these tables on a non-unique integer column but this index is not helping much as the performance is real slow.

    Will adding a new unique column (identity column) will help to increase the performance as then these tables could be partitioned and I can create clustered index on that unique column? Looking for experts' advice on this.

    Regards.

  • You're suffering under the misunderstanding that a clustered index needs to be unique. It doesn't. The Primary Key does, and there should be some key there to uniquely identify a row, but it doesn't need to be the clustered indexing.

    The clustered index needs to be balanced against the churn the data pages will suffer, as well as that it will be carried to all non-clustered indexes (including an internal Row ID that SQL will generate as a unique identifier for non-unique clustered indexes).

    100 million rows isn't exhaustive, especially if you're working against a data warehouse. If you post the tables' DDL, existing indexing, and query you're trying to optimize, I'm sure you'll get some pretty solid assistance. By preference, also post the existing .sqlplan for the query. See my sig for the 3rd link if you need assistance with any of that.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Hi,

    Queries which are taking time are of type,

    SELECT......

    FROM SomeTable WITH (NOLOCK)

    WHERE SomeColumn BETWEEN 47300 AND 47315

    There are 4 tables which are similar to SomeTable and involved in similar query as mentioned above. Each of this query takes 5 to 6 minutes. The SomeColumn column is similar in all of these tables. This SomeColumn is a non unique integer column and currently SomeTable has only one index which is a non clustered index and involvs only one column that is SomeColumn. The execution plan (ExecutionPlan1.sqlplan) shows a table scan

    When I remove this non clustered index and create a clustered index on the SomeColumn column (which is non unique), I get better performance and results come in well under 2 minutes. The execution plan (ExecutionPlan2.sqlplan) shows Clustered index seek.

    So, creating a clustered index on non unique column is fine, right? However, I will definitely need a unique column if I would have to partition this table.

    Regards.

  • Sachin Vaidya (11/24/2010)


    So, creating a clustered index on non unique column is fine, right? However, I will definitely need a unique column if I would have to partition this table.

    Yes, a clustered index on a non-unique is appropriate as long as it's relatively tight (since it goes to all non-clustereds as well) and is your main access method to the table.

    You do not need a unique column to partition against, and I'm not sure you need one at all. This white paper may help you more:

    http://msdn.microsoft.com/en-us/library/ms345146(SQL.90).aspx


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Queries which are taking time are of type,

    SELECT ......

    FROM SomeTable WITH (NOLOCK)

    WHERE SomeColumn BETWEEN 47300 AND 47315

    There are 4 tables which are similar to SomeTable and involved in similar query as mentioned above. Each of this query takes 5 to 6 minutes. The SomeColumn column is similar in all of these tables. This SomeColumn is a non unique integer column and currently SomeTable has only one index which is a non clustered index and involves only one column that is SomeColumn. The execution plan (ExecutionPlan1.sqlplan) shows a table scan

    The reason for the table scan is the ..... in the SELECT

    It doesn't do much good to find which rows are needed if you then have to go find them all in a heap to get the other columns.

    Try adding the columns selected to the index. This is called a "covering" index and the table is not touched at all.

    If there are further where clauses or sorts make those columns part of the index, otherwise just include them.

    When I remove this non clustered index and create a clustered index on the SomeColumn column (which is non unique), I get better performance and results come in well under 2 minutes. The execution plan (ExecutionPlan2.sqlplan) shows Clustered index seek.

    Making it a clustered index is about the same as including every possible column in the index.

    Which makes the index bigger, fewer entries per page, etc.

    So if your select list is MOST or all of the columns then the PK is the way to go. If it is only a few (and or smaller) columns then a covering index would be better. In fact, if you create both PK and covering the query plan will tell you which is better.

    So, creating a clustered index on non unique column is fine, right?

    Yes

    However, I will definitely need a unique column if I would have to partition this table.

    I don't think you quite understand partitioning.

    Think of the conference registration table:

    A-G H-K L-O P-Z

    For example, you would partition an address table by state, not zip+4

  • May I spice this one?

    Subjet of this thread is...

    Sachin Vaidya (11/23/2010)


    Huge tables with no appropriate column for creating clustered index!

    ...and I say "so what?" 😀

    There is the common missunderstanding that all tables have to have a clustered index. I do agree clustered indexes are good in certain situations but that does not mean you must have one.

    A heap table with a proper indexing strategy will perform well in most situations.

    I would check the indexing strategy and match it to business requirements, a clustered index may or may be not needed or even appropriate.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • PaulB-TheOneAndOnly (12/3/2010)


    May I spice this one?

    Subjet of this thread is...

    Sachin Vaidya (11/23/2010)


    Huge tables with no appropriate column for creating clustered index!

    ...and I say "so what?" 😀

    There is the common missunderstanding that all tables have to have a clustered index. I do agree clustered indexes are good in certain situations but that does not mean you must have one.

    A heap table with a proper indexing strategy will perform well in most situations.

    I would check the indexing strategy and match it to business requirements, a clustered index may or may be not needed or even appropriate.

    As you know, a clustered index is essential to being able to rebuild (for the sake of defragmentation) the non-clustered indexes. Yes, you could create the clustered index, reorg or rebuild the non-clustered indexes, and then drop the clustered index but I believe that it would take too long on a table of this size to do it that way to be practical. Besides, a clustered index takes less room than a non-clustered index.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Another issue might be the table design by itself.

    If you don't have a column that uniquely identifies a row, then your table structure isn't normalized.

    The downside of it: you'll need to copy all columns that'll identify a row for each and every row, therewith increasing the row size and the table size.

    You might be better off splitting your table in two (or even more) separate tables where you have a single (narrow) column to identify a single row. This might also reduce the size of the tables by eliminating duplicate information.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Jeff Moden (12/3/2010)


    PaulB-TheOneAndOnly (12/3/2010)


    May I spice this one?

    Subjet of this thread is...

    Sachin Vaidya (11/23/2010)


    Huge tables with no appropriate column for creating clustered index!

    ...and I say "so what?" 😀

    There is the common missunderstanding that all tables have to have a clustered index. I do agree clustered indexes are good in certain situations but that does not mean you must have one.

    A heap table with a proper indexing strategy will perform well in most situations.

    I would check the indexing strategy and match it to business requirements, a clustered index may or may be not needed or even appropriate.

    As you know, a clustered index is essential to being able to rebuild (for the sake of defragmentation) the non-clustered indexes. Yes, you could create the clustered index, reorg or rebuild the non-clustered indexes, and then drop the clustered index but I believe that it would take too long on a table of this size to do it that way to be practical. Besides, a clustered index takes less room than a non-clustered index.

    Jeff - I'm not saying you should not use a clustered index, I'm saying you should use it when it brings some value.

    I've seen many tables where an extra column is added just to build a clustered index - how that saves space?

    In regards to reorg, you can always reorg the target table and rebuild indexes - it takes a little more scripting but it works just fine.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • PaulB-TheOneAndOnly (12/4/2010)


    Jeff - I'm not saying you should not use a clustered index, I'm saying you should use it when it brings some value.

    I've seen many tables where an extra column is added just to build a clustered index - how that saves space?

    A clustered index saves space (compared to a non-clustered index) because is built essentially as part of the data and doesn't require nearly as much space as a non-clustered index.

    In regards to reorg, you can always reorg the target table and rebuild indexes - it takes a little more scripting but it works just fine.

    Before I reply to that (I don't want to assume), what is the sequence of events that you would use to accomplish that on a 200GB table?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (12/4/2010)


    PaulB-TheOneAndOnly (12/4/2010)


    Jeff - I'm not saying you should not use a clustered index, I'm saying you should use it when it brings some value.

    I've seen many tables where an extra column is added just to build a clustered index - how that saves space?

    A clustered index saves space (compared to a non-clustered index) because is built essentially as part of the data and doesn't require nearly as much space as a non-clustered index.

    hey man, gimme some credit here - I know what a clustered index is 😀

    I think you didn't read what I wrote, if you are adding an extra column just to have your clustered index, and on top of it you have to have all the indexes your indexing strategy requires then ... tell me how are you saving space by adding the overhead of the clustered index?

    Jeff Moden (12/4/2010)


    PaulB-TheOneAndOnly (12/4/2010)


    In regards to reorg, you can always reorg the target table and rebuild indexes - it takes a little more scripting but it works just fine.

    Before I reply to that (I don't want to assume), what is the sequence of events that you would use to accomplish that on a 200GB table?

    Okay... you keep going back to the specific case of the 200GB table when I made it clear from post #1 I was talking in general.

    Having said that I would certainly have my 200GB table and indexes partitioned using the same function then I'll do whatever maintenance I have to do in a partition by partition basis. You can switch your target partition out, do whatever you want to do then switch it in back to the original monster table.

    Hope this clarifies.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • PaulB-TheOneAndOnly (12/4/2010)


    Jeff Moden (12/4/2010)


    PaulB-TheOneAndOnly (12/4/2010)


    Jeff - I'm not saying you should not use a clustered index, I'm saying you should use it when it brings some value.

    I've seen many tables where an extra column is added just to build a clustered index - how that saves space?

    A clustered index saves space (compared to a non-clustered index) because is built essentially as part of the data and doesn't require nearly as much space as a non-clustered index.

    hey man, gimme some credit here - I know what a clustered index is 😀

    I think you didn't read what I wrote, if you are adding an extra column just to have your clustered index, and on top of it you have to have all the indexes your indexing strategy requires then ... tell me how are you saving space by adding the overhead of the clustered index?

    Jeff Moden (12/4/2010)


    PaulB-TheOneAndOnly (12/4/2010)


    In regards to reorg, you can always reorg the target table and rebuild indexes - it takes a little more scripting but it works just fine.

    Before I reply to that (I don't want to assume), what is the sequence of events that you would use to accomplish that on a 200GB table?

    Okay... you keep going back to the specific case of the 200GB table when I made it clear from post #1 I was talking in general.

    Having said that I would certainly have my 200GB table and indexes partitioned using the same function then I'll do whatever maintenance I have to do in a partition by partition basis. You can switch your target partition out, do whatever you want to do then switch it in back to the original monster table.

    Hope this clarifies.

    Heh... the whole problem is that you made it sound like (IMHO) you need to add a column to have a clustered index. I know you know otherwise and I also know that's not what you meant, but you made it sound that way and I want to make sure people know they don't need to add a (especially, unique) column to create a clustered index.

    And, yeah... I know you were talking in general but the OP was about a very large table... not in general. 😀

    You also made it sound like it's a bad idea to have a clustered index. You and I agreed a long time ago that about the only place you shouldn't have a clustered index is on a staging table. 😉 I just want to make sure that people know that having a clustered index on a table, especially of any table of size, is in fact, a good idea to have especially if you ever want to defrag non-clustered indexes or recover space from the table due to deletes. :hehe:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (12/4/2010)


    PaulB-TheOneAndOnly (12/4/2010)


    Jeff Moden (12/4/2010)


    PaulB-TheOneAndOnly (12/4/2010)


    Jeff - I'm not saying you should not use a clustered index, I'm saying you should use it when it brings some value.

    I've seen many tables where an extra column is added just to build a clustered index - how that saves space?

    A clustered index saves space (compared to a non-clustered index) because is built essentially as part of the data and doesn't require nearly as much space as a non-clustered index.

    hey man, gimme some credit here - I know what a clustered index is 😀

    I think you didn't read what I wrote, if you are adding an extra column just to have your clustered index, and on top of it you have to have all the indexes your indexing strategy requires then ... tell me how are you saving space by adding the overhead of the clustered index?

    Jeff Moden (12/4/2010)


    PaulB-TheOneAndOnly (12/4/2010)


    In regards to reorg, you can always reorg the target table and rebuild indexes - it takes a little more scripting but it works just fine.

    Before I reply to that (I don't want to assume), what is the sequence of events that you would use to accomplish that on a 200GB table?

    Okay... you keep going back to the specific case of the 200GB table when I made it clear from post #1 I was talking in general.

    Having said that I would certainly have my 200GB table and indexes partitioned using the same function then I'll do whatever maintenance I have to do in a partition by partition basis. You can switch your target partition out, do whatever you want to do then switch it in back to the original monster table.

    Hope this clarifies.

    Heh... the whole problem is that you made it sound like (IMHO) you need to add a column to have a clustered index. I know you know otherwise and I also know that's not what you meant, but you made it sound that way and I want to make sure people know they don't need to add a (especially, unique) column to create a clustered index.

    And, yeah... I know you were talking in general but the OP was about a very large table... not in general. 😀

    You also made it sound like it's a bad idea to have a clustered index. You and I agreed a long time ago that about the only place you shouldn't have a clustered index is on a staging table. 😉 I just want to make sure that people know that having a clustered index on a table, especially of any table of size, is in fact, a good idea to have especially if you ever want to defrag non-clustered indexes or recover space from the table due to deletes. :hehe:

    I'm really sorry I sound that way - I thought I did everything I could to make it clear I'm not against clustered indexes but I do not dismiss heap tables either. Looks like I do have to work on my communication skils 🙂

    Interesting you mention that - the way I sound - because I wrote my first post in this thread when I thought everybody was saying that it was kind of mandatory to have a clustered index in any b&^%%y table you create 😀

    I do agree with you a staging table is a brilliant example for a heap table, kind of a no brainer but I would certainly stop short of saying that staging tables are the only scenario where a heap table is applicable.

    It has always called my attention how in this SQL Server dominated universe heap tables are dismissed, persecuted and confined to live a misserable life in Hospital Island. I know for a fact that in close parallel universes - non SQL Server dominated - heap tables are the standard nevertheless other kind of tables (like index organized tables - clustered index based) are treated with upmost respect and considered full citizens.

    Let me try to summarize. My intention was to tell the community that you are neither going to be kick out of the Kirk nor your soul will burn in hell forever if you have a compelling reason to use a heap table. There is a growing number of us gathering in the summer solstice chanting "heap tables are in accordance to the book of Codd... heap tables are in accordance to the book of Codd..." then we drink our secret beberage made out of malt, hops and yeast and go back home. 😀

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Heh... no problem, Paul. I just misread you. Thanks for the friendly feedback.

    BTW... when's the next meeting of your "heap table" group? I might not agree 100% with the premise, but the drinks sound pretty good. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Now that the religious(ish) discussion of heap vs. clustered has been resolved let's circle back to the original post.

    IMHO it was a teachable moment for us to reach many newbies on several topics, including the concept of partitioning and covering indexes.

    There is no unique column present in these tables that can be ... can be considered as basis of horizontally partitioning the tables. ...

    Will adding a new unique column (identity column) will help to increase the performance as then these tables could be partitioned ... ? Looking for experts' advice on this.

    I am at work and cannot spend serious time on this beyond the two cents I put in earlier, but are there other threads covering these topics that someone could supply a link to?

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply