Repetitive Indexes

  • I'm currently transferring some tables from a large database on SQL Server 2000 to a new server, their own database and SQL Server 2008 R2. We don't have the resources to redesign these tables at the moment, so I'm moving them "as is".

    However, that doesn't mean I can't clean up waste while I'm there, especially in indexes which are unlikely to be specifically references from calling programs or queries (the reason I can't redesign anything else).

    On one of the tables, there are a Primary Key and 3 other indexes. The Primary Key is just [id]. I don't like the column names, but what can you do?

    The indexes though, are interesting and I think I can dispose of 2 of them, but I wanted to ask here in case I'm missing something or (likely) not full understanding how indexes are used.

    The other 3 indexes have the following columns as part of the index. There are no included columns as this comes from a 2000 db.

    Index 1: CafeNumber, BusinessDate

    Index 2: BusinessDate, CafeNumber, DField

    Index 3: id, BusinessDate, CafeNumber, PosNumber

    PosNumber is a subset of CafeNumber and is tightly linked to it.

    I'm thinking I can just make one Index with the fields

    BusinessDate, CafeNumber, PosNumber, DField

    and be done with it.

    Is that right or am I missing something?

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

  • Stefan Krzywicki (12/30/2010)


    I'm currently transferring some tables from a large database on SQL Server 2000 to a new server, their own database and SQL Server 2008 R2. We don't have the resources to redesign these tables at the moment, so I'm moving them "as is".

    However, that doesn't mean I can't clean up waste while I'm there, especially in indexes which are unlikely to be specifically references from calling programs or queries (the reason I can't redesign anything else).

    On one of the tables, there are a Primary Key and 3 other indexes. The Primary Key is just [id]. I don't like the column names, but what can you do?

    The indexes though, are interesting and I think I can dispose of 2 of them, but I wanted to ask here in case I'm missing something or (likely) not full understanding how indexes are used.

    The other 3 indexes have the following columns as part of the index. There are no included columns as this comes from a 2000 db.

    Index 1: CafeNumber, BusinessDate

    Index 2: BusinessDate, CafeNumber, DField

    Index 3: id, BusinessDate, CafeNumber, PosNumber

    PosNumber is a subset of CafeNumber and is tightly linked to it.

    I'm thinking I can just make one Index with the fields

    BusinessDate, CafeNumber, PosNumber, DField

    and be done with it.

    Is that right or am I missing something?

    Indexes have different leading columns therefore I assume are designed to serve queries looking for a range based on leading-column plus whatever columns come behind.

    I wouldn't touch it on production - why don't you try suggested change on Test environment?

    Search for queries hitting that table, set a performance baseline for them, alter indexing strategy, test queries again, compare performance.

    That's the way to do it safely.

    As a personal note I'll add that I usually try to stay away for fixing what it's not broken 🙂

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Stefan Krzywicki (12/30/2010)


    PosNumber is a subset of CafeNumber and is tightly linked to it.

    I'm thinking I can just make one Index with the fields

    BusinessDate, CafeNumber, PosNumber, DField

    and be done with it.

    You could if you really want to, but that one index is not a replacement for the three original ones. Remember that column order is important in an index. http://sqlinthewild.co.za/index.php/2009/01/19/index-columns-selectivity-and-equality-predicates/

    Let's take the first index that's currently on the table.

    Index 1: CafeNumber, BusinessDate

    That is seekable for queries of the form

    WHERE Cafenumber = @Var1

    WHERE Cafenumber = @Var1 AND BusinessDate = @Var2

    Your replacement index is seekable for queries of the form

    WHERE BusinessDate = @Var1

    WHERE BusinessDate = @Var1 AND Cafenumber = @Var2

    Note the difference.

    Now if you know that CafeNumber is never in the where clause without BusinessDate, maybe that's OK. Maybe not. You know the queries I don't.

    The third index you can probably safely drop, if something is seeking on ID, it can use the index associated with the pk.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • PaulB-TheOneAndOnly (12/30/2010)


    Stefan Krzywicki (12/30/2010)


    I'm currently transferring some tables from a large database on SQL Server 2000 to a new server, their own database and SQL Server 2008 R2. We don't have the resources to redesign these tables at the moment, so I'm moving them "as is".

    However, that doesn't mean I can't clean up waste while I'm there, especially in indexes which are unlikely to be specifically references from calling programs or queries (the reason I can't redesign anything else).

    On one of the tables, there are a Primary Key and 3 other indexes. The Primary Key is just [id]. I don't like the column names, but what can you do?

    The indexes though, are interesting and I think I can dispose of 2 of them, but I wanted to ask here in case I'm missing something or (likely) not full understanding how indexes are used.

    The other 3 indexes have the following columns as part of the index. There are no included columns as this comes from a 2000 db.

    Index 1: CafeNumber, BusinessDate

    Index 2: BusinessDate, CafeNumber, DField

    Index 3: id, BusinessDate, CafeNumber, PosNumber

    PosNumber is a subset of CafeNumber and is tightly linked to it.

    I'm thinking I can just make one Index with the fields

    BusinessDate, CafeNumber, PosNumber, DField

    and be done with it.

    Is that right or am I missing something?

    Indexes have different leading columns therefore I assume are designed to serve queries looking for a range based on leading-column plus whatever columns come behind.

    I wouldn't touch it on production - why don't you try suggested change on Test environment?

    Search for queries hitting that table, set a performance baseline for them, alter indexing strategy, test queries again, compare performance.

    That's the way to do it safely.

    As a personal note I'll add that I usually try to stay away for fixing what it's not broken 🙂

    Ha, "test environment", that's a good one.

    While your advice is sound, I don't have the access or resources to do as you suggest. I doubt these indexes were placed with that kind of design, which is why I was hoping I might be able to get rid of a few.

    And sure, don't fix what isn't broken except I don't know if it is broken or not. I don't have access to the underlying data that would tell me if it is or isn't and won't until we've completely transferred off the existing server at which time it is likely that my contract will end.

    Fun, huh? : -)

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

  • GilaMonster (12/30/2010)


    Stefan Krzywicki (12/30/2010)


    PosNumber is a subset of CafeNumber and is tightly linked to it.

    I'm thinking I can just make one Index with the fields

    BusinessDate, CafeNumber, PosNumber, DField

    and be done with it.

    You could if you really want to, but that one index is not a replacement for the three original ones. Remember that column order is important in an index. http://sqlinthewild.co.za/index.php/2009/01/19/index-columns-selectivity-and-equality-predicates/

    Let's take the first index that's currently on the table.

    Index 1: CafeNumber, BusinessDate

    That is seekable for queries of the form

    WHERE Cafenumber = @Var1

    WHERE Cafenumber = @Var1 AND BusinessDate = @Var2

    Your replacement index is seekable for queries of the form

    WHERE BusinessDate = @Var1

    WHERE BusinessDate = @Var1 AND Cafenumber = @Var2

    Note the difference.

    Now if you know that CafeNumber is never in the where clause without BusinessDate, maybe that's OK. Maybe not. You know the queries I don't.

    The third index you can probably safely drop, if something is seeking on ID, it can use the index associated with the pk.

    Oh, I wish I knew the queries. I don't have access to them all.

    Thanks for the link. I'm going to go read it now.

    I was afraid of that about the column order. I didn't realize it was quite as drastic as that, where the second column isn't used if alone. Much as I would like to improve the system, I may just have to copy as-is and save changing the indexes for when (if) I can redo the whole system.

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

  • Run a profiler/trace session against that database...that will give you the queries hitting the database.

    Hope that this helps...thanks.

    Chris

  • Thank you for the clarifications.

    I'll say that... if you do not have a Test environment and you do not know the Queries and you do want to keep your job... do nothing. The "thing" is working, you are in the process of a migration, avoid self inflicted wounds. 🙂

    Hope this helps.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • CGSJohnson (12/30/2010)


    Run a profiler/trace session against that database...that will give you the queries hitting the database.

    Hope that this helps...thanks.

    Chris

    Would if I had access. As it is, I have to wait until the transfer is done. Thanks though!

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

  • PaulB-TheOneAndOnly (12/30/2010)


    Thank you for the clarifications.

    I'll say that... if you do not have a Test environment and you do not know the Queries and you do want to keep your job... do nothing. The "thing" is working, you are in the process of a migration, avoid self inflicted wounds. 🙂

    Hope this helps.

    Heh, thanks. We are testing the new environment and I'd hope to catch any problems there, but then again I'm not the one doing the testing so leaving things as-is is likely best.

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

  • Stefan Krzywicki (12/30/2010)


    I was afraid of that about the column order. I didn't realize it was quite as drastic as that, where the second column isn't used if alone.

    If a query filters on a set of columns that is not a left-based subset of the index key, that index is not seekable for that query. It may still be usable as a scan, but not a seek.

    It would be like me giving you a telephone directory of Gauteng and asking you to look up all the Marys who lives in Benoni. Without her surname, you'd be forced to read the entire directory start to end to find them all. As opposed to finding M. Smits

    Maybe also have a read through these: http://www.sqlservercentral.com/articles/Indexing/68439/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (12/30/2010)


    Stefan Krzywicki (12/30/2010)


    PosNumber is a subset of CafeNumber and is tightly linked to it.

    I'm thinking I can just make one Index with the fields

    BusinessDate, CafeNumber, PosNumber, DField

    and be done with it.

    You could if you really want to, but that one index is not a replacement for the three original ones. Remember that column order is important in an index. http://sqlinthewild.co.za/index.php/2009/01/19/index-columns-selectivity-and-equality-predicates/

    Let's take the first index that's currently on the table.

    Index 1: CafeNumber, BusinessDate

    That is seekable for queries of the form

    WHERE Cafenumber = @Var1

    WHERE Cafenumber = @Var1 AND BusinessDate = @Var2

    Your replacement index is seekable for queries of the form

    WHERE BusinessDate = @Var1

    WHERE BusinessDate = @Var1 AND Cafenumber = @Var2

    Note the difference.

    Now if you know that CafeNumber is never in the where clause without BusinessDate, maybe that's OK. Maybe not. You know the queries I don't.

    The third index you can probably safely drop, if something is seeking on ID, it can use the index associated with the pk.

    Does this only apply to WHERE or does it also apply to ON?

    If I have two tables joined thusly

    ON A.CafeNumber = B.CafeNumber

    AND A.PosNumber = B.PosNumber

    WHERE A.FP = 201101

    How do I order the columns? Is it

    CafeNumber, PosNumber, FP

    or

    FP, CafeNumber, PosNumber

    or do they need to be separate indexes? or does the index not affect the join at all? The on-line examples I've seen about this only talk about the WHERE statement. I'm sure the JOINS are discussed somewhere, but I haven't found it yet.

    If there's a reference book you can suggest that references all of this instead of you having to go over all of it, I'd be fine going out to buy it. Of course if you've already covered it in your wonderful blog I'd love to read it there. Reading the link you provided cleared up column order with WHERE statement for me.

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

  • GilaMonster (12/30/2010)


    Stefan Krzywicki (12/30/2010)


    I was afraid of that about the column order. I didn't realize it was quite as drastic as that, where the second column isn't used if alone.

    If a query filters on a set of columns that is not a left-based subset of the index key, that index is not seekable for that query. It may still be usable as a scan, but not a seek.

    It would be like me giving you a telephone directory of Gauteng and asking you to look up all the Marys who lives in Benoni. Without her surname, you'd be forced to read the entire directory start to end to find them all. As opposed to finding M. Smits

    Maybe also have a read through these: http://www.sqlservercentral.com/articles/Indexing/68439/

    Heh, I'd started with those, but I likely missed the phone-book example because I was looking to answer a specific question at the time. Your responses here, those articles and the blog post have me more up to speed than before, but there's a lot to take into consideration. Practice will help. : -)

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

  • Stefan Krzywicki (12/30/2010)


    Does this only apply to WHERE or does it also apply to ON?

    Applies to both (kinda).

    If I have two tables joined thusly

    ON A.CafeNumber = B.CafeNumber

    AND A.PosNumber = B.PosNumber

    WHERE A.FP = 201101

    How do I order the columns? Is it

    CafeNumber, PosNumber, FP

    or

    FP, CafeNumber, PosNumber

    I would go for the second option, because then I know that SQL will filter first and then join with the result. Rule of indexes - let SQL reduce the rows in consideration as fast and as early as possible. But test both ways and see which the optimiser prefers.

    Will depend what kind of join is used. An index can be used for a nested loop join, it can assist with a merge join (by providing order) and is of no use for a hash join.

    or do they need to be separate indexes?

    http://sqlinthewild.co.za/index.php/2010/09/14/one-wide-index-or-multiple-narrow-indexes/

    If there's a reference book you can suggest that references all of this instead of you having to go over all of it, I'd be fine going out to buy it.

    Haven't written it yet.

    Of course if you've already covered it in your wonderful blog I'd love to read it there. Reading the link you provided cleared up column order with WHERE statement for me.

    Have a read through everything in the indexes category there. It's a bit hodgepodge, the basics aren't there (see the SSC articles), but there's a fair bit covered (and a lot missing still).

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Index 1: CafeNumber, BusinessDate

    Index 2: BusinessDate, CafeNumber, DField

    Index 3: id, BusinessDate, CafeNumber, PosNumber

    There is a reason for multiple indexes. For example, if a reporting type query only searches on BusinessDate, and none of the indexes have BusinessDate as the leading column, then that will most likely result in a full table scan. Some of these composite indexes may have originally been created in an attempt to support a specific covered query scenario, so I wouldn't drop them just because they seem odd.

    If there is already a clustered primary key on [id], then Index 3 may be redundant and perhaps not used. That's the one that stands out without knowing more about your database.

    SQL Server 2005/2008 provides much better options for evanualting things like index usage than v2000. For example, once you have this database ported over to QA, you can use the Database Tuning Advisor (DTA) while smoke testing the application and running sample procedure calls.

    Also, there are some DMV queries that will reveal which indexes are used and how frequently. For example, it will give you the actual number of times an index was utilized in a query and even the cost of an index to maintain in terms of how frequently it's updated. However, the result of these queries are based on SQL Server's internal usage statistics, so you first need to perform a sample workload of the application on the server before you get back useful information.

    I'll dump the queries I typically use below:

    --Unused indexes and tables

    SELECT

    '[' + object_name(i.object_id) + '].[' + i.name + ']' as TableIndexName,

    i.index_id

    FROMsys.indexes i

    INNER JOIN sys.objects o

    ONo.object_id = i.object_id

    LEFT JOIN sys.dm_db_index_usage_stats s

    ONi.object_id=s.object_id

    ANDi.index_id=s.index_id

    ANDdatabase_id = DB_ID()

    WHEREobjectproperty(o.object_id,'IsUserTable') = 1

    ANDs.index_id IS NULL

    ORDER BY TableIndexName ASC

    --Index usage. Least used appear first.

    SELECTobject_name(s.object_id) AS ObjectName

    , s.object_id

    , i.name as IndexName

    , i.index_id

    , user_seeks

    , user_scans

    , user_lookups

    , user_updates

    FROMsys.dm_db_index_usage_stats s

    INNER JOIN sys.indexes i

    ONi.object_id = s.object_id

    AND i.index_id = s.index_id

    WHEREdatabase_id = db_id ()

    ANDobjectproperty(s.object_id,'IsUserTable') = 1

    ORDER BY (user_seeks + user_scans + user_lookups + user_updates) ASC

    --Index scan appear more than index seeks

    SELECTobject_name(s.object_id) AS ObjectName

    , s.object_id

    , i.name as IndexName

    , i.index_id

    , user_seeks

    , user_scans

    , user_lookups

    , user_updates

    FROMsys.dm_db_index_usage_stats s

    INNER JOIN sys.indexes i

    ONi.object_id = s.object_id

    AND i.index_id = s.index_id

    WHEREdatabase_id = db_id ()

    ANDobjectproperty(s.object_id,'IsUserTable') = 1

    ANDuser_scans > user_seeks

    ORDER BY user_scans DESC

    --Index updated more than it is used

    SELECTobject_name(s.object_id) AS ObjectName

    , s.object_id

    , i.name as IndexName

    , i.index_id

    , user_seeks

    , user_scans

    , user_lookups

    , user_updates

    FROMsys.dm_db_index_usage_stats s

    INNER JOIN sys.indexes i

    ONi.object_id = s.object_id

    AND i.index_id = s.index_id

    WHEREdatabase_id = db_id ()

    ANDobjectproperty(s.object_id,'IsUserTable') = 1

    ANDuser_updates > (user_scans + user_seeks)

    ANDs.index_id > 1

    ORDER BY user_updates DESC

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • GilaMonster (12/30/2010)


    Stefan Krzywicki (12/30/2010)


    Does this only apply to WHERE or does it also apply to ON?

    Applies to both (kinda).

    If I have two tables joined thusly

    ON A.CafeNumber = B.CafeNumber

    AND A.PosNumber = B.PosNumber

    WHERE A.FP = 201101

    How do I order the columns? Is it

    CafeNumber, PosNumber, FP

    or

    FP, CafeNumber, PosNumber

    I would go for the second option, because then I know that SQL will filter first and then join with the result. Rule of indexes - let SQL reduce the rows in consideration as fast and as early as possible. But test both ways and see which the optimiser prefers.

    Will depend what kind of join is used. An index can be used for a nested loop join, it can assist with a merge join (by providing order) and is of no use for a hash join.

    or do they need to be separate indexes?

    http://sqlinthewild.co.za/index.php/2010/09/14/one-wide-index-or-multiple-narrow-indexes/

    If there's a reference book you can suggest that references all of this instead of you having to go over all of it, I'd be fine going out to buy it.

    Haven't written it yet.

    Of course if you've already covered it in your wonderful blog I'd love to read it there. Reading the link you provided cleared up column order with WHERE statement for me.

    Have a read through everything in the indexes category there. It's a bit hodgepodge, the basics aren't there (see the SSC articles), but there's a fair bit covered (and a lot missing still).

    Excellent, thanks Gail! I'll be reading your blog over the long weekend. : -)

    Have a great New Years.

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply