Index design question

  • Hi,

    I've built a table, and am trying to build indexes (prior to data being added) that I believe will be useful given how the data is planned to be used. The table has many columns of various types of data. In it there are LastName (varchar), FirstName (varchar), and DateOfBirth (datetime). I expect that the most often used query will be something similar to:

    SELECT * FROM Tbl WHERE LastName Like '<data>%' AND FirstName Like '<data>%' AND DateOfBirth = '<data>'

    There may also be some variations of this query depending upon whether the user enters FirstName and/or DateOfBirth. LastName would always be required though.

    1) I wasn't quite sure how LIKE interacts with indexes to begin with.

    2) I was considering creating a clustered index on the combination of LastName, FirstName, and DateOfBirth. I wasn't sure though if this would be useful when FirstName was not specified, and/or DateOfBirth was not specified within the query. Let's say that the user only entered LastName of "Smith" and a DateOfBirth of "01/01/2012", would it be able to utilize the DateOfBirth with the index?

    Thanks,

    Larry

  • Like can use an index providing the wildcard is at the end.

    The combo of surname, firstname and dateofbirth is a very poor choice for clustered index. A clustered index should be narrow, uniquem unchanging and ever-increasing (values inserted are always higher than existing values). That combo is neither narrow nor ever-increasing.

    Maybe have a read through this series:

    http://www.sqlservercentral.com/articles/Indexing/68439/

    http://www.sqlservercentral.com/articles/Indexing/68563/

    http://www.sqlservercentral.com/articles/Indexing/68636/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks Gail.

    Okay, so what about a combined non-clustered index on LastName, FirstName, and DateOfBirth? Based upon the situation of having, or not having, the FirstName and/or the DateOfBirth, would it be better to have the combined index, or have separate indexes on each of these three columns?

    What happens in the combined index when FirstName isn't included in the query, but DateOfBirth is, OR FirstName is included, but DateOfBirth isn't?

  • As a nonclustered index that would certainly be better than a clustered index. Maybe take a read through those articles, they do cover the question of columns and column ordering.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Okay Gail, I've read your series. It did provide me with a little better understanding of indexes than I had before, BUT, I'm now wondering, given the situation that I'm describing above, if it would be beneficial to create multiple non-clustered indexes just on the individual columns (LastName, FirstName, and DateOfBirth), or should I maybe create two composite indexes such as:

    Index1: LastName, FirstName, DateOfBirth

    Index2: LastName, DateOfBirth

    I thought maybe two composite indexes might provide a better chance of getting an index hit. Would using LIKE '<data>%' have any affect on Index1 when LastName and FirstName are both queried in this manner?

    Is there any advantage (or disadvantage) of using these two composite indexes over using three single-column indexes?

    Thanks!

  • Re multiple indexes each on their own column: http://sqlinthewild.co.za/index.php/2010/09/14/one-wide-index-or-multiple-narrow-indexes/

    In the scenario you describe I would go for 2 indexes:

    LastName, FirstName, DoB

    DoB, LastName (just so that we have a different leading column)

    Re the wildcards, not great but if you need them then there's no choice. See if you can write the app/code so that there's only a wildcard if part of the surname was specified

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks again Gail. I think I will create the two indexes as you described. I have read several of your articles, and I do think I have a better understanding of how indexes really work now. In order to make the search flexible enough for the users, I don't think I can avoid using LIKE in the first name since there can be first names like Michael or Mike.

    One other question though. In one of your articles you indicate that there are two schools of thought about creating the proper clustered index. One is to create it on the field(s) that are used most often as "key" fields to the table, even if they aren't entirely unique. The other is to create them on a "narrow", purely "unique" key, where the value increases (such as on an Identity column).

    I have an identity column in my table, but I also have a unique column (varchar(30)) that will be the key to other sub-tables. It is a combined value column where the first part of the value is a string representation of a date down to milliseconds (YYYYMMDDHHmmSSxxx) and that's concatenated together with a string literal value representing an organization name acronym. I'm contemplating making this concatenated column my clustered index rather than the identity column since that column wouldn't be used much for queries. I know it's not very narrow, but it should be unique, based upon how the data will be inserted into the table, and with the first part being a date/time value, it should be increasing in value.

    What do you think?

    Thanks,

    Larry

  • lhowe (2/27/2012)


    I don't think I can avoid using LIKE in the first name since there can be first names like Michael or Mike.

    Like's not going to help there, unless you're searching for LIKE 'Mi%'

    I have an identity column in my table, but I also have a unique column (varchar(30)) that will be the key to other sub-tables. It is a combined value column where the first part of the value is a string representation of a date down to milliseconds (YYYYMMDDHHmmSSxxx) and that's concatenated together with a string literal value representing an organization name acronym. I'm contemplating making this concatenated column my clustered index rather than the identity column since that column wouldn't be used much for queries. I know it's not very narrow, but it should be unique, based upon how the data will be inserted into the table, and with the first part being a date/time value, it should be increasing in value.

    I would probably consider that varchar as the cluster, just note that having that as the cluster automatically makes nonclustered indexes up to 30 characters wider, so take that into account.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply