Non Clustered Index on Phone column

  • Dear All,
    I have a Lead table with below fields.
    LeadId int (Not a primary key)
    Telephone
    ...
    ...Table have 4 millions records
    Client always search the data with the Telephone number.
    I have created only Non clustered index on Telephone Number field.
    Should I have also added the primary key constraint on leadId?
    If I add the primary key on LeadId column, performance will be increase or not?

  • sunilkmr284 - Saturday, August 18, 2018 3:48 AM

    Dear All,
    I have a Lead table with below fields.
    LeadId int (Not a primary key)
    Telephone
    ...
    ...Table have 4 millions records
    Client always search the data with the Telephone number.
    I have created only Non clustered index on Telephone Number field.
    Should I have also added the primary key constraint on leadId?
    If I add the primary key on LeadId column, performance will be increase or not?

    Can you post the full DDL (create table) script and information on what of the data you want to bring back and how the users will do the search?
    😎

    There is a big difference between handling small result sets and large result sets produced by very generic queries such as searching for anything starting with one or two digits.

  • Eirikur Eiriksson - Saturday, August 18, 2018 5:14 AM

    sunilkmr284 - Saturday, August 18, 2018 3:48 AM

    Dear All,
    I have a Lead table with below fields.
    LeadId int (Not a primary key)
    Telephone
    ...
    ...Table have 4 millions records
    Client always search the data with the Telephone number.
    I have created only Non clustered index on Telephone Number field.
    Should I have also added the primary key constraint on leadId?
    If I add the primary key on LeadId column, performance will be increase or not?

    Can you post the full DDL (create table) script and information on what of the data you want to bring back and how the users will do the search?
    😎

    There is a big difference between handling small result sets and large result sets produced by very generic queries such as searching for anything starting with one or two digits.

    USE ContactDB
    GO

    /****** Object: Table [dbo].[tbl_Lead]  Script Date: 08/18/2018 16:56:57 ******/
    SET ANSI_NULLS ON
    GO

    SET QUOTED_IDENTIFIER ON
    GO

    SET ANSI_PADDING ON
    GO

    CREATE TABLE [dbo].[tbl_Lead](
        [LeadID] [int] IDENTITY(1,1) NOT NULL,
        [Company Name] [varchar](100) NULL,
        [Title] [varchar](50) NULL,
        [First Name] [varchar](50) NULL,
        [Last Name] [varchar](50) NULL,
        [Address1] [varchar](100) NULL,
        [Address2] [varchar](100) NULL,
        [Address3] [varchar](100) NULL,
        [Town] [varchar](50) NULL,
        [County] [varchar](50) NULL,
        [PostCode] [varchar](20) NULL,
        [Telephone] [varchar](20) NULL,
        [Email] [varchar](150) NULL
    ) ON [PRIMARY]

    GO

    SET ANSI_PADDING OFF
    GO

    Searching 
    CREATE PROCEDURE [dbo].[Lead_Get]
        @PhoneNo varchar(50)
    AS
    BEGIN
        -- SET NOCOUNT ON added to prevent extra result sets from
        -- interfering with SELECT statements.
        SET NOCOUNT ON;

      select * from tbl_lead where Telephone=@PhoneNo
    end

  • sunilkmr284 - Saturday, August 18, 2018 5:30 AM

    Eirikur Eiriksson - Saturday, August 18, 2018 5:14 AM

    sunilkmr284 - Saturday, August 18, 2018 3:48 AM

    Dear All,
    I have a Lead table with below fields.
    LeadId int (Not a primary key)
    Telephone
    ...
    ...Table have 4 millions records
    Client always search the data with the Telephone number.
    I have created only Non clustered index on Telephone Number field.
    Should I have also added the primary key constraint on leadId?
    If I add the primary key on LeadId column, performance will be increase or not?

    Can you post the full DDL (create table) script and information on what of the data you want to bring back and how the users will do the search?
    😎

    There is a big difference between handling small result sets and large result sets produced by very generic queries such as searching for anything starting with one or two digits.

    USE ContactDB
    GO

    /****** Object: Table [dbo].[tbl_Lead]  Script Date: 08/18/2018 16:56:57 ******/
    SET ANSI_NULLS ON
    GO

    SET QUOTED_IDENTIFIER ON
    GO

    SET ANSI_PADDING ON
    GO

    CREATE TABLE [dbo].[tbl_Lead](
        [LeadID] [int] IDENTITY(1,1) NOT NULL,
        [Company Name] [varchar](100) NULL,
        [Title] [varchar](50) NULL,
        [First Name] [varchar](50) NULL,
        [Last Name] [varchar](50) NULL,
        [Address1] [varchar](100) NULL,
        [Address2] [varchar](100) NULL,
        [Address3] [varchar](100) NULL,
        [Town] [varchar](50) NULL,
        [County] [varchar](50) NULL,
        [PostCode] [varchar](20) NULL,
        [Telephone] [varchar](20) NULL,
        [Email] [varchar](150) NULL
    ) ON [PRIMARY]

    GO

    SET ANSI_PADDING OFF
    GO

    Searching 
    CREATE PROCEDURE [dbo].[Lead_Get]
        @PhoneNo varchar(50)
    AS
    BEGIN
        -- SET NOCOUNT ON added to prevent extra result sets from
        -- interfering with SELECT statements.
        SET NOCOUNT ON;

      select * from tbl_lead where Telephone=@PhoneNo
    end

    I would try a clustered primary key on [LeadID] and a unique (if it is unique) index on Telephone with an INCLUDE of all the other columns.

  • Jonathan AC Roberts - Saturday, August 18, 2018 6:46 PM

    I would try a clustered primary key on [LeadID] and a unique (if it is unique) index on Telephone with an INCLUDE of all the other columns.

    If all the searches are on an exact phone number, literally duplicating the data does not make too much sense. Therefore, I would suggest a non-clustered index on the Phone number column with the inclusion of the clustered index, searches will then find the matches and bring the full results back via key lookup. Because the phone numbers can be shared, it is futile to use a unique constraint on the index unless the business rules limit the entries to unique numbers, something that we do not know.
    😎

  • Eirikur Eiriksson - Sunday, August 19, 2018 3:57 AM

    Jonathan AC Roberts - Saturday, August 18, 2018 6:46 PM

    I would try a clustered primary key on [LeadID] and a unique (if it is unique) index on Telephone with an INCLUDE of all the other columns.

    If all the searches are on an exact phone number, literally duplicating the data does not make too much sense. Therefore, I would suggest a non-clustered index on the Phone number column with the inclusion of the clustered index, searches will then find the matches and bring the full results back via key lookup. Because the phone numbers can be shared, it is futile to use a unique constraint on the index unless the business rules limit the entries to unique numbers, something that we do not know.
    😎

    I did say if it was unique.
    If this is the query that's most common, and performance is the issues, and the data doesn't have massive amounts of updates; an include all columns will allow a seek without a lookup which will be slightly faster than a lookup.

  • Jonathan AC Roberts - Sunday, August 19, 2018 4:58 AM

    Eirikur Eiriksson - Sunday, August 19, 2018 3:57 AM

    Jonathan AC Roberts - Saturday, August 18, 2018 6:46 PM

    I would try a clustered primary key on [LeadID] and a unique (if it is unique) index on Telephone with an INCLUDE of all the other columns.

    If all the searches are on an exact phone number, literally duplicating the data does not make too much sense. Therefore, I would suggest a non-clustered index on the Phone number column with the inclusion of the clustered index, searches will then find the matches and bring the full results back via key lookup. Because the phone numbers can be shared, it is futile to use a unique constraint on the index unless the business rules limit the entries to unique numbers, something that we do not know.
    😎

    I did say if it was unique.
    If this is the query that's most common, and performance is the issues, and the data doesn't have massive amounts of updates; an include all columns will allow a seek without a lookup which will be slightly faster than a lookup.

    An INCLUDE of all the columns necessary would be a totally unnecessary duplication of data, in this case.  That's all indexes really are is a copy of the data sorted in a different order.

    We also don't know what the op means by improving the performance.  Seems to me that a "lead" table would require singleton lookups by phone number and that just about won't ever cause a performance problem if there's an index on the telephone number.  Even a HEAP will do fine in this area.  If they're doing something in batches, the OP needs to identify that before we can make recommendations on "performance".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • sunilkmr284 - Saturday, August 18, 2018 3:48 AM

    Dear All,
    I have a Lead table with below fields.
    LeadId int (Not a primary key)
    Telephone
    ...
    ...Table have 4 millions records
    Client always search the data with the Telephone number.
    I have created only Non clustered index on Telephone Number field.
    Should I have also added the primary key constraint on leadId?
    If I add the primary key on LeadId column, performance will be increase or not?

    Performance of WHAT???  How is the table being used and what are the current issues with performance?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Sunday, August 19, 2018 1:36 PM

    Jonathan AC Roberts - Sunday, August 19, 2018 4:58 AM

    Eirikur Eiriksson - Sunday, August 19, 2018 3:57 AM

    Jonathan AC Roberts - Saturday, August 18, 2018 6:46 PM

    I would try a clustered primary key on [LeadID] and a unique (if it is unique) index on Telephone with an INCLUDE of all the other columns.

    If all the searches are on an exact phone number, literally duplicating the data does not make too much sense. Therefore, I would suggest a non-clustered index on the Phone number column with the inclusion of the clustered index, searches will then find the matches and bring the full results back via key lookup. Because the phone numbers can be shared, it is futile to use a unique constraint on the index unless the business rules limit the entries to unique numbers, something that we do not know.
    😎

    I did say if it was unique.
    If this is the query that's most common, and performance is the issues, and the data doesn't have massive amounts of updates; an include all columns will allow a seek without a lookup which will be slightly faster than a lookup.

    An INCLUDE of all the columns necessary would be a totally unnecessary duplication of data, in this case.  That's all indexes really are is a copy of the data sorted in a different order.

    We also don't know what the op means by improving the performance.  Seems to me that a "lead" table would require singleton lookups by phone number and that just about won't ever cause a performance problem if there's an index on the telephone number.  Even a HEAP will do fine in this area.  If they're doing something in batches, the OP needs to identify that before we can make recommendations on "performance".

    The query sunilkmr284 gave as the one he wanted to improve the performance of is select * from tbl_lead where Telephone=@PhoneNo
    If it is just this query and he wants to make it as fast as possible then an index on PhoneNo with all the other columns included will give the fastest results, hence my comment.

  • Jonathan AC Roberts - Sunday, August 19, 2018 1:59 PM

    Jeff Moden - Sunday, August 19, 2018 1:36 PM

    Jonathan AC Roberts - Sunday, August 19, 2018 4:58 AM

    Eirikur Eiriksson - Sunday, August 19, 2018 3:57 AM

    Jonathan AC Roberts - Saturday, August 18, 2018 6:46 PM

    I would try a clustered primary key on [LeadID] and a unique (if it is unique) index on Telephone with an INCLUDE of all the other columns.

    If all the searches are on an exact phone number, literally duplicating the data does not make too much sense. Therefore, I would suggest a non-clustered index on the Phone number column with the inclusion of the clustered index, searches will then find the matches and bring the full results back via key lookup. Because the phone numbers can be shared, it is futile to use a unique constraint on the index unless the business rules limit the entries to unique numbers, something that we do not know.
    😎

    I did say if it was unique.
    If this is the query that's most common, and performance is the issues, and the data doesn't have massive amounts of updates; an include all columns will allow a seek without a lookup which will be slightly faster than a lookup.

    An INCLUDE of all the columns necessary would be a totally unnecessary duplication of data, in this case.  That's all indexes really are is a copy of the data sorted in a different order.

    We also don't know what the op means by improving the performance.  Seems to me that a "lead" table would require singleton lookups by phone number and that just about won't ever cause a performance problem if there's an index on the telephone number.  Even a HEAP will do fine in this area.  If they're doing something in batches, the OP needs to identify that before we can make recommendations on "performance".

    The query sunilkmr284 gave as the one he wanted to improve the performance of is select * from tbl_lead where Telephone=@PhoneNo
    If it is just this query and he wants to make it as fast as possible then an index on PhoneNo with all the other columns included will give the fastest results, hence my comment.

    Possibly the fastest - but is really the best option? without knowing the amount of updates/inserts there are on the table we don't even know if having the clustered index being the one based on the telephone isn't the better (which would give the same performance as a index with all columns - and just one tiny index to contain the leadid (which we weren't even told if it is used at all).

    Having a index with all the columns is in most of the cases I've seen an overkill. And on this case with 4million rows on a table it does add a lot to the size. 

    We also don't know what other indexes, if any, exist on the table - the DDL posted is from a "create script" from ssms which by default will not script these, neither the primary key (which from what the OP said is not the leadid - may note have one though)

  • frederico_fonseca - Sunday, August 19, 2018 2:38 PM

    Jonathan AC Roberts - Sunday, August 19, 2018 1:59 PM

    Jeff Moden - Sunday, August 19, 2018 1:36 PM

    Jonathan AC Roberts - Sunday, August 19, 2018 4:58 AM

    Eirikur Eiriksson - Sunday, August 19, 2018 3:57 AM

    Jonathan AC Roberts - Saturday, August 18, 2018 6:46 PM

    I would try a clustered primary key on [LeadID] and a unique (if it is unique) index on Telephone with an INCLUDE of all the other columns.

    If all the searches are on an exact phone number, literally duplicating the data does not make too much sense. Therefore, I would suggest a non-clustered index on the Phone number column with the inclusion of the clustered index, searches will then find the matches and bring the full results back via key lookup. Because the phone numbers can be shared, it is futile to use a unique constraint on the index unless the business rules limit the entries to unique numbers, something that we do not know.
    😎

    I did say if it was unique.
    If this is the query that's most common, and performance is the issues, and the data doesn't have massive amounts of updates; an include all columns will allow a seek without a lookup which will be slightly faster than a lookup.

    An INCLUDE of all the columns necessary would be a totally unnecessary duplication of data, in this case.  That's all indexes really are is a copy of the data sorted in a different order.

    We also don't know what the op means by improving the performance.  Seems to me that a "lead" table would require singleton lookups by phone number and that just about won't ever cause a performance problem if there's an index on the telephone number.  Even a HEAP will do fine in this area.  If they're doing something in batches, the OP needs to identify that before we can make recommendations on "performance".

    The query sunilkmr284 gave as the one he wanted to improve the performance of is select * from tbl_lead where Telephone=@PhoneNo
    If it is just this query and he wants to make it as fast as possible then an index on PhoneNo with all the other columns included will give the fastest results, hence my comment.

    Possibly the fastest - but is really the best option? without knowing the amount of updates/inserts there are on the table we don't even know if having the clustered index being the one based on the telephone isn't the better (which would give the same performance as a index with all columns - and just one tiny index to contain the leadid (which we weren't even told if it is used at all).

    Having a index with all the columns is in most of the cases I've seen an overkill. And on this case with 4million rows on a table it does add a lot to the size. 

    We also don't know what other indexes, if any, exist on the table - the DDL posted is from a "create script" from ssms which by default will not script these, neither the primary key (which from what the OP said is not the leadid - may note have one though)

    In my initial answer I did say "If this is the query that's most common, and performance is the issues, and the data doesn't have massive amounts of updates; an include all columns will allow a seek without a lookup which will be slightly faster than a lookup.". So that covers the amount updates/inserts. A clustered index on phone number would get fragmented but on an identity column it would be less prone to fragmentation. And I did say "I would try this index", I didn't say this is definitely the best option.

  • Thank you all for suggestion me.

  • I did notice that you have a field of the Telephone as varchar(20) and your lookup @PhoneNo is varchar(50). I would recommend that you have the same type and size so the optimizer will use the intended index correctly all the time. It sounds like this would not make a difference but I have had plenty of cases where that is all it took to speed up the query. Just make sure there is no margin for error as you have here. 

    Cody L.

  • Jonathan AC Roberts - Sunday, August 19, 2018 3:09 PM

    frederico_fonseca - Sunday, August 19, 2018 2:38 PM

    Jonathan AC Roberts - Sunday, August 19, 2018 1:59 PM

    Jeff Moden - Sunday, August 19, 2018 1:36 PM

    Jonathan AC Roberts - Sunday, August 19, 2018 4:58 AM

    Eirikur Eiriksson - Sunday, August 19, 2018 3:57 AM

    Jonathan AC Roberts - Saturday, August 18, 2018 6:46 PM

    I would try a clustered primary key on [LeadID] and a unique (if it is unique) index on Telephone with an INCLUDE of all the other columns.

    If all the searches are on an exact phone number, literally duplicating the data does not make too much sense. Therefore, I would suggest a non-clustered index on the Phone number column with the inclusion of the clustered index, searches will then find the matches and bring the full results back via key lookup. Because the phone numbers can be shared, it is futile to use a unique constraint on the index unless the business rules limit the entries to unique numbers, something that we do not know.
    😎

    I did say if it was unique.
    If this is the query that's most common, and performance is the issues, and the data doesn't have massive amounts of updates; an include all columns will allow a seek without a lookup which will be slightly faster than a lookup.

    An INCLUDE of all the columns necessary would be a totally unnecessary duplication of data, in this case.  That's all indexes really are is a copy of the data sorted in a different order.

    We also don't know what the op means by improving the performance.  Seems to me that a "lead" table would require singleton lookups by phone number and that just about won't ever cause a performance problem if there's an index on the telephone number.  Even a HEAP will do fine in this area.  If they're doing something in batches, the OP needs to identify that before we can make recommendations on "performance".

    The query sunilkmr284 gave as the one he wanted to improve the performance of is select * from tbl_lead where Telephone=@PhoneNo
    If it is just this query and he wants to make it as fast as possible then an index on PhoneNo with all the other columns included will give the fastest results, hence my comment.

    Possibly the fastest - but is really the best option? without knowing the amount of updates/inserts there are on the table we don't even know if having the clustered index being the one based on the telephone isn't the better (which would give the same performance as a index with all columns - and just one tiny index to contain the leadid (which we weren't even told if it is used at all).

    Having a index with all the columns is in most of the cases I've seen an overkill. And on this case with 4million rows on a table it does add a lot to the size. 

    We also don't know what other indexes, if any, exist on the table - the DDL posted is from a "create script" from ssms which by default will not script these, neither the primary key (which from what the OP said is not the leadid - may note have one though)

    In my initial answer I did say "If this is the query that's most common, and performance is the issues, and the data doesn't have massive amounts of updates; an include all columns will allow a seek without a lookup which will be slightly faster than a lookup.". So that covers the amount updates/inserts. A clustered index on phone number would get fragmented but on an identity column it would be less prone to fragmentation. And I did say "I would try this index", I didn't say this is definitely the best option.

    Okay, IF I was querying the table by the phone number most frequently and pulling back all the data each time then I would cluster the table on the phone number with the LeadId column as the second column of the index to make it unique.

  • @sunilkmr284,
    If you are joining to other tables frequently by LeadID then I would create a clustered PK on that. Clustering on the phone number may lead to fragmentation and affect performance negatively. As Jeff already stated, a non-clustered index should perform well for singleton searches. What is the cardinality? Run this query and let us know the results:

    SELECT Count(*) TotalCount, Count(DISTINCT Telephone) PhoneCount
    ROM dbo.tbl_Lead;

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply