Index and Search help

  • Hi,

    The belwo is my sample table structure and I will be buildinga sample website to test the search feature. Basically I have a text box and I can enter  Name or Description or cleintbrand or clientcategory. Assume I have 1 million records in my table and can grow up to 20 million over next 3 years.

    Create table ClientProducts(IdClientProduct bigint identity(1,1),Name varchar(50),[Description] varchar(max),ClientBrand varchar(50),
    ClientProductCode nvarchar(20),ClientCategory int,ActiveStatus bit default 1);

    1.  Please help me on creating index on these columns.
    Should I go with multi column index or index on each column? Also, is it good to index on varchar/varchar(max) column? Can u please show me some sample  how to create index for the below table sttrucure.

    2.   For the search functionnality , will the or condition gives best and fastest search? My sample try below,
    declare @Searchstring varchar(30);

    select * from ClientProducts where (Name like '%@Searchstring%' or [Description] like '%@Searchstring%' or
    ClientBrand like '%@Searchstring%' or ClientProductCode like '%@Searchstring%') and ActiveStatus = 1;

    please help me in this.

  • (1) Get rid of the ActiveStatus column.  It's useless, and it can cause headaches for the optimizer.  That type of column really is a throwback to tape-based data.

    (2) At least allow the user to specify specify the type of data they are searching for, such as Name / Description / Product Code / etc.  Then only search for those column(s) that were actually specified.

    (3) No, it's not good to index on varchar(max).  For a table like that, you might as well just search the table itself.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Hi Scott, Thank you for the reply and i need more input from you.  Actually i need ActiveStatus because the table will have Inactive clientproducts as well. So, if you have this table structure, how would you create index for better search results. pleas use this structure and give me indexes.

  • can anyone please assist me

  • KGJ-Dev - Monday, November 19, 2018 12:37 PM

    Hi,

    The belwo is my sample table structure and I will be buildinga sample website to test the search feature. Basically I have a text box and I can enter  Name or Description or cleintbrand or clientcategory. Assume I have 1 million records in my table and can grow up to 20 million over next 3 years.

    Create table ClientProducts(IdClientProduct bigint identity(1,1),Name varchar(50),[Description] varchar(max),ClientBrand varchar(50),
    ClientProductCode nvarchar(20),ClientCategory int,ActiveStatus bit default 1);

    1.  Please help me on creating index on these columns.
    Should I go with multi column index or index on each column? Also, is it good to index on varchar/varchar(max) column? Can u please show me some sample  how to create index for the below table sttrucure.

    2.   For the search functionnality , will the or condition gives best and fastest search? My sample try below,
    declare @Searchstring varchar(30);

    select * from ClientProducts where (Name like '%@Searchstring%' or [Description] like '%@Searchstring%' or
    ClientBrand like '%@Searchstring%' or ClientProductCode like '%@Searchstring%') and ActiveStatus = 1;

    please help me in this.

    You have a description column that's defined as VARCHAR(MAX)??? Are you putting video or something in there???

  • Hi pietlinden,
    It's a ClientProductdescription and the record might go more than 4000 character. that's why kept as max.

  • In order to keep the searches for the other columns nasty fast, you should set the table option (using sp_tableoption) to store the LOBs (description column) out of row.

    As for searches on the description, I'd recommend coming up with a keyword column because, as Scott mentioned, there's not much you're actually going to be able to do with an index on such a column.  First, indexes are limited to "only" 900 bytes and, second, I suspect you're going to end up using "mid-string" searches, which will cause a table scan, period.  On 20 million rows, that will lead to quite the duration using conventional methods.

    One possible solution for the searches on the description column would be the use of Full-Text Indexing.  Do understand that such indexing is a duplication of data that will require extra hard-disk space and, perhaps, some memory.  Considering your requirement to search on the description column, it'll be worth it.

    Also, read the following article because you're going to end up with what is known as a "Catch All Query" and doing it right is absolutely essential to both performance and security (avoidance of SQL Injection).  I consider the article (and there's a follow-up to the article I listed below, as well) to be the definitive article on the subject.

    https://sqlinthewild.co.za/index.php/2009/03/19/catch-all-queries/
    https://sqlinthewild.co.za/index.php/2018/03/13/revisiting-catch-all-queries/

    As for indexes, you're probably going to need one per non-LOB column, especially when you hit the 20 million mark.  Again, that's more than a duplication of data and so you need to plan on disk size and memory size accordingly.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Tuesday, November 20, 2018 9:00 PM

    In order to keep the searches for the other columns nasty fast, you should set the table option (using sp_tableoption) to store the LOBs (description column) out of row.

    As for searches on the description, I'd recommend coming up with a keyword column because, as Scott mentioned, there's not much you're actually going to be able to do with an index on such a column.  First, indexes are limited to "only" 900 bytes and, second, I suspect you're going to end up using "mid-string" searches, which will cause a table scan, period.  On 20 million rows, that will lead to quite the duration using conventional methods.

    One possible solution for the searches on the description column would be the use of Full-Text Indexing.  Do understand that such indexing is a duplication of data that will require extra hard-disk space and, perhaps, some memory.  Considering your requirement to search on the description column, it'll be worth it.

    Also, read the following article because you're going to end up with what is known as a "Catch All Query" and doing it right is absolutely essential to both performance and security (avoidance of SQL Injection).  I consider the article (and there's a follow-up to the article I listed below, as well) to be the definitive article on the subject.

    https://sqlinthewild.co.za/index.php/2009/03/19/catch-all-queries/
    https://sqlinthewild.co.za/index.php/2018/03/13/revisiting-catch-all-queries/

    As for indexes, you're probably going to need one per non-LOB column, especially when you hit the 20 million mark.  Again, that's more than a duplication of data and so you need to plan on disk size and memory size accordingly.

    I don't think it's the same type of "catch all query" as the OP is searching all columns for the same string. So there's little dynamic SQL can do to help that.
    Also, the OP has expressions incorrectly expressed:  like '%@Searchstring%'
    They should be  like '%' + @Searchstring + '%'

  • Jonathan AC Roberts - Wednesday, November 21, 2018 6:32 AM

    Jeff Moden - Tuesday, November 20, 2018 9:00 PM

    In order to keep the searches for the other columns nasty fast, you should set the table option (using sp_tableoption) to store the LOBs (description column) out of row.

    As for searches on the description, I'd recommend coming up with a keyword column because, as Scott mentioned, there's not much you're actually going to be able to do with an index on such a column.  First, indexes are limited to "only" 900 bytes and, second, I suspect you're going to end up using "mid-string" searches, which will cause a table scan, period.  On 20 million rows, that will lead to quite the duration using conventional methods.

    One possible solution for the searches on the description column would be the use of Full-Text Indexing.  Do understand that such indexing is a duplication of data that will require extra hard-disk space and, perhaps, some memory.  Considering your requirement to search on the description column, it'll be worth it.

    Also, read the following article because you're going to end up with what is known as a "Catch All Query" and doing it right is absolutely essential to both performance and security (avoidance of SQL Injection).  I consider the article (and there's a follow-up to the article I listed below, as well) to be the definitive article on the subject.

    https://sqlinthewild.co.za/index.php/2009/03/19/catch-all-queries/
    https://sqlinthewild.co.za/index.php/2018/03/13/revisiting-catch-all-queries/

    As for indexes, you're probably going to need one per non-LOB column, especially when you hit the 20 million mark.  Again, that's more than a duplication of data and so you need to plan on disk size and memory size accordingly.

    I don't think it's the same type of "catch all query" as the OP is searching all columns for the same string. So there's little dynamic SQL can do to help that.
    Also, the OP has expressions incorrectly expressed:  like '%@Searchstring%'
    They should be  like '%' + @Searchstring + '%'

    You're right... For a moment, I couldn't figure out what you were talking about and so went back to re-read the original post and there it was...

    Basically I have A text box and I can enter Name or Description or cleintbrand or clientcategory.


    Definitely not a classic "Catch All" query.  More like a "Google" query especially if "fuzzy" lookups (partial words, etc) and mixed bags (e.g. enter both a clientbrand and clientcategory with a partial description using a partial word) are required.

    While it could be done in SQL Server, it would take a shedload of drive space and some very careful pre-parsing (N-Grams, Noise Word filtering, etc) of the data (Especially for that Definition column).  It's not often that I'll say such a thing but SQL Server probably isn't the correct tool for this type of thing. At least not on its own.  Some other tool should come into play.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ScottPletcher - Monday, November 19, 2018 12:53 PM

    (1) Get rid of the ActiveStatus column.  It's useless, and it can cause headaches for the optimizer.  That type of column really is a throwback to tape-based data.

    (2) At least allow the user to specify specify the type of data they are searching for, such as Name / Description / Product Code / etc.  Then only search for those column(s) that were actually specified.

    (3) No, it's not good to index on varchar(max).  For a table like that, you might as well just search the table itself.

    Hi Scott,
    I'm just wondering what your objections are to bit type columns?
    Also, I don't think it's even possible to index a varchar(MAX) columns, as Jeff points out the maximum length of an index is 900 bytes.

  • Jonathan AC Roberts - Wednesday, November 21, 2018 8:32 AM

    ScottPletcher - Monday, November 19, 2018 12:53 PM

    (1) Get rid of the ActiveStatus column.  It's useless, and it can cause headaches for the optimizer.  That type of column really is a throwback to tape-based data.

    (2) At least allow the user to specify specify the type of data they are searching for, such as Name / Description / Product Code / etc.  Then only search for those column(s) that were actually specified.

    (3) No, it's not good to index on varchar(max).  For a table like that, you might as well just search the table itself.

    Hi Scott,
    I'm just wondering what your objections are to bit type columns?
    Also, I don't think it's even possible to index a varchar(MAX) columns, as Jeff points out the maximum length of an index is 900 bytes.

    I don't object to bit columns per se.  In fact, I like them because they save space.
    I object to the "is_deleted" bit column, especially since it's almost always as a "just in case".  [Nearly] Every query uses "is_deleted = 0" anyway, and it can play havoc with the optimizer.  If you genuinely need deleted rows for certain uses, create a separate table of deleted rows.  Then, when you need to see deleted rows also, use a view that UNIONs the current and deleted tables.

    As to MAX, true, you can't index the entire value.  Using a computed column, you can the first ~1650 bytes or so if on SQL 2016 (as we are) or 900 bytes for older versions, as you point out.  However, yes, adding the computed column to index on could add other performance implications.  I think we all agree it's not a great idea to index on very long char columns.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher - Wednesday, November 21, 2018 10:05 AM

    Jonathan AC Roberts - Wednesday, November 21, 2018 8:32 AM

    ScottPletcher - Monday, November 19, 2018 12:53 PM

    (1) Get rid of the ActiveStatus column.  It's useless, and it can cause headaches for the optimizer.  That type of column really is a throwback to tape-based data.

    (2) At least allow the user to specify specify the type of data they are searching for, such as Name / Description / Product Code / etc.  Then only search for those column(s) that were actually specified.

    (3) No, it's not good to index on varchar(max).  For a table like that, you might as well just search the table itself.

    Hi Scott,
    I'm just wondering what your objections are to bit type columns?
    Also, I don't think it's even possible to index a varchar(MAX) columns, as Jeff points out the maximum length of an index is 900 bytes.

    I don't object to bit columns per se.  In fact, I like them because they save space.
    I object to the "is_deleted" bit column, especially since it's almost always as a "just in case".  [Nearly] Every query uses "is_deleted = 0" anyway, and it can play havoc with the optimizer.  If you genuinely need deleted rows for certain uses, create a separate table of deleted rows.  Then, when you need to see deleted rows also, use a view that UNIONs the current and deleted tables.

    As to MAX, true, you can't index the entire value.  Using a computed column, you can the first ~1650 bytes or so if on SQL 2016 (as we are) or 900 bytes for older versions, as you point out.  However, yes, adding the computed column to index on could add other performance implications.  I think we all agree it's not a great idea to index on very long char columns.

    Spot on.  What really gets my goat is when folks not only have an "Is_Deleted" column but they also have a "Deleted_On" date column.  There's also the question of why folks end up backing up data that will never again change as well as the problem that "soft deleted" data usually ends up being interleaved with "active" data and that takes extra reads and memory and, etc, etc, etc.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • KGJ-Dev - Monday, November 19, 2018 12:37 PM

    Hi,

    The belwo is my sample table structure and I will be buildinga sample website to test the search feature. Basically I have a text box and I can enter  Name or Description or cleintbrand or clientcategory. Assume I have 1 million records in my table and can grow up to 20 million over next 3 years.

    Create table ClientProducts(IdClientProduct bigint identity(1,1),Name varchar(50),[Description] varchar(max),ClientBrand varchar(50),
    ClientProductCode nvarchar(20),ClientCategory int,ActiveStatus bit default 1);

    1.  Please help me on creating index on these columns.
    Should I go with multi column index or index on each column? Also, is it good to index on varchar/varchar(max) column? Can u please show me some sample  how to create index for the below table sttrucure.

    2.   For the search functionnality , will the or condition gives best and fastest search? My sample try below,
    declare @Searchstring varchar(30);

    select * from ClientProducts where (Name like '%@Searchstring%' or [Description] like '%@Searchstring%' or
    ClientBrand like '%@Searchstring%' or ClientProductCode like '%@Searchstring%') and ActiveStatus = 1;

    please help me in this.

    One other thing to add for consideration.....Full-text search query can sometimes be faster in situations like this with like predicates and varchar(max) columns.

    Sue

  • thanks a lot guys for making me to understand the logic behind.

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply