What is proper way of indexing on a table which contain large amount of data

  • Hi experts,

    I have to optimize a database. I have almost done everything(modified stored procedures, implemented index also). But I am not satisfied & I am bit confused at this moment due to some sections.

    There are few tables (contains millions of rows) which take part in public search. These tables also takes part in INSERT & UPDATE operations.

    I have performed maximum possible normalization & cannot break my tables any more.

    There is a main table & several related tables (so called child tables). Main table has 18 columns.

    10 columns are numeric (int, bigint & numeric(18,2)), 4 columns are varchar (128), 2 columns are datetime and 2 columns are bit. Some columns are nullable also.

    Search is based on different conditions & user can select one or more search conditions.

    We are using dynamic query in our stored procedures. All columns can take part in "where" clause.

    Now my questions are:

    1. Should I create index on every columns? if yes, there will be 18 index (1 clustered & 17 non clustered). Is this correct approach?

    2. Should I create covering index with all non clustered index or Is it sufficient to create covering index with only one non clustered index (on that column which always takes part in where clause).

    Please suggest me proper approach.

    Thanks in advance !

  • This is a gigantic topic.

    I'm not going to be able to give you specific guidance based on a generic problem as you've outlined. Sorry.

    It sounds like you're creating a "report on anything" page and you're finding, as everyone does, that it's very hard to optimize those. The fact is, in most cases, there are patterns to what the users are going to access. Seldom is it true that they will search on anything. Generally there are related columns that will always be used for specific types of searches. You need to identify these patterns, then index accordingly.

    Putting a single index on each column is an extremely poor choice. It is likely that the majority of those indexes will never be used by any search criteria. You will still have to maintain them through all inserts, updates & deletes, adding significantly to your overhead.

    Dynamic t-sql is the way to go with this type of query. Just make sure you're using sp_executesql and parameters and not merely building a sql string and executing it. The first method ensures at least some plan reuse. The second ensures none.

    Other than that, I'd say recognize the access patterns and then build indexes to support them. In most circumstances, compound keys and included columns work much better than single column indexes, but it really, really depends on exactly what the T-SQL code looks like.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (2/5/2013)


    Other than that, I'd say recognize the access patterns and then build indexes to support them. In most circumstances, compound keys and included columns work much better than single column indexes, but it really, really depends on exactly what the T-SQL code looks like.

    This.

    You analyse common patterns and index to support those.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi Gail, Fritchey,

    Thanks for your reply.

    Fritchey, thank you very much. I have found my first approach which I have to do.

    We are not using sp_executesql. So, this is my step. I am going to change the process. Now, I am afraid because developers have created all SPs using sql string. We haven't used parameters.

    "Other than that, I'd say recognize the access patterns and then build indexes to support them. In most circumstances, compound keys and included columns work much better than single column indexes, but it really, really depends on exactly what the T-SQL code looks like."

    can you please let me know how can I do this approach?

  • You've got two options there. Well, actually, you have to do both. First, talk to the business people to understand what they're actually going to do, not what they're asking for. Second, capture the queries as they come in and match that with what you know about the structure and their access methods. But the main thing is talking to the business to understand what they're doing.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • http://sqlinthewild.co.za/index.php/2011/11/11/sql-university-advanced-indexing-indexing-strategies/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi Fritchey,

    We have made the SQL like this,

    IF @inbCityID IS NOT NULL AND @inbCityID > 0

    BEGIN

    SET @SqlCondition = @SqlCondition + ' AND vaa.CityID = ' + CONVERT(VARCHAR(50), @inbCityID);

    END

    (this is only a block of code)

    & Finally we use below code:

    EXECUTE SP_EXECUTESQL @FinalSql;

    Should I change above structure to parametrized structure & then I should pass the parameters to SP_EXECUTESQL.

  • Yes, absolutely yes.

    As you have it, it's a security risk (SQL injection) and subject to poor plan caching and reuse.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Gail, thanks for the link,

  • Hi Gail,

    I have changed my one stored procedure. As you have the code block which I posted. Now that block will look like this:

    IF @inbCityID IS NOT NULL AND @inbCityID > 0

    BEGIN

    SET @SqlCondition = @SqlCondition + N' AND vaa.CityID = @inbCityID';

    END

    I have just one question. When I will call the sp_executesql & will pass the parameters, Can I pass all parameters (approx 14) regardless of query.

    Means in above code block, if @inbCityID is 0 then above condition will not be included in our final sql string. Can I pass my all parameters (including @inbCityID) because we don't know that this condition will become true or false.

    I have checked by executing the procedure (& passed all parameters) where only one condition was true, SP is running without any error.

    I have just asked because of any performance consideration. Is there any performance impact if I pass more parameters. I have included all parameters in params definition. I googled but didn't find any answer, that's why I am asking.

  • Yes you can. No there isn't.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi Gail,

    I read all the articles listed in your link. This is really very good information.

    All the requirement & best practice are good. But I have still a question in my mind.

    In my case we have more than 10 columns.

    I think, we cannot define a specific sequence for non clustered index which will best suite our dynamic query.

    E.g: If we have:

    FirstName, LastName, Address, City, State, Zip, Phone, Email

    eight columns which takes part in our dynamic query, then how can we decide the exact sequence of where clause. If I consider index (lastname, firstname, ..........), it will only work for those queries which will contain where clause like this:

    LastName = 'text' and firstname = 'text' and ...............

    but what about those queries which does not contain lastname. In that case, this non clustered index will become useless. Will I have to create 8 * 8 = 64 indexes (or any other number count. Sorry, I am not too good in permutation/combination).

    Thanks to Microsoft who gave us the opportunity to create 999 non clustered indexes.

    I just want to raise a question: How can we say that this specific sequence of columns will be best suited when all columns have equal priority? sometimes, I will not pass lastname & sometimes city or postal code or zip also.

    & this situation will become more worse when number of columns will increase.

  • As Grant said earlier

    Other than that, I'd say recognize the access patterns and then build indexes to support them.

    No, you absolutely do not want to create indexes for every theoretical access pattern. Create indexes for the access patterns that are used heavily, just a couple. Then monitor and look for queries that run often and aren't efficient enough, consider another index or two.

    http://sqlinthewild.co.za/index.php/2011/11/11/sql-university-advanced-indexing-indexing-strategies/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • thanks, it sounds great. 🙂

    but now I am afraid because now I will have to ENJOY my weekend in the office again. 🙁 🙁

    again, thanks a ton mate.

    this time, I have learnt a lot.

  • I was just reading the article:

    http://www.sql-server-performance.com/2007/t-sql-where/2/[/url]

    There is a recommendation given below:

    "If you have a WHERE clause that includes expressions connected by two or more AND operators, SQL Server will evaluate them from left to right in the order they are written. This assumes that no parenthesis have been used to change the order of execution. Because of this, you may want to consider one of the following when using AND:

    Locate the least likely true AND expression first. This way, if the AND expression is false, the clause will end immediately, saving time.

    If both parts of an AND expression are equally likely being false, put the least complex AND expression first. This way, if it is false, less work will have to be done to evaluate the expression.

    You may want to consider using Query Analyzer or Management Studio to look at the execution plans of your queries to see which is best for your situation. [2000, 2005, 2008] Updated 2-4-2009 "

    Check the above bold line.

    So, if we put least likely true column very first (left most where clause), then it is just opposite which you have asked.

    You advised that we should put "most selective" column first.

    Now I am in dilemma. What to do, what not to do.

Viewing 15 posts - 1 through 15 (of 26 total)

You must be logged in to reply to this topic. Login to reply