Index guidance needed

  • Hi everyone.

    I recently came across a dev database in our organization that had no clustered indexes and circa 100 million row.....:crazy:

    In order to rectify I need some advice on where to create the clustered index. The table seems to be queried very often , but nothing seems to be removed.

    Table tracks network activity and is designed as below.

    Col

    [SrcIP] - varchar(25)

    [SrcDNS] - varchar(MAX)

    [DstIP] - varchar(25)

    [DstDNS] - varchar(MAX)

    [DstPort] - varchar(MAX)

    [Protocol] - varchar(MAX)

    [Date] - datetime

    [ConversationCount] - int

    I am keen to learn how to improve the performance of this table and queries that are run against it.

    What should I be looking at in terms of creating a clustered index , and where should it be created ?

    Any advice on improvements gratefully received.

    thanks everyone

  • What's the table frequently filtered by?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • MickyD (9/15/2016)


    Hi everyone.

    I recently came across a dev database in our organization that had no clustered indexes and circa 100 million row.....:crazy:

    In order to rectify I need some advice on where to create the clustered index. The table seems to be queried very often , but nothing seems to be removed.

    Table tracks network activity and is designed as below.

    Col

    [SrcIP] - varchar(25)

    [SrcDNS] - varchar(MAX)

    [DstIP] - varchar(25)

    [DstDNS] - varchar(MAX)

    [DstPort] - varchar(MAX)

    [Protocol] - varchar(MAX)

    [Date] - datetime

    [ConversationCount] - int

    I am keen to learn how to improve the performance of this table and queries that are run against it.

    What should I be looking at in terms of creating a clustered index , and where should it be created ?

    Any advice on improvements gratefully received.

    thanks everyone

    Although I'm similarly appalled at the condition of the table, let's first ask the question of "What makes you think there's a performance issue"?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks everyone for the prompt replies.

    I have been given a selection of queries from the users.

    As an example a very basic select returns (669484 row(s) affected) in around 5 mins.

    My aim was to see if I could suggest any improvements to the table which could improve and baseline this standard select.

    I have created my own copy of the database so I was keen to improve the copy I have and test and prove before applying to the (what I will refer to as live)database.

    I appreciate I have not really supplied enough detail to aid any proper performance improvement at this stage.

    I was thinking about going back to devs with questions such as ,

    Do you need 100 million rows in the table ?

    Can we remove X amount of rows ?

    How are they achieved \ if at all ?

    Has any future proofing been thought about ?

    As the table in question has no PK , or any form of index I was wondering if any quick wins could be suggested that I could test. This would then form the basis on what I do next to the live copy.

    I think any advice on what I should be asking the devs or proposing as an improvement suggestion would be appreciated

    thanks everyone.

  • MickyD (9/15/2016)


    As an example a very basic select returns (669484 row(s) affected) in around 5 mins.

    Hold the phone a minute... WHERE are those rows being returned to? What does that query actually look like?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • MickyD (9/15/2016)


    Thanks everyone for the prompt replies.

    I have been given a selection of queries from the users.

    As an example a very basic select returns (669484 row(s) affected) in around 5 mins.

    My aim was to see if I could suggest any improvements to the table which could improve and baseline this standard select.

    I have created my own copy of the database so I was keen to improve the copy I have and test and prove before applying to the (what I will refer to as live)database.

    I appreciate I have not really supplied enough detail to aid any proper performance improvement at this stage.

    I was thinking about going back to devs with questions such as ,

    Do you need 100 million rows in the table ?

    Can we remove X amount of rows ?

    How are they achieved \ if at all ?

    Has any future proofing been thought about ?

    As the table in question has no PK , or any form of index I was wondering if any quick wins could be suggested that I could test. This would then form the basis on what I do next to the live copy.

    I think any advice on what I should be asking the devs or proposing as an improvement suggestion would be appreciated

    thanks everyone.

    You should also ask why they use VARCHAR(MAX), which can contain up to 2 Billion (binary 2 billion) characters for things that don't look like they come close to even ever holding more than 8K.

    And, the questions remain... where are the nearly 700K rows being returned to and what does that query look like? Attaching an Actual Execution Plan would also be a big help in figuring out what gives.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks all some great topics for me to continue comms with the Devs there.

    I will extract execution plan on the simple select I have been sent.

    thanks again

  • MickyD (9/15/2016)


    I have been given a selection of queries from the users.

    Good start

    What's the most common predicate used against that table?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • MickyD (9/16/2016)


    Thanks all some great topics for me to continue comms with the Devs there.

    I will extract execution plan on the simple select I have been sent.

    thanks again

    Please could you answer the questions posted by Jeff? Thanks.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • MickyD (9/15/2016)


    As an example a very basic select returns (669484 row(s) affected) in around 5 mins.

    My aim was to see if I could suggest any improvements to the table which could improve and baseline this standard select.

    Create some "very basic standard" indexes and do some "very basic standard" maintenance and you will get some improvements.


    Alex Suprun

  • GilaMonster (9/15/2016)


    What's the table frequently filtered by?

    GilaMonster (9/16/2016)


    Good start

    What's the most common predicate used against that table?

    I'm new to index/performance tuning. Could you please explain how to tell what the table is most frequently filtered by or how to tell the most common predicate used against the table? Is that found in a particular DMV or is that just knowledge of your tables and how they are being queried?

  • CferMN (10/4/2016)


    Could you please explain how to tell what the table is most frequently filtered by or how to tell the most common predicate used against the table?

    You go and look at the application and workload and the queries in it.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I have to second Jeff's questioning of the use of varchar(max)...

    Before indexing it would seem valuable, even crucial, to review the data types.

    Can DNS entries or Protocol really ever be 2 billion characters?

    Is Protocol just one of the standard network protocol names ? Could it be a reference to the IANA Assigned Internet Protocol Numbers (tinyint)?

    Is DstPort really a string, or just a port number (int)?

  • RAThor (10/5/2016)


    I have to second Jeff's questioning of the use of varchar(max)...

    Before indexing it would seem valuable, even crucial, to review the data types.

    Can DNS entries or Protocol really ever be 2 billion characters?

    Is Protocol just one of the standard network protocol names ? Could it be a reference to the IANA Assigned Internet Protocol Numbers (tinyint)?

    Is DstPort really a string, or just a port number (int)?

    I don't believe DNS names can be longer that 253 characters so varchar(max) is vast overkill. And yes, port numbers should be able to be stored in an int.

  • GilaMonster (10/4/2016)


    CferMN (10/4/2016)


    Could you please explain how to tell what the table is most frequently filtered by or how to tell the most common predicate used against the table?

    You go and look at the application and workload and the queries in it.

    Another way is to run a trace to see how the table is being queried. Gail wrote a nice article on this. https://www.simple-talk.com/sql/performance/finding-the-causes-of-poor-performance-in-sql-server-part-1/

    ----------------------------------------------------

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply