Max function and indexes

  • Craig Smith

    SSChasing Mays

    Points: 600

    We often need to pull the maximum values for date fields.  I thought we would need to index those fields, but without indexes the queries perform relatively well.  A couple examples below; note that these are narrow tables, and not having to output all the rows certainly helps, but I'm still surprised at the performance.

    10 million rows: < 1 second

    300 million rows: 20 seconds

    • This topic was modified 2 months, 2 weeks ago by  Craig Smith.
  • Lynn Pettis

    SSC Guru

    Points: 442284

    So, is this just an observation or do you actually have a question here?

     

  • Craig Smith

    SSChasing Mays

    Points: 600

    Sorry, I am curious as to how it can do this and what factors influence it; I would like to know when indexes are needed for these types of queries.

  • Jeff Moden

    SSC Guru

    Points: 996014

    Craig Smith wrote:

    Sorry, I am curious as to how it can do this and what factors influence it; I would like to know when indexes are needed for these types of queries.

    Are you able to share the actual execution plan (not just a graphic but the actual saved file).  It's the only way we might be able to explain what's happening.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Craig Smith

    SSChasing Mays

    Points: 600

    Execution plan is attached.

    Attachments:
    You must be logged in to view attached files.
  • Chris Harshman

    SSC-Forever

    Points: 42020

    I suppose it depends on your table definitions and the query you are using to "pull maximum values of date fields".  That could mean a number of different things.

    If for example, you have an OrderStatus table with a row for each OrderID for each time the status changes, the speed will depend on how many rows per OrderID you have.

  • Chris Harshman

    SSC-Forever

    Points: 42020

    looking at your execution plan, 9 million rows at 15 bytes per row is not very large, so it was able to read the entire table and do the table scan.

  • Jeffrey Williams

    SSC Guru

    Points: 88336

    Chris Harshman wrote:

    looking at your execution plan, 9 million rows at 15 bytes per row is not very large, so it was able to read the entire table and do the table scan.

    It also is parallel across 8 CPUs and took a little over 1.2 seconds per thread and almost 3 seconds of total CPU time.

    Adding a clustered index (doesn't appear this table has a clustered index) would allow for a clustered index scan and should reduce the query to quite a bit less - for even much larger tables.

    Jeffrey Williams
    Problems are opportunities brilliantly disguised as insurmountable obstacles.

    How to post questions to get better answers faster
    Managing Transaction Logs

  • ScottPletcher

    SSC Guru

    Points: 98398

    Jeffrey Williams wrote:

    Chris Harshman wrote:

    looking at your execution plan, 9 million rows at 15 bytes per row is not very large, so it was able to read the entire table and do the table scan.

    It also is parallel across 8 CPUs and took a little over 1.2 seconds per thread and almost 3 seconds of total CPU time.

    Adding a clustered index (doesn't appear this table has a clustered index) would allow for a clustered index scan and should reduce the query to quite a bit less - for even much larger tables.

    Create the clustered index on that column, which reduces the I/O to a single seek, as fast as it gets.

    SQL DBA,SQL Server MVP(07, 08, 09) Prosecutor James Blackburn, in closing argument in the Fatal Vision murders trial: "If in the future, you should cry a tear, cry one for them [the murder victims]. If in the future, you should say a prayer, say one for them. And if in the future, you should light a candle, light one for them."

  • Craig Smith

    SSChasing Mays

    Points: 600

    Thanks everyone.  I have added a clustered primary key in our test environment but can't compare the performance as the configuration is totally different.  When I do the 300 million row table I'll do a before and after test.

    So the actual row size is about 185 bytes.  I read that fixed length columns (such as this date) are stored at the beginning of the row however; does it only have to retrieve the first part of the row?

    Edit -- posts crossed in the mail -- That column is not in the primary key.  Sure, any index on that column would help, but it can't be the clustering key so I'm trying to see if it's really necessary.  Doesn't appear to be on tables this size, we'll see about the larger one.

    • This reply was modified 2 months, 2 weeks ago by  Craig Smith.
  • Lynn Pettis

    SSC Guru

    Points: 442284

    Craig Smith wrote:

    Thanks everyone.  I have added a clustered primary key in our test environment but can't compare the performance as the configuration is totally different.  When I do the 300 million row table I'll do a before and after test.

    So the actual row size is about 185 bytes.  I read that fixed length columns (such as this date) are stored at the beginning of the row however; does it only have to retrieve the first part of the row?

    Edit -- posts crossed in the mail -- That column is not in the primary key.  Sure, any index on that column would help, but it can't be the clustering key so I'm trying to see if it's really necessary.  Doesn't appear to be on tables this size, we'll see about the larger one.

    Clustered index doesn't have to be the primary key and sometimes it is better that way.  Really depends on the access patterns of the tables.

     

  • Jeff Moden

    SSC Guru

    Points: 996014

    Adding to Lynn's post, I will say that having a UNIQUE Clustered Index is usually better because it gets used in all Non-Clustered indexes, as well.

    To wit, I do have tables that I've assigned the Clustered Index to a DATETIME column but I've added the IDENTITY column as a secondary column in the definition of the Clustered Index as a "uniquifier/temporal tie breaker" to be able to force uniqueness.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ScottPletcher

    SSC Guru

    Points: 98398

    Given that the table had no index at all, I didn't think in this case that making the clus index unique was necessary, particularly to the extent of adding the overhead of an identity column. SQL can very quickly seek to find the highest key value whether it's unique or not.

    But, to help cut down on dup key values, make the underlying column datetime2 rather than datetime.

    SQL DBA,SQL Server MVP(07, 08, 09) Prosecutor James Blackburn, in closing argument in the Fatal Vision murders trial: "If in the future, you should cry a tear, cry one for them [the murder victims]. If in the future, you should say a prayer, say one for them. And if in the future, you should light a candle, light one for them."

  • Jeff Moden

    SSC Guru

    Points: 996014

    That being said, I'd love to see the schema of the table and what the real use of finding the max date would be for this table.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden

    SSC Guru

    Points: 996014

    ScottPletcher wrote:

    Given that the table had no index at all, I didn't think in this case that making the clus index unique was necessary, particularly to the extent of adding the overhead of an identity column. SQL can very quickly seek to find the highest key value whether it's unique or not.

    But, to help cut down on dup key values, make the underlying column datetime2 rather than datetime.

    Also, for single row inserts, it will certainly "help" cut down down on dup key values but it will not eliminate them.  Also, if you use something even like SYSDATETIME ( ), ALL of the rows inserted in a single batch are guaranteed to have exactly the same date and time.

    If you need the clustered index to be unique, just having a single temporal column isn't going to do it... not even DATETIME2(7).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 31 total)

You must be logged in to reply to this topic. Login to reply