Max function and indexes

  • We often need to pull the maximum values for date fields.  I thought we would need to index those fields, but without indexes the queries perform relatively well.  A couple examples below; note that these are narrow tables, and not having to output all the rows certainly helps, but I'm still surprised at the performance.

    10 million rows: < 1 second

    300 million rows: 20 seconds

    • This topic was modified 4 years, 3 months ago by  Craig Smith.
  • So, is this just an observation or do you actually have a question here?

     

  • Sorry, I am curious as to how it can do this and what factors influence it; I would like to know when indexes are needed for these types of queries.

  • Craig Smith wrote:

    Sorry, I am curious as to how it can do this and what factors influence it; I would like to know when indexes are needed for these types of queries.

    Are you able to share the actual execution plan (not just a graphic but the actual saved file).  It's the only way we might be able to explain what's happening.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Execution plan is attached.

    Attachments:
    You must be logged in to view attached files.
  • I suppose it depends on your table definitions and the query you are using to "pull maximum values of date fields".  That could mean a number of different things.

    If for example, you have an OrderStatus table with a row for each OrderID for each time the status changes, the speed will depend on how many rows per OrderID you have.

  • looking at your execution plan, 9 million rows at 15 bytes per row is not very large, so it was able to read the entire table and do the table scan.

  • Chris Harshman wrote:

    looking at your execution plan, 9 million rows at 15 bytes per row is not very large, so it was able to read the entire table and do the table scan.

    It also is parallel across 8 CPUs and took a little over 1.2 seconds per thread and almost 3 seconds of total CPU time.

    Adding a clustered index (doesn't appear this table has a clustered index) would allow for a clustered index scan and should reduce the query to quite a bit less - for even much larger tables.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Jeffrey Williams wrote:

    Chris Harshman wrote:

    looking at your execution plan, 9 million rows at 15 bytes per row is not very large, so it was able to read the entire table and do the table scan.

    It also is parallel across 8 CPUs and took a little over 1.2 seconds per thread and almost 3 seconds of total CPU time.

    Adding a clustered index (doesn't appear this table has a clustered index) would allow for a clustered index scan and should reduce the query to quite a bit less - for even much larger tables.

    Create the clustered index on that column, which reduces the I/O to a single seek, as fast as it gets.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Thanks everyone.  I have added a clustered primary key in our test environment but can't compare the performance as the configuration is totally different.  When I do the 300 million row table I'll do a before and after test.

    So the actual row size is about 185 bytes.  I read that fixed length columns (such as this date) are stored at the beginning of the row however; does it only have to retrieve the first part of the row?

    Edit -- posts crossed in the mail -- That column is not in the primary key.  Sure, any index on that column would help, but it can't be the clustering key so I'm trying to see if it's really necessary.  Doesn't appear to be on tables this size, we'll see about the larger one.

    • This reply was modified 4 years, 3 months ago by  Craig Smith.
  • Craig Smith wrote:

    Thanks everyone.  I have added a clustered primary key in our test environment but can't compare the performance as the configuration is totally different.  When I do the 300 million row table I'll do a before and after test.

    So the actual row size is about 185 bytes.  I read that fixed length columns (such as this date) are stored at the beginning of the row however; does it only have to retrieve the first part of the row?

    Edit -- posts crossed in the mail -- That column is not in the primary key.  Sure, any index on that column would help, but it can't be the clustering key so I'm trying to see if it's really necessary.  Doesn't appear to be on tables this size, we'll see about the larger one.

    Clustered index doesn't have to be the primary key and sometimes it is better that way.  Really depends on the access patterns of the tables.

     

  • Adding to Lynn's post, I will say that having a UNIQUE Clustered Index is usually better because it gets used in all Non-Clustered indexes, as well.

    To wit, I do have tables that I've assigned the Clustered Index to a DATETIME column but I've added the IDENTITY column as a secondary column in the definition of the Clustered Index as a "uniquifier/temporal tie breaker" to be able to force uniqueness.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Given that the table had no index at all, I didn't think in this case that making the clus index unique was necessary, particularly to the extent of adding the overhead of an identity column. SQL can very quickly seek to find the highest key value whether it's unique or not.

    But, to help cut down on dup key values, make the underlying column datetime2 rather than datetime.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • That being said, I'd love to see the schema of the table and what the real use of finding the max date would be for this table.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ScottPletcher wrote:

    Given that the table had no index at all, I didn't think in this case that making the clus index unique was necessary, particularly to the extent of adding the overhead of an identity column. SQL can very quickly seek to find the highest key value whether it's unique or not.

    But, to help cut down on dup key values, make the underlying column datetime2 rather than datetime.

    Also, for single row inserts, it will certainly "help" cut down down on dup key values but it will not eliminate them.  Also, if you use something even like SYSDATETIME ( ), ALL of the rows inserted in a single batch are guaranteed to have exactly the same date and time.

    If you need the clustered index to be unique, just having a single temporal column isn't going to do it... not even DATETIME2(7).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 30 total)

You must be logged in to reply to this topic. Login to reply