Want to make ''Get Latest Status'' function more efficient

  • Hey folks, I'm looking at making the following query more efficient potentially using the ranking functions and I'd like some advice from the gurus.  The purpose of the following is to have a status for a person, and also have a historical background as to what they've done, status wise.  This was the best way I could come up with to do this a few years back, but I'm wondering if there's a better way with Sql 2005.

    Here's a toned down structure and my query.  Any help/critique would be greatly appreciated.

    CREATE TABLE #Status( 

     StatusID int NOT NULL,

     StatusName VARCHAR(50) NOT NULL,

     StatusCategoryID int NOT NULL

    ) ON [PRIMARY]

    CREATE TABLE #RegStatus(

     [RegistrationID] [uniqueidentifier] NOT NULL,

     [StatusID] [int] NOT NULL,

     [StatusTimeStamp] [datetime] NOT NULL,

     [UniqueRowID] [int] IDENTITY(1,1) NOT NULL

    ) ON [PRIMARY]

    SET NOCOUNT on

    INSERT INTO #Status VALUES(200, 'StatusA', 1)

    INSERT INTO #Status VALUES(210, 'StatusB', 1)

    INSERT INTO #Status VALUES(115, 'StatusC', 1)

    INSERT INTO #Status VALUES(112, 'StatusD', 1)

    INSERT INTO #Status VALUES(314, 'StatusE', 1)

    INSERT INTO #Status VALUES(15, 'StatusF', 1)

    INSERT INTO #Status VALUES(22, 'StatusG', 1)

    INSERT INTO #Status VALUES(300, 'StatusX', 2)

    INSERT INTO #Status VALUES(310, 'StatusY', 2)

    INSERT INTO #Status VALUES(320, 'StatusZ', 2)

    INSERT INTO #RegStatus VALUES('7A6058D0-06CB-4E83-A8C4-B1AFC74B11F0', 200, GETDATE())

    INSERT INTO #RegStatus VALUES('7A6058D0-06CB-4E83-A8C4-B1AFC74B11F0', 210, GETDATE())

    INSERT INTO #RegStatus VALUES('7A6058D0-06CB-4E83-A8C4-B1AFC74B11F0', 115, GETDATE())

    INSERT INTO #RegStatus VALUES('8B94A666-A3DD-4CB1-89A0-9910047AE7A0', 112, GETDATE())

    INSERT INTO #RegStatus VALUES('8B94A666-A3DD-4CB1-89A0-9910047AE7A0', 314, GETDATE())

    INSERT INTO #RegStatus VALUES('8B94A666-A3DD-4CB1-89A0-9910047AE7A0', 200, GETDATE())

    INSERT INTO #RegStatus VALUES('8B94A666-A3DD-4CB1-89A0-9910047AE7A0', 22, GETDATE())

    INSERT INTO #RegStatus VALUES('58AF0FC6-C900-4BD0-B3F7-F9D62701F021', 15, GETDATE())

    INSERT INTO #RegStatus VALUES('58AF0FC6-C900-4BD0-B3F7-F9D62701F021', 115, GETDATE())

    INSERT INTO #RegStatus VALUES('58AF0FC6-C900-4BD0-B3F7-F9D62701F021', 200, GETDATE())

    INSERT INTO #RegStatus VALUES('58AF0FC6-C900-4BD0-B3F7-F9D62701F021', 115, GETDATE())

    SET NOCOUNT Off

    /*

    This is a query from within a function that I use to not only get the latest

    status for one registrant, but I can use it to get the latest status for everyone as well.

    */

    DECLARE @RegStatusCatID int,

    @RegID UNIQUEIDENTIFIER

    SET @RegStatusCatID = 1

    SET @RegID = null

    select LS.*, S.StatusName, S.StatusCategoryID

    from #Status S

    join(

     select RS.RegistrationID, RS.StatusID, RS.StatusTimeStamp

     from #RegStatus RS

     join

      (

      SELECT RS.RegistrationID , max(RS.UniqueRowID) UniqueRowID

      FROM #RegStatus RS

      join #Status S

       on RS.StatusID = S.StatusID

       and S.StatusCategoryID = @RegStatusCatID

       and (

        @RegID is null

        or (@RegID is not null

         and RS.RegistrationID = @RegID)

       &nbsp

      group by RS.RegistrationID

    &nbsp LS

      on RS.UniqueRowID = LS.UniqueRowID 

    ) LS

     on S.StatusID = LS.StatusID

    --SELECT * FROM #RegStatus

    DROP TABLE #RegStatus

    DROP TABLE #Status

  • Mark

    Two things:

    (1) Using a Common Table Expression instead of a subquery would make your query more readable, even if it didn't help the performance.

    (2) At the beginning you set @RegID to be null, so why does your query test for NULLness further down?  Even if that test were necessary, I think you could simplify it to ...@RegID IS NULL OR RS.RegistrationID = @RegID...

    John

  • John, I appreciate the prompt reply.

    (1) I'm currently looking into CTEs and cannot figure out how they'll help the situation.  Currently I have this code in a UDF (not explained very well earlier) and I think if I used a CTE, I'd have to create a temp table to return my values.  I would guess this extra step would slow things down a little.

    (2) 

        @RegID is null

        or (@RegID is not null

         and RS.RegistrationID = @RegID)

    this code acts as a case statement.  If I pass in a RegID to the UDF, it'll return the status of only one RegistrationID.  This one UDF serves 2 purposes.  It'd PROBABLY be best of me to leave the RegID business out of this all together and make 2 UDF's, one for a single RegistrationIDs, and one for all RegistrationIDs, but for maintainability, I've put both into one.

    Mark

  • Mark

    CTEs can be used in UDFs without temporary tables.  I have rewritten your query as below.  I'm not sure about improving performance, but it makes it more readable.  You mentioned ranking functions in your first post, but I can't see how those would help... but then I'm obviously not as familiar with your data as you are.

    I still think you can shorten the @RegID part without loss of generality.  Certainly for the sample data you have provided, the same result set is returned.

    By the way, why do you have an identity column and a unique identifier in one of your tables?

    I've made the following changes:

    (a) Changed one of the aliases (actually the name of the CTE) to LS1 since your use of the same alias twice was confusing

    (b) Got rid of the .* construct and specified the columns indivdually

    (c) Moved the parts of the join predicate that only refer to one side of the join into the WHERE clause: with an INNER JOIN this doesn't make any difference but I think it's clearer to do it like that and it's safer in case you ever change it to an OUTER JOIN.

    John

    DECLARE

    @RegStatusCatID int,

            @RegID UNIQUEIDENTIFIER

    SET     @RegStatusCatID = 1

    SET     @RegID = NULL;

    WITH LS1 AS

    (

    SELECT  RS.RegistrationID, RS.StatusID, RS.StatusTimeStamp

    FROM    #RegStatus RS

    JOIN    (

            SELECT RS.RegistrationID , MAX(RS.UniqueRowID) UniqueRowID

            FROM   #RegStatus RS

            JOIN   #Status S

            ON     RS.StatusID = S.StatusID

            WHERE  S.StatusCategoryID = @RegStatusCatID

            AND    (@RegID IS NULL

                   OR RS.RegistrationID = @RegID)

            GROUP BY RS.RegistrationID

            ) LS

    ON      RS.UniqueRowID = LS.UniqueRowID

    )

    SELECT

      LS1.RegistrationID,

            LS1.StatusID,

            LS1.StatusTimeStamp,

            S.StatusName,

            S.StatusCategoryID

    FROM    #Status S

    JOIN    LS1

    ON      S.StatusID = LS1.StatusID

  • John, thanks for the detailed explanations.  I really appreciate your perspective.  I didn't realize that I can still use a CTE in a function without making a temp table.  I'll play with that today.  I had some suggestions from the same post in comp.databases.ms-sqlserver that lead me to using ROW_NUMBER() in the inner query.  Here's the sample:

    --Common Table Expression that Ranks the status' for each registrant

    --based on the order of the UniqueRowID

    WITH CurStatus AS

    (

     SELECT RegistrationID,

        StatusID,

        StatusTimeStamp,

       [UniqueRowID],

        ROW_NUMBER()

        OVER(

         PARTITION BY RegistrationID

         ORDER BY [UniqueRowID] DESC

       &nbsp AS rn

     FROM #RegStatus RS

     where (

       @RegID is null

       or (@RegID is not null

        and RS.RegistrationID = @RegID)

      &nbsp

    )

    SELECT S.StatusID, S.StatusName, S.StatusCategoryID, CS.RegistrationID, CS.StatusTimeStamp

    FROM #Status S

    JOIN CurStatus CS

     ON S.StatusID = CS.StatusID

     AND CS.RN = 1

     AND S.StatusCategoryID = @RegStatusCatID

  • Mark

    I see what you've done there, using the ROW_NUMBER rank instead of the MAX aggregate function.  I'd be interested to find out - have you compared the execution plans for this with what you had before?  Which is more efficient?  I've got an inkling that the original should perform better, especially if you have a large amount of data, since the result set of the CTE should be much smaller using MAX than using ROW_NUMBER.

    John

  • Currently I've only tested it with that sample data and it performs much better according to the execution plans (from how I know how to read them).  Next step is mass amounts of data.

    Mark

  • Mark

    OK, please let us know when you've done that.  My inkling may have been wrong, since the query optimizer may not totally materialise the CTE anyway.  If you SET STATISTICS IO ON, that'll tell you how much juice each query is using... but don't forget to clear the cache before each test (not on a production server)!  What indexes do you have on your tables?

    Thanks

    John

  • John, I'm not sure how to clear my cache, but here's my results with simply 1500 rows in the RegStatus Table.  I'd like some more data in that table to replicate what's going to happen in a live environment, but that'll happen in the coming weeks.

    Using ROW_Number() and the CTE (Subtree cost 0.0584093)

    (32 row(s) affected)

    Table 'Status'. Scan count 1, logical reads 66, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'RegStatus'. Scan count 1, logical reads 22, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

     

    Previous example, no changes  (Subtree cost 0.115096)

    (32 row(s) affected)

    Table 'Status'. Scan count 1, logical reads 66, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'RegStatus'. Scan count 2, logical reads 44, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    I'm not OVERLY familiar with reading the Execution plans, but in SQLSMS I clicked the Actual Execution Plan button and ran the queries.  Relative to the batch the Ranked query was 33% of the batch, and the Old was 66%.  The subtree costs of the select statements were listed above.

  • Mark

    Yes, a similar thing happens with your original sample data as well.  Now we just need to see what happens when you get hold of all the extra data.  What indexes do you have on the tables?

    By the way, you can use DBCC DROPCLEANBUFFERS to clear the data cache.

    John

  • Assuming a clustered index on the UniqueRowId (or a combination of the other columns used in the aggregate query), in a similar situation we've found that using TOP(1) with an order by on the cluster to perform faster than the MAX. Also, in situations where you're returning only the latest value, using CROSS APPLY with the TOP(1) query ran much faster.

    SELECT ...

    FROM dbo.BoundBookedPolicy p

    CROSS APPLY (SELECT TOP(1) v2.BoundBookedPolicyId,v2.BoundBookedVersionId,v2.EndsmtNum,v2.EndsmtId

    FROM dbo.BoundBookedVersion v2

    WHERE v2.BoundBookedPolicyId = p.BoundBookedPolicyId

    ORDER BY v2.BoundBookedPolicyId DESC, v2.BoundbookedVersionId DESC) AS v

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply