Want to make ''Get Latest Status'' function more efficient

Question

Post reply

Want to make ''Get Latest Status'' function more efficient

Mark Rullo

SSC Rookie

Points: 27
More actions
May 14, 2007 at 8:53 am

#178512

Hey folks, I'm looking at making the following query more efficient potentially using the ranking functions and I'd like some advice from the gurus. The purpose of the following is to have a status for a person, and also have a historical background as to what they've done, status wise. This was the best way I could come up with to do this a few years back, but I'm wondering if there's a better way with Sql 2005.
Here's a toned down structure and my query. Any help/critique would be greatly appreciated.
CREATE TABLE #Status(
StatusID int NOT NULL,
StatusName VARCHAR(50) NOT NULL,
StatusCategoryID int NOT NULL
) ON [PRIMARY]
CREATE TABLE #RegStatus(
[RegistrationID] [uniqueidentifier] NOT NULL,
[StatusID] [int] NOT NULL,
[StatusTimeStamp] [datetime] NOT NULL,
[UniqueRowID] [int] IDENTITY(1,1) NOT NULL
) ON [PRIMARY]
SET NOCOUNT on
INSERT INTO #Status VALUES(200, 'StatusA', 1)
INSERT INTO #Status VALUES(210, 'StatusB', 1)
INSERT INTO #Status VALUES(115, 'StatusC', 1)
INSERT INTO #Status VALUES(112, 'StatusD', 1)
INSERT INTO #Status VALUES(314, 'StatusE', 1)
INSERT INTO #Status VALUES(15, 'StatusF', 1)
INSERT INTO #Status VALUES(22, 'StatusG', 1)
INSERT INTO #Status VALUES(300, 'StatusX', 2)
INSERT INTO #Status VALUES(310, 'StatusY', 2)
INSERT INTO #Status VALUES(320, 'StatusZ', 2)
INSERT INTO #RegStatus VALUES('7A6058D0-06CB-4E83-A8C4-B1AFC74B11F0', 200, GETDATE())
INSERT INTO #RegStatus VALUES('7A6058D0-06CB-4E83-A8C4-B1AFC74B11F0', 210, GETDATE())
INSERT INTO #RegStatus VALUES('7A6058D0-06CB-4E83-A8C4-B1AFC74B11F0', 115, GETDATE())
INSERT INTO #RegStatus VALUES('8B94A666-A3DD-4CB1-89A0-9910047AE7A0', 112, GETDATE())
INSERT INTO #RegStatus VALUES('8B94A666-A3DD-4CB1-89A0-9910047AE7A0', 314, GETDATE())
INSERT INTO #RegStatus VALUES('8B94A666-A3DD-4CB1-89A0-9910047AE7A0', 200, GETDATE())
INSERT INTO #RegStatus VALUES('8B94A666-A3DD-4CB1-89A0-9910047AE7A0', 22, GETDATE())
INSERT INTO #RegStatus VALUES('58AF0FC6-C900-4BD0-B3F7-F9D62701F021', 15, GETDATE())
INSERT INTO #RegStatus VALUES('58AF0FC6-C900-4BD0-B3F7-F9D62701F021', 115, GETDATE())
INSERT INTO #RegStatus VALUES('58AF0FC6-C900-4BD0-B3F7-F9D62701F021', 200, GETDATE())
INSERT INTO #RegStatus VALUES('58AF0FC6-C900-4BD0-B3F7-F9D62701F021', 115, GETDATE())
SET NOCOUNT Off
/*
This is a query from within a function that I use to not only get the latest
status for one registrant, but I can use it to get the latest status for everyone as well.
*/
DECLARE @RegStatusCatID int,
@RegID UNIQUEIDENTIFIER
SET @RegStatusCatID = 1
SET @RegID = null
select LS.*, S.StatusName, S.StatusCategoryID
from #Status S
join(
select RS.RegistrationID, RS.StatusID, RS.StatusTimeStamp
from #RegStatus RS
join
  (
  SELECT RS.RegistrationID , max(RS.UniqueRowID) UniqueRowID
  FROM #RegStatus RS
  join #Status S
   on RS.StatusID = S.StatusID
   and S.StatusCategoryID = @RegStatusCatID
   and (
    @RegID is null
    or (@RegID is not null
     and RS.RegistrationID = @RegID)
   &nbsp
  group by RS.RegistrationID
&nbsp LS
  on RS.UniqueRowID = LS.UniqueRowID
) LS
on S.StatusID = LS.StatusID
--SELECT * FROM #RegStatus
DROP TABLE #RegStatus
DROP TABLE #Status

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 1

Mark

Two things:

(1) Using a Common Table Expression instead of a subquery would make your query more readable, even if it didn't help the performance.

(2) At the beginning you set @RegID to be null, so why does your query test for NULLness further down? Even if that test were necessary, I think you could simplify it to ...@RegID IS NULL OR RS.RegistrationID = @RegID...

John

Mark Rullo SSC Rookie Points: 27 More actions · Answer 2

John, I appreciate the prompt reply.

(1) I'm currently looking into CTEs and cannot figure out how they'll help the situation. Currently I have this code in a UDF (not explained very well earlier) and I think if I used a CTE, I'd have to create a temp table to return my values. I would guess this extra step would slow things down a little.

(2)

@RegID is null

or (@RegID is not null

and RS.RegistrationID = @RegID)

this code acts as a case statement. If I pass in a RegID to the UDF, it'll return the status of only one RegistrationID. This one UDF serves 2 purposes. It'd PROBABLY be best of me to leave the RegID business out of this all together and make 2 UDF's, one for a single RegistrationIDs, and one for all RegistrationIDs, but for maintainability, I've put both into one.

Mark

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 3

Mark

CTEs can be used in UDFs without temporary tables. I have rewritten your query as below. I'm not sure about improving performance, but it makes it more readable. You mentioned ranking functions in your first post, but I can't see how those would help... but then I'm obviously not as familiar with your data as you are.

I still think you can shorten the @RegID part without loss of generality. Certainly for the sample data you have provided, the same result set is returned.

By the way, why do you have an identity column and a unique identifier in one of your tables?

I've made the following changes:

(a) Changed one of the aliases (actually the name of the CTE) to LS1 since your use of the same alias twice was confusing

(b) Got rid of the .* construct and specified the columns indivdually

(c) Moved the parts of the join predicate that only refer to one side of the join into the WHERE clause: with an INNER JOIN this doesn't make any difference but I think it's clearer to do it like that and it's safer in case you ever change it to an OUTER JOIN.

John

DECLARE

@RegStatusCatID int,

@RegID UNIQUEIDENTIFIER

SET @RegStatusCatID = 1

SET @RegID = NULL;

WITH LS1 AS

(

SELECT RS.RegistrationID, RS.StatusID, RS.StatusTimeStamp

FROM #RegStatus RS

JOIN (

SELECT RS.RegistrationID , MAX(RS.UniqueRowID) UniqueRowID

FROM #RegStatus RS

JOIN #Status S

ON RS.StatusID = S.StatusID

WHERE S.StatusCategoryID = @RegStatusCatID

AND (@RegID IS NULL

OR RS.RegistrationID = @RegID)

GROUP BY RS.RegistrationID

) LS

ON RS.UniqueRowID = LS.UniqueRowID

)

SELECT

LS1.RegistrationID,

LS1.StatusID,

LS1.StatusTimeStamp,

S.StatusName,

S.StatusCategoryID

FROM #Status S

JOIN LS1

ON S.StatusID = LS1.StatusID

Mark Rullo SSC Rookie Points: 27 More actions · Answer 4

John, thanks for the detailed explanations. I really appreciate your perspective. I didn't realize that I can still use a CTE in a function without making a temp table. I'll play with that today. I had some suggestions from the same post in comp.databases.ms-sqlserver that lead me to using ROW_NUMBER() in the inner query. Here's the sample:

--Common Table Expression that Ranks the status' for each registrant

--based on the order of the UniqueRowID

WITH CurStatus AS

(

SELECT RegistrationID,

StatusID,

StatusTimeStamp,

[UniqueRowID],

ROW_NUMBER()

OVER(

PARTITION BY RegistrationID

ORDER BY [UniqueRowID] DESC

&nbsp AS rn

FROM #RegStatus RS

where (

@RegID is null

or (@RegID is not null

and RS.RegistrationID = @RegID)

&nbsp

)

SELECT S.StatusID, S.StatusName, S.StatusCategoryID, CS.RegistrationID, CS.StatusTimeStamp

FROM #Status S

JOIN CurStatus CS

ON S.StatusID = CS.StatusID

AND CS.RN = 1

AND S.StatusCategoryID = @RegStatusCatID

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 5

Mark

I see what you've done there, using the ROW_NUMBER rank instead of the MAX aggregate function. I'd be interested to find out - have you compared the execution plans for this with what you had before? Which is more efficient? I've got an inkling that the original should perform better, especially if you have a large amount of data, since the result set of the CTE should be much smaller using MAX than using ROW_NUMBER.

John

Mark Rullo SSC Rookie Points: 27 More actions · Answer 6

Currently I've only tested it with that sample data and it performs much better according to the execution plans (from how I know how to read them). Next step is mass amounts of data.

Mark

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 7

Mark

OK, please let us know when you've done that. My inkling may have been wrong, since the query optimizer may not totally materialise the CTE anyway. If you SET STATISTICS IO ON, that'll tell you how much juice each query is using... but don't forget to clear the cache before each test (not on a production server)! What indexes do you have on your tables?

Thanks

John

Mark Rullo SSC Rookie Points: 27 More actions · Answer 8

John, I'm not sure how to clear my cache, but here's my results with simply 1500 rows in the RegStatus Table. I'd like some more data in that table to replicate what's going to happen in a live environment, but that'll happen in the coming weeks.

Using ROW_Number() and the CTE (Subtree cost 0.0584093)

(32 row(s) affected)

Table 'Status'. Scan count 1, logical reads 66, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'RegStatus'. Scan count 1, logical reads 22, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Previous example, no changes (Subtree cost 0.115096)

(32 row(s) affected)

Table 'Status'. Scan count 1, logical reads 66, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'RegStatus'. Scan count 2, logical reads 44, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

I'm not OVERLY familiar with reading the Execution plans, but in SQLSMS I clicked the Actual Execution Plan button and ran the queries. Relative to the batch the Ranked query was 33% of the batch, and the Old was 66%. The subtree costs of the select statements were listed above.

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 9

Mark

Yes, a similar thing happens with your original sample data as well. Now we just need to see what happens when you get hold of all the extra data. What indexes do you have on the tables?

By the way, you can use DBCC DROPCLEANBUFFERS to clear the data cache.

John

Grant Fritchey SSC Guru Points: 399331 More actions · Answer 10

Assuming a clustered index on the UniqueRowId (or a combination of the other columns used in the aggregate query), in a similar situation we've found that using TOP(1) with an order by on the cluster to perform faster than the MAX. Also, in situations where you're returning only the latest value, using CROSS APPLY with the TOP(1) query ran much faster.

SELECT ...

FROM dbo.BoundBookedPolicy p

CROSS APPLY (SELECT TOP(1) v2.BoundBookedPolicyId,v2.BoundBookedVersionId,v2.EndsmtNum,v2.EndsmtId

FROM dbo.BoundBookedVersion v2

WHERE v2.BoundBookedPolicyId = p.BoundBookedPolicyId

ORDER BY v2.BoundBookedPolicyId DESC, v2.BoundbookedVersionId DESC) AS v

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning