July 18, 2008 at 9:34 am
I very recently stumbled on the "Include Client Statistics" feature, and was delighted to find another tool to help analyze my query performance. (Although I was dismayed that the phrase "Client Statistics" was not in BOL, and I can find no documentation.)
Today I am writing a query that is based of a bunch of Common Table Expressions, and was using Client Statistics to incrementaly test parts of it.
Simplified, here is the part of the query I was testing:
WITH CPMIS AS--Get Personnel changes from CPMIS
(
SELECT FR.*
FROM [HR_Joined].DBO.[PERS133Titles Current and History] AS FR
LEFT OUTER JOIN [HR_Joined].[DBO].[PERS133Titles Current and History] AS [TTO]
ON FR.[SOURCE] = TTO.SOURCE
AND FR.[SSN] = TTO.SSN
AND FR.EOD = TTO.EOD
AND FR.X-1 = TTO.X
WHERE FR.[BEGIN D H] >= FR.EOD -- Restrict to actions after EOD
-- returns 252,061 Rows
)
, PERSCHG AS
(
SELECT FR.*
FROM [HR_Joined].[dbo].[NFCFRPOS]as FR
-- returns 367,766 Rows
)
, SSNS AS
(
SELECT CPMIS.SSN
FROM CPMIS
UNION
SELECT PERSCHG.SSNO
FROM PERSCHG
-- returns 120,497 Rows
)
select * from ssns
-- returns 120,497 Rows
In summary, it includes 3 Selects in 3 CTEs, and (for testing) a final select to return the output of the last CTE.
What confused me were the client statistics returned from two identical runs of the query; the first reports only 2 select statement, and the second (just a repeat of the first execution) reports only 1 select statement. There may have been an earlier execution prior to Trial 1. (Sorry about the ugly column alignment...)
Trial 2Trial 1Average
-----------------------------------------------------------------------------------
Number of SELECT statements 121.5000
Rows returned by SELECT statements120497240994180745.5000
Number of transactions 000.0000
Network Statistics
Number of server roundtrips41811.0000
TDS packets sent from client52213.5000
TDS packets received from server6231256939.5000
Bytes sent from client51422042812785.0000
Bytes received from server253566950717343803702.0000
Time Statistics
Client processing time469734601.5000
Total execution time110920931601.0000
Wait time on server replies6401359999.5000
I wonder if the SQL engine is reusing some hidden intermediate result tables in TEMP that were left behind after prior runs? Any insights would be appreciated!
July 18, 2008 at 1:31 pm
I'd say that it had realized after the first run that the only column you're actually using from each of the first CTEs is the SSN column, and that the third CTE is really the outer query, so it cut out a bunch of junk between the two executions. (Okay, "realized" is the wrong word, but it should make the concept clear enough.)
Which, of course, begs the question, why do "SELECT FR.*" in the first CTE, if the only column you're going to use is the SSN? Same for the second CTE. And why have the third CTE at all, when it really is the outer query for the first two CTEs?
And that, of course, raises the question of why even do this as CTEs, when it can be done as a simple Union select? Not that that will probably have any effect on the query. This last one is just curiosity for me, the other points in the prior paragraph actually matter in the database.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 22, 2008 at 4:44 am
Thanks for your insights, GSquared. Perhaps things got distorted because I was testing only the initial part of a more complex query. All columns of the first two CTEs are ultimately used, as is the list of distinct SSNs encountered by the first two CTEs. I was trying to audit the row count to be sure I am not losing rows (inner join with no matches) or gaining spurious rows (outer joins encountering duplicates in either the left or right source.) I wish there was an easier way to spot those anomalies, but it does not look like Client Statistics provides the answer. Just another reason why I should have (but can't for other considerations) better constraints on my source tables.
(Actually, re the select(*)'s, before posting I simplified my actual queries to remove the list of selected columns, just to make the sample smaller.)
July 22, 2008 at 7:06 am
This may be drifting off topic; if I get no replies here I might just post the question as a new topic.
I am still stuck trying to find a way to get counts from my (multiple) CTEs. I'd like to capture @@ROWCOUNT, but I can't find a strategy that does not run afoul of the restrictions on CTE formats. So far I have unsuccessfully tried establishing local variables (can't define variables ahead of the CTEs), defining the locals and doing count(*) after the main query (only one main query allowed).
I guess my next attempt will be to replace the CTEs with temp #tables, so I will be able to capture the row counts after I create each.
Anybody have a better approach?
July 22, 2008 at 11:57 am
You can add a column to the final select with an inline count of the CTE's rows. But if you're doing anything that complex, temp tables is probably a better idea.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Viewing 5 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply