Mastering SQL Server Profiler - Part 5: Data Columns

  • Comments posted to this topic are about the item Mastering SQL Server Profiler - Part 5: Data Columns

    Brad M. McGehee
    DBA

  • Great series Brad. I've used Profiler a fair bit for a while but will certainly be utilising it a lot more after watching your series using what I've learnt.

    One thing I find frustrating is trying to "profile" events where the application uses a dedicated SQL login and the users are in a terminal server environment.

    This (as far as I can see) leaves the DBA almost blind as to who and where.

    The login name is shared among all users and the host is always the terminal server name.

    Do you have any ideas on how to dig deeper into the data columns to get past this abstraction?

  • I have an odd situation in which the total duration for a period of time exceeds that of the given time period. For example, I run Profiler for 3-hours, however the sum of the Duration column for that period of time is 8-hours. I would expect it to be less than 3-hours. How is that possible? Does it have to do with parallel processing (MAXDOP), or do I have a bad Profile capture?

  • That is certainly curious. I have seen this happen for the CPU time, but have never noticed this for the Duration time. Have you done a second trace too preclude any potential problems?

    Brad M. McGehee
    DBA

  • We ran a long test overnight, so I'll check it when I get in and report back. I didn't do the original Profile capture, so it might be a bad capture. It was on a server where the tester had automated scripts to change the time, but I don't know if that happened during the test. I'll go through the original Profile (68 million rows!) and look for a time change. That's the only thing that would make sense. It's the only Profile capture I've run across where this has happened.

  • I was able to reproduce the issue. The total duration below is correct, but the sum of the Duration column is way off.

    Total Duration in Minutes: 273 (4.5 hours)

    Sum(Duration) in Minutes: 3476 (57.0 hours)

    Check my query to make sure I'm not doing something stupid (it happens from time to time).

    select ((datediff(MS, min(starttime), max(endtime))/1000)/60) as "Total Duration in Minutes",

    (sum(Duration)/1000000)/60 as "SQL Duration in Minutes"

    from temp_trc

    I'm at a loss, and haven't been able to find anything explaining the discrepancy.

  • Just throwing ideas in there...

    are you using multiple threads in the application?

    are you using parralelism?

    Your query seems to basically use this equation... endtime from las query - start time from first query.

    I think a better equation would be sum(endtime - starttime). That eliminates an errand query in there before or after the real work to sku with those results.

    That being said it still doesn't explain the ±15 fold different between the numbers, and if anything just widdens the gap

    One thing that could help explain why this happening is if the server is running at 100% for the whole time of the process and assuming the server uses all 8 availble processors and that each query only uses 1 processor, you just increased your available processing time by 8.... but not 15 obviously. This means that if you have 8 queries running in exactly 1 second each, then could all be processed in 1 second REAL TIME but adding up to 8 seconds total.

    One thing I first noticed back in sql 2000 (not sure if it still applies in 2005/08) is that I often had queries that run for very little time (0ms) but every once in a while like 4 or 5 runs, the server showed 13ms. I'm assuming here that there's a rounding error in there or just a weird way to calculate things.

    Even if a 3 ms error is small over each query... when compounded over 68 millions queries that makes for almost 57 hours in time.

    Can you check the average run time of each query? If I assume 68 M queries over 5 hours, that's only 250 nanoseconds AVERAGE. I think this could explain a great deal in your situation.

    Also I may suggest that you try maybe a real ETL process here if possible... do a single, even if parrallel load into a single [set of] table and then do your thing with that data. You should be able to bring that processing time down to a couple minutes or at worse 1 hour.

  • Ninja's_RGR'us (3/27/2010)


    Just throwing ideas in there...

    Not having direct access to the data, I still find it difficult to offer a good explanation, but certainly Ninga's_RGR'us suggestion that the Duration timing is inaccurate may account for much of the discrepancy. If I was faced with this issue, I would probably not worry about it, as the goal is to maximize overall performance, and not worry about some arbitrary statistics. If you are happy with the current performance, then the stats you see aren't important. If you are not happy with the current performance, then I would start experimenting with alternatives to see what I could do to boost performance in order to reduce the overall time it takes to accomplish your task. While tools, such as Profiler, can be handy, they can sometimes also lead us astray, and distract us for our ultimate goal.

    Brad M. McGehee
    DBA

  • Thank you for your reply. This is driving me nuts, and I'm sure I'm not the only one out there who has run accross this issue.

    Ninja's_RGR'us (3/27/2010)


    are you using multiple threads in the application?

    Yes. How would that effect the results?


    are you using parralelism?

    Yes. I have not changed any of the default server settings, and the application is not passing query hints to set MAXDOP.


    Your query seems to basically use this equation... endtime from las query - start time from first query.

    I think a better equation would be sum(endtime - starttime). That eliminates an errand query in there before or after the real work to sku with those results.

    I'm not sure what you mean here. To get the total duration in milliseconds, I take the difference between when it started, from when it ended using the datediff function (DATEDIFF ( datepart , startdate , enddate )).


    One thing that could help explain why this happening is if the server is running at 100% for the whole time of the process and assuming the server uses all 8 availble processors and that each query only uses 1 processor, you just increased your available processing time by 8.... but not 15 obviously. This means that if you have 8 queries running in exactly 1 second each, then could all be processed in 1 second REAL TIME but adding up to 8 seconds total.

    This would make sense. I'll monitor CPU usage next time and see. Would paralleism also have the same effect? On smaller, less instensive tests, the duration time and SQL time seems to fall into place (SQL time being less than total duration time).


    One thing I first noticed back in sql 2000 (not sure if it still applies in 2005/08) is that I often had queries that run for very little time (0ms) but every once in a while like 4 or 5 runs, the server showed 13ms. I'm assuming here that there's a rounding error in there or just a weird way to calculate things.

    Even if a 3 ms error is small over each query... when compounded over 68 millions queries that makes for almost 57 hours in time.

    On SQL 2005/2008, the duration column is captured in microseconds instead of milliseconds when saved to a table (which I'm doing). So what used to show as 0ms might show as 1000 microseconds, allowing you to get a more accurate time.


    Can you check the average run time of each query? If I assume 68 M queries over 5 hours, that's only 250 nanoseconds AVERAGE. I think this could explain a great deal in your situation.

    I'm still combing through the latest test, so I'll do further analysis later today.

  • min(starttime), max(endtime))...

    I don't have a copy of the trace but if you have open connections for a few minutes before starting the "real" work or a few minutes after the end or anything else unexpected, you'll get a longer duration than what actually happened during the real test.

    That's why I preffer to use start/stop time in a script or just a stop watch. My idea was just to do a sum of endtime - startime for all 68 M queries... this figure "should" match sum(Duration) / 1000.

    If parralelism is off... or not used for some reason it duplicates the available minutes of processing power.

    For my rounding theory I have no idea how it affects traces with 2005 and 2008 but I have the feeling it may with my experience in 2000.

    I think that if you add up all the possible causes you have a probable explaination... 8 cores meaning 8 secs of potential processing per "real time" second + even 0.1 MS rounding error in the data could account for 100% of the discrepency you experience.

    As annoying as those figures are. I agree with Brad's avice that you should just go with optimizing the most ressources and time hungry queries and let go of this issue, if only for your mental health! :w00t:

  • I've come to the realization that I won't be able to rely on the sum of the duration column for reporting the accumulated time it takes SQL Server to execute the queries. It seems to only be reliable in reporting the duration of individual transactions. Maybe that's by design, and that's fine, but I wish Microsoft would expand on that in Books Online.

    Thanks again for your replies.

  • oralinque (3/29/2010)


    I've come to the realization that I won't be able to rely on the sum of the duration column for reporting the accumulated time it takes SQL Server to execute the queries. It seems to only be reliable in reporting the duration of individual transactions. Maybe that's by design, and that's fine, but I wish Microsoft would expand on that in Books Online.

    Thanks again for your replies.

    I'm not sure I understand your conclusion. If you assume you can trust the duration of 1 transaction, then you should also be able to trust the 67 999 999 other transactions.

    I think the real problem is that we don't have all the figures here to make a real conclusion.

  • You probably have lots of overlapping queries. If your application generates 20 simultaneous queries that all start at 00:00:00 and then all finish at 00:00:02 the elapsed time would be 2 seconds, but the sum of all durations would be 40 seconds.

    It is also possible that you have some queries that take a very long time to complete - possibly because of locking.

    Note that the number of overlapping queries has nothing to do with the number of CPU:s in the machine. Even with a single CPU server it is possible for the clients to generate lots of overlapping queries - especially if there are multiple clients involved.

    So, in short: It is perfectly normal for the sum of durations to be longer than the elapsed time.

    /SG

  • Ninja's_RGR'us (3/29/2010)


    I'm not sure I understand your conclusion. If you assume you can trust the duration of 1 transaction, then you should also be able to trust the 67 999 999 other transactions.

    That's my whole point. In smaller tests, this is true. In a 45 minute test, the elapsed time is greater than the sum of the duration. However, on longer tests (4.5 hours) the elapsed time is correct, but the sum of the duration far exceeds the elapsed time.

    As Stefan_G suggests, if the problem is that transaction 1 starts at 00:03:00.000 and ends at 00:03:01.000, and transaction 2 starts and ends at the exact same time, the total duration is 1-second, but the sum of the duration column would be 2-seconds. If it were this simple, I could figure out a way to determine the true duration time. However, it's unlikely that they would start and end at the exact same time.

  • Stefan_G (3/30/2010)


    You probably have lots of overlapping queries. If your application generates 20 simultaneous queries that all start at 00:00:00 and then all finish at 00:00:02 the elapsed time would be 2 seconds, but the sum of all durations would be 40 seconds.

    That's it. Here is a small sample of three overlapping queries.

    Event ClientProcessID SPID StartTime EndTime Duration TextData

    SQL:BatchCompleted5740 74 12/1/2009 10:13:01.660 12/1/2009 10:13:01.713 52003 Query1

    SQL:BatchCompleted5596 68 12/1/2009 10:13:01.660 12/1/2009 10:13:01.713 52003 Query2

    SQL:BatchCompleted812 73 12/1/2009 10:13:01.660 12/1/2009 10:13:01.713 53003 Query3

    Thanks, Stefan_G.

    So in the end, it looks like you should not rely on the sum of the duration column if you want to compare the total SQL duration time with the total elapsed time.

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply