Finding Problem Code

  • Eric M Russell (5/1/2015)


    djackson 22568 (5/1/2015)


    Commenting mainly to benefit from what others share...

    We are implementing a major product that solves this issue by having a shadow of live, and all reports are run against that system. Production is configured to actually prevent most reports from running!

    What's a "shadow of live" ?

    In this case Intersystems Cache uses the term shadow. All production transactions are done against "live" or "prod". Prod then "shadows" or "mirrors" all transactions to a second server.

    All reports are written against the shadow server unless there is a real, real good reason to require up to the second data. Generally speaking there is an expected delay of less than 15 seconds on the shadow. That is a small enough window to have close to zero impact against reporting.

    Dave

  • Steve Jones - SSC Editor (5/1/2015)


    david.gugg (5/1/2015)


    We run a daily trace that captures SQL running longer that X seconds. This gives us a good idea of what SQL can be tuned to provide the biggest improvement to our system.

    The trace is started via a SQL Agent job that also aggregates data into tables for monitoring blocks, deadlocks, and long running SQL's historical performance.

    Want to describe this in an article for others?

    We have articles about capturing database operational stats to tables and then using it to perform baseline analysis and etc.

    5 Reasons You Must Start Capturing Baseline Data

    By Erin Stellato

    http://www.sqlservercentral.com/articles/baselines/94657/

    Baseline SQL Server with SQL Sentry Performance Advisor

    http://blogs.sqlservercentral.com/blogs/sqlserver365/2013/06/24/baseline-sql-server-with-sql-sentry-performance-advisor/

    After reading the above, then read this rather funny blog post by Brent Ozar describing one experience he had with a DBA who did this the wrong way.

    http://www.brentozar.com/archive/2014/03/the-worst-database-user/

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (5/1/2015)


    Steve Jones - SSC Editor (5/1/2015)


    david.gugg (5/1/2015)


    We run a daily trace that captures SQL running longer that X seconds. This gives us a good idea of what SQL can be tuned to provide the biggest improvement to our system.

    The trace is started via a SQL Agent job that also aggregates data into tables for monitoring blocks, deadlocks, and long running SQL's historical performance.

    Want to describe this in an article for others?

    We have articles about capturing database operational stats to tables and then using it to perform baseline analysis and etc.

    5 Reasons You Must Start Capturing Baseline Data

    By Erin Stellato

    http://www.sqlservercentral.com/articles/baselines/94657/

    Baseline SQL Server with SQL Sentry Performance Advisor

    http://blogs.sqlservercentral.com/blogs/sqlserver365/2013/06/24/baseline-sql-server-with-sql-sentry-performance-advisor/

    After reading the above, then read this rather funny blog post by Brent Ozar describing one experience he had with a DBA who did this the wrong way.

    http://www.brentozar.com/archive/2014/03/the-worst-database-user/

    Those posts don't describe setting up the trace and moving that over. They imply it, but don't show it.

  • david.gugg (5/1/2015)


    We run a daily trace that captures SQL running longer that X seconds. This gives us a good idea of what SQL can be tuned to provide the biggest improvement to our system.

    The trace is started via a SQL Agent job that also aggregates data into tables for monitoring blocks, deadlocks, and long running SQL's historical performance.

    An awful lot of technical people believe "the longest running queries (or sections of code) are the biggest issue". That simply isn't true overall. I wish I could remember who was the first to teach me this but I don't. Total processing cycles is what you are looking for. A piece of code that uses 20% of your overall processor time per month might only take a second to run.

    Imagine you have a query that runs for 1/10th of a second, but runs a million times a day. Imagine you have a query that runs for 30 minutes once a month. Shaving 10% off of the run time of the first query would produce far greater benefits than shaving half the time of the long query.

    I am not saying to ignore long running queries, but you have to do some analysis to determine if fixing those would actually have a positive impact overall.

    Dave

  • Eric M Russell (5/1/2015)


    djackson 22568 (5/1/2015)


    Commenting mainly to benefit from what others share...

    We are implementing a major product that solves this issue by having a shadow of live, and all reports are run against that system. Production is configured to actually prevent most reports from running!

    What's a "shadow of live" ?

    How does this solution contrast with SQL Server's transaction replication feature; or is it just built on top of transaction replication?

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (5/1/2015)


    Eric M Russell (5/1/2015)


    djackson 22568 (5/1/2015)


    Commenting mainly to benefit from what others share...

    We are implementing a major product that solves this issue by having a shadow of live, and all reports are run against that system. Production is configured to actually prevent most reports from running!

    What's a "shadow of live" ?

    How does this solution contrast with SQL Server's transaction replication feature; or is it just built on top of transaction replication?

    I think I may have answered this above.

    Cache has write daemons that commit transactions to the file system. Like SQL Server, everything is journaled (logged) in a journal file (transaction log). The content of the logs are copied to the shadow server. The write daemons on the second server commit the transactions from the log. Since I am in the middle of implementation, and halfway through the courses for certification, I am not confident enough to state more than that. Overall, I keep hearing how we need to forget everything we know about SQL Server or Oracle, yet I don't see any significant differences from a logical point of view. Oracle and SQL Server are "different" but overall I see these as implementation details rather than significant architectural issues.

    Dave

  • Eric M Russell (5/1/2015)


    Static code analysis doesn't know how many times the code is executed, how much data it accesses, or how it's impacted when running in parallel with other specific code. However, something like a SSMS plugin or report can be a useful for enforcing best coding practices in general.

    Querying the TOP X ranked plans based on total reads, wait time, etc. over a period of time is the best method of identifying problem T-SQL code. That's what I turn to when performance tuning.

    Quick thought, detecting a bomb when it goes off isn't that hard, it's harder finding the ones that are about to go off.

    😎

  • djackson 22568 (5/1/2015)


    david.gugg (5/1/2015)


    We run a daily trace that captures SQL running longer that X seconds. This gives us a good idea of what SQL can be tuned to provide the biggest improvement to our system.

    The trace is started via a SQL Agent job that also aggregates data into tables for monitoring blocks, deadlocks, and long running SQL's historical performance.

    An awful lot of technical people believe "the longest running queries (or sections of code) are the biggest issue". That simply isn't true overall. I wish I could remember who was the first to teach me this but I don't. Total processing cycles is what you are looking for. A piece of code that uses 20% of your overall processor time per month might only take a second to run.

    Imagine you have a query that runs for 1/10th of a second, but runs a million times a day. Imagine you have a query that runs for 30 minutes once a month. Shaving 10% off of the run time of the first query would produce far greater benefits than shaving half the time of the long query.

    I am not saying to ignore long running queries, but you have to do some analysis to determine if fixing those would actually have a positive impact overall.

    You are absolutely right Dave, one has to include all the metrics in the equation. As an example, I recently changed a function that before the changes would take on average 250ms, during high peaks it would be called 1000 times/sec. On a 32 core system that mounts up to 8 x in wait or 900000 seconds of work for every 3600 seconds of actual time. Tuned it down to 3-4ms and the server was happy again.

    😎

  • Eirikur Eiriksson (5/1/2015)


    Eric M Russell (5/1/2015)


    Static code analysis doesn't know how many times the code is executed, how much data it accesses, or how it's impacted when running in parallel with other specific code. However, something like a SSMS plugin or report can be a useful for enforcing best coding practices in general.

    Querying the TOP X ranked plans based on total reads, wait time, etc. over a period of time is the best method of identifying problem T-SQL code. That's what I turn to when performance tuning.

    Quick thought, detecting a bomb when it goes off isn't that hard, it's harder finding the ones that are about to go off.

    😎

    There are certainly bad coding practices that static code analysis can sniff out. However, even well written SQL that passes the sniff test can blow up inadventernly when mixed with enough data, or the wrong parameters, or when it runs in parallel with itself or other types of SQL.

    It would be interesting to see if a static analysis tool can actually identify in development the same queries that would end up causing the TOP 10 bottlenecks in production.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (5/1/2015)


    Eirikur Eiriksson (5/1/2015)


    Eric M Russell (5/1/2015)


    Static code analysis doesn't know how many times the code is executed, how much data it accesses, or how it's impacted when running in parallel with other specific code. However, something like a SSMS plugin or report can be a useful for enforcing best coding practices in general.

    Querying the TOP X ranked plans based on total reads, wait time, etc. over a period of time is the best method of identifying problem T-SQL code. That's what I turn to when performance tuning.

    Quick thought, detecting a bomb when it goes off isn't that hard, it's harder finding the ones that are about to go off.

    😎

    There are certainly bad coding practices that static code analysis can sniff out. However, even well written SQL that passes the sniff test can blow up inadventernly when mixed with enough data, or the wrong parameters, or when it runs in parallel with itself or other types of SQL.

    It would be interesting to see if a static analysis tool can actually identify in development the same queries that would end up causing the TOP 10 bottlenecks in production.

    I have tried out few tools and found that having some code metrics gives almost the same or even sometimes a better indication where potential problems lie. From the top of my head, between 50% and 75% of badly performing code can be identified using static analysis.

    😎

  • Eirikur Eiriksson (5/1/2015)


    Eric M Russell (5/1/2015)


    Eirikur Eiriksson (5/1/2015)


    Eric M Russell (5/1/2015)


    Static code analysis doesn't know how many times the code is executed, how much data it accesses, or how it's impacted when running in parallel with other specific code. However, something like a SSMS plugin or report can be a useful for enforcing best coding practices in general.

    Querying the TOP X ranked plans based on total reads, wait time, etc. over a period of time is the best method of identifying problem T-SQL code. That's what I turn to when performance tuning.

    Quick thought, detecting a bomb when it goes off isn't that hard, it's harder finding the ones that are about to go off.

    😎

    There are certainly bad coding practices that static code analysis can sniff out. However, even well written SQL that passes the sniff test can blow up inadventernly when mixed with enough data, or the wrong parameters, or when it runs in parallel with itself or other types of SQL.

    It would be interesting to see if a static analysis tool can actually identify in development the same queries that would end up causing the TOP 10 bottlenecks in production.

    I have tried out few tools and found that having some code metrics gives almost the same or even sometimes a better indication where potential problems lie. From the top of my head, between 50% and 75% of badly performing code can be identified using static analysis.

    😎

    Well, 95% of badly performing SQL can is identified and eliminated in my head before I even type it out... yet, when I deploy the SQL production, a query for the TOP 10 worst performing query plans always returns somthing new. :hehe:

    So, what is this static analysis tool that you use for T-SQL? I'd like to give it a try.

    You're not talking about something like SQL Sentry Plan Explorer, are you? I already look at execution plans.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eirikur Eiriksson (5/1/2015)


    djackson 22568 (5/1/2015)


    david.gugg (5/1/2015)


    We run a daily trace that captures SQL running longer that X seconds. This gives us a good idea of what SQL can be tuned to provide the biggest improvement to our system.

    The trace is started via a SQL Agent job that also aggregates data into tables for monitoring blocks, deadlocks, and long running SQL's historical performance.

    An awful lot of technical people believe "the longest running queries (or sections of code) are the biggest issue". That simply isn't true overall. I wish I could remember who was the first to teach me this but I don't. Total processing cycles is what you are looking for. A piece of code that uses 20% of your overall processor time per month might only take a second to run.

    Imagine you have a query that runs for 1/10th of a second, but runs a million times a day. Imagine you have a query that runs for 30 minutes once a month. Shaving 10% off of the run time of the first query would produce far greater benefits than shaving half the time of the long query.

    I am not saying to ignore long running queries, but you have to do some analysis to determine if fixing those would actually have a positive impact overall.

    You are absolutely right Dave, one has to include all the metrics in the equation. As an example, I recently changed a function that before the changes would take on average 250ms, during high peaks it would be called 1000 times/sec. On a 32 core system that mounts up to 8 x in wait or 900000 seconds of work for every 3600 seconds of actual time. Tuned it down to 3-4ms and the server was happy again.

    😎

    Excellent. I have worked with a lot of people who would have ignored that fix.

    The best one in my recollection was some VB 6 code that was reading a file, doing statistical analysis and projections, then writing to a file.

    Oops, the person who wrote it put the reads of the source, file opens, writes and closes of the target, inside the loop! A process that ran for 8 hours took less than 15 minutes when I was done. I really can't take credit for this. Any competent developer would have found this 2 minutes after starting to debug it.

    I recall a discussion here regarding whether college is useful. This is an area where I think a good education is useful. Understanding why things run faster on the processor, slower in cache, slower still in memory, and almost at a dead stop on disk, helps you to more quickly identify low hanging fruit.

    Dave

  • djackson 22568 (5/1/2015)


    ...

    I recall a discussion here regarding whether college is useful. This is an area where I think a good education is useful. Understanding why things run faster on the processor, slower in cache, slower still in memory, and almost at a dead stop on disk, helps you to more quickly identify low hanging fruit.

    They teach that in college now? Taking a step back even further from system engineering, the most important thing I took away from college, what has benefitted me the most, was Critical Thinking. Any subject oriented knowledge can be leaned from a good book.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (5/1/2015)


    Eirikur Eiriksson (5/1/2015)


    Eric M Russell (5/1/2015)


    Eirikur Eiriksson (5/1/2015)


    Eric M Russell (5/1/2015)


    Static code analysis doesn't know how many times the code is executed, how much data it accesses, or how it's impacted when running in parallel with other specific code. However, something like a SSMS plugin or report can be a useful for enforcing best coding practices in general.

    Querying the TOP X ranked plans based on total reads, wait time, etc. over a period of time is the best method of identifying problem T-SQL code. That's what I turn to when performance tuning.

    Quick thought, detecting a bomb when it goes off isn't that hard, it's harder finding the ones that are about to go off.

    😎

    There are certainly bad coding practices that static code analysis can sniff out. However, even well written SQL that passes the sniff test can blow up inadventernly when mixed with enough data, or the wrong parameters, or when it runs in parallel with itself or other types of SQL.

    It would be interesting to see if a static analysis tool can actually identify in development the same queries that would end up causing the TOP 10 bottlenecks in production.

    I have tried out few tools and found that having some code metrics gives almost the same or even sometimes a better indication where potential problems lie. From the top of my head, between 50% and 75% of badly performing code can be identified using static analysis.

    😎

    Well, 95% of badly performing SQL can is identified and eliminated in my head before I even type it out... yet, when I deploy the SQL production, a query for the TOP 10 worst performing query plans always returns something new. :hehe:

    So, what is this static analysis tool that you use for T-SQL? I'd like to give it a try.

    You're not talking about something like SQL Sentry Plan Explorer, are you? I already look at execution plans.

    I think your quite on spot with the 95% of the bad performing code, my point is trying to identify that code quicker than reading through hundreds of thousands of lines of code.

    After working with quite few tools like Rational, SonarCube etc. I started to use a simpler method of parsing sys.all_sql_modules.definition, quicker, more accurate, perfectly customizable and entirely free;-) Happy to share the code if you like.

    😎

  • From my DBA role, I also did ran into problem while finding problem "code" on and off

    The tools I use most being from the SQL Server side, so I could find which T-SQL statement being most fruitful for tuning (e.g. use most CPU / use the most IO / run the most often etc), from even just SQL Server DMV of procedure cache ...

    However, developer are usually talking on their own "dialect", where T-SQL may not always part of that, so fail to communicate is usually the roadblock on such exercise.

    Entity Framework is a case (while the earlier post in this topics already discussed), while another arise from an ERP (we use SAP)

    The Problem is : developer may not exactly know which statement / "code" they written actually cause the poor T-SQL where DBA recognized.

    In fact, the T-SQL statement is just the "product" of the "data tier middleware" (e.g. EF, etc) developer use, which translate their code into DB engines dialect.

    For SAP case, (as I am newbie of SAP, 3 years before), it was not until I understand its tools (called "DBA cockpit", or, I just recognized invoke by so called SAP Transaction "ST04") can I become effective to discuss with Developer what they should be watch out.

    I guess as the proliferation of all kind of data tier middleware in developers' world, any such tools vendor must also match it with its own "translation tools", so as to allow the generated T-SQL to be back tracked to original statement. (I see SAP tools' use some sort of machine generated comment, which indicate the module / line of codes from which corr. T-SQL statement being generated)

    I would be much appreciated if anyone can teach me on any such method applied on "Entity Framework" or any other such kind of "data tier middleware".

    (Our development team comes with all sort of tools, they just need not to tell DBA what are being used, until they really need to seek for help)

Viewing 15 posts - 16 through 30 (of 49 total)

You must be logged in to reply to this topic. Login to reply