slow database performance

  • Hi All,

    Last week we got users complaining about specific database performance and they were saying almost all of the queries are running slow. They provided one sample query which took more than 10 mins, and they mentioned that specific query would actually finish in few seconds. I looked at the plan, it was doing a full table scan as they were using

    SELECT * from tablename where colname in ('val1','val2,'val3'..);

    It was returning only 71 rows and I saw the wait type as PAGEIOLATCH_SH. There was no blocking.

    I took the plan and explained them and the table was heap suffering from 99.9% fragmentation with more than 10K page count. This is how I made the query to run faster and it took around 4 seconds to complete.

    My question is, if all the queries are poorly performing. In that case, where do I start my troubleshooting.

    Literally, so many things was coming into my mind and it was overwhelming. I was checking blocking, checking task mgr and check to see resource utilization, Don't know whether other databases are taking more memory, any sql agent jobs running like index fragmentation or backups etc....

    Also, that instance was shared among like 50 odd databases. some of them are of 3 TB, 2TB, 1TB and others are round 900 MB. Physical RAM was 96 GB out of which 90GB is allocated to max server memory. Its a dedicated SQL server QA box.

    Given all this information, if someone comes and say, all the queries of a specific database on that instance is running slow, then where should I start my troubleshooting. Please advice. Looking for inputs here.  Any specific questions we can ask the application or users ?

    Thanks,

    Sam

  • PAGEIOLATCH_SH

    Occurs when a task is waiting on a latch for a buffer that is in an I/O request. The latch request is in Shared mode. Long waits may indicate problems with the disk subsystem.

    ignore memory , you have it pretty much configured correctly - what you need to do is put clustered indexes on your heaps. that is your first step

    then you can look at statistics (I just run sp_updatestats), then look in activity monitor and expand the  Data file I/o section.... you want the response time to be very low (anything more than 100 says you have a disk that is overworked or is damaged)

    stick a profiler trace on to catch any queries taking more than "x" seconds and use the performance dashboard in the reports section of management studio

     

    MVDBA

  • Thanks for the inputs. Yes, as you mentioned I think we are over doing on the disks. I can see a lot of I/O taking longer than 15 secs messages in sql server errorlog during that time frame . Its only with 1 database, its happening with other databases as well.

    Looks like all the databases( ~ around 50 odd databases) which are big in size (some 2TB, some 1 TB, some are 50 GB) and has only 90 GB max server memory leaving 6GB for OS. Looks like memory is sufficient here and databases are growing every quarter. One more thing is management has dumped all these databases on one sql instance to save licensing cost.

    Can anyone give a nice analogy so that I can put forward this to management and make them understand what is happening. Can anyone has any pic to share which illustrate such memory contention ? I know many of you are MVPs and could have given better examples in community presentations. That's the reason I am asking. People like analogies.

    Thanks,

    Sam

     

     

  • I'd worry about memory but not in the way you distributed it.  You have a fair number of fairly large databases.  Adding memory is generally the single best hardware improvement that you can make to server.  I've also seen some of the code that people are running on this server and, although memory is not a panacea for making poor code run faster, it does help.

    If you only have the Standard Edition of SQL Server, remember that SQL Server can and does use memory beyond what you've allocated as max memory for a bunch of things.  I'd set the machine up with 256GB of RAM and allocate the max for SQL Server Standard Edition, which is 128.  You might also want to consider using 2 instances on the server (and kind of balance the expected load between the two instances), which shouldn't cost you any more in licensing and allocate, say, 120 GB to each instance.

    If you're using the SQL Server Enterprise Edition, max the machine out for memory and make sure that your max memory setting saves some for the operating system.

    Again, though... that's only going to help.  It's not the ultimate fix and I'm not sure that we can get there.  As you said on your other thread, a lot of this stuff is 3rd party junk with rows that are more than 23K bytes of "Row Overflow" data, no clustered indexes, poor indexing, and some poor code including SELECT * on those very wide tables, a good portion of which suck on scans because of the wide NVARCHAR data where a lot of it doesn't go out of row.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Oh... almost forget... your question was...

    vsamantha35 wrote:

    My question is, if all the queries are poorly performing. In that case, where do I start my troubleshooting.

    The answer is, the system knows.  With the understanding that, if cache clears for one reason or another, the information will change, you can use some of the built-in reports to tell you what the top 10 queries are for IO and CPU time.  Keep in mind that these are not necessarily the "longest running" queries (which may not actually be the problem).  They are the queries that are likely causing the biggest problems because of how often they're being used.  In the Object Explorer window of SSMS (press the {f8} key if it's not currently displayed), right click on the Instance Name and the follow the path of popups below.  The report I have hi-lited and the one below it are particularly eye opening for most.  You can also use Adam Machanic's sp_WhoIsActive to provide even more information on what is causing your system the most pain (not including recompiles, which can be a REAL pain, as well) and, thus, what you should start looking at first.  Just remember, all this is based on what is in cache.  When cache clears, things will change.

    You could also look into "query store", which will help with the problem of cache clearing unexpectedly.  I personally don't use it simply because I've not needed to... I have my own ways. 😀

    • This reply was modified 4 years, 7 months ago by  Jeff Moden.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    Thanks a lot of the reply. I have a followup question. Kindly advice.

    Suppose, if a person is hired as a new DBA, and she/he doesn't have any clue of what this server is, and assume that she is supposed to handle this particular SQL Instance, so as an SQL expert/SME/escalation engineer/performance engineer how do we calculate,  of how much RAM is really required for this instance so that it can satisfy all the database queries ? what numbers to be looked at ? what to capture so that I can give a rough number to the management. Also, this is UAT box and developers once databases are refreshed from prod to UAT, they run all sorts of adhoc queries.. in that case, how would I even get to a rough number and ask the management or systems team that we might need this much of memory for this sql server to perform well. I want to know a systematic approach to tackle this issue. Please advice.

    Secondly, can you give a real life example/analogy/metaphor for this type of memory contention so that even a layman can understand/visualize of what the memory contention is all about. Everyone like stories and like to visualize. if they understand that picture in mind, then everyone will be on same page and agrees to come to a conclusion rather than do a blame game. If any real life examples you can think of , please share.. that will be very helpful to convince management or other developers.

    Thanks,

    Sam

  • Ok... first of all, you didn't say anywhere else that this was a "UAT" box.  I thought it was a prod box because you were talking about licensing costs, etc.  Generally speaking, Dev and Test boxes can use the Developer's Edition, which is free as of 2016 sp1 and, generally, they don't need nearly as much in the areas of CPU and RAM.

    So, Step 1 is to not worry about making UAT boxes run as fast as production.  There's actually an advantage in having Dev and UAT boxes being a bit lame compared to production... if you can make something run really well in Dev or UAT, then it will usually do at least as well in prod and frequently better.  In other words, concentrate on fixing bad code.

    To wit, my Dev boxes have only a 3rd of the CPUs (and MAXDOP has been limited in a similar fashion) and a 3rd of the RAM that my prod box has.  The Dev box continues to run on spinning rust rather than SSDs.

    The UAT box only has a 3rd of the CPU resources and only a 6th of the RAM.

    And, yeah... I have more than 1 database one Dev and UAT that goes over the 1TB mark and several that approach it.

    As for some magic method of determining how much RAM you should have, the answer is as with all else having to do with SQL Server... "It Depends".  You can look for things like how much memory is being used at any time and still be wrong because of maintenance routines and large batch code, etc.  You'll also need a whole lot less as you fix bad code.  There's a whole lot of proof on the internet for that later point.

    And I have to ask, why the hell do you have Developers running a bunch of ad hoc queries on the UAT box?

    As for a "real life" example, you've already provided one but it'll work against you.  You have some bad code running that requires 12.4 GB of mostly not used data (71 rows out of 449 thousand) in RAM.  Compared to the 90GB of RAM you have, the query required more than 1/7th of the RAM because the whole heap had to live in RAM in order for it to be scanned.  Adding more ram isn't going to fix such code.  Unfortunately, that's not a good justification for adding more RAM to a UAT box.  The only reason why you want a whole lot of memory in prod is so that when multiple copies of such code mistakenly make it to prod, you're not going to bring the server to its knees.

    You can do the following Google search on this question...

    https://www.google.com/search?q=determine+how+much+ram+sql+server+needs

    ... and then realize that it truly does "depend".  It mostly depends on the workload and whether or not the box is a prod box or not.

    To summarize, my suggestion of adding extra RAM was based on how much bad code I think you may have and it being a prod box.  That suggestion also won't make the bad code that you posted run any faster.  It'll just help keep the bad code from bringing a prod server to it's knees until you can fix the bad code.  Having low amounts of RAM in Dev and UAT will help you find such bad code before it makes it to prod. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks a lot Jeff.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply