Performance improves after reboot

  • I am an ETL Developer who was asked to
    look at SQL Server performance problem.
    We have two DBA's but they're too busy trying to keep just hardware up and running on other 60-70 servers.
    They didn't have a chance to look at performance and optimization.

    Here is what happens.
    Sql Server performance gradually deteriorates during 2 weeks period.
    We reboot the server. Performance considerably improves.
    We notice a big drop in Read/Write Latency for example.
    Then the same cycle repeats again. 
    Performance is good for a week then starts to slow down,
    Read/Write Latency increases again. Reboot and performance and latency improves.
    What does it tell us?

  • Could be quite a few things. My best guess is that you don't have a max memory set for the server and you are getting starved for RAM.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • We set Memory Max to
    112640 MB
    SQl Server is installed on Virtual machine
    128 GB of RAM

    Min Memory = 0
    Max Degree of Parallelism = 0
    Cost Threshold for Parallelism = 5
    Blocked Process Threshold = 30

    It's OLTP database. 3 TB. 
    Main purpose = Web Application
    Plus daily ETL runs at 4:45 AM for 2 hours.
    Database is a member of High Availability group.
    It is primary node.
    Asynchronous commit is used.
    Replication is also configured.

  • This was removed by the editor as SPAM

  • I read this Brent Ozar article.
    It's a tutorial how to collect performance counters data and try to analyze it.
    It can take months of experimenting to understand this material fully.

    We have SOLARWINDS tool.
    It's pretty good SQL Server Performance Monitoring Tool
    that has all statistics we need.
    The question is how do you read it , understand it and what conclusions these stats should lead to.

    My question is a high level question.
    What gets changed on SQL Server after reboot that can improve performance?
    To be more specific, why Read/Write Latency drops after reboot and then increases again?

  • check you SAN drivers..

  • As a start, run a report on historical page life expectancy.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • RVO - Wednesday, February 8, 2017 10:38 PM

    I read this Brent Ozar article.
    It's a tutorial how to collect performance counters data and try to analyze it.
    It can take months of experimenting to understand this material fully.
    Plus, Mr Ozar has a very tricky style.
    He definitely is very knowledgable person.
    But he sometimes makes an investigator more puzzled than before reading his articles.

    Example:
    B.Ozar: "Look at the % Usage metric, which monitors how much of the page file Windows is using. "
    ------- How do I do that? Provide a sample.
    B.Ozar: "Generally speaking, you don’t want to see a SQL Server swapping memory out to disk"
    ------- How do I see it? Provide a sample

    In other words his approach is that the person reading his text has the same or very similar level of expertise
    and knows what he means.
    The truth is very often we have no clue what he is talking about.
    We need more samples, detailed explanation.

    Plus, we have SOLARWINDS tool.
    It's pretty good SQL Server Performance Monitoring Tool
    that has all statistics we need.
    The question is how do you read it , understand it and what conclusions these stats should lead to.

    My question is a high level question.
    What gets changed on SQL Server after reboot that can improve performance?
    To be more specific, why Read/Write Latency drops after reboot and then increases again?

    Now there are even more things that could be causing problems: virtualization. Your settings are defaulted too. Lets add in that you are trying to run a web-facing app with a 3TB database on a virtualized server with just 128GB of RAM. And oh BTW you also have replication AND "HA Group", which could mean Always On. Even more things that can gum up the works. These and other things lead me to the following:

    I'm going to be blunt here: you just don't have the experience or knowledge to address what is going on. You actually stated as much with "how do you read it , understand it and what conclusions these stats should lead to". But there are just SOOO many things to check here that we could literally spend weeks going back and forth and still not get to the root cause(s) of the problem(s). I have seen this over and over on this and other forums. 

    Thus I STRONGLY recommend you hire a performance tuning professional that can spend a few hours or perhaps a few days giving your system a review and help you improve the myriad of things that are suboptimal or just plain bad while also figuring out what is causing the problem you posted here about.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Here are some basic things I would check.  TempDB files - both data and logs.  When you reboot a server these get reset.  Do you have enought space for these files?  How many tempdb data files are you using.  If you are using only 1 tempdb data file, you may need to increase that number.  You can also try running DBCC FREEPROCCACHE, DBCC DROPCLEANBUFFERS, DBCC FREESYSTEMCACHE ('ALL') and DBCC FREESESSIONCACHE. 

    Hope these suggestions help.  Bottom line, don't give up.  You can and will figure out what is causing your issue and you become the better DBA for what you learn.

    Jon

  • RVO - Wednesday, February 8, 2017 10:38 PM

    Example:
    B.Ozar: "Look at the % Usage metric, which monitors how much of the page file Windows is using. "
    ------- How do I do that? Provide a sample.
    B.Ozar: "Generally speaking, you don’t want to see a SQL Server swapping memory out to disk"
    ------- How do I see it? Provide a sample

    The truth is very often we have no clue what he is talking about.
    We need more samples, detailed explanation.

    To be honest that's probably the worst thing you could say on an open forum like this, for starters you'll have all the flamers attempting to shoot you down.
    If you don't understand the items you were provided, take them offline to your Windows admin and\or storage admin and engage their help in understanding what's going on.
    Frequently, Brent's pots are designed specifically at giving you the information you need to approach your windows\storage admins in an attempt to get help from them regarding issues you may be seeing outside of SQL Server (i.e. at the environment level).

    Of course, if you're saying that none of you understand it then i'm afraid you really have big trouble

    RVO Wednesday, February 8, 2017 10:38pm


    My question is a high level question.
    What gets changed on SQL Server after reboot that can improve performance?
    To be more specific, why Read/Write Latency drops after reboot and then increases again?

    When you reboot the plan cache is cleared so any bad plans will be removed.
    Disk usage will increase as the buffer pool recharges.

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Perry.
    I removed my controversial comments about Brent.
    I have to be honest.
    I learned so much from Brent.
    Even if sometimes I wish he could be a bit more specific, overall he's a giant
    and I have a huge respect for him.

  • jonathan.ellison - Friday, February 10, 2017 5:21 AM

    Here are some basic things I would check.  TempDB files - both data and logs.  When you reboot a server these get reset.  Do you have enought space for these files?  How many tempdb data files are you using.  If you are using only 1 tempdb data file, you may need to increase that number.  You can also try running DBCC FREEPROCCACHE, DBCC DROPCLEANBUFFERS, DBCC FREESYSTEMCACHE ('ALL') and DBCC FREESESSIONCACHE. 

    Hope these suggestions help.  Bottom line, don't give up.  You can and will figure out what is causing your issue and you become the better DBA for what you learn.

    Jon

    Jon,
    I am not DBA. I am an ETL Developer (consultant) who was asked to help.
    DBA were not able so far to fix performance issue.
    In my opinion they never really tried.

    I don't understand why it's called "Initial Size (MB)".
    It should be "Current File Size". Period.
    In 2014 it's not being reset to anything other than the size it was before reboot. I did a test on my local SQL Server.
    I manually grew tempdb.mdf from 8 to 50 MB. Rebooted and it's 50 MB.

    When our SQL Server was first installed in April 2016, our DBA created 8 tempdb files (we have 16 processors).
    Each was 2GB. Autogrowth = 200MB
    Today the files are:

    tempdb.mdf ---------------- 30.2GB
    tempdb_db2.mdf -------- 2GB
    tempdb_db3.mdf -------- 32.6GB
    tempdb_db4.mdf -------- 32.3GB
    tempdb_db5.mdf -------- 29.8GB
    tempdb_db6.mdf -------- 32.6GB
    tempdb_db7.mdf -------- 32.7GB
    tempdb_db8.mdf -------- 32.7GB

    During day we have from 600 to 2,000 objects in tempdb.
    Can I try to shrink tempdb and then watch when and how much file sizes are changing?
    Although probably not a good idea. If it needs to grow everything will be much slower. Huge performance imact.

    We had a major slowdown on January 5/6.
    Instead of 1.5 hours ETL took 9 and 7 hours.
    I wonder, what if on that day tempdb Autogrowth by 200MB was kicked off?

  • RVO - Friday, February 10, 2017 1:13 PM

    jonathan.ellison - Friday, February 10, 2017 5:21 AM

    Here are some basic things I would check.  TempDB files - both data and logs.  When you reboot a server these get reset.  Do you have enought space for these files?  How many tempdb data files are you using.  If you are using only 1 tempdb data file, you may need to increase that number.  You can also try running DBCC FREEPROCCACHE, DBCC DROPCLEANBUFFERS, DBCC FREESYSTEMCACHE ('ALL') and DBCC FREESESSIONCACHE. 

    Hope these suggestions help.  Bottom line, don't give up.  You can and will figure out what is causing your issue and you become the better DBA for what you learn.

    Jon

    Jon,
    I am not DBA. I am an ETL Developer (consultant) who was asked to help.
    DBA were not able so far to fix performance issue.
    In my opinion they never really tried.

    I don't understand why it's called "Initial Size (MB)".
    It should be "Current File Size". Period.
    In 2014 it's not being reset to anything other than the size it was before reboot. I did a test on my local SQL Server.
    I manually grew tempdb.mdf from 8 to 50 MB. Rebooted and it's 50 MB.

    When our SQL Server was first installed in April 2016, our DBA created 8 tempdb files (we have 16 processors).
    Each was 2GB. Autogrowth = 200MB
    Today the files are:

    tempdb.mdf ---------------- 30.2GB
    tempdb_db2.mdf -------- 2GB
    tempdb_db3.mdf -------- 32.6GB
    tempdb_db4.mdf -------- 32.3GB
    tempdb_db5.mdf -------- 29.8GB
    tempdb_db6.mdf -------- 32.6GB
    tempdb_db7.mdf -------- 32.7GB
    tempdb_db8.mdf -------- 32.7GB

    During day we have from 600 to 2,000 objects in tempdb.
    Can I try to shrink tempdb and then watch when and how much file sizes are changing?
    Although probably not a good idea. If it needs to grow everything will be much slower. Huge performance imact.

    We had a major slowdown on January 5/6.
    Instead of 1.5 hours ETL took 9 and 7 hours.
    I wonder, what if on that day tempdb Autogrowth by 200MB was kicked off?

    If you manually grow tempdb, it will stay at that size post reboot.  If it auto-grows to a size, it will shrink back to the initial file size at reboot.

    As for your ETL running long, there are a lot of things that could be.  I am assuming your ETL is an SSIS pacakge (or packages).  If so, and you are using an Integration Services Catalog, SSIS will run in separate memory space.  So if you have (for example) 128 GB of RAM on your server, and 124 is in use, your SSIS packages will only get 4 GB to work with.  That isn't a lot and you will likely end up paging to disk.
    We ran into that issue recently.  Something that normally took 1-1.5 hours to complete was taking 9+ and it was one particular step that it kept getting stuck on and it normally took 1 minute, but was taking 7+ hours to complete.  We reduced the memory used by our SQL Instance used to host our SSIS catalog and performance increased greatly.

    If you are concerned about tempdb autogrow, you could go in, set the values manually (so they have new minimum's at reboot) and change the autogrow to a more reasonable value for that size of database?  Although that is a HUGE set of tempdb... how much data are you moving around in there?  I'd be concerned that some temporary objects are not being cleaned up properly or that transactions are being left open.
    Well, unless those sizes seem normal to you.
    On our data warehouse, our tempdb is under 10 GB and our SSIS catalog instance is at 51 MB.  Our financial software has a big one at 103 GB total size, but that is half the size of your 224.9 GB tempdb.

    EDIT - also your auto grow doesn't look like it is behaving very efficiently.  I would try bumping all of the tempdb files up to a starting size of 30 GB as it looks like that is what you are using.  Having one at 2 GB seems like the autogrow is behaving strangely.
    With respect to the January 5/6 thing, did you have a reboot around then?  If so, having it autogrow by 200 MB from 2 GB all the way up to 30+ GB will be quite slow.  I would try setting them all to 30 GB (or so) and put the autogrow to at least 1GB and make sure instant file initialization is on (NOTE instant file initialization gets mixed reviews on if it should be on or off)

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • I would definitely attempt to shrink Tempdb.  I would shrink each file individually as much as possible.  Also, you need to see what drive your tempdb files are stored on and check how much space is left on that drive.  I would be willing to bet you are running out of drive space where tempdb is stored.  If you can shrink all the tempdb .mdf files and the log file, you should see an improvement in your ETL job time.

    Jon

  • 1) Do you have traceflags 1117 and 1118 on? Best to have those I think to ensure optimal tempdb operations.

    2) I would basically redo the system if you can (from a tempdb perspective). In a manitenance outage, size all tempdb data files EXACTLY the same with EXACTLY the same growth factor. If you need a two-pass operation to do this that's fine.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 15 posts - 1 through 15 (of 114 total)

You must be logged in to reply to this topic. Login to reply