SQL Server 2016 SP1 (Azure IaaS) slower than SQL 2008 SP2 on premise

  • I have an issue with a stored proc that uses a cursor (yes, I KNOW THIS ISN'T GOOD) however it is client app code and I don't have scope to change it. The stored proc on premise using SQL 2008 SP2  and takes 30 mins.  In Azure IaaS using SQL 2016 SP1 it takes 200 mins.  I would have expected it to be faster than on prem given it has been upgraded to three versions newer not six times worse in execution time.

    OS: Win2008 Ent / SQL 2008 SP2 x64 32GB RAM / 16 cpu
    OS: Win2012 R2 DC / SQL 2016 SP1 x64 56GB RAM / 16 cpu (Azure IaaS with managed Premium SSDs)

    Anyone experienced this before?

    qh

    [font="Tahoma"]Who looks outside, dreams; who looks inside, awakes. – Carl Jung.[/font]
  • quackhandle1975 - Tuesday, August 28, 2018 7:37 AM

    I have an issue with a stored proc that uses a cursor (yes, I KNOW THIS ISN'T GOOD) however it is client app code and I don't have scope to change it. The stored proc on premise using SQL 2008 SP2  and takes 30 mins.  In Azure IaaS using SQL 2016 SP1 it takes 200 mins.  I would have expected it to be faster than on prem given it has been upgraded to three versions newer not six times worse in execution time.

    OS: Win2008 Ent / SQL 2008 SP2 x64 32GB RAM / 16 cpu
    OS: Win2012 R2 DC / SQL 2016 SP1 x64 56GB RAM / 16 cpu (Azure IaaS with managed Premium SSDs)

    Anyone experienced this before?

    qh

    Did you compare the query plans between the two?
    It could be the new CE introduced in 2014. There are some suggestions on ways to handle CE issues in this blog:
    SQL Server 2016 new features to deal with the new CE

    Sue

  • Thanks Sue, I never considered the CE, so I will have a look at that.  I have used Actual Execution Plan for the query on SQL 2016 Azure and most of the work (65%) is due to the cursor. It uses a Clustered Index Insert [CWT_PrimaryKey]. I don't have access to the on-premise environment as it is current production, but need to get the plan to compare.

    Cheers

    qh

    [font="Tahoma"]Who looks outside, dreams; who looks inside, awakes. – Carl Jung.[/font]
  • Just a further update on this for anyone with similar issues.  Firstly we looked at the entire on-premise SQL environment to see what was happening:

    1. MAXDOP was set to 8
    2. There is no HA (eg - Mirroring) on on-premise
    3. Tempdb is set to multiple data files 8 x 5GB / 8 x 2GB

    So, to keep everything consistent we applied these to the Azure SQL Servers.  Aaaaaand….it was still the same poor performance. We even took the db out of the AG and this gave only a negligible perf gain.  Damn! :crazy: Then one thing we did look at was this:
    https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sql/virtual-machines-windows-sql-performance

     And specifically this one:
    "Enable read caching on the disk(s) hosting the data files and TempDB data files." 

    Thus we shut down the server(s) and applied this and wow this jumped performance up by a few notches. Not as good as on premise however but still magnitude of  improvement than we were getting. :satisfied: However I am still left with the issue where specific stored procs (using a cursor) complete in around 3.5 hours for On-premise. The Azure environment takes double that. I am now looking at increasing the disks for more IOPS onthe SQL trans log drives as I ran sp_whoisactive (a brilliant tool if you haven't used it before) and I am getting heavy WRITELOG waits when these stored procs run. 

    Any further recommendations greatly received as this has almost got me stumped. :pinch:

    qh

    [font="Tahoma"]Who looks outside, dreams; who looks inside, awakes. – Carl Jung.[/font]
  • quackhandle1975 - Friday, August 31, 2018 2:50 AM

    Just a further update on this for anyone with similar issues.  Firstly we looked at the entire on-premise SQL environment to see what was happening:

    1. MAXDOP was set to 8
    2. There is no HA (eg - Mirroring) on on-premise
    3. Tempdb is set to multiple data files 8 x 5GB / 8 x 2GB

    So, to keep everything consistent we applied these to the Azure SQL Servers.  Aaaaaand….it was still the same poor performance. We even took the db out of the AG and this gave only a negligible perf gain.  Damn! :crazy: Then one thing we did look at was this:
    https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sql/virtual-machines-windows-sql-performance

     And specifically this one:
    "Enable read caching on the disk(s) hosting the data files and TempDB data files." 

    Thus we shut down the server(s) and applied this and wow this jumped performance up by a few notches. Not as good as on premise however but still magnitude of  improvement than we were getting. :satisfied: However I am still left with the issue where specific stored procs (using a cursor) complete in around 3.5 hours for On-premise. The Azure environment takes double that. I am now looking at increasing the disks for more IOPS onthe SQL trans log drives as I ran sp_whoisactive (a brilliant tool if you haven't used it before) and I am getting heavy WRITELOG waits when these stored procs run. 

    Any further recommendations greatly received as this has almost got me stumped. :pinch:

    qh

    It's very likely to be the new CE. Have you been down that avenue yet?

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Phil Parkin - Friday, August 31, 2018 5:28 AM

    quackhandle1975 - Friday, August 31, 2018 2:50 AM

    Just a further update on this for anyone with similar issues.  Firstly we looked at the entire on-premise SQL environment to see what was happening:

    1. MAXDOP was set to 8
    2. There is no HA (eg - Mirroring) on on-premise
    3. Tempdb is set to multiple data files 8 x 5GB / 8 x 2GB

    So, to keep everything consistent we applied these to the Azure SQL Servers.  Aaaaaand….it was still the same poor performance. We even took the db out of the AG and this gave only a negligible perf gain.  Damn! :crazy: Then one thing we did look at was this:
    https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sql/virtual-machines-windows-sql-performance

     And specifically this one:
    "Enable read caching on the disk(s) hosting the data files and TempDB data files." 

    Thus we shut down the server(s) and applied this and wow this jumped performance up by a few notches. Not as good as on premise however but still magnitude of  improvement than we were getting. :satisfied: However I am still left with the issue where specific stored procs (using a cursor) complete in around 3.5 hours for On-premise. The Azure environment takes double that. I am now looking at increasing the disks for more IOPS onthe SQL trans log drives as I ran sp_whoisactive (a brilliant tool if you haven't used it before) and I am getting heavy WRITELOG waits when these stored procs run. 

    Any further recommendations greatly received as this has almost got me stumped. :pinch:

    qh

    It's very likely to be the new CE. Have you been down that avenue yet?

    You're not the first person to ask that however I didn't get any noticeable gain when I turned it on for that particular database.  Unless I was missing something.

    qh

    [font="Tahoma"]Who looks outside, dreams; who looks inside, awakes. – Carl Jung.[/font]
  • Another update (for anyone that cares) 😀 I did some tests with running the process and using resource monitor to see what was going on whilst it was running.  CPU, zip, RAM, zip. TempDB, you guessed it, zip.  They were all doing nothing.  Only workload was massive writes to the db log drive.  So By The Power Of Greyskull Azure, we increased the disk size to 2TB so we get an IOPS increase from 5000 to 7500.  

    Yeah nothing happened. Still tons of writes to the log drive. I bloody hate computers sometimes.  The next experiment is to try a G series based VM in Azure which apparently gives you 64,000 IOPS. These babies cost $$$$ so the client ain't gonna pay for it, we just want to see if it improves log write performance.

    Stay tuned! :Wow:

    qh

    [font="Tahoma"]Who looks outside, dreams; who looks inside, awakes. – Carl Jung.[/font]
  • have  you tried setting Delayed Durability? just to see the impact - and permanent setting if it improves and the DB can leave with it on (or at least the particular cursor query that has the issue.
    https://www.mssqltips.com/sqlservertip/5121/reduce-sql-server-writelog-waits-using-delayed-durability/

    And cloud vCpus are always slower than On-prem - those 16 vCpu of that VM equate to a loss of 30 to 45% power when compared to a 16 Core (32 threads) server on prem.

    Also log drive should not have write cache enabled.
    Out of curiosity did you try Disk Striping? would be curious to see the impact.
    Tempdb - did you put it on the local temp storage or on a premium disk? local normally faster. Even if it doesn't seem to make a difference here.

    just to see the impact I would also try 
    - run it with the db set to simple logging
    - increase frequency of log backups

  • quackhandle1975 - Sunday, September 2, 2018 4:06 PM

    Another update (for anyone that cares) 😀 I did some tests with running the process and using resource monitor to see what was going on whilst it was running.  CPU, zip, RAM, zip. TempDB, you guessed it, zip.  They were all doing nothing.  Only workload was massive writes to the db log drive.  So By The Power Of Greyskull Azure, we increased the disk size to 2TB so we get an IOPS increase from 5000 to 7500.  

    Yeah nothing happened. Still tons of writes to the log drive. I bloody hate computers sometimes.  The next experiment is to try a G series based VM in Azure which apparently gives you 64,000 IOPS. These babies cost $$$$ so the client ain't gonna pay for it, we just want to see if it improves log write performance.

    Stay tuned! :Wow:

    qh

    What I really hate isn't the hardware... it's the hype.  The good part about it all is that good code is even more important now.  😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Right, we have a solution!  Two things, one of the DBAs here decided to stick the offending code into a temp table.  Hey presto  - the process runs in just over a minute!!!

    Also we found an issue with the following:
    https://blogs.msdn.microsoft.com/alwaysonpro/2018/02/06/analyze-synchronous-commit-impact-on-high-commit-rate-workloads/

    Which makes sense as the only major setup difference between On premise and Azure environment was in Azure we are using Always On AG's however we didn't test it properly as when taking the database out of the AG or (stopping the secondary server) then running the code there was no performance gain.  We simply wrapped the cursor in a transaction and hey presto, runs in 4 minutes as opposed to 200 mins! And easier to implement than the temp table fix.

    As Jeff pointed out above you can get lost in the hype, as were convinced the poor performance was the fault of Azure, however it was down to the (poor) code. Nice one, Jeff!

    qh

    [font="Tahoma"]Who looks outside, dreams; who looks inside, awakes. – Carl Jung.[/font]
  • quackhandle1975 - Monday, September 3, 2018 9:25 AM

    Right, we have a solution!  Two things, one of the DBAs here decided to stick the offending code into a temp table.  Hey presto  - the process runs in just over a minute!!!

    Also we found an issue with the following:
    https://blogs.msdn.microsoft.com/alwaysonpro/2018/02/06/analyze-synchronous-commit-impact-on-high-commit-rate-workloads/

    Which makes sense as the only major setup difference between On premise and Azure environment was in Azure we are using Always On AG's however we didn't test it properly as when taking the database out of the AG or (stopping the secondary server) then running the code there was no performance gain.  We simply wrapped the cursor in a transaction and hey presto, runs in 4 minutes as opposed to 200 mins! And easier to implement than the temp table fix.

    As Jeff pointed out above you can get lost in the hype, as were convinced the poor performance was the fault of Azure, however it was down to the (poor) code. Nice one, Jeff!

    qh

    Heh... thanks for the feedback.  Like I tell people, "Performance is in the code... or not." 😀

    That notwithstanding, I still prefer on-premise to anything in the cloud.  There's no need for trickery with on-premise... just good code.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I am just please we found a solution as when you are dealing with 10+ year old code and you are moving it to Azure/Cloud you can run into issues as the primary task is to just migrate it *as is* we don't have scope due to budget/politics/time constraints, etc  to re-write the code so I am sure in the next few years people are going to run into these types of issues when more systems are migrated to the Cloud.

    In my issue everyone was focusing on the cursor, which we all know is bad for SQL Server, but it still initially ran faster on premise.

    qh

    [font="Tahoma"]Who looks outside, dreams; who looks inside, awakes. – Carl Jung.[/font]
  • quackhandle1975 - Tuesday, September 4, 2018 3:48 AM

    I am just please we found a solution as when you are dealing with 10+ year old code and you are moving it to Azure/Cloud you can run into issues as the primary task is to just migrate it *as is* we don't have scope due to budget/politics/time constraints, etc  to re-write the code so I am sure in the next few years people are going to run into these types of issues when more systems are migrated to the Cloud.

    In my issue everyone was focusing on the cursor, which we all know is bad for SQL Server, but it still initially ran faster on premise.

    qh

    I think it's awesome that you've documented what you ran into and the fixes that you ended up with.  It's going to help a lot of folks with similar issues.

    If you don't mind a suggestion, what you went through would make an awesome article and would help even more folks because it would be headlined instead of being present "only" in a forum post that comparatively few will read.

    As for me, I've bookmarked/added it to my briefcase this thread because of your documentation on the subject.  I really appreciate it because I know that, someday, I'm going to be in the same boat.  Thank you again.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you Jeff, my pleasure.  This is part of a huge migration to Azure (we go live this weekend) however once I get my life back, yeah I think that's a great idea. I've only ever done editorials for SSC, and provided Mr Jones approves I think I could replicate the issue in Azure and show the solution.

    Hold that thought. 😀

    qh

    [font="Tahoma"]Who looks outside, dreams; who looks inside, awakes. – Carl Jung.[/font]
  • Just a note I remembered when reading this.. I did see a video SOMEWHERE that pretty much said you always want to wrap just about everything in transactions in the cloud

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply