Initial load of data to table slow

  • First let me say I'm not a DBA.  I'm looking for some suggestions from others who have ran into this issue.  I'm trying to gather info to help out our DBA's, they are swamped right now with one leaving soon.

    We are running on SQL server 2016.  Data is being loaded through an SSIS dataflow.  Copies the data from a 'work' table on one database to another table in another database, on same server, no transformations being done on the data.

    The issue we are running into is trying to load about 1.4 million rows to a table that has 140 million rows already.  On the initial load of the data it takes anywhere from 2+ hours to almost 12 hours.  The table isn't critical at this time so the delay isn't causing major issues.  So last month's load was over 2 hours, a week or so after the data loaded I deleted the 1.4 million rows that were added.  The I ran the step to load the data again,this time it took less than 4 minutes.
    To me it appears to be an issue with the table needing to get more space during that initial load of the data.

    One DBA did try moving the data to a new 'file group', indexes were updated after the move.  This did not fix the issue, actually took longer.  This slowness is being seen across our servers since we upgraded to 2016.

    Any ideas you have that I could present to our DBA's would be helpful.

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • Something to check is database autogrow.  If the database is growing when you do this import, it will slow things down.  
    Another thing you could try that we did in our shop was to disable the indexes, load the data, rebuild the indexes.  Depending on the number of indexes and their size you may or may not get some benefit by disabling them first.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Make sure you don't have auto shrink enabled. It adds latency to have to expand data and log files. If they grow to a size to handle loads don't shrink. If your recovery model is set to full you may get better throughput with bulk-logged. Play with the Data Flow DefaultBufferMaxRows and DefaultBufferSize properties to find the best combination for the server and data. In the Destination set the FastLoadMaxInsertCommitSize higher or lower depending on resources and ValidateExternalMetaData property to false.

  • Look to see if the SQL server instance user has IFI privilege - https://docs.microsoft.com/en-us/sql/relational-databases/databases/database-instant-file-initialization?view=sql-server-2017 
    and make sure that the log file has enough size to cope with the transaction - if it is being shrunk it will require more space and this can be a slow operation specially if the increment size is small or tooo big. between 3 and 7 GB may be acceptable.

  • bmg002 - Friday, August 3, 2018 12:30 PM

    Something to check is database autogrow.  If the database is growing when you do this import, it will slow things down.  
    Another thing you could try that we did in our shop was to disable the indexes, load the data, rebuild the indexes.  Depending on the number of indexes and their size you may or may not get some benefit by disabling them first.

    After the delete of the data the reload only took 4 minutes, so I don't think it is an issue with the indexes.  Or does it not delete the indexes for the data that was deleted?

    Thanks everyone for the info, I'll be sure to pass it along.

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • 1. Depending on the size of the data SQL could have required more space to be allocated. Auto Grow will kick in. Many time guys leave this default and that grows at 10%. on a small database you will have many iterations of "growth" actions as the 10% keeps on increasing with your database. Setting the auto grow to a physical size is more advisable as it give you more control over how much it will grow by and also sometime allow you to plan for the growth. Another reason why it is a bad idea to have small increments is due to the page allocations. When the space of the database is expanded SQL pre-allocates the pages and groupings. So the more of these "grouping" you have the less optimal your database will be. I like 4GB as this is roughly where SQLs limit is when creating extents and so forth.

    Second to that. when you deleted the records the space would have still been allocated, so when you re-inserted the data SQl would have simply overwritten the pages or created new pages in the same "space". hence no need to increase disc space.

    IFI can cause you a great amount of delay in growing your database if it is not enabled.

    Joe Torre - Friday, August 3, 2018 5:08 PM

    Make sure you don't have auto shrink enabled. It adds latency to have to expand data and log files. If they grow to a size to handle loads don't shrink. If your recovery model is set to full you may get better throughput with bulk-logged. Play with the Data Flow DefaultBufferMaxRows and DefaultBufferSize properties to find the best combination for the server and data. In the Destination set the FastLoadMaxInsertCommitSize higher or lower depending on resources and ValidateExternalMetaData property to false.

    2. This would be my next stop would be to limit the batch sizes so that you can keep the buffer pool optimized. if you do not have a commit size you could get spills from RAM to disc, thus creating double work.

    3. Lastly you could use the DMV's to look at the wait stats and see what is holding up the insert. There is also a procedure written by Adam sp_whoisactive (http://whoisactive.com/)  that puts a lot of that type of information together when the stuff is running that could be useful.

    hope this helps

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply