Backup taking MUCH longer on prod HELP:)

  • michael.berry 22479 (8/23/2012)


    WaitTypeWait_SResource_SSignal_SWaitCountPercentageAvgWait_SAvgRes_SAvgSig_S

    PAGEIOLATCH_SH93277.2493264.1413.10399999516.970.02330.02330.0000

    MSQL_XP68382.9568382.950.0031779712.440.21520.21520.0000

    PREEMPTIVE_OS_GETPROCADDRESS68380.7768380.770.0031779712.440.21520.21520.0000

    BACKUPIO64827.2064814.3512.85115627111.800.05610.05610.0000

    ASYNC_IO_COMPLETION62775.3862775.370.0118811.42333.9116333.91160.0000

    BACKUPBUFFER61542.3761414.42127.95236889511.200.02600.02590.0001

    PAGEIOLATCH_EX43050.0943045.394.7018565767.830.02320.02320.0000

    LCK_M_S36037.1636036.830.3328716.5612.552112.55200.0001

    LCK_M_U17102.2717102.270.01553.11310.9504310.95030.0001

    WRITELOG12215.1212188.0027.136467402.220.01890.01880.0000

    Okay let's see.

    The above waitstats show you what SQL has been waiting for since it's last restart or since you last cleared the waitstats. On top of the list is pageIOLATCH_SH which is associated with reading datapages from the database files. Can't say to much about that right now, your average wait is 23ms per read. Which is, without nowing further info of your system neither good or bad at this point.

    then Msql_XP and Preeemptyive_OS_GetProcAddress are associated with using extended stored procs. Now since the wait times are so similar with the backup wait times, I'm making the assumption that you are backing up with a 3rd party tool that is hooking into XPs.

    BACKUPIO, ASYNC_IO_COMPLETION (during backups) and Backupbuffer are all related with creating backups. Together with the assumption that you use the 3rd party tool that makes up for the xp waits We can see that 55% of the time that sql is waiting is for backups.

    Now, I don't no what your normal load has been on this server. If you haven't done much apart from the backups, it could be perfectly normal 55% of the time that sql has been waiting is backup related.

    I would suggest that you do the following:

    1) run the waitstat query on dev server and store the results for later reference

    2) clear the waitstat on the dev server

    3) start the backup on the dev server and wait till it finishes

    4) run waitstats query again and post th results

    do the same 4 steps on production.

    that way we can see the waitstats just assosiated with the backup process and have a good compare between the dev and prod waits.

    Also, ask if their are any differences between the LUN for the dev and the LUN for production.

    Different amount disk spindles, different SANs? , different raid? different fabric, different paths to SAN?

    regards,

    Edward

  • working with SAN guys now to dig deeper.

    Michael B
    Data Architect
    MCT, MCITP(SQL DBA 2005/2008),MCTS, MCP,MCDBA

  • You should run a copy_only backup to the nul device to see if the problem is reading the database data or writing the backup file to disk.

    backup database [MyDatabase] to disk = N'NUL' with copy_only, stats = 10

    This will tell you how long it takes to read the data from the disk and if that is the problem.

    If it is fast, then the problem is more than likely writing the backup to your backup file.

    Make sure that you use the COPY_ONLY option so that SQL Server does not see that as part of the backup chain.

  • Michael Valentine Jones (8/24/2012)


    You should run a copy_only backup to the nul device to see if the problem is reading the database data or writing the backup file to disk.

    backup database [MyDatabase] to disk = N'NUL' with copy_only, stats = 10

    This will tell you how long it takes to read the data from the disk and if that is the problem.

    If it is fast, then the problem is more than likely writing the backup to your backup file.

    Make sure that you use the COPY_ONLY option so that SQL Server does not see that as part of the backup chain.

    Now that is a really good suggestion!!

    I will do so. My test that I mention at the beginning has copy_only so I am good with that.

    Michael B
    Data Architect
    MCT, MCITP(SQL DBA 2005/2008),MCTS, MCP,MCDBA

  • OK. it is taking a long time on prod...

    and here are the wait stats while it is running

    type wait totalwait

    BACKUPIO 988.943242 5746.281352

    BACKUPBUFFER 990.484487 5674.775883

    BACKUPTHREAD 0 8.825387371

    BACKUP 0 0

    Michael B
    Data Architect
    MCT, MCITP(SQL DBA 2005/2008),MCTS, MCP,MCDBA

  • I may have missed it so I have to ask... are both using the same recovery model? Are both pointed to the SAN using the same type of network cards? Are the backup areas for both servers configured the same way (IE raid level, etc)? Is the "path" from the two servers to the backup area the same? Is the size of the data the same?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (8/24/2012)


    I may have missed it so I have to ask... are both using the same recovery model? Are both pointed to the SAN using the same type of network cards? Are the backup areas for both servers configured the same way (IE raid level, etc)? Is the "path" from the two servers to the backup area the same? Is the size of the data the same?

    Both are using same recover ability model

    the VM is local storage I found out and prod is SAN.

    I tried running a Copy_only on both and even tried local storage on prod and its still the same.

    database is nightly copy from prod to dev so it is same size.

    Michael B
    Data Architect
    MCT, MCITP(SQL DBA 2005/2008),MCTS, MCP,MCDBA

  • michael.berry 22479 (8/24/2012)


    Michael Valentine Jones (8/24/2012)


    You should run a copy_only backup to the nul device to see if the problem is reading the database data or writing the backup file to disk.

    backup database [MyDatabase] to disk = N'NUL' with copy_only, stats = 10

    This will tell you how long it takes to read the data from the disk and if that is the problem.

    If it is fast, then the problem is more than likely writing the backup to your backup file.

    Make sure that you use the COPY_ONLY option so that SQL Server does not see that as part of the backup chain.

    It is slow on prod and fast on dev.....

    even with no writing so I believe we can assume (always scared to) that it is not the write but the read.

    I had the network guys monitor the SAN and before I kicked off the backup on prod the disk was at 2500 bits/read per second then when the backup occurred we jumped to 22 million bits per sec (22 GB per sec) so it seems the reads of the database are working... but still very very slow backup completion.

    Now that is a really good suggestion!!

    I will do so. My test that I mention at the beginning has copy_only so I am good with that.

    Michael B
    Data Architect
    MCT, MCITP(SQL DBA 2005/2008),MCTS, MCP,MCDBA

  • I think a key element in your reply is: vm uses local storage and your prod db uses SAN.

    So how is your Server connected to your SAN? 1gbit iscsi? 10gb fiber?

    Are you backing up when everybody is backing up?

    How many disks are underneath your database data LUN?

    What is the queuedepth setting of the HBA of your server?

    Just a couple of questions to keep you busy 😉

    Have you got enough memory on your physical server? And have you set max server memory?

    Edward

  • Edward Dortland (8/28/2012)


    I think a key element in your reply is: vm uses local storage and your prod db uses SAN.

    So how is your Server connected to your SAN? 1gbit iscsi? 10gb fiber?

    Are you backing up when everybody is backing up?

    How many disks are underneath your database data LUN?

    What is the queuedepth setting of the HBA of your server?

    Just a couple of questions to keep you busy 😉

    Have you got enough memory on your physical server? And have you set max server memory?

    Edward

    I appreciate the response

    above I said "I tried running a Copy_only on both and even tried local storage on prod and its still the same."

    so I can stop looking at the SAN I believe for write.. now the issue apparently is read from db for the backup.

    Michael B
    Data Architect
    MCT, MCITP(SQL DBA 2005/2008),MCTS, MCP,MCDBA

  • I'm talking about your read. Your database file is still on the SAN lun right? Or did you also try and move your mdf to local disks?

    Regarding tour backup software, you are using exactly the same versions on both servers?

    Regards,

    Edward

  • Edward Dortland (8/28/2012)


    I'm talking about your read. Your database file is still on the SAN lun right? Or did you also try and move your mdf to local disks?

    Regarding tour backup software, you are using exactly the same versions on both servers?

    Regards,

    Edward

    I have sent these questions to my server & storage admins and will get back to you. Thanks!

    Michael B
    Data Architect
    MCT, MCITP(SQL DBA 2005/2008),MCTS, MCP,MCDBA

  • micber (8/28/2012)


    Edward Dortland (8/28/2012)


    I think a key element in your reply is: vm uses local storage and your prod db uses SAN.

    So how is your Server connected to your SAN? 1gbit iscsi? 10gb fiber?

    Are you backing up when everybody is backing up?

    How many disks are underneath your database data LUN?

    What is the queuedepth setting of the HBA of your server?

    Just a couple of questions to keep you busy 😉

    Have you got enough memory on your physical server? And have you set max server memory?

    Edward

    I appreciate the response

    above I said "I tried running a Copy_only on both and even tried local storage on prod and its still the same."

    so I can stop looking at the SAN I believe for write.. now the issue apparently is read from db for the backup.

    The answers from my admin

    The SQL data LUN (E drive) is connected to OURSERVERNAME over 8GB fibre channel. The LUN lives on our 500GB SATA aggregate which has 37 disks in it. The SQL data LUN shares the disk I/O of that aggregate between 5 total volumes I don't know for sure what the queue depth is set to on that HBA but I believe the default is 16.

    Michael B
    Data Architect
    MCT, MCITP(SQL DBA 2005/2008),MCTS, MCP,MCDBA

  • Since you are sharing spindles with other servers, looks like you need to have the SAN guys monitor what is happening during the backups. Looks like contention between serveral servers trying to use the same set of disks at the same time.

  • I agree with lynn, have your SAN admin look for iops overloading on that aggregate. Is it Netapp?

    Meanwhile to see if it is truly the SAN, try restoring a back of the database to local disks and then make a backup to NUL from that local db. That would verify our theory.

Viewing 15 posts - 16 through 30 (of 40 total)

You must be logged in to reply to this topic. Login to reply