Help me understand bulk insert blocking chains on unrelated tables

  • Here's an issue I've been observing (SQL 2014 SP1 CU3):

    - In a database with 20 tables (clustered key on a bigint identity column, one other index, no constraints/defaults/foreign keys/triggers)

    - And 20 bulk inserts happening at once, one for each table; batch_size 50000, check_constraints, and keepnulls, from a CSV.

    - Lots of blocking chains occur where:

    -- The statements involved are all bulk inserts.

    -- There is one lead bulk insert, and two or more blocked bulk inserts.

    -- All are to completely different tables with no relationships with each other.

    -- The last wait stats sometimes indicate a LOGBUFFER wait on the lead blocker, and PAGELATCH_EX on the blocked processes.

    Of course I can understand there is a lot of IO pressure but I'm wondering what the blocking resource is. I haven't yet been able to catch it in time to check sys.dm_tran_locks in case it reveals it. I'm using SQL Sentry but it doesn't appear to capture that info in the Blocking SQL tab.

  • In sys.processess does your waitresource show something like 2:1:1. It might be a tempdb contention.

    ----------------------------------------------------------------------------------------------------------------------------------------------------
    Roshan Joe

    Jeff Moden -Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • When you say 'Bulk inserts' do you mean straight from file or table without doing any joins or filtering or sorting?

    Bulk inserting into a clustered table or table with indexes will cause implicit sorting.

    How many files does tempdb have?

    MS recommend number of tempdb files = SmallestOf(#CPU's,8)

    Log access is single threaded whereas DATA access is not.

    So LOG can easily become a blocker.

    Is the LOG on a separate drive to the DATA file?

    For more info,

    http://www.sqlskills.com/blogs/paul/worrying-wait-type/

    Catch-all queries done right [/url]
    Gail Shaw's Performance Blog[/url]

  • Search for, download and create sp_whoisactive. You can use it to see blocking chains and wait types in real time. Use Extended Events to capture blocking events and anything else you're interested in for later analysis.

    John

  • MadAdmin, yes literally Bulk Insert statements straight from a file with no further joins or filtering or sorting.

    It has a bigint identity which is populated automatically, plus one index on another bigint which is from the file. That's one number per file. So hopefully it's not sorting in any special way after that. I will be experimenting with some of those sorting options and adding indexes/before after (I do plan to experiment to work out how to speed this stuff up).

    8 files for tempdb, 16 cores and separate disks. I think the underlying storage is SAN and the network cards are being saturated so we're upgrading them (virtually, it's Hyper-V stuff).

    Log waits are definitely one of our bottlenecks but I expected that the sessions waiting on the log would go to a suspended state if they're not being serviced; not form blocking chains where a b and c are blocked by d.

    I'm currently getting 20-30-40 Bulk Inserts all waiting in massive queues on each other. I wonder if there's an internal limit, or if possibly they're saturating the schedulers, and maybe using a MAXDOP hint would help? I'll add that to the list of things to try.

    The database is in simple mode; but it's also multi-terabyte. We'd actually like to make it highly available; but I can imagine that AlwaysOn and full mode are going to start killing things. Also I'm not sure if TF 610 operates on and can improve bulk inserts on simple mode databases; I suspect not. I haven't seen much if anything on performing mass bulk inserts in AOAG environments. I'm even going so far as to investigate In-Memory OLTP stuff to see if we can just eliminate this step (because it is basically staging stuff which will be put to disk elsewhere later anyway).

    John -> Yes, I'm using SQL Sentry and also sp_WhoIsActive of course! However they only identified what I've listed so far; no actual root causes. This is a pretty heavy system and I expect there to be lots of bottlenecks. What I'm particularly interested in though is how the blocking chains are forming. Multiple sessions waiting on log writes usually would not form blocking chains. One normally has to hold locks on resource that others require and log writes aren't a lockable resource. But I haven't found any indication of what it is.

  • Have a look at the query plan created for the bulk insert.

    There was or maybe still is, where a plan gets created with sort even though you not doing an identity insert.

    Found this after a bit of a hunt.

    https://connect.microsoft.com/SQLServer/feedback/details/348970/bulk-insert-with-identity-column-creates-query-plan-with-sort

    Catch-all queries done right [/url]
    Gail Shaw's Performance Blog[/url]

  • If you use sp_whoisactive @get_locks=1

    you might be able to see exactly what resources the tasks are waiting for.

    This might give you a hint to what is happening.

    /SG

  • Thanks Mad & Stefan. I had tried to look at the query plans for the bulk insert before but SQL Sentry said they weren't generated.

    I'll try again. I'll also set up the @get_locks and see what I can find out. Probably 24 hours.

  • Cody K (11/30/2015)


    The database is in simple mode; but it's also multi-terabyte. We'd actually like to make it highly available; but I can imagine that AlwaysOn and full mode are going to start killing things.

    That being said, if the database goes corrupt, what's your recovery plan?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (11/30/2015)


    Cody K (11/30/2015)


    That being said, if the database goes corrupt, what's your recovery plan?

    Restore from backup with a few day outage and losing up to a day of data.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply