Strange self blocking

  • We have an odd issue that has cropped up recently.  I know that I can fix it by re-writing the query, but I'd like a bit of information about why it is happening and hoping that the experts here can help.

    We have an SSIS package that has 2 steps - a T-SQL step and a data flow step.  The T-SQL is just a truncate on the destination table.  Nothing exciting.
    The data flow step is where things are interesting.  
    It does a select which returns roughly 70,000 rows (69,129 rows to be exact) and inserts them into a table.
    Sounds easy enough so far, but the SELECT grabs data from the table it is inserting to.

    So step 1 is to truncate dbo.d.
    Next we have a data source of a T-SQL statement.
    To dumb down the T-SQL statement, it is something like this:
    SELECT a.name, b.date, a.fulldatealternatekey
    from dbo.a
    outer apply
    (select c.date
    from dbo.c
    where date <> expndate) as b
    where a.fulldatealternatekey > ISNULL((select max(excDate) FROM dbo.d),'1900-01-01 00:00:00.000')
    order by a.fulldatealternatekey

    And then we have a data destination of dbo.d.
    In an average day, this will block 2 to 5 times and block for approximately 25 seconds.  But on a bad day (which occurs roughly once every 2 months), it will block until I manually go in and stop the package execution.
    The blocking is that the insert bulk is blocked by the select.  I have verified this in Red-Gate SQL Monitor.

    I see several ways to fix the above (since the table is truncated the max excDate will be null so ignore the whole where clause.  Or presuming we wanted to remove the truncate step at the start (which I think was the original intent), move that "(select max(excDate) FROM dbo.d)" part into a variable (which should also improve performance)), but I am more curious as to why this would block itself.

    My theory is that the server ran out of memory space to store the entire result set or estimated it to be much higher than it actually was and was working in stages (10,000 rows at a time possibly?).  Due to this retained the shared lock on dbo.d but since there was a shared lock, the insert bulk on dbo.d could not succeed.

    Does this sound plausable?  Or is there some other weird voodoo happening that I am not thinking about?

    EDIT - wanted to include more information.  The SSIS server is a SQL Server 2012 server running as an SSIS catalog using the SQL Agent jobs to execute the SSIS packages.  The source and destination tables are all SQL Server 2012 as well.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Considering that the "b" query appears to be a doubled triangular join (just one diagonal from being a full up square join {Cartesian Product/Cross Join}), I'm not surprised that your having performance problems with the code.  I can't make a suggestion because I don't actually know what the requirements that you're trying solve for are nor do I actually know if it's a double triangular join because there are no aliases on the column names of the WHERE clause within "b".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Change the size of the batch in the SSIS package to a smaller number.

    And, floow Jeff's suggestion to try to re-write thje query

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • Oh, the plan is to rewrite the query for sure.  And my bad about missing the aliases in the subquery.  It should have been:

    SELECT a.name, b.date, a.fulldatealternatekey
    from dbo.a
    outer apply
    (select c.date
    from dbo.c
    where c.date <> c.expndate) as b
    where a.fulldatealternatekey > ISNULL((select max(d.excDate) FROM dbo.d),'1900-01-01 00:00:00.000')
    order by a.fulldatealternatekey

    I suppose for my example, I didn't really need to put in the outer apply, but was just trying to keep the logic of the original code the same.

    Michael, the problem with decreasing the size of the batch is wouldn't it still hold a shared lock with the data source on dbo.d and then when it tries to do the insert into dbo.d at the destination end, I'd end up with blocking that could likely never end?

    My proposed solution to the data source is:
    DECLARE @maxDate datetime
    SELECT @maxDate =   max(excDate) FROM dbo.d
    SELECT a.name, b.date, a.fulldatealternatekey
    from dbo.a
    outer apply
    (select c.date
    from dbo.c
    where date <> expndate) as b
    where a.fulldatealternatekey > ISNULL(@maxDate,'1900-01-01 00:00:00.000')
    order by a.fulldatealternatekey

    or just completely remove the where clause as step 1 of this SSIS package is to truncate dbo.d.  I inherited this package and am working with the package developer about fixing it.  

    Performance actually isn't horrible if there is no blocking.  Takes less than a minute to complete.  Not super fast by any means and could be faster for sure, but it is better performance than I had anticipated.  But when blocking occurs, it gets stuck for hours.  Longest I've seen was 7 hours before I cancelled it manually.  I think that the above rewrite or removing the where clause completely would improve performance substantially.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply