Slow query - optimisation help required!

  • I have the following query:

    INSERT INTO dbo.Load9_PotentialOverlaps_(Master_Id, Master_GUID, Master_SubmissionID, Duplicate_Id, duplicate_GUID, Duplicate_SubmissionID, MatchKeyType)

    SELECT a.id, a.GUID, a.SubmissionID, b.id, b.GUID, b.SubmissionID, 6

    FROM dbo.All_keys_ AS a

    INNER JOIN dbo.Load9_keys_ AS b

    on a.idxMatchKey1 = b.idxMatchKey1

    WHERE NOT EXISTS (SELECT 1 from dbo.Load9_PotentialOverlaps_ as c WHERE c.duplicate_ID IN (a.id,b.id)) OPTION (MAXDOP 2);

    I've attached the estimated execution plan if anyone wants to look at it.

    The main tables used by the above query are as below:

    Table sizes:

    dbo.All_keys_ = 88 million rows

    dbo.Load9_keys_ = 750 thousand rows

    Using sp_whoisactive, I can see regular wait info of the type:

    (20ms)PAGEIOLATCH_SH:MyDatabase:1(*)

    Wait stats from this moring (SQL Server restarted last night around 9PM)

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • What is the actual execution plan?

    What is the definition of the Load9_PotentialOverlaps table?

    Also why all the indexes on varbinary(8000) columns? The max length for an index key is 900 bytes, so inserts of anything over 900 bytes will cause the insert to fail, so I would get rid of the indexes or shrink the column down to 900.

  • ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • Actual execution plan?

  • Also why all the indexes on varbinary(8000) columns? The max length for an index key is 900 bytes, so inserts of anything over 900 bytes will cause the insert to fail, so I would get rid of the indexes or shrink the column down to 900.

    These are hash indexes. I was unable to create the index on the columns so I had to generate a hashbyte code for the columns which is basically a VARBAINARY(8000).

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • Please see latest attached file.

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • Such a big difference between the estimated and the actual, which is why you should always attach the actual plan as that is what SQL actually does not just what it think it will do.

    Well the biggest cost is the insert into the Load9_PotentialOverlaps_ table, so I would recommend dropping / disabling the indexes before the insert then rebuilding them after the insert

    Then look at creating a clustered index and insert into a B-Tree instead of a heap and see which is quicker.

  • You're getting type conversions on a couple of the columns. I'd look to see if that might be causing issues. Also, a Nested Loops join for millions of rows with a seek operation? That seems a little iffy. You might try a JOIN hint there, just as an experiment. See what it does with a hash or a merge. I'm just curious. It doesn't look like the kind of place I'd expect to see a loop. Plus, the seek is the highest cause apart from the insert operation.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • anthony.green (3/13/2013)


    Such a big difference between the estimated and the actual, which is why you should always attach the actual plan as that is what SQL actually does not just what it think it will do.

    Well the biggest cost is the insert into the Load9_PotentialOverlaps_ table, so I would recommend dropping / disabling the indexes before the insert then rebuilding them after the insert

    Then look at creating a clustered index and insert into a B-Tree instead of a heap and see which is quicker.

    Yes, I have disabled the indexes in the hope it speeds things up. As they say, every little bit helps!

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • Grant Fritchey (3/13/2013)


    You're getting type conversions on a couple of the columns. I'd look to see if that might be causing issues. Also, a Nested Loops join for millions of rows with a seek operation? That seems a little iffy. You might try a JOIN hint there, just as an experiment. See what it does with a hash or a merge. I'm just curious. It doesn't look like the kind of place I'd expect to see a loop. Plus, the seek is the highest cause apart from the insert operation.

    Good point about the type conversion. I've changed the Master_ID and Duplicate ID to INTs. I don't want to get too excited but I think this (and the disabbling of indexes) may have made a big difference.

    In 2 minutes, I have a table of 5 million rows populated which is a massive improvement on previous runs.

    I also tried the JOIN hints but they seem to make things worse?! To be fair it may require further testing but for now I think I will test the new changes some more and report back any further prblems.

    Thanks guys!

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • Silly question but regarding type conversions. If I have a column that's NVARCHAR(500) and I insert it into a new table that has a column of NVARCHAR(50), would that also carry a conversion overhead?

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • No, you won't get a conversion. You may see truncated data of course, but no conversion.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Okay great, just one more question that's been bugging me for a while.

    My staging table has a lot of NVRCHAR(500) columns. Now I know many of my columns will never be that big however, in terms of storage, would it make a difference if I decreased the size to say NVARCHAR(200)?

    Not sure but I have a feeling that many of my queries are taking longer than expected because of the amount of IO?

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • The amount of storage is the cause of IO, not the potential storage. If there are 300 bytes not used, they're not used. There's no additional cost for that. It's still a good practice to only use the size you actually need, but it won't seriously negatively impact you in this regard. Now, if you were designating them as MAX, that's different. But other than that, no big deal.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks Grant. Well it's back to the drawing board for me. Need to find other ways of doing the same thing in my code without retreiving so much data!

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply