Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

SQL Man of Mystery

Wes Brown is a PASS chapter leader and SQL Server MVP. He writes for SQL Server Central and maintains his blog at http://www.sqlserverio.com. Wes is Currently serving as a Senior Lead Consultant at Catapult Systems. Previous experiences include Product Manager for SQL Litespeed by Quest software and consultant to fortune 500 companies. He specializes in high availability, disaster recovery and very large database performance tuning. He is a frequent speaker at local user groups and SQLSaturdays.

The Dangers of Sub-queries!!!!

Ok, now that I have your attention this really should be titled the danger of not qualifying all objects, but that isn’t quite as sensational enough to make you click through to here :)

Imagine if you will, a developer expects ten records to be inserted into a table and the query has been running for almost an hour.  A quick glance and sp_who2 shows that it is using a lot of IO, I mean a lot. The first thing that happens is a command decision from the boss to kill the query. I took a quick peek and found the plan still in the plan cache while we waited for this thing to roll back. As I look at the plan I see something really wrong. It looks like the table receiving the inserts isn’t getting the ten records we thought but 169 million records, every record from the table in the select clause.

I look at the query and on the surface everything looks good. When I hit the check mark it compiles without error. Looking at the plan in detail I notice a little icon that tipped me off on where to look next.

image

Oh that’s bad. Warnings: No join predicate. Ouch. But this is a sub-query, how can I have no join predicate? Lets take a look at an example.

DROP TABLE #t1
DROP TABLE #t2
GO
CREATE TABLE #t1 (
  t1id    INT   NOT NULL,
  t1name  VARCHAR(50)   NULL,
  t1place VARCHAR(50)   NULL)

ALTER TABLE #t1
 ADD   PRIMARY KEY ( t1id )

CREATE TABLE #t2 (
  t2id    INT   NOT NULL,
  t1name  VARCHAR(50)   NULL,
  t1place VARCHAR(50)   NULL)

ALTER TABLE #t2
 ADD   PRIMARY KEY ( t2id )

INSERT INTO #t1
SELECT
  spid,
  loginame,
  hostname
FROM
  MASTER.dbo.sysprocesses
WHERE  hostname <> ''

INSERT INTO #t2
SELECT
  spid,
  loginame,
  hostname
FROM
  MASTER.dbo.sysprocesses
WHERE  hostname <> ''

This gives us two tables with some data. We want to find all the records in #t1 that also exist in #t2 but instead of a join we will use an IN and a sub-query.

SELECT
  t1name,
  t1place
FROM
  #t1
WHERE  t1id IN (SELECT
                 t1id
                FROM
                 #t2)
 

Looks simple enough and will pass the compile test all day long even though t1id doesn’t exist in #t2. Since you can use columns from the top level query in the sub-query this is a perfectly valid piece of T-SQL.

It gives us this plan when we look at it though.

image

And there you have it, a join without an ON clause as far as the optimizer is concerned. By not qualifying all the columns in the sub-query we opened ourselves up to this error. This isn’t the optimizers fault! If we re-write this as it was mean to be:

SELECT
  t1name,
  t1place
FROM
  #t1
WHERE  t1id IN (SELECT
                 t2id
                FROM
                 #t2)
 

We get a plan that is more to our liking.

image

You could also re-write this as a join instead of using the IN clause and would have avoided this problem as well. Scary way to learn that SQL Server will do exactly what you tell it to even if you are wrong!


Comments

No comments.

Leave a Comment

Please register or log in to leave a comment.