any way this can avoid of index scan?

  • Hi, it looks impossible to me, but, is anyone know could this kind of query avoid of index scan?

    select * from A where col1 not in (select col1 from B)

    thank you all!

  • No, that will require a scan of both tables.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • sqlnes (7/17/2013)


    Hi, it looks impossible to me, but, is anyone know could this kind of query avoid of index scan?

    select * from A where col1 not in (select col1 from B)

    thank you all!

    As Gail said, no, but there are scans and there are scans. Play with the following code and examine the execution plans of the two statements. Note also the NULL handling behaviour of NOT IN - is it what you were expecting?

    DROP TABLE #A

    SELECT TOP 10000 col1 = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    INTO #A

    FROM syscolumns a, syscolumns b

    CREATE UNIQUE CLUSTERED INDEX ucx_col1 ON #A (col1)

    DROP TABLE #B

    SELECT TOP 3 col1 = CAST(col1 AS INT)

    INTO #B

    FROM #A

    INSERT INTO #B (col1) SELECT NULL

    INSERT INTO #B (col1) SELECT TOP 10000 col1 = 10000+ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    CREATE UNIQUE CLUSTERED INDEX ucx_col1 ON #B (col1)

    -- full scan

    SELECT a.col1

    FROM #A a

    WHERE a.col1 NOT IN (SELECT b.col1 FROM #B b)

    -- range scan

    SELECT a.col1

    FROM #A a

    WHERE NOT EXISTS (SELECT 1 FROM #B b WHERE b.col1 = a.col1)

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thank you Gail & Chris for explain.

    Just checked my tables and the "col1" on both table are non null. So it seems it's unavoidable.

  • sqlnes (7/17/2013)


    Thank you Gail & Chris for explain.

    Just checked my tables and the "col1" on both table are non null. So it seems it's unavoidable.

    What's unavoidable? (Apart from D & T of course)

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • 2 index scans are unavoidable?

  • sqlnes (7/17/2013)


    2 index scans are unavoidable?

    That's correct. You can have an index scan too if you want - just up the ratio of looked-up rows to parent rows, but you're stuck with the scans. Like I said earlier, check the actual execution plans of the two alternative ways of writing the query. Run them together (run the prep part as a batch, then the two queries as a separate batch) to compare the plans one above the other. If you are unsure of what I mean by "not all scans are equal", look at the scan operators for the lookup tables in each plan.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thanks Chris. And sorry I forgot to mention that I've followed your first reply and compared the explains and they are identical. Then I checked the column on my 2 tables and both are not null...

    I guess may be I could ask the developer to add some more criterion on the first table would reduce the cost?

  • sqlnes (7/17/2013)


    Thanks Chris. And sorry I forgot to mention that I've followed your first reply and compared the explains and they are identical. Then I checked the column on my 2 tables and both are not null...

    I guess may be I could ask the developer to add some more criterion on the first table would reduce the cost?

    Really? What type of join is shown for Query 1 and Query 2?

    Query 2 is about 20x faster than Query 1 on this pc, with or without nullable columns.

    I'm using 2K8, I wouldn't expect 2K5 to be much different.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (7/17/2013)


    Really? What type of join is shown for Query 1 and Query 2?

    NOT IN and NOT EXISTS do perform identically with identical execution plans if the columns are defined as NOT NULL.

    http://sqlinthewild.co.za/index.php/2010/02/18/not-exists-vs-not-in/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (7/17/2013)


    ChrisM@Work (7/17/2013)


    Really? What type of join is shown for Query 1 and Query 2?

    NOT IN and NOT EXISTS do perform identically with identical execution plans if the columns are defined as NOT NULL.

    http://sqlinthewild.co.za/index.php/2010/02/18/not-exists-vs-not-in/

    <headdesk> Thanks Gail.

    DROP TABLE #A

    CREATE TABLE #A (col1 INT NOT NULL)

    INSERT INTO #A (col1)

    SELECT TOP 10000 col1 = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM syscolumns a, syscolumns b

    CREATE UNIQUE CLUSTERED INDEX ucx_col1 ON #A (col1)

    DROP TABLE #B

    CREATE TABLE #B (col1 INT NOT NULL)

    INSERT INTO #B (col1)

    SELECT TOP 300 col1 = CAST(col1 AS INT)

    FROM #A

    INSERT INTO #B (col1) SELECT TOP 10000 col1 = 10000+ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    CREATE UNIQUE CLUSTERED INDEX ucx_col1 ON #B (col1)

    -- range scan

    SELECT a.col1

    FROM #A a

    WHERE a.col1 NOT IN (SELECT b.col1 FROM #B b)

    -- range scan

    SELECT a.col1

    FROM #A a

    WHERE NOT EXISTS (SELECT 1 FROM #B b WHERE b.col1 = a.col1)

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply