Can someone tell me what's happening with this query?

  • Hello,

    I'm seeing a difference in a query between SQL 2005 and SQL 2008 and I'm not understanding. The below demonstrates it. On SQL 2008, the second query below will return 3 records. The first query is the same query but without the where clause. It returns one record. On SQL 2005, both queries return one record. If you run the inner query by itself, it will only return one record. So I don't see how the second query can select three records from a set of one.

    I know the join is not needed but this is a generated query. It just so happens that the select clause didn't include anything from the joined table in this case.

    Any help with what is happening would be appreciated. This seems at least like a behavior difference between SQL 2005 and 2008.

    Thanks,

    Kevin

    create table table1(Id int, Name varchar(10))

    create table table2(Id int)

    insert into table1 values(101, 'Test')

    insert into table2 values(101)

    insert into table2 values(101)

    insert into table2 values(101)

    SELECT * FROM (SELECT DISTINCT [Table1].* , ROW_NUMBER() OVER (ORDER BY [Table1].Name) AS RowNumber

    FROM [Table1]

    LEFT JOIN [Table2] ON [Table1].Id = [Table2].Id) AS TempTable

    SELECT * FROM (SELECT DISTINCT [Table1].* , ROW_NUMBER() OVER (ORDER BY [Table1].Name) AS RowNumber

    FROM [Table1]

    LEFT JOIN [Table2] ON [Table1].Id = [Table2].Id) AS TempTable

    WHERE RowNumber BETWEEN 1 AND 10

    drop table table1

    drop table table2

  • kevinv (5/8/2013)


    I'm seeing a difference in a query between SQL 2005 and SQL 2008 and I'm not understanding. The below demonstrates it. On SQL 2008, the second query below will return 3 records. The first query is the same query but without the where clause. It returns one record. On SQL 2005, both queries return one record. If you run the inner query by itself, it will only return one record. So I don't see how the second query can select three records from a set of one.

    I want to confirm for you that it's neither system nor environment. I can get this to happen on 2k5 SP2 and 2k8R2 SP1. Excellent test presentation.

    Next, minor adjustment for those of us who don't like to mess with perm tables:

    create table #table1(Id int, Name varchar(10))

    create table #table2(Id int)

    insert into #table1 values(101, 'Test')

    insert into #table2 values(101)

    insert into #table2 values(101)

    insert into #table2 values(101)

    SELECT * FROM (SELECT DISTINCT [#table1].* , ROW_NUMBER() OVER (ORDER BY [#table1].Name) AS RowNumber

    FROM [#table1]

    LEFT JOIN [#table2] ON [#table1].Id = [#table2].Id) AS TempTable

    SELECT * FROM (SELECT DISTINCT [#table1].* , ROW_NUMBER() OVER (ORDER BY [#table1].Name) AS RowNumber

    FROM [#table1]

    LEFT JOIN [#table2] ON [#table1].Id = [#table2].Id) AS TempTable

    WHERE RowNumber BETWEEN 1 AND 10

    drop table #table1

    drop table #table2

    The difference is in the optimization plan and if the join gets applied. Activate show actual execution plans. In 2k5 the join is never actually applied to the process, because it's never actually used in the final query. The optimizer went into the subquery and made some assumptions.

    This seems like a bug in 2k5 that was repaired for 2k8, when the row_number() wasn't being properly applied. If you go poking around long enough in Connect you'll probably find where it was reported. What's interesting is that the inner query still doesn't recognize that there's 3 rows when ran uniquely. Seems like it's still a bit buggy.

    I'd still have expected to see 3 rows here:

    SELECT DISTINCT [#table1].* , ROW_NUMBER() OVER (ORDER BY [#table1].Name) AS RowNumber

    FROM [#table1]

    LEFT JOIN [#table2] ON [#table1].Id = [#table2].Id

    However that execution plan still applies the row_number() after the join/sorting, whereas when it's a subquery it doesn't.

    EDIT: I will learn to type without skipping things that are in my head one of these days. However, that execution plan still applies the row_number() after the optimizer has determined it didn't need the join when it wasn't a subquery, so only one row is resulting off the distinct.

    In some ways, I'd expect ROW_NUMBER() to simply override a distinct.

    /EDIT

    Very odd.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Thanks for the info. I'll next have to see how I can fix it.

  • You can see the effect with SQL Server 2008 R2 (RTM):

    SELECT @@VERSION

    -- Microsoft SQL Server 2008 R2 (SP2) - 10.50.4000.0 (X64) Jun 28 2012 08:36:30

    -- (Build 7601: Service Pack 1) (Hypervisor)

    SELECT --DISTINCT

    [#table1].* , ROW_NUMBER() OVER (ORDER BY [#table1].Name) AS RowNumber

    FROM [#table1]

    LEFT JOIN [#table2] ON [#table1].Id = [#table2].Id

    -- (3 row(s) affected)

    SELECT DISTINCT

    [#table1].* , ROW_NUMBER() OVER (ORDER BY [#table1].Name) AS RowNumber

    FROM [#table1]

    LEFT JOIN [#table2] ON [#table1].Id = [#table2].Id

    -- (3 row(s) affected)

    SELECT @@VERSION

    -- Microsoft SQL Server 2008 R2 (RTM) - 10.50.1600.1 (X64) Apr 2 2010 15:48:46

    -- (Build 7601: Service Pack 1)

    SELECT --DISTINCT

    [#table1].* , ROW_NUMBER() OVER (ORDER BY [#table1].Name) AS RowNumber

    FROM [#table1]

    LEFT JOIN [#table2] ON [#table1].Id = [#table2].Id

    -- (3 row(s) affected)

    SELECT DISTINCT

    [#table1].* , ROW_NUMBER() OVER (ORDER BY [#table1].Name) AS RowNumber

    FROM [#table1]

    LEFT JOIN [#table2] ON [#table1].Id = [#table2].Id

    -- (1 row(s) affected)

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply