Different results from left outer join and an inner join

  • This one is really strange and unfortunately I can't give an example of the bug. I did recreate the scenario with temp tables, primary keys and unique indexes but it works. There may be an entirely other way to rewrite this which I welcome but here it goes. There is a table with a pool and company cross joined to a date table and I need to get the last rate equal to or less than the date from another table. At first I was using an inner join figuring there'd be at least one rate but lets face it, never trust the users :hehe: so I changed it to a left outer figuring if they didn't enter a rate for the month it would get last months rate. This will be averaged at the end but as I started building the code I wanted to make sure it was gettin the correct dates. The code that works in temp tables is the following

    BEGIN

    SET NOCOUNT ON

    CREATE TABLE #pool_def(

    pool_id CHAR(10),

    company_id CHAR(10),

    unique_id INT IDENTITY PRIMARY KEY,

    CONSTRAINT unqPoolDef UNIQUE(pool_id, company_id)

    )

    INSERT INTO #pool_def VALUES('pool', 'subcmp1')

    INSERT INTO #pool_def VALUES('pool', 'subcmp2')

    CREATE TABLE #c_doy2k(

    date_time DATETIME,

    unique_id INT IDENTITY PRIMARY KEY,

    CONSTRAINT unqDoy2k UNIQUE(date_time)

    )

    DECLARE @i INT

    SET @i = 0

    BEGIN TRANSACTION

    WHILE @i < 31 BEGIN

    INSERT INTO #c_doy2k

    VALUES(DATEADD(d,@i,'20100101'))

    SET @i = @i + 1

    END

    COMMIT TRANSACTION

    CREATE TABLE #srate(

    datex DATETIME,

    pool_id CHAR(10),

    company_id CHAR(10),

    date2 DATETIME,

    rate FLOAT,

    unique_id INT IDENTITY PRIMARY KEY,

    CONSTRAINT unqSrate UNIQUE(datex,pool_id, company_id),

    CONSTRAINT unqSrate2 UNIQUE(pool_id, company_id,date2)

    )

    INSERT INTO #srate

    SELECT '20100101', 'pool', 'subcmp1', '20100101', 1 UNION ALL

    SELECT '20100105', 'pool', 'subcmp1', '20100105', 2 UNION ALL

    SELECT '20100101', 'pool', 'subcmp2', '20100101', 3 UNION ALL

    SELECT '20100105', 'pool', 'subcmp2', '20100105', 4

    DECLARE

    @sDate DATETIME,

    @eDate DATETIME

    SELECT

    @sDate = '20100101',

    @eDate = '20100131'

    SELECT pd.pool_id, pd.company_id, doy.date_time, MAX(srate.datex) rateDate

    FROM #pool_def pd

    INNER JOIN #c_doy2k doy ON

    doy.date_time BETWEEN @sDate AND @eDate

    INNER JOIN #srate srate ON

    srate.pool_id = pd.pool_id AND

    srate.company_id = pd.company_id AND

    srate.datex <= doy.date_time

    WHERE pd.pool_id = 'pool'

    GROUP BY pd.pool_id, pd.company_id, doy.date_time

    ORDER BY pd.pool_id, pd.company_id, doy.date_time

    SELECT pd.pool_id, pd.company_id, doy.date_time, MAX(srate.datex) rateDate

    FROM #pool_def pd

    INNER JOIN #c_doy2k doy ON

    doy.date_time BETWEEN @sDate AND @eDate

    LEFT OUTER JOIN #srate srate ON

    srate.pool_id = pd.pool_id AND

    srate.company_id = pd.company_id AND

    srate.datex <= doy.date_time

    WHERE pd.pool_id = 'pool'

    GROUP BY pd.pool_id, pd.company_id, doy.date_time

    ORDER BY pd.pool_id, pd.company_id, doy.date_time

    DROP TABLE #c_doy2k

    DROP TABLE #pool_def

    DROP TABLE #srate

    END

    GO

    from the example you can see that the result set is

    POOL SUBCMP1 2010-01-01 00:00:00.0002010-01-01 00:00:00.000

    POOL SUBCMP1 2010-01-02 00:00:00.0002010-01-01 00:00:00.000

    POOL SUBCMP1 2010-01-03 00:00:00.0002010-01-01 00:00:00.000

    POOL SUBCMP1 2010-01-04 00:00:00.0002010-01-01 00:00:00.000

    POOL SUBCMP1 2010-01-05 00:00:00.0002010-01-05 00:00:00.000

    POOL SUBCMP1 2010-01-06 00:00:00.0002010-01-05 00:00:00.000

    and again with the left outer

    POOL SUBCMP1 2010-01-01 00:00:00.0002010-01-01 00:00:00.000

    POOL SUBCMP1 2010-01-02 00:00:00.0002010-01-01 00:00:00.000

    POOL SUBCMP1 2010-01-03 00:00:00.0002010-01-01 00:00:00.000

    POOL SUBCMP1 2010-01-04 00:00:00.0002010-01-01 00:00:00.000

    POOL SUBCMP1 2010-01-05 00:00:00.0002010-01-05 00:00:00.000

    POOL SUBCMP1 2010-01-06 00:00:00.0002010-01-05 00:00:00.000

    Now the interesting part is that in the production tables which are bigger and uglier than what I pieced together I get two different result sets from the following

    BEGIN

    DECLARE

    @sDate DATETIME,

    @eDate DATETIME

    SELECT

    @sDate = '20100101',

    @eDate = '20100131'

    SELECT pd.pool_id, pd.company_id, doy.date_time, MAX(srate.datex) rateDate

    FROM pool_def pd

    INNER JOIN c_doy2k doy ON

    doy.date_time BETWEEN @sDate AND @eDate

    INNER JOIN srate srate ON

    srate.pool_id = pd.pool_id AND

    srate.company_id = pd.company_id AND

    srate.datex <= doy.date_time

    WHERE pd.pool_id = 'pool'

    GROUP BY pd.pool_id, pd.company_id, doy.date_time

    ORDER BY pd.pool_id, pd.company_id, doy.date_time

    SELECT pd.pool_id, pd.company_id, doy.date_time, MAX(srate.datex) rateDate

    FROM pool_def pd

    INNER JOIN c_doy2k doy ON

    doy.date_time BETWEEN @sDate AND @eDate

    LEFT OUTER JOIN srate srate ON

    srate.pool_id = pd.pool_id AND

    srate.company_id = pd.company_id AND

    srate.datex <= doy.date_time

    WHERE pd.pool_id = 'pool'

    GROUP BY pd.pool_id, pd.company_id, doy.date_time

    ORDER BY pd.pool_id, pd.company_id, doy.date_time

    END

    GO

    but this is actually what i get from production

    POOL SUBCMP1 2010-01-01 00:00:00.0002010-01-01 00:00:00.000

    POOL SUBCMP1 2010-01-02 00:00:00.0002010-01-01 00:00:00.000

    POOL SUBCMP1 2010-01-03 00:00:00.0002010-01-01 00:00:00.000

    POOL SUBCMP1 2010-01-04 00:00:00.0002010-01-01 00:00:00.000

    POOL SUBCMP1 2010-01-05 00:00:00.0002010-01-05 00:00:00.000

    POOL SUBCMP1 2010-01-06 00:00:00.0002010-01-05 00:00:00.000

    with the left outer i get

    POOL LEADCMP 2010-01-01 00:00:00.000NULL

    POOL LEADCMP 2010-01-02 00:00:00.000NULL

    POOL LEADCMP 2010-01-03 00:00:00.000NULL

    POOL LEADCMP 2010-01-04 00:00:00.000NULL

    POOL LEADCMP 2010-01-05 00:00:00.000NULL

    POOL LEADCMP 2010-01-06 00:00:00.000NULL

    The version of the server is as follows

    Microsoft SQL Server 2005 - 9.00.4053.00 (Intel X86) May 26 2009 14:24:20 Copyright (c) 1988-2005 Microsoft Corporation Enterprise Edition on Windows NT 5.2 (Build 3790: Service Pack 2)

    Any ideas whatsoever?

  • Aren't the differences what you'd expect?

    With an inner join there must be a match in the second table, so the only rows that can appear in the resultset will have values for the srate.datex column

    With a left outer join you're allowing there not to be a matching row at all in the srate table. When there isn't, the column values will appear as null, exactly what you're seeing in the second result set.

    It means that you have rows in the pool_def/c_doy2k tables that has no match at all on the conditions that you specified in the srate table

    Sorry, but I see nothing that's unexpected here.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • To reproduce that in your test example, change your insert into the first temp table as follows:

    INSERT INTO #pool_def VALUES('pool', 'subcmp1')

    INSERT INTO #pool_def VALUES('pool', 'subcmp2')

    INSERT INTO #pool_def VALUES('pool', 'LEADCMP')

    Because there are no rows in the #srate table that have the company_id = LEADCMP, when you do a left join, you'll see the rows with leadcmp appear, with null values from #srate.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (1/21/2010)


    Aren't the differences what you'd expect?

    With an inner join there must be a match in the second table, so the only rows that can appear in the resultset will have values for the srate.datex column

    With a left outer join you're allowing there not to be a matching row at all in the srate table. When there isn't, the column values will appear as null, exactly what you're seeing in the second result set.

    It means that you have rows in the pool_def/c_doy2k tables that has no match at all on the conditions that you specified in the srate table

    Sorry, but I see nothing that's unexpected here.

    You are absolutely right, I didn't even notice the lead company, thank you. brain dead :w00t:

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply