Correlated Joins Using "Apply"

  • I had this problem too and solved it by row numbering like in the article. Although I had read about the APPLY function it did not come to my mind. I therefore think the article is indeed valuable, even if it is not the newest feature, because the use of the APPLY function is little known. Would be interesting if somebody could shed some more light on the performance. Thanks.

  • Hi,

    If i understood the problem (you did not show the result set)

    I think that you can also write this query this way (sql 2000/2005):

    SELECT

    lvt.ID,

    lvt.PhoneNumber,

    lvt.CreateDate,

    lvc.CallWindowStart,

    lvc.CallWindowEnd,

    lvc.LVCallDispositionID

    FROM

    LVTransaction lvt

    INNER JOIN

    LVCall lvc

    on (lvc.LVTransactionID = lvt.ID

    and lvc.CreateDate in

    (select top 3 CreateDate

    from LVCall lvc1

    where lvc1.LVTransactionID = lvc.LVTransactionID

    Order By CreateDate DESC))

    order by lvt.ID,lvt.CreateDate

    But it doesn't show as good preformance as the others (Cross Aplly & Row Number).

    Rani_w

  • How is APPLY any different from LEFT OUTER JOIN?

  • Well, once again, I've been enlightened. I've printed this article, and I bet I use it soon.

    Thanks.

  • From the tests below it seems like the ROW_NUMBER method is faster than the OUTER APPLY.

    How much faster depends on the number of calls per transaction (the difference is smaller if there are few calls per transaction).

    --Create transaction table

    CREATE TABLE #table1 (ID INT IDENTITY

    ,PhoneNumber VARCHAR(12)

    ,CreateDate DATETIME)

    --Create a call table

    CREATE TABLE #table2 (LVTransactionID INT

    ,CallWindowStart DATETIME

    ,CallWindowEnd DATETIME

    ,CallNumber INT)

    --Populate transactions with 8 unique phone numbers

    INSERT INTO #table1

    SELECT '310 404 1001', '20090227 01:00:00'

    UNION ALL

    SELECT '310 404 1002', '20090227 02:00:00'

    UNION ALL

    SELECT '310 404 1003', '20090227 03:00:00'

    UNION ALL

    SELECT '310 404 1004', '20090227 04:00:00'

    UNION ALL

    SELECT '310 404 1005', '20090227 05:00:00'

    UNION ALL

    SELECT '310 404 1006', '20090227 06:00:00'

    UNION ALL

    SELECT '310 404 1007', '20090227 07:00:00'

    UNION ALL

    SELECT '310 404 1008', '20090227 08:00:00'

    --For performance testing assume that these phone numbers were used in a large number of transactions each

    INSERT INTO #table1

    SELECT a.PhoneNumber, a.CreateDate FROM #table1 a, #table1 b, #table1 c, #table1 d

    --Populate table2 with one call for each transaction

    INSERT INTO #table2

    SELECT ID, CreateDate, DATEADD(mi,1,CreateDate), 1

    FROM #table1

    --A minute later a new call is placed for all but the first transaction

    INSERT INTO #table2

    SELECT LVTransactionID, DATEADD(mi,1,CallWindowStart), DATEADD(mi,1,CallWindowEnd), CallNumber + 1

    FROM #table2

    WHERE LVTransactionID > 1 AND CallNumber = 1

    --A minute later a new call is placed for all but the two first transactions

    INSERT INTO #table2

    SELECT LVTransactionID, DATEADD(mi,1,CallWindowStart), DATEADD(mi,1,CallWindowEnd), CallNumber + 1

    FROM #table2

    WHERE LVTransactionID > 2 AND CallNumber = 2

    --A minute later a new call is placed for all but the three first transactions

    INSERT INTO #table2

    SELECT LVTransactionID, DATEADD(mi,1,CallWindowStart), DATEADD(mi,1,CallWindowEnd), CallNumber + 1

    FROM #table2

    WHERE LVTransactionID > 3 AND CallNumber = 3

    --A minute later a new call is placed for all but the four first transactions

    INSERT INTO #table2

    SELECT LVTransactionID, DATEADD(mi,1,CallWindowStart), DATEADD(mi,1,CallWindowEnd), CallNumber + 1

    FROM #table2

    WHERE LVTransactionID > 4 AND CallNumber = 4

    --We now have data for testing the result and performance of different methods

    SELECT 'Testing performance: '+CAST((SELECT COUNT(*) FROM #table1) AS varchar(10))+' transactions and '+CAST((SELECT COUNT(*) FROM #table2) AS varchar(10))+' calls'

    DECLARE @StartTime DATETIME

    --Test CTE/ROW_NUMBER method

    SET @StartTime = GETDATE();

    WITH lvc AS

    (

    SELECT LVTransactionID

    , CallWindowStart

    , CallWindowEnd

    , CallNumber

    , ROW_Num = ROW_NUMBER() OVER

    (PARTITION BY LVTransactionID

    ORDER BY CallWindowStart DESC)

    FROM #table2

    )

    SELECT lvt.ID

    ,lvt.PhoneNumber

    , lvt.CreateDate

    , lvc.CallWindowStart

    , lvc.CallWindowEnd

    , lvc.CallNumber

    FROM #table1 lvt

    INNER JOIN lvc ON lvt.ID=lvc.LVTransactionID

    WHERE ROW_Num <= 3

    SELECT DATEDIFF(millisecond, @StartTime, GETDATE()) AS [ExecutionTime CTE/ROW_NUMBER]

    --Test the OUTER APPLY method

    SET @StartTime = GETDATE();

    SELECT

    lvt.ID,

    lvt.PhoneNumber,

    lvt.CreateDate,

    lvc.CallWindowStart,

    lvc.CallWindowEnd,

    lvc.CallNumber

    FROM

    #table1 lvt

    OUTER APPLY

    (

    SELECT top 3 *

    FROM #table2

    WHERE LVTransactionID = lvt.ID

    ORDER BY CallWindowStart DESC

    ) AS lvc

    SELECT DATEDIFF(millisecond, @StartTime, GETDATE()) AS [ExecutionTime OUTER APPLY]

    DROP TABLE #table1

    DROP TABLE #table2

    GO

  • OOps! I didn't realized this was and old post.


    * Noel

  • We've been using the ROW_NUMBER() method for a while now and it's great for reporting purposes. Not to mention, its speed is great compared to using UDFs or inserting into a temp table within a loop.

    To get the top X records per group

    SELECT * FROM

    (SELECT ROW_NUMBER() OVER (PARTITION BY ChowID ORDER BY ChowID, EndDate) as RowNumber, ChowID, EndDate, Name FROM StatLog) as a

    WHERE a.RowNumber <= @TopCount ORDER BY a.Name, a.EndDate

    /* Anything is possible but is it worth it? */

  • If you are going to write a piece of code then you have an obligation to the user. Why? This is a service business. The user experience is the most important part of the service business.

    If your code runs slower than another piece of code and both return the same result, you should use the other code. An article explaining the new "feature" runs poorer than the old "feature" may be providing a public service. But, I would really like to see code samples that help me provide the user with a better experience instead of a worse one.

  • my first thought was to set up a CTE for the transactions, then join that to the Call data and use TOP 3. If the data can be returned on multiple rows i'd think this would be pretty fast (haven't tested it).

  • P.S. great article!

  • Does a regular correlated sub-query perform the same thing?

    SELECT

    lvt.ID,

    lvt.PhoneNumber,

    lvt.CreateDate,

    lvc.CallWindowStart,

    lvc.CallWindowEnd,

    lvc.LVCallDispositionID

    FROM

    LVTransaction lvt

    OUTER APPLY --<<<<<<<<<<<<<< Replace OUTER APPLY with WHERE.

    (

    SELECT top 3 *

    FROM LVCall

    WHERE lvtransactionID = lvt.ID

    Order By CreateDate DESC

    ) as lvc

  • nope. you'd get this error as stated in the article:

    Msg 4104, Level 16, State 1, Line 2

    The multi-part identifier "lvt.ID" could not be bound.

    Hence the use of apply (or row_number).

  • Andy, I didn't do the same thing noted in the article:

    --Correlated INNER JOIN Attempt

    SELECT

    lvt.ID,

    lvt.PhoneNumber,

    lvt.CreateDate,

    lvc.CallWindowStart,

    lvc.CallWindowEnd,

    lvc.LVCallDispositionID

    FROM

    LVTransaction lvt

    INNER JOIN

    (SELECT TOP 3 * FROM LVCall WHERE LVTransactionID = lvt.ID ORDER BY CreateDate DESC) lvc

    on lvc.LVTransactionID = lvt.ID -- <<<<<<<<<<<<<<<< mine DOESN'T have this.

    Mine isn't a correlated inner join, it's a simple correlated subquery. Note the <<<<<<<<<<<'s. In mine, the inner query will reference the outside query for each row in the outside query. There is no "JOIN.... ON... "

  • you're still looking for lvt.ID in a query that only has LVCalls in the from statement

  • Ok. Sorry, I had a minor error in the WHERE Clause. I meant:

    SELECT

    lvt.ID,

    lvt.PhoneNumber,

    lvt.CreateDate,

    lvc.CallWindowStart,

    lvc.CallWindowEnd,

    lvc.LVCallDispositionID

    FROM

    LVTransaction lvt

    WHERE lvt.ID in ( --<<<<<<<<<<<<<<< THIS IS WHAT I MEANT

    SELECT top 3 *

    FROM LVCall

    WHERE lvtransactionID = lvt.ID

    Order By CreateDate DESC

    ) as lvc

    Try this. It is essentially the same thing.

    create table #T1 (pk int identity(1,1), C1 int)

    create table #T2 (pk int identity(1,1), C2 int)

    insert #T1 (C1) values (1)

    insert #T1 (C1) values (2)

    insert #T2 (C2) values (1)

    insert #T2 (C2) values (2)

    select *

    from #T1 T1

    where pk in (select pk from #T2 T2 where T1.pk = T2.pk)

    drop table #T1

    drop table #T2

Viewing 15 posts - 16 through 30 (of 49 total)

You must be logged in to reply to this topic. Login to reply