Query restructure

  • I have a query which updates columns in Table1 from max value in Table2 joining on Table1 and Table2 as follows:

    UPDATE TAB1

    SET TAB1.COL1 = TAB2.COL1,

    TAB1.COL2 = TAB2.COL2

    FROM TAB1 , TAB2 B

    WHERE TAB1.COL3 = TAB2.COL3

    AND TAB1.COL4 = TAB2.COL4

    AND TAB2.COL1 = ( SELECT MAX(COL1)

    FROM TAB2 C

    WHERE TAB1.COL3 = C.COL3

    AND TAB1.COL4 = C.COL4)

    Although query is serving the requirement I have but I suspect there may be a better way to structure this query. Any suggestions?

  • What does your execution plan look like? Based on what I can see of the criteria I don't really see a better way. The only thing you might want to do is create a derived table in the from instead of a correlated subquery in the where. Something like this:

    [font="Courier New"]UPDATE TAB1

       SET TAB1.COL1 = TAB2.COL1,

           TAB1.COL2  = TAB2.COL2

    FROM    

       TAB1 JOIN

       TAB2 B ON

           TAB1.COL3 = TAB2.COL3 AND

           TAB1.COL4 = TAB2.COL4 JOIN

       (SELECT  

           MAX(COL1) AS COL1

       FROM    

           TAB2 C

       WHERE  

           TAB1.COL3 = C.COL3 AND

           TAB1.COL4 = C.COL4) AS D ON

           B.Col1 = D.Col1

    [/font]

    As always test both, checking the plans and stats.

  • I think Jack meant something like:

    UPDATE T1

    SET Col1 = T2.Col1

    &nbsp&nbsp&nbsp&nbsp,Col2 = T2.Col2

    FROM Tab1 T1

    &nbsp&nbsp&nbsp&nbspJOIN Tab2 T2

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspON T1.Col3 = T2.Col3

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspAND T1.Col4 = T2.Col4

    &nbsp&nbsp&nbsp&nbspJOIN

    &nbsp&nbsp&nbsp&nbsp(

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspSELECT C.Col3, C.Col4, MAX(C.Col1) AS Col1

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspFROM Tab2 C

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspGROUP BY C.Col3, C.Col4

    &nbsp&nbsp&nbsp&nbsp) D

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspON T2.Col3 = D.Col3

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspAND T2.Col4 = D.Col4

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspAND T2.Col1 = D.Col1

    I think, however, that the TSQL UPDATE syntax is dangerous in this instance.

    If there are multiple rows of in TAB2 containing the same Col1, Col3 and Col4 values with

    different values in Col2 then a random last value of Col2 will be updated.

    To get around this, either the value of Col2 needs to be defined:

    UPDATE T1

    SET Col1 = T2.Col1

    &nbsp&nbsp&nbsp&nbsp,Col2 = T2.Col2

    FROM Tab1 T1

    &nbsp&nbsp&nbsp&nbspJOIN Tab2 T2

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspON T1.Col3 = T2.Col3

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspAND T1.Col4 = T2.Col4

    &nbsp&nbsp&nbsp&nbspJOIN

    &nbsp&nbsp&nbsp&nbsp(

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspSELECT C.Col3, C.Col4, MAX(C.Col1) AS Col1

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspFROM Tab2 C

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspGROUP BY C.Col3, C.Col4

    &nbsp&nbsp&nbsp&nbsp) D

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspON T2.Col3 = D.Col3

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspAND T2.Col4 = D.Col4

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspAND T2.Col1 = D.Col1

    &nbsp&nbsp&nbsp&nbspJOIN

    &nbsp&nbsp&nbsp&nbsp(

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspSELECT C2.Col3, C2.Col4, C2.Col1, Max(C2.col2) AS Col2

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspFROM Tab2 C2

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspGROUP BY C2.Col3, C2.Col4, C2.Col1

    &nbsp&nbsp&nbsp&nbsp) D2

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspON D.Col3 = D2.Col3

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspAND D.Col4 = D2.Col4

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspAND D.Col1 = D2.Col1

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspAND T2.Col2 = D2.Col2

    or the ANSI UPDATE syntax should be used to report an error if there is more than one value of Col2

    UPDATE Tab1

    SET Col1 =

    &nbsp&nbsp&nbsp&nbsp(

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspSELECT T21.Col1

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspFROM Tab2 T21

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspWHERE T21.Col3 = Tab1.Col3

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspAND T21.Col4 = Tab1.Col4

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspAND T21.Col1 =

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp(

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspSELECT MAX(T211.Col1)

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspFROM Tab2 T211

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspWHERE T211.Col3 = T21.Col3

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspAND T211.Col4 = T21.Col4

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp)

    &nbsp&nbsp&nbsp&nbsp)

    &nbsp&nbsp&nbsp&nbsp,Col2 =

    &nbsp&nbsp&nbsp&nbsp(

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspSELECT T21.Col2

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspFROM Tab2 T21

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspWHERE T21.Col3 = Tab1.Col3

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspAND T21.Col4 = Tab1.Col4

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspAND T21.Col1 =

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp(

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspSELECT MAX(T211.Col1)

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspFROM Tab2 T211

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspWHERE T211.Col3 = T21.Col3

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspAND T211.Col4 = T21.Col4

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp)

    &nbsp&nbsp&nbsp&nbsp)

    WHERE EXISTS

    (

    &nbsp&nbsp&nbsp&nbspSELECT *

    &nbsp&nbsp&nbsp&nbspFROM Tab2 T2

    &nbsp&nbsp&nbsp&nbspWHERE T2.Col3 = Tab1.Col3

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspAND T2.Col4 = Tab1.Col4

    )

    If the ANSI syntax looks a bit scary, then you could try a hybrid syntax:

    UPDATE T1

    SET Col1 = T2.Col1

    &nbsp&nbsp&nbsp&nbsp,Col2 =

    &nbsp&nbsp&nbsp&nbsp(

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspSELECT T21.Col2

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspFROM Tab2 T21

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspWHERE T21.Col3 = D.Col3

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspAND T21.Col4 = D.Col4

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspAND T21.Col1 = D.Col1

    &nbsp&nbsp&nbsp&nbsp)

    FROM Tab1 T1

    &nbsp&nbsp&nbsp&nbspJOIN Tab2 T2

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspON T1.Col3 = T2.Col3

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspAND T1.Col4 = T2.Col4

    &nbsp&nbsp&nbsp&nbspJOIN

    &nbsp&nbsp&nbsp&nbsp(

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspSELECT C.Col3, C.Col4, MAX(C.Col1) AS Col1

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspFROM Tab2 C

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspGROUP BY C.Col3, C.Col4

    &nbsp&nbsp&nbsp&nbsp) D

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspON T2.Col3 = D.Col3

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspAND T2.Col4 = D.Col4

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspAND T2.Col1 = D.Col1

  • Thanks for the suggestions and the comment about possible multiple values of Col2 for Tab2. I will try different structures and check which execution plan looks better.

  • The execution plan for my original query and the derived table query is exactly the same with 50%-50% cost. I am sure that if I add the complexity for multiple values for Tab2.Col2 the execution plan will change.

    I have following non-clustered indexes on tables, which the execution plan is using:

    CREATE INDEX [IX_Tab2_01] ON [dbo].[Tab2]

    (

    [Col3] ASC,

    [Col4] ASC,

    [Col1] ASC

    )

    INCLUDE

    (

    [Col2]

    )

    CREATE INDEX [IX_Tab1_01] ON [dbo].[Tab1]

    (

    [Col3] ASC,

    [Col4] ASC

    )

  • please help me solve this problem:

    ALTER proc [dbo].[S_RptCustProd]

    @StartPrd varchar(6),

    @EndPrd varchar(6),

    @Class varchar(15),

    @Tipe varchar(1)

    as

    if exists(select * from tempdb.dbo.sysobjects where id=object_id('tempdb.dbo.#summary'))

    drop table #summary

    create table #summary (

    CustNo varchar(20) null,

    CustName varchar(100) null,

    Industry varchar(50) null

    )

    declare @Start int, @End int, @ColName varchar(50), @StartStr varchar(50), @InvoicePajak varchar(50)

    set @Start = left(@StartPrd,4)

    set @End = left(@EndPrd,4)

    insert into #summary (CustNo, CustName, Industry)

    select distinct B.CustNo, B.CustName, B.Industry

    from SalesData A

    left outer join MsCustomer B on A.CustNo = B.CustNo

    where A.InvoiceDate between @StartPrd+'01' and dateadd(dd,-1,dateadd(mm,1,@EndPrd+'01'))

    and A.ItemClass like @Class + '%'

    and A.CustNo like @Tipe + '%'

    while @Start <= @End begin

    set @ColName = 'C' + convert(varchar,@Start)

    set @StartStr = convert(varchar,@Start)

    exec ('alter table #summary add ' + @ColName + ' numeric(17,2) null')

    exec ('update #summary set ' + @ColName + ' = isnull(' +

    '(select sum(isnull(OriPrice,0)-isnull(OriDisc,0)-isnull(OriOutSource,0)- isnull((select sum(OriPPh) from PPh where InvoicePajak = '''+@InvoicePajak+''' and Period between '''+@StartPrd+''' and '''+@EndPrd+'''),0)) ' +

    ' from SalesData B ' +

    ' where #summary.CustName = B.Customer ' +

    ' and B.InvoiceDate between ''' + @StartPrd + '01'' and dateadd(dd,-1,dateadd(mm,1,''' + @EndPrd + '01'')) ' +

    ' and year(B.InvoiceDate) = ' + @StartStr + ' ' +

    ' and B.ItemClass like ''' + @Class + '%'' ' +

    ' and B.CustNo like '''+ @Tipe + '%''),0) ') /*+

    ' group by Customer),0) ')*/

    set @Start = @Start + 1

    end

    select * from #summary

    i use that procedure but there is an error like this:

    Msg 130, Level 15, State 1, Line 1

    Cannot perform an aggregate function on an expression containing an aggregate or a subquery.

    please help me.... thx...

  • KB (8/21/2008)


    The execution plan for my original query and the derived table query is exactly the same with 50%-50% cost.

    Ummmm.... never trust the "cost" of a batch in the execution plan. Not only is it frequently wrong, but sometimes it's downright misleading. Here's an example of code where one part takes 0% and the other takes 100%... but not in real life. If you actually measure the parts with statistics on, the reverse is true...

    SET NOCOUNT ON

    --=======================================================================================

    -- Recursive method shown by (Name with-held)

    --=======================================================================================

    PRINT '========== Recursive method =========='

    --===== Turn on some performance counters ===============================================

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    DECLARE @Bitbucket DATETIME --Holds display output so display times aren't measured.

    --===== Execute the code being tested ===================================================

    DECLARE @DateVal DATETIME

    SET @DateVal = '2008-01-01'

    ;with mycte as

    (

    select @DateVal AS DateVal

    union all

    select DateVal + 1

    from mycte

    where DateVal + 1 < DATEADD(yy, 5, @DateVal)

    )

    select @Bitbucket = d.dateval

    from mycte d

    OPTION (MAXRECURSION 0)

    --===== Turn off the performance counters and print a separator =========================

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',90)

    GO

    --=======================================================================================

    -- Tally table method by Jeff Moden

    --=======================================================================================

    PRINT '========== Tally table method =========='

    --===== Turn on some performance counters ===============================================

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    DECLARE @Bitbucket DATETIME --Holds display output so display times aren't measured.

    --===== Execute the code being tested ===================================================

    DECLARE @StartDate AS DATETIME

    SET @StartDate = '2008-01-01'

    SELECT TOP (DATEDIFF(dd,@StartDate,DATEADD(yy,5,@StartDate)))

    @Bitbucket = @StartDate-1+t.N

    FROM Tally t

    ORDER BY N

    --===== Turn off the performance counters and print a separator =========================

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',90)

    ... and if you don't have a Tally table yet, now's the time...

    http://www.sqlservercentral.com/articles/TSQL/62867/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • pakaw_man (8/24/2008)


    please help me solve this problem:

    I'm off to bed just now... but I have a suggestion for you... start your own thread for this problem. It'll get more attention, that way. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply