Self Join

  • I'm working on a report. I need the details, and summary from one table basically. I'm using a self join query. Doesn't perform real well, but it certainly does what I need. Of course the redundant data is poor database design, but this is just a report. So the report is based on a "select" like the one at the end of the code. Is there a better way to do this sort of thing? Here's the code:

    declare @transactions table (

    TranId int not null primary key identity(1,1)

    ,TranCode varchar(4) not null

    ,TranDescription varchar(30) null

    ,TranAmount money

    )

    insert into @transactions(TranCode, TranDescription, TranAmount)

    values('100', 'REST', 100.00)

    insert into @transactions(TranCode, TranDescription, TranAmount)

    values('100', 'REST', 100.00)

    insert into @transactions(TranCode, TranDescription, TranAmount)

    values('100', 'REST', 100.00)

    insert into @transactions(TranCode, TranDescription, TranAmount)

    values('100', 'REST', 100.00)

    insert into @transactions(TranCode, TranDescription, TranAmount)

    values('200', 'BAR', 200.00)

    insert into @transactions(TranCode, TranDescription, TranAmount)

    values('200', 'BAR', 120.00)

    insert into @transactions(TranCode, TranDescription, TranAmount)

    values('200', 'BAR', 160.00)

    insert into @transactions(TranCode, TranDescription, TranAmount)

    values('200', 'BAR', 9.00)

    insert into @transactions(TranCode, TranDescription, TranAmount)

    values('200', 'BAR', 45.00)

    insert into @transactions(TranCode, TranDescription, TranAmount)

    values('300', 'GIFT', 78.00)

    insert into @transactions(TranCode, TranDescription, TranAmount)

    values('300', 'GIFT', 82.00)

    insert into @transactions(TranCode, TranDescription, TranAmount)

    values('300', 'GIFT', 35.00)

    insert into @transactions(TranCode, TranDescription, TranAmount)

    values('300', 'GIFT', 100.00)

    insert into @transactions(TranCode, TranDescription, TranAmount)

    values('300', 'GIFT', 120.00)

    insert into @transactions(TranCode, TranDescription, TranAmount)

    values('300', 'GIFT', 72.00)

    insert into @transactions(TranCode, TranDescription, TranAmount)

    values('300', 'GIFT', 66.00)

    select t.TranCode, t.TranDescription, t.TranDescription, TranTotals.TranCodeTotal

    from @transactions t

    left join (

    select TranCode, SUM(TranAmount) as TranCodeTotal

    from @transactions

    group by TranCode

    ) as TranTotals on t.TranCode = TranTotals.TranCode

    order by TranCode

    .

  • You could use a SUM Window function as an alternative...

    select

    TranCode

    ,TranDescription

    ,TranDescription

    ,sum(TranAmount) over (partition by TranCode) as TranCodeTotal

    from @transactions

    order by TranCode

  • Thanks for the alternative. Hmm, sure is cleaner syntax wise. Interesting thing about it is that if I understand the execution plan correctly, it takes twice as long. Rather shocking! I would expect it to be virtually identical.

    When I look at the execution plan, the self join takes %4 of the total batch. The "window" version takes 8% of the batch. I'm not realy great at reading execution plans, but that seems like twice as long.

    .

  • For this kind of analysis, you either have to look closely on the execution plan, or have a relevant amount of test data. If you look at the execution plan, you'll see that the self join has two clustered index scans against your table. While this may be more efficient with a few rows, it is definitely not the most efficient when you get millions or billions of rows. So, for larger amounts of data, I do believe that the windowing approach is more efficient.



    Ole Kristian Velstadbråten Bangås - Virinco - Facebook - Twitter

    Concatenating Row Values in Transact-SQL[/url]

  • Makes good sense. I'll have to study the plan. I would think the window and the self join end up with the same plan. Probably a good lesson to be learned here.

    Thanks and Happy Holidays.

    .

  • Don't be fooled by the % of total batch numbers in the execution plan. 8% of a batch that runs in 20 ms is much better than 4% of a batch that runs in 500 ms. You have to play these numbers against CPU and logical reads.

    You should play the execution plan against the numbers you see from SET STATISTICS IO, TIME ON.

    Todd Fifield

  • I agree with Todd, the cost percentages by themselves aren't that useful. You'll have to compare IO, cpu, and duration between the two queries with varying amounts of data to see which is the more efficient for your purposes.

  • Costs in execution plans are, by and large, junk data. Use the I/O and time stats instead.

    Another option would be to Outer Apply the sub-query, but that's likely to get the same execution plan as the Outer Join.

    Are you, in the real query, running on a table variable, or on a "real" table? That'll have a huge impact on the execution plan and the overall speed, in most cases. (Table variables reduce the "cost", but usually actually take more work and time to run. One of the reasons "costs" are largely a junk value.)

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Yea, I've come to that conclusion. I was trying to use that stat for a simple shortcut in this case. Should have known better!

    Thanks all!

    .

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply