Aggregate function

  • Hey all,

    I'm trying to pull an ID number field out of a table based on grouping another ID field and getting the MAX value of another field. My SQL statement looks like this:

    select apprentice_id, agreement_id, max(start_date)

    from apprenticeshiptraining.tbl_apprentice_agreement

    group by apprentice_id

    I know this won't work because the agreement_id isn't in the group by clause, but this is the basic idea of what I need to do.

    Anybody have an idea how I can do this? Thanks so much!

    Jude

  • To know how to handle it, we need to know what agreement_id you want.

    Do you want the start date of each? Add it to the group by.

    Do you want the latest one? Grab the MAX()

    First one? Grab the MIN()

    Something else entirely? Let us know.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. πŸ˜‰

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Thanks so much for your quick reply! What I want is the agreement_id that goes with the MAX(start_date). For each apprentice_id I have several agreement_id's. I want to pull the one with the latest start_date.

    Jude

  • I think this will work:

    select

    apprentice_id,

    agreement_id

    from

    apprenticeshiptraining.tbl_apprentice_agreement AA Join

    (

    Select

    apprentice_id,

    Max(start_date) as max_date

    From

    apprenticeshiptraining.tbl_apprentice_agreement

    Group By

    apprentice_id

    ) MA

    AA.apprentice_id = MA.apprentice_id And

    AA.start_date = MA.max_date

  • Thanks Jack! That does seem to work.

    Jude

  • :hehe: Mornin Jack :hehe:

    I had an almost identical post typed out, then decided to try to get used to the new 2005 stuff that I always ignore. Here's an attempt using a CTE with Row_Number (Probably overkill for this particular query)

    ;WITH MA( Apprentice_id, Agreement_ID, RN )

    AS

    (SELECTApprentice_ID,

    Agreement_ID,

    ROW_NUMBER() OVER (PARTITION BY Apprentice_ID, Agreement_ID ORDER BY [Start_Date] DESC) RN

    FROM apprenticeshiptraining.tbl_apprentice_agreement)

    SELECT Apprentice_ID, Agreement_ID, [Start_Date]

    FROM apprenticeshiptraining.tbl_apprentice_agreement A

    INNER JOIN MA ON A.Apprentice_ID = MA.Apprentice_ID AND A.Agreement_ID = MA.Agreement_ID

    WHERE RN = 1

    See if that one works?

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. πŸ˜‰

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Thanks Seth, that works too. Is there a performance benefit to doing it your way over Jack's? Just so you know, the code will be a sub-query inside of another query.

    Jude

  • Jude, if you could provide feedback on which one is more efficient I would appreciate it. You should be able to pull some overall cost from the execution plan and if you were to put the following in prior to executing the query and post back the output that would be great too.

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    I personally have had some performance issues with the "ROW_NUMBER() OVER" function so I am very curious to see how it works in your situation.

    David

    @SQLTentmaker

    β€œHe is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

  • Q-Girl (12/26/2008)


    Thanks Seth, that works too. Is there a performance benefit to doing it your way over Jack's? Just so you know, the code will be a sub-query inside of another query.

    Jude

    They're completely different methods, so one of them is going to be more efficient. If I had to guess, I'd say Jack's would be, but I can't say for sure. Like David, I've seen some performance issues appear with Row_Number over, but usually that's only with huge number sequences. If you're going to have sets that only go up to 5 or so, it may pick up the speed quite a bit.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. πŸ˜‰

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Garadin (12/26/2008)


    :hehe: Mornin Jack :hehe:

    I had an almost identical post typed out, then decided to try to get used to the new 2005 stuff that I always ignore. Here's an attempt using a CTE with Row_Number (Probably overkill for this particular query)

    ;WITH MA( Apprentice_id, Agreement_ID, RN )

    AS

    (SELECTApprentice_ID,

    Agreement_ID,

    ROW_NUMBER() OVER (PARTITION BY Apprentice_ID, Agreement_ID ORDER BY [Start_Date] DESC) RN

    FROM apprenticeshiptraining.tbl_apprentice_agreement)

    SELECT Apprentice_ID, Agreement_ID, [Start_Date]

    FROM apprenticeshiptraining.tbl_apprentice_agreement A

    INNER JOIN MA ON A.Apprentice_ID = MA.Apprentice_ID AND A.Agreement_ID = MA.Agreement_ID

    WHERE RN = 1

    See if that one works?

    I almost used a CTE as well but would have just put the derived table into a CTE and the referenced the CTE in the join in place of the derived table.

    Jude,

    You could just put the 2 queries in SSMS and turn on show actual execution plan and see which takes more resources.

    If using it in a sub-query I'd probably go with the derived table, but without seeing the whole query I can't say for sure.

  • Hey Jack,

    How do I turn on "show actual execution plan"? I'm a bit of a newbie to SQL Server. πŸ˜€

  • A few things

    1. To turn on actual execution plan, go to the Query option in the menu and select 'Include actual execution plan.'

    Alternatively, hit CTRL + M.

    2. I believe my query is actually wrong (You may have corrected it already). I don't think I should be using Agreement_ID in the PARTITION BY.

    3. Knowing your data is important here, as the derived table and CTE query are actually different in more than method. With the way that derived table is written, if you have more than 1 appointment that has the MAX(date), you will get multiple returns. I think there may be another problem with the Row_Number solution as well, I just haven't figured out what it is yet.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. πŸ˜‰

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Hey Seth,

    As I am in a bit of a time crunch here, I'm going with Jack's solution. I really appreciate all the advice but I don't have time to play with it anymore. Have a great day!!!

    Jude

  • CTRL + L will show an estimated Execution plan. CTRL + M will show the actual execution plan when you run the query.

    There are some videos on Execution plans at jumpstarttv.com. A good starter video is here.

  • Thanks Jack!

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply