Top N and group by query...

  • swjohnson

    Hall of Fame

    Points: 3254

    Ok, my brain is fried and need some extra eyes and brains....

    I have a table with three fields

    EmailID (int and identity)

    ProjID (int)

    EmailStatus (int)

    Ok, lets says that in the table are 3000 records for ProjID 1454, 50 records for ProjID 1455, and 2 records for ProjID 1456 (all have the same status =100)

    What I need to do in a set based solution (I can accomplish it in a cursor or temp table format but want something faster...) is Select the Top 200 records for each ProjID where emailStatus=100. 

    I have been trying varying combinations of Top N and Group by to no avail.  Basically the query should return 200 records for ProjID 1454, the 50 records for 1455 and the 2 records for 1456. 

    Again, I have code to do this via a cursor and temp tables but want to see about a set based solution.  (We are on SQL 2000) 

    Any thoughts?

    Thanks

    SJ

  • Ram Ram

    Ten Centuries

    Points: 1079

    Try this out

    DECLARE @Rows INT

    SELECT  @Rows = 200

    -- Replace @tbl with your actual tablename

     

    SELECT A.*

    FROM @tbl AS A

     LEFT JOIN  (

       SELECT  ProjId,

        MIN(EmailId) AS MinId,

        MAX(EmailId) AS MaxId,

        COUNT(ProjId) AS RCnt

       FROM @tbl

       GROUP BY ProjId

      &nbsp AS Derived

      ON A.ProjId = Derived.ProjId

    WHERE A.EmailId BETWEEN Derived.MinId AND Derived.MinId + @Rows-1

    Ram

  • Andrew in WV

    SSCommitted

    Points: 1655

    Ram's solution works as long as all values of EmailID for a particular project are contiguous. I think this will work regardless:

    SELECT EmailID, ProjID, EmailStatus
    FROM <TableName>
    WHERE EmailID IN (
    	SELECT TOP 200 EmailID
    	FROM <TableName> IT
    	WHERE <TableName>.ProjID = IT.ProjID
    	  AND EmailStatus = 100
    )
    ORDER BY ProjID, EmailStatus, EmailID
    

    --Andrew

  • swjohnson

    Hall of Fame

    Points: 3254

    Thanks to both you and Ramamoorthy.  After a break and some more tinkering I was able to come up with something similar to Andrew's query.  However, both work for what I need but Ramamoorthy's is faster in my experiments and the execution plans are quite different.  Especially when I use it on my large recordsets (over 100,000 records pending) and it is not dependent upon consecutive/contiguous EmailIDs. 

    Thanks again!

    SJ

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply