Advnaced Grouping Help Needed

  • I have a query where I need to return every 10th row of a particular group upto a certian amount which is different for each group. Example:

    RowID, Color, Salesman

    1,blue,bob

    2,red,bob

    3,purple,bob

    4,purple,jane

    5,blue,jane

    6,red,jane

    What I need is every 2nd row of all of the rows but only 1 for bob & 2 for jane. So the result will look like:

    2,red,bob

    4,purple,jane

    6,red,jane

    I got as far as the row count (every 2nd), but am stuck at how to limit the amount differently for each group. I appreciate any help.

    Thank You In Advance.

    JT

  • Jason,

    Is there a business reason for this?

    Todd Fifield

  • I know... My original question back to the requestor, but yes. What we are doing is an extract of sample data to be used for testing and they want so many of one group to test and more or less of others to test.

  • Not sure I understand what you're after. Try these two queries to see if they help.

    BEGIN TRAN

    --Sample Data

    SELECT RowID, Color, Salesman

    INTO #sales

    FROM (SELECT 1, 'blue', 'bob'

    UNION ALL SELECT 2, 'red', 'bob'

    UNION ALL SELECT 3, 'purple', 'bob'

    UNION ALL SELECT 4, 'purple', 'jane'

    UNION ALL SELECT 5, 'blue', 'jane'

    UNION ALL SELECT 6, 'red', 'jane') a(RowID, Color, Salesman)

    --First go

    SELECT RowID, Color, Salesman

    FROM #sales

    WHERE RowID % 2 = 0

    /*--------The above returns--------*RowID Color Salesman

    ----------- ------ --------

    2 red bob

    4 purple jane

    6 red jane

    \*---------------------------------*/

    --But you wanted to ensure that there is only ever 1 row for

    --bob and 2 for jane

    SELECT RowID, Color, Salesman

    FROM (SELECT RowID, Color, Salesman,

    ROW_NUMBER() OVER (PARTITION BY Salesman ORDER BY RowID DESC) AS rn

    FROM (SELECT RowID, Color, Salesman

    FROM #sales

    WHERE RowID % 2 = 0) innerQ ) outerQ

    WHERE rn <= 2

    /*--------The above returns--------*RowID Color Salesman

    ----------- ------ --------

    2 red bob

    4 purple jane

    6 red jane

    \*---------------------------------*/

    ROLLBACK


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Maybe this one will do?

    WITH SampleData AS

    (

    SELECT * FROM (VALUES

    (0, 'blue', 'bob'),

    (1, 'red', 'bob'),

    (3, 'purple', 'bob'),

    (4, 'purple', 'jane'),

    (5, 'blue', 'jane'),

    (7, 'red', 'jane'),

    (8, 'green', 'bob'),

    (9, 'yellow', 'bob'),

    (10, 'black', 'jane'),

    (11, 'white', 'jane')

    ) D(ID, Color, FirstName)

    ),

    NumberedSampleData AS

    (

    SELECT

    ID, Color, FirstName,

    ROW_NUMBER() OVER (ORDER BY ID) RowNum

    FROM

    SampleData

    ),

    FilteredSampleData AS

    (

    SELECT

    ID, Color, FirstName, RowNum,

    ROW_NUMBER() OVER (PARTITION BY FirstName ORDER BY RowNum) FirstNameRowNum

    FROM

    NumberedSampleData

    WHERE

    RowNum % 2 = 0

    )

    SELECT

    *

    FROM

    FilteredSampleData

    WHERE

    FirstNameRowNum <= CASE FirstName WHEN 'Bob' THEN 1 WHEN 'Jane' THEN 2 ELSE FirstNameRowNum END

    ORDER BY

    ID

    Assumptions:

    - The n-th row is determined by the order of ID's

    - ID's do not form a consecutive sequence

    EDIT: Oops, I missed the column headers in the original sample data, so I used my own one. If the second assumption is not valid, the numbering of rows in NumberedSampleData can be eliminated. Also, if there are a lot of groups to filter on, maybe it's better to put these groups in a separate table together with the number of rows to filter on, rather than hard-coding the number of rows in the WHERE clause.

  • If you're looking for a fixed percentage of random rows from each partition (salesperson), have you considered using NTILE?

    SELECT

    *

    FROM (

    SELECT *, NTILE(10) OVER(PARTITION BY Salesman ORDER BY NEWID()) AS [Tenth part]

    FROM MyTable

    ) d

    WHERE [Tenth part] = 1

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thank you everyone for the input. I greatly appreciate your time and effort. Each one of the postings gave me a different way to run with this. for the N-th row part of the query I am already using the ROW_NUMBER () OVER function and is working great. But I am going to try performing "UNIONS" with the different criteria. Luckily there is only about 9 groups, but I am dealing with millions of rows and therefore the N-th row must be duplicated for each group. I will post more when I have a solid solution in place.

    Thanks again for the help.

    JT

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply