Get list of all workers having same SET of Rates

  • Hi All,

    I have attached some test data for you that has two temp tables "#worker" and "#worker_rate".

    The issue is to find all workers who are sharing SAME SET of rate_codes.

    I'm able to get the output as "workers sharing same rate_codes", but unfortunately I could not get the list of workers sharing same SET of rate_codes. Also definition of SAME SET is not defined.

    I don't know what I'm missing. Any help will be appreciated.

  • nice job providing the sample data!

    in this case, you need to get the "sets" organized first so they can be compared.

    Someone else may have another method, but i thought using FOR XML to create a comma delimtied list would work fine.

    select distinct T1.worker_id,AllRateCodes.Codez from #worker_rate T1

    cross apply (SELECT Codez = STUFF((SELECT ',' + T2.rate_code

    from #worker_rate T2

    where T1.worker_id = T2.worker_id

    ORDER BY rate_code

    FOR XML PATH('')),1,1,'')) AllRateCodes

    now that that is organized, I'm not sure if you need more than that, since it's visualized, ort if you need to join that resultset agaisnt itself so you can compare them?

    i'm ASSUMING workerid is integers here really so i can order by and prevent duplicates(ie 1=2 and 2=1, which is repetitively redundant)

    With MyCTE

    AS

    (

    select distinct T1.worker_id,AllRateCodes.Codez from #worker_rate T1

    cross apply (SELECT Codez = STUFF((SELECT ',' + T2.rate_code

    from #worker_rate T2

    where T1.worker_id = T2.worker_id

    ORDER BY rate_code

    FOR XML PATH('')),1,1,'')) AllRateCodes

    )

    SELECT * FROM MYCTE T1 INNER JOIN MyCTE T2

    ON T1.Codez = T2.Codez

    WHERE T1.worker_id <> T2.worker_id

    AND T1.worker_id < T2.worker_id

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Here's a pure TSQL method to compare with Lowell's string-concatenation method.

    ;WITH Rates AS (SELECT worker_id, rate_code, rate_count = COUNT(*) OVER(PARTITION BY worker_id)

    FROM #worker_rate r)

    SELECT

    rate_group = DENSE_RANK() OVER(ORDER BY r1.worker_id),

    r2.worker_id

    FROM Rates r1

    INNER JOIN Rates r2

    ON r2.worker_id >= r1.worker_id

    AND r2.rate_code = r1.rate_code

    AND r2.rate_count = r1.rate_count

    GROUP BY r1.worker_id, r2.worker_id

    HAVING MAX(r1.rate_count) = COUNT(*)

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • To the OP: Please specify precisely what your expected results are. Do not describe them. List out the rows you want to see.

    I'm thinking this is a relational division problem and I'm curious which answer provided previously is correct.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Dear All,

    Thanks for your kind support. It was really not easy to write a script specially when requirement is not clear. I made a few changes in suggested queries and it gave me desired results.

    With MyCTE

    AS

    (

    select distinct T1.worker_id,AllRateCodes.SET_OF_CODES from #worker_rate T1

    cross apply

    (

    SELECT SET_OF_CODES = STUFF((SELECT ',' + T2.rate_code

    from #worker_rate T2

    where T1.worker_id = T2.worker_id

    ORDER BY rate_code

    FOR XML PATH('')),1,1,'')

    ) AllRateCodes

    )

    SELECT

    T1.WORKER_ID, T1.SET_OF_CODES, DENSE_RANK() OVER (ORDER BY T1.SET_OF_CODES) AS SNO FROM MYCTE T1 ORDER BY SET_OF_CODES

    The required output was like this:

    If worker 1 has rates a/b/c

    then we have to find other workers who have rates a/b/c.

    If worker 2 has a/b/c/d

    It should not show with worker 1 as a/b/c and with other worker as a/b/c/d.

    So worker_id will be distinct in the list.

    worker_id set_of_codes sno

    4 1,2 1

    1 1,2,3 2

    2 1,2,3 2

    3 1,2,3 2

  • Another possibility might be to use the intersect operator

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Just wondering what the grand-masters will say about a checksum solution?

    From what I know about checksums they are pretty reliable way to tell one set from the other... Any comments are welcome.

    ; WITH CTE

    as (

    SELECT worker_id,

    CHECKSUM_AGG(CHECKSUM(rate_code)) chk

    FROM #worker_rate

    GROUP BY worker_id

    )

    SELECT worker_id,

    DENSE_RANK() OVER (ORDER BY chk) SNO

    FROM CTE

  • a.myasnikov (4/8/2014)


    Just wondering what the grand-masters will say about a checksum solution?

    From what I know about checksums they are pretty reliable way to tell one set from the other... Any comments are welcome.

    ; WITH CTE

    as (

    SELECT worker_id,

    CHECKSUM_AGG(CHECKSUM(rate_code)) chk

    FROM #worker_rate

    GROUP BY worker_id

    )

    SELECT worker_id,

    DENSE_RANK() OVER (ORDER BY chk) SNO

    FROM CTE

    That would be a great idea and it definitely shows "thinking outside the box" but... CHECKSUM and CHECKSUM_AGG use a simple "Exlusive OR" (an "adder" by any other name) that allows for duplicates to occur.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I do not get the right result with the CHECKSUM suggestion. May be I am doing something wrong.

    This is the result I am getting:

    worker_id SNO

    4 1

    1 2

    2 2

    3 2

    This is the result from the XML approach:

    worker_id Codezworker_id Codez

    1 1,2,32 1,2,3

    1 1,2,33 1,2,3

    2 1,2,33 1,2,3

    The nice thing about the XML approach is that you get also the list of rate codes. IMHO it would be nice having an ordered set function for string aggregation (http://connect.microsoft.com/SQLServer/feedback/details/728969/feature-request-ordered-set-functions-within-group-clause).

    There is another solution I learned from Peter (Peso) which yield a much better performance. Here is a nice article comparing three different approaches.

    http://social.technet.microsoft.com/wiki/contents/articles/22165.t-sql-relational-division.aspx

    WITH C1 AS (

    SELECT

    worker_id,

    COUNT(*) AS cnt,

    MIN(rate_code) AS min_rc,

    MAX(rate_code) AS max_rc

    FROM

    #worker_rate

    GROUP BY

    worker_id

    )

    , C2 AS (

    SELECT

    B.worker_id,

    B.rate_code,

    A.cnt,

    A.min_rc,

    A.max_rc

    FROM

    C1 AS A

    INNER JOIN

    #worker_rate AS B

    ON A.worker_id = B.worker_id

    )

    SELECT

    A.worker_id AS lwid,

    B.worker_id AS rwid

    FROM

    C2 AS A

    INNER JOIN

    C2 AS B

    ON A.worker_id < B.worker_id

    AND A.rate_code = B.rate_code

    AND A.cnt = B.cnt

    AND A.min_rc = B.min_rc

    AND A.max_rc = B.max_rc

    GROUP BY

    A.worker_id,

    B.worker_id

    HAVING

    COUNT(*) = MIN(B.cnt);

    GO

  • May I ask, as it doesn't appear to be obvious to me at any rate, what exactly are you looking for as a final result set? Can you show us what this would look like based on your sample data?

  • article posted today

    http://www.sqlservercentral.com/articles/T-SQL/107611/

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Have a look at this article which I wrote about this question and a way of getting the results using binary flags that performs well with many rows.

    http://www.sqlservercentral.com/articles/T-SQL/107611

  • If you read the article, you should probably also look at the discussion thread:

    http://www.sqlservercentral.com/Forums/Topic1559372-3529-1.aspx?Update=1


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply