Permutations (difficult one)

  • Hi,

    We're tyring to find the best way to solve the following, hope you can help please.

    We have a fairly large table of users (2500+) and the users belong to 1 or more groups (at the moment the max number of groups a user is in is 32 but it changes daily).

    A sample data set looks like:

    Usr | Grp

    ==============

    SmithA | GrpA

    SmithA | GrpB

    MouseM | GrpD

    MouseM | GrpB

    MouseM | GrpC

    The result set we need would be:

    SmithA | GrpA

    SmithA | GrpB

    SmithA | GrpA,GrpB

    MouseM | GrpB

    MouseM | GrpC

    MouseM | GrpD

    MouseM | GrpB,GrpC

    MouseM | GrpB,GrpD

    MouseM | GrpC,GrpD

    MouseM | GrpB,GrpC,GrpD

    Note the alphabetical order of the grps - it's not particularly imporant but if they aren't sorted alphabetical then we'd need all combinations.

    Any ideas please?

  • What a truly bizarre result set you need. You have to return the data in three different types of ways?

    Can you post some ddl and sample data? I think you will need three different queries here, one for each "type" of result.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • I saw the first part and thought... oh, use STUFF...

    and then I saw the odd requirement (show all permutations)

    Here's the sample data part I did from the example... since he's new and all...

    -- CREATE TABLES

    CREATE TABLE #Person(

    PersonID INT IDENTITY,

    UserID VARCHAR(15) NOT NULL,

    CONSTRAINT pkPerson PRIMARY KEY (PersonID));

    CREATE TABLE #Groups(

    GroupName VARCHAR(4) PRIMARY KEY );

    GO

    -- ADD DATA

    INSERT INTO #Person(UserID) VALUES ('SmithA'),('MouseM');

    INSERT INTO #Groups(GroupName) VALUES ('GrpA'),('GrpB'),('GrpC'),('GrpD');

    As you'll note in several people's signatures - the best way to get a quick answer, or ANY answer - is to make it as easy as possible for people to recreate your problem. This includes CREATE TABLE and INSERT statements to at least recreate your situation.

    For a newbie, nice job! But sample tables and data are a HUGE help.

  • Take a look at this article:

    Generating n-Tuples with SQL[/url]

    Then look into the discussion thread for a slight performance improvement on the approach:

    http://www.sqlservercentral.com/Forums/Topic1301485-3122-5.aspx

    Coding up the improved approach with your sample data yields this:

    WITH SampleData (Usr, Grp) AS

    (

    SELECT 'SmithA','GrpA'

    UNION ALL SELECT 'SmithA','GrpB'

    UNION ALL SELECT 'MouseM','GrpD'

    UNION ALL SELECT 'MouseM','GrpB'

    UNION ALL SELECT 'MouseM','GrpC'

    ),

    UNIQUEnTuples (n, Usr, Tuples, ID) AS

    (

    SELECT 1, Usr, CAST(Grp AS VARCHAR(8000)), Grp

    FROM SampleData

    UNION ALL

    SELECT 1 + n.n, n.Usr, t.Grp + ',' + n.Tuples, Grp

    FROM UNIQUEnTuples n

    CROSS APPLY

    (

    SELECT Grp

    FROM SampleData t

    WHERE t.Grp < n.ID AND t.Usr = n.Usr

    ) t

    -- WHERE n <= 5

    )

    SELECT *

    FROM UNIQUEnTuples

    ORDER BY Usr, n;

    This has the potential to run like a pig if you really do have 32 groups possible for a Usr. You might be able to improve the performance by running the anchor leg of the query first to a Temp table and then using a WHILE loop to implement the recursive leg.

    If you're not sure how to do this, let me know.

    Edit: I added the commented WHERE clause, because you might want to run it with that WHERE clause uncommented the first time you try with your 32 groups, just to see how long it takes.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • pietlinden (1/27/2014)


    ...

    and then I saw the odd requirement (show all permutations)

    ...

    Technically what the OP is looking for is combinations (not permutations).


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • First of all that's amazing, it give exactly the required output. Thank you very much. Currently I'm really struggling to see how it works but I'll try and disect this evening.

    There's a small problem though, if I run this on my real data, it runs out of memory after 30 minutes.

    There are 474 users who belong to on average 6 groups (max 32).

    The average length of a group name is 20 characters (max 47).

    In the interests of being able to get the task nailed today (the business are badgering me) is there something we can do to optimise?

    I've created a table of the 88 distinct groups with a int column 1-88 indexing them.

    Every row in the user table now has an extra int column containing this GrpID.

    Is that helpful and how would it change the query pls?

  • I would bet it's not memory that's the issue. It's row counts.

    Consider the case where you said you have one user in 32 groups. That user alone would generate POWER(2,32)-1 rows (working from memory here so check the article to be sure). That's a lot of rows.

    You can try the loop I suggested but it will still have to deal with ridiculous numbers of rows.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (1/28/2014)


    That user alone would generate POWER(2,32)-1 rows (working from memory here so check the article to be sure). That's a lot of rows.

    Amusingly:

    select power(2 ,32)

    Msg 232, Level 16, State 3, Line 1

    Arithmetic overflow error for type int, value = 4294967296.000000.

    Besides, if I remember these formulas right, it's 32P31.

    Out of curiousity, what is the business requirement causing this technical requirement? Besides functionally painful, it's certainly nearly impossible for an end user to use at 32 groups crosswired into combination sets, so it strikes me as a midstep. What's the end goal from this source data?


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Ah ha, thought how to reduce the problem size down to 12 groups 🙂

    Since the Roles with multiple groups consist of <=12 groups all I need to to do is filter the LDAP data and reduce it those entries that are in the set of groups that are part of multi group roles!...yippeee

    Original problem is

    A role consists of between 1 and x groups from a set of x groups.

    A role can only have a particular group once.

    A user may have more than one role

    Example:

    Role | Group

    Helpdesk | Password Reset

    Security | Password Reset,

    Security | Enable User

    Reception | Enable User

    Reception | Disable User

    The data I have is of the form

    User | Group

    SmithA | Password Reset

    SmithA | Enable User

    BloggsJ | Password Reset

    MouseM | Enable User

    MouseM | Disable User

    From that I need to deduce the users have the following roles

    User | Role

    SmithA | Security

    BloggsJ | Helpdesk

    MouseM | Reception

    Thanks

  • Evil Kraig F (1/28/2014)


    dwain.c (1/28/2014)


    That user alone would generate POWER(2,32)-1 rows (working from memory here so check the article to be sure). That's a lot of rows.

    Amusingly:

    select power(2 ,32)

    Msg 232, Level 16, State 3, Line 1

    Arithmetic overflow error for type int, value = 4294967296.000000.

    ...

    Indeed, but this works:

    SELECT POWER(CAST(2 AS BIGINT) ,32)

    -- Result: 4294967296


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • oliveraustin (1/28/2014)


    Ah ha, thought how to reduce the problem size down to 12 groups 🙂

    Since the Roles with multiple groups consist of<=12 groups and the rest of the all I need to to do is filter the LDAP data and reduce it those entries that are in the set of groups that are part of multi group roles!...yippeee

    Original problem is

    A role consists of between 1 and x groups from a set of x groups.

    A role can only have a particular group once.

    A user may have more than one role

    Example:

    Role | Group

    Helpdesk | Password Reset

    Security | Password Reset,

    Security | Enable User

    Reception | Enable User

    Reception | Disable User

    The data I have is of the form

    User | Group

    SmithA | Password Reset

    SmithA | Enable User

    BloggsJ | Password Reset

    MouseM | Enable User

    MouseM | Disable User

    From that I need to deduce the users have the following roles

    User | Role

    SmithA | Security

    BloggsJ | Helpdesk

    MouseM | Reception

    Thanks

    So does this mean you have a suitable solution now?


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • oliveraustin (1/28/2014)


    Ah ha, thought how to reduce the problem size down to 12 groups 🙂

    Since the Roles with multiple groups consist of<=12 groups and the rest of the all I need to to do is filter the LDAP data and reduce it those entries that are in the set of groups that are part of multi group roles!...yippeee

    Original problem is

    A role consists of between 1 and x groups from a set of x groups.

    A role can only have a particular group once.

    A user may have more than one role

    Example:

    Role | Group

    Helpdesk | Password Reset

    Security | Password Reset,

    Security | Enable User

    Reception | Enable User

    Reception | Disable User

    The data I have is of the form

    User | Group

    SmithA | Password Reset

    SmithA | Enable User

    BloggsJ | Password Reset

    MouseM | Enable User

    MouseM | Disable User

    From that I need to deduce the users have the following roles

    User | Role

    SmithA | Security

    BloggsJ | Helpdesk

    MouseM | Reception

    Thanks

    I might be barking up the wrong tree completely here, but this may do what you want

    with RoleGroup as (

    SELECT *

    FROM (VALUES

    ('Helpdesk' , 'Password Reset'),

    ('Security' , 'Password Reset'),

    ('Security', 'Enable User'),

    ('Reception' , 'Enable User'),

    ('Reception' , 'Disable User')

    ) rg(Role, Grp)

    ),

    UserGroup as (

    SELECT *

    FROM (VALUES

    ('SmithA' , 'Password Reset'),

    ('SmithA' , 'Enable User'),

    ('BloggsJ' , 'Password Reset'),

    ('MouseM' , 'Enable User'),

    ('MouseM' , 'Disable User')

    ) ug(Usr, Grp)

    ),

    potentialRoles AS (

    SELECT usr, role, ug.grp

    FROM (SELECT usr, grp, COUNT(*) OVER (PARTITION BY usr) gCount, ROW_NUMBER() OVER (PARTITION BY Usr ORDER BY grp) N FROM UserGroup) ug

    INNER JOIN (SELECT role, grp, COUNT(*) OVER (PARTITION BY role) gCount, ROW_NUMBER() OVER (PARTITION BY role ORDER BY grp) N FROM RoleGroup) rg

    ON ug.grp = rg.grp and ug.N = rg.N and ug.gCount = rg.gCount

    )

    SELECT usr, role

    FROM potentialRoles

    GROUP BY usr, role;

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply