Permutations (difficult one)

Question

Permutations (difficult one)

oliveraustin

SSC Veteran

Points: 236
More actions
January 27, 2014 at 10:19 am

#303499

Hi,
We're tyring to find the best way to solve the following, hope you can help please.
We have a fairly large table of users (2500+) and the users belong to 1 or more groups (at the moment the max number of groups a user is in is 32 but it changes daily).
A sample data set looks like:
Usr | Grp
==============
SmithA | GrpA
SmithA | GrpB
MouseM | GrpD
MouseM | GrpB
MouseM | GrpC
The result set we need would be:
SmithA | GrpA
SmithA | GrpB
SmithA | GrpA,GrpB
MouseM | GrpB
MouseM | GrpC
MouseM | GrpD
MouseM | GrpB,GrpC
MouseM | GrpB,GrpD
MouseM | GrpC,GrpD
MouseM | GrpB,GrpC,GrpD
Note the alphabetical order of the grps - it's not particularly imporant but if they aren't sorted alphabetical then we'd need all combinations.
Any ideas please?

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Sean Lange SSC Guru Points: 286573 More actions · Answer 1

What a truly bizarre result set you need. You have to return the data in three different types of ways?

Can you post some ddl and sample data? I think you will need three different queries here, one for each "type" of result.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

pietlinden SSC Guru Points: 63425 More actions · Answer 2

I saw the first part and thought... oh, use STUFF...

and then I saw the odd requirement (show all permutations)

Here's the sample data part I did from the example... since he's new and all...

-- CREATE TABLES

CREATE TABLE #Person(

PersonID INT IDENTITY,

UserID VARCHAR(15) NOT NULL,

CONSTRAINT pkPerson PRIMARY KEY (PersonID));

CREATE TABLE #Groups(

GroupName VARCHAR(4) PRIMARY KEY );

GO

-- ADD DATA

INSERT INTO #Person(UserID) VALUES ('SmithA'),('MouseM');

INSERT INTO #Groups(GroupName) VALUES ('GrpA'),('GrpB'),('GrpC'),('GrpD');

As you'll note in several people's signatures - the best way to get a quick answer, or ANY answer - is to make it as easy as possible for people to recreate your problem. This includes CREATE TABLE and INSERT statements to at least recreate your situation.

For a newbie, nice job! But sample tables and data are a HUGE help.

Dwain Camps SSC Guru Points: 86908 More actions · Answer 3

Take a look at this article:

Generating n-Tuples with SQL[/url]

Then look into the discussion thread for a slight performance improvement on the approach:

http://www.sqlservercentral.com/Forums/Topic1301485-3122-5.aspx

Coding up the improved approach with your sample data yields this:

WITH SampleData (Usr, Grp) AS

(

SELECT 'SmithA','GrpA'

UNION ALL SELECT 'SmithA','GrpB'

UNION ALL SELECT 'MouseM','GrpD'

UNION ALL SELECT 'MouseM','GrpB'

UNION ALL SELECT 'MouseM','GrpC'

),

UNIQUEnTuples (n, Usr, Tuples, ID) AS

(

SELECT 1, Usr, CAST(Grp AS VARCHAR(8000)), Grp

FROM SampleData

UNION ALL

SELECT 1 + n.n, n.Usr, t.Grp + ',' + n.Tuples, Grp

FROM UNIQUEnTuples n

CROSS APPLY

(

SELECT Grp

FROM SampleData t

WHERE t.Grp < n.ID AND t.Usr = n.Usr

) t

-- WHERE n <= 5

)

SELECT *

FROM UNIQUEnTuples

ORDER BY Usr, n;

This has the potential to run like a pig if you really do have 32 groups possible for a Usr. You might be able to improve the performance by running the anchor leg of the query first to a Temp table and then using a WHILE loop to implement the recursive leg.

If you're not sure how to do this, let me know.

Edit: I added the commented WHERE clause, because you might want to run it with that WHERE clause uncommented the first time you try with your 32 groups, just to see how long it takes.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Dwain Camps SSC Guru Points: 86908 More actions · Answer 4

pietlinden (1/27/2014)
...
and then I saw the odd requirement (show all permutations)
...

Technically what the OP is looking for is combinations (not permutations).

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

oliveraustin SSC Veteran Points: 236 More actions · Answer 5

First of all that's amazing, it give exactly the required output. Thank you very much. Currently I'm really struggling to see how it works but I'll try and disect this evening.

There's a small problem though, if I run this on my real data, it runs out of memory after 30 minutes.

There are 474 users who belong to on average 6 groups (max 32).

The average length of a group name is 20 characters (max 47).

In the interests of being able to get the task nailed today (the business are badgering me) is there something we can do to optimise?

I've created a table of the 88 distinct groups with a int column 1-88 indexing them.

Every row in the user table now has an extra int column containing this GrpID.

Is that helpful and how would it change the query pls?

Dwain Camps SSC Guru Points: 86908 More actions · Answer 6

I would bet it's not memory that's the issue. It's row counts.

Consider the case where you said you have one user in 32 groups. That user alone would generate POWER(2,32)-1 rows (working from memory here so check the article to be sure). That's a lot of rows.

You can try the loop I suggested but it will still have to deal with ridiculous numbers of rows.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 7

dwain.c (1/28/2014)
That user alone would generate POWER(2,32)-1 rows (working from memory here so check the article to be sure). That's a lot of rows.

Amusingly:

select power(2 ,32)

Msg 232, Level 16, State 3, Line 1

Arithmetic overflow error for type int, value = 4294967296.000000.

Besides, if I remember these formulas right, it's 32P31.

Out of curiousity, what is the business requirement causing this technical requirement? Besides functionally painful, it's certainly nearly impossible for an end user to use at 32 groups crosswired into combination sets, so it strikes me as a midstep. What's the end goal from this source data?

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

oliveraustin SSC Veteran Points: 236 More actions · Answer 8

Ah ha, thought how to reduce the problem size down to 12 groups 🙂

Since the Roles with multiple groups consist of <=12 groups all I need to to do is filter the LDAP data and reduce it those entries that are in the set of groups that are part of multi group roles!...yippeee

Original problem is

A role consists of between 1 and x groups from a set of x groups.

A role can only have a particular group once.

A user may have more than one role

Example:

Role | Group

Helpdesk | Password Reset

Security | Password Reset,

Security | Enable User

Reception | Enable User

Reception | Disable User

The data I have is of the form

User | Group

SmithA | Password Reset

SmithA | Enable User

BloggsJ | Password Reset

MouseM | Enable User

MouseM | Disable User

From that I need to deduce the users have the following roles

User | Role

SmithA | Security

BloggsJ | Helpdesk

MouseM | Reception

Thanks

Dwain Camps SSC Guru Points: 86908 More actions · Answer 9

Evil Kraig F (1/28/2014)
dwain.c (1/28/2014)
That user alone would generate POWER(2,32)-1 rows (working from memory here so check the article to be sure). That's a lot of rows.
Amusingly:
select power(2 ,32)
Msg 232, Level 16, State 3, Line 1
Arithmetic overflow error for type int, value = 4294967296.000000.
...

Indeed, but this works:

SELECT POWER(CAST(2 AS BIGINT) ,32)

-- Result: 4294967296

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Dwain Camps SSC Guru Points: 86908 More actions · Answer 10

oliveraustin (1/28/2014)
Ah ha, thought how to reduce the problem size down to 12 groups 🙂
Since the Roles with multiple groups consist of<=12 groups and the rest of the all I need to to do is filter the LDAP data and reduce it those entries that are in the set of groups that are part of multi group roles!...yippeee
Original problem is
A role consists of between 1 and x groups from a set of x groups.
A role can only have a particular group once.
A user may have more than one role
Example:
Role | Group
Helpdesk | Password Reset
Security | Password Reset,
Security | Enable User
Reception | Enable User
Reception | Disable User
The data I have is of the form
User | Group
SmithA | Password Reset
SmithA | Enable User
BloggsJ | Password Reset
MouseM | Enable User
MouseM | Disable User
From that I need to deduce the users have the following roles
User | Role
SmithA | Security
BloggsJ | Helpdesk
MouseM | Reception
Thanks

So does this mean you have a suitable solution now?

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

mickyT SSChampion Points: 10360 More actions · Answer 11

oliveraustin (1/28/2014)
Ah ha, thought how to reduce the problem size down to 12 groups 🙂
Since the Roles with multiple groups consist of<=12 groups and the rest of the all I need to to do is filter the LDAP data and reduce it those entries that are in the set of groups that are part of multi group roles!...yippeee
Original problem is
A role consists of between 1 and x groups from a set of x groups.
A role can only have a particular group once.
A user may have more than one role
Example:
Role | Group
Helpdesk | Password Reset
Security | Password Reset,
Security | Enable User
Reception | Enable User
Reception | Disable User
The data I have is of the form
User | Group
SmithA | Password Reset
SmithA | Enable User
BloggsJ | Password Reset
MouseM | Enable User
MouseM | Disable User
From that I need to deduce the users have the following roles
User | Role
SmithA | Security
BloggsJ | Helpdesk
MouseM | Reception
Thanks

I might be barking up the wrong tree completely here, but this may do what you want

with RoleGroup as (

SELECT *

FROM (VALUES

('Helpdesk' , 'Password Reset'),

('Security' , 'Password Reset'),

('Security', 'Enable User'),

('Reception' , 'Enable User'),

('Reception' , 'Disable User')

) rg(Role, Grp)

),

UserGroup as (

SELECT *

FROM (VALUES

('SmithA' , 'Password Reset'),

('SmithA' , 'Enable User'),

('BloggsJ' , 'Password Reset'),

('MouseM' , 'Enable User'),

('MouseM' , 'Disable User')

) ug(Usr, Grp)

),

potentialRoles AS (

SELECT usr, role, ug.grp

FROM (SELECT usr, grp, COUNT(*) OVER (PARTITION BY usr) gCount, ROW_NUMBER() OVER (PARTITION BY Usr ORDER BY grp) N FROM UserGroup) ug

INNER JOIN (SELECT role, grp, COUNT(*) OVER (PARTITION BY role) gCount, ROW_NUMBER() OVER (PARTITION BY role ORDER BY grp) N FROM RoleGroup) rg

ON ug.grp = rg.grp and ug.N = rg.N and ug.gCount = rg.gCount

)

SELECT usr, role

FROM potentialRoles

GROUP BY usr, role;