Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 1234»»»

All Possible Combinations Loop Expand / Collapse
Author
Message
Posted Thursday, February 07, 2013 11:15 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, April 23, 2013 10:40 PM
Points: 22, Visits: 45
Hi,

I am trying to build something that combines all values of all variables using SQL Server. For Example:

A B C D
A1 B1 C1 D1
A2 B2 C2
A3 C3
A4 C4
C5

A1
B1
C1
D1
A1 B1
A1 B2
A1 B1 C1 D1 and so on..

Basically making all possible filter combinations on variables. I have been a little bit successful but the problem is with storage because the combinations can be so many that they are running me out of storage space.

I wanted your help in trying to make a SQL procedure to make all these filter combinations but more importantly comparing each combination as soon as they are generated with another table on a criteria to check performance before deciding to save or delete them.

This is my code so far:

DECLARE @i INT  
SET @i = 0
WHILE (@i < 15)
BEGIN
SET @i = @i + 1
EXEC ('CREATE TABLE VAR' + @i + '(VAR' + @i + ' VARCHAR(50))')
END

INSERT INTO VAR1 (VAR1) SELECT DISTINCT VAR1 FROM [Training Data]
DECLARE @i INT
SET @i = 0
WHILE (@i < 15)
BEGIN
SET @i = @i + 1
EXEC('INSERT INTO VAR' + @i + ' (VAR' + @i + ')' + ' SELECT DISTINCT VAR' + @i + ' FROM [Training Data]')
END

CREATE TABLE ALL_COMB (VAR1 VARCHAR(50), VAR2 VARCHAR(50), VAR3 VARCHAR(50), VAR4 VARCHAR(50), VAR5 VARCHAR(50), VAR6 VARCHAR(50), VAR7 VARCHAR(50), VAR8 VARCHAR(50), VAR9 VARCHAR(50), VAR10 VARCHAR(50), VAR11 VARCHAR(50), VAR12 VARCHAR(50), VAR13 VARCHAR(50), VAR14 VARCHAR(50), VAR15 VARCHAR(50))
INSERT INTO ALL_COMB (VAR1, VAR2, VAR3, VAR4, VAR5, VAR6, VAR7, VAR8, VAR9, VAR10, VAR11, VAR12, VAR13, VAR14, VAR15) SELECT * FROM VAR1, VAR2, VAR3, VAR4, VAR5, VAR6, VAR7, VAR8, VAR9, VAR10, VAR11, VAR12, VAR13, VAR14, VAR15
ORDER BY VAR1, VAR2, VAR3, VAR4, VAR5, VAR6, VAR7, VAR8, VAR9, VAR10, VAR11, VAR12, VAR13, VAR14, VAR15;

OR CAN I ATLEAST LIMIT THE OBSERVATIONS GOING INTO ALL_COMB BY STARTING OBSERVATION AND ENDING OBSERVATION.

Thanks in advance for the help.

-Akber.
Post #1417264
Posted Thursday, February 07, 2013 12:04 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Yesterday @ 1:17 PM
Points: 8,641, Visits: 8,273
Hi and welcome to SSC! It is really hard to figure out what you are trying to do here based on your description. Can you please try to reword what you are trying to do. It would probably be a good idea to post ddl and sample data. You can read about best practices for posting questions by reading the article found at the first link in my signature.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Post #1417287
Posted Thursday, February 07, 2013 12:16 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, April 23, 2013 10:40 PM
Points: 22, Visits: 45
Ok.

What I am trying to do is to filter a dataset on all possible combinations.

My code does create combinations in a separate dataset but I am unable to completely run it because of the size which will run up to 800 TB and I dont have that space.

Is there a way to slow SQL down and generate these combination one by one with a condition that checks performance of each combination against a criteria specified?

Thanks for trying to help!
Post #1417300
Posted Thursday, February 07, 2013 12:23 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Yesterday @ 1:17 PM
Points: 8,641, Visits: 8,273
akberali67 (2/7/2013)
Ok.

What I am trying to do is to filter a dataset on all possible combinations.

My code does create combinations in a separate dataset but I am unable to completely run it because of the size which will run up to 800 TB and I dont have that space.

Is there a way to slow SQL down and generate these combination one by one with a condition that checks performance of each combination against a criteria specified?

Thanks for trying to help!


You are familiar with your project but I am not. I can see your code that is creating some tables and then inserting data into them. I get what you are saying about disc space but I don't understand what you mean about slow sql down??? I shudder to think that you are creating 800TB of data using a loop. Does this take somewhere around 3 weeks to run? If you can clarify what you are trying to do I am sure there is a way to do whatever it is without needing an additional 800TB of disc space!!!


_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Post #1417305
Posted Thursday, February 07, 2013 12:33 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, April 23, 2013 10:40 PM
Points: 22, Visits: 45
The code ran for about a day on my 3 TB partition and I got an error. My dataset is really small but the combinations increase the size. This is an example:

A B C D Outcome
A1 B1 C1 D1 1
A2 B2 C2 0
A3 C3 1
A4 C4 1
C5 0

So, like in excel I filter the data and filter on the variable A on A1, then another combination to filter would be on variable A on A2, B on B2 assuming B2 occurs multiple times. So, I am trying to make all filter combinations and check how the outcome is for them.

I hope I have explained my problem well enough.
Post #1417313
Posted Thursday, February 07, 2013 12:37 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, April 23, 2013 10:40 PM
Points: 22, Visits: 45
A,B,C,D,Outcome
A1,B1,C1,D1,1
A2,B2,C2,,0
A3,,C3,,1
A4,,C4,,1
,,C5,,0
Post #1417316
Posted Thursday, February 07, 2013 12:41 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Yesterday @ 1:17 PM
Points: 8,641, Visits: 8,273
akberali67 (2/7/2013)
The code ran for about a day on my 3 TB partition and I got an error. My dataset is really small but the combinations increase the size. This is an example:

A B C D Outcome
A1 B1 C1 D1 1
A2 B2 C2 0
A3 C3 1
A4 C4 1
C5 0

So, like in excel I filter the data and filter on the variable A on A1, then another combination to filter would be on variable A on A2, B on B2 assuming B2 occurs multiple times. So, I am trying to make all filter combinations and check how the outcome is for them.

I hope I have explained my problem well enough.


Not to be rude but you haven't explained your problem at all. If you look back at what you posted you will notice there are no details to someone who is not familiar with your project.

What does the base table look like and why do you need every single combination of each column?

I would recommend using a cross join instead of looping. And if you filled up a 3TB partition that should be self explanatory that whatever you are trying to do here is not the best approach.


_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Post #1417318
Posted Thursday, February 07, 2013 12:43 PM


UDP Broadcaster

UDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP Broadcaster

Group: General Forum Members
Last Login: Today @ 1:06 PM
Points: 1,456, Visits: 14,281
akberali67 (2/7/2013)
A,B,C,D,Outcome
A1,B1,C1,D1,1
A2,B2,C2,,0
A3,,C3,,1
A4,,C4,,1
,,C5,,0


deleted....didnt read the OP correctly !


__________________________________________________________________
you can lead a user to data....but you cannot make them think !
__________________________________________________________________
Post #1417319
Posted Thursday, February 07, 2013 5:30 PM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Yesterday @ 7:24 PM
Points: 2,346, Visits: 3,192
I think this article will help you:
Generating n-Tuples with SQL

There's a performance improved version of the UNIQUEnTuples rCTE (the one I think you'll need) late in the discussion thread here:
http://www.sqlservercentral.com/Forums/Topic1301485-3122-5.aspx (last post on that page).

I would have worked up an example for you but I am unclear how the input data is structured (i.e., no DDL and consumable sample data to work with).



No loops! No CURSORs! No RBAR! Hoo-uh!

INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Are you too recursively challenged?
Splitting strings based on patterns can be fast!
Post #1417400
Posted Thursday, February 07, 2013 9:10 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, April 23, 2013 10:40 PM
Points: 22, Visits: 45
Hi,

This is great but the problem is that I have my data in variables (columns) and need it outputted in variables back, so that I can use them to filter another dataset. I have tried SAS, R, MATLAB over the past month but I couldnt work it out. I was close with SQL but It was very inefficient in terms of time and size. Let me share some actual data with you, though its a bit confidential, I guess random data wont hurt.

https://docs.google.com/spreadsheet/ccc?key=0Aok4OTJtBDbrdFc2WFZ5MUpHTF91NFlWeTlwZ0RLRmc&usp=sharing

So, I am trying to filter this dataset on all possible filter combinations from var1 to var15 and then check Outcome for performance in terms of %.

select sum(Outcome)/count(Outcome) as Perc, count(Outcome) as Lines from Sample_Data where var1 = 1  

var1 = 1 is one of the filter combination possible.

I hope this helps understand better. Thanks for trying to help.

-Akber.
Post #1417452
« Prev Topic | Next Topic »

Add to briefcase 1234»»»

Permissions Expand / Collapse