All Possible Combinations Loop

  • akberali67 (2/8/2013)


    Hi,

    I understood the issue, it is with the concept. Mathematically, it needs more theories (Permutations, Combinations and Selections) implemented within code, I will have to buy books and read to be able to figure it out. Thanks for the help.

    I'm glad you said that because I looked at it again and I just wasn't getting it.

    It looks like you're trying to set up some kind of experimental design model and evaluate outcomes. But it's been a long time since I've done anything like that and I am no statistician! So I'm not sure I'll be able to help much more until you can provide more information to go on.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • I understand, thanks for the help. I will get on it and let you know through this thread if I have a break through. Cheers and thanks again for taking the time.

  • It's difficult to figure out exactly what you are looking for here, but I suspect that one or other of the GROUP BY extensions might just get you what you need. Have a look at this and see if it's on the right track. You might have to refer to BOL (Books Online, the SQL Server help system).

    ;WITH SampleData (

    [Customer ID],

    Var1, Var2, Var3, Var4, Var5, Var6, Var7, Var8, Var9, Var10, Var11, Var12, Var13, Var14, Var15,

    [Outcome Label])

    AS (

    SELECT '123',3,0,5,0.5,0.885889264,0.6,0.023603498,0,1,0.89,1,0.654,0,6,1,1 UNION ALL

    SELECT '288',1,0,4,0.066666667,0.994711653,0.8,0.366666667,0,1,0.53,0,0.400040004,4,1,1,1 UNION ALL

    SELECT '227',1,1,4,1,0.846637933,0.6,0.333333333,0,0,0.95,1,0,1,18,1,1 UNION ALL

    SELECT '7',1,0,4,1,0.846637933,0.6,0.333333333,0,0,0.95,1,0,5,3,4,0

    )

    SELECT VAR1, VAR2, VAR3,

    SUM([Outcome Label])*1.0/COUNT([Outcome Label]) AS PERC,

    COUNT([Outcome Label]) AS LINES

    FROM SampleData

    GROUP BY ROLLUP(VAR1, VAR2, VAR3)

    HAVING SUM([Outcome Label])*1.0/COUNT([Outcome Label]) < 0.8;

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Or possibly this:

    -- GROUP BY ROLLUP and GROUP BY CUBE seem to be close.

    -- Neither can handle more than 12 columns ("variables") in one sitting

    -- 4 sample rows generate 14,300 rows of unfiltered data in about 16 seconds

    -- which comes down to 4096 rows when filtered for PERC < 0.8

    -- note use of internal AVG function

    ;WITH SampleData (

    [Customer ID],

    Var1, Var2, Var3, Var4, Var5, Var6, Var7, Var8, Var9, Var10, Var11, Var12, Var13, Var14, Var15,

    [Outcome Label])

    AS (

    SELECT '123',3,0,5,0.5,0.885889264,0.6,0.023603498,0,1,0.89,1,0.654,0,6,1,1 UNION ALL

    SELECT '288',1,0,4,0.066666667,0.994711653,0.8,0.366666667,0,1,0.53,0,0.400040004,4,1,1,1 UNION ALL

    SELECT '227',1,1,4,1,0.846637933,0.6,0.333333333,0,0,0.95,1,0,1,18,1,1 UNION ALL

    SELECT '7' ,1,0,4,1,0.846637933,0.6,0.333333333,0,0,0.95,1,0,5,3,4,0

    )

    SELECT Var1,Var2,Var3,Var4,Var5,Var6,Var7,Var8,Var9,Var10,Var11,Var12, --Var13,Var14,Var15,

    AVG([Outcome Label]*1.0) AS PERC,

    COUNT([Outcome Label]) AS LINES

    FROM SampleData

    /*

    GROUP BY

    ROLLUP(Var1),ROLLUP(Var2),ROLLUP(Var3),ROLLUP(Var4),ROLLUP(Var5),

    ROLLUP(Var6),ROLLUP(Var7),ROLLUP(Var8),ROLLUP(Var9),ROLLUP(Var10),

    ROLLUP(Var11),ROLLUP(Var12) --,Var13,Var14,Var15)

    */

    GROUP BY CUBE(Var1,Var2,Var3,Var4,Var5,Var6,Var7,Var8,Var9,Var10,Var11,Var12) --,Var13,Var14,Var15)

    HAVING AVG([Outcome Label]*1.0) < 0.8

    ORDER BY Var1,Var2,Var3,Var4,Var5,Var6,Var7,Var8,Var9,Var10,Var11,Var12 --, --Var13,Var14,Var15

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Are Rollup and Cube stored procedures because sql doesnt seem to be recognizing them. I am using SQL Server 2005, maybe thats the problem?

  • akberali67 (2/8/2013)


    Are Rollup and Cube stored procedures because sql doesnt seem to be recognizing them. I am using SQL Server 2005, maybe thats the problem?

    No, but the syntax was changed in SQL 2008 to make it more complaint with ANSI-SQL. In 2005, the syntax is

    GROUP BY <field list>

    WITH CUBE

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen (2/8/2013)


    akberali67 (2/8/2013)


    Are Rollup and Cube stored procedures because sql doesnt seem to be recognizing them. I am using SQL Server 2005, maybe thats the problem?

    No, but the syntax was changed in SQL 2008 to make it more complaint with ANSI-SQL. In 2005, the syntax is

    GROUP BY <field list>

    WITH CUBE

    Drew

    This is the 2008 forum section. 2008 & 2012 can handle up to 12 grouping columns with the WITH CUBE extension. 2005 is limited to 10 grouping columns. I think it's worth the OP experimenting with this despite the restriction - if it gives us a better handle on the problem.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • ok, got it..

  • Yup! Yup! Yup!

    This is it. I mean, I will have to do some correlation to find out which variables are not important but thats such a minor give away to what this solution is doing, I am using the Rollup function. I ran it on a small set and it worked.

    I have put the entire code to run on the full dataset and by my calculation should run for about two weeks but well worth it. I am sure it will work on that one too because it works for this one.

    I truly can't thank you enough because over the past one month, I have gone half bald thinking of ways to do this. Thank you, thank you, thank you!

    Cheers and God Bless!

    -Akber Khan.

  • akberali67 (2/7/2013)


    I have tried SAS, R, MATLAB over the past month but I couldnt work it out.

    Akber, I don't mean to stick my nose in here, since it seems like you have a solution with CUBE. I'm just curious, did you try proc summary in SAS? It would be something like this, and would be pretty efficient. I'm just asking because you refer to the columns as 'variables', which makes me think you will be re-importing this back into some kind of stats program.

    proc summary data=YourDataset NoPrint Nway Missing;

    class Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12;

    Output Out = work.want (rename=(_freq_=Lines) Drop = _type_);

    run;

    Greg
    _________________________________________________________________________________________________
    The glass is at one half capacity: nothing more, nothing less.

  • Hi Greg,

    I tried quite a lot of advanced functions in SAS. Proc Summary would only summarize fields but I am trying to make all possible filter combinations, which should create over a zillion results. Thanks for trying to help though.

    -Akber.

  • Finally ran out of system memory after 1.5 hours. I dont know what to do, this stuff is making me crazy.

    Msg 701, Level 17, State 123, Line 3

    There is insufficient system memory to run this query.

  • akberali67 (2/8/2013)


    Finally ran out of system memory after 1.5 hours. I dont know what to do, this stuff is making me crazy.

    Msg 701, Level 17, State 123, Line 3

    There is insufficient system memory to run this query.

    It may be worth experimenting with a subset of your data to establish parameters. Those 4 sample rows across 12 variables (columns) generate over 40,000 output rows, which comes down to around 4,000 rows with the PERC filter. Try doubling up the number of sampling rows a few times. You know you can't run this on your current system, what you need to find out is if SQL Server can run this at all.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Ok, that does makes sense.

    Just one last question and I am sorry if I am bothering you. How much memory do you anticipate I would need to run the entire data. I have about 28,000 rows of data and about 93 rows of distinct values.

    Thanks again for your invaluable time, I truly appreciate it.

    -Akber.

  • akberali67 (2/9/2013)


    Ok, that does makes sense.

    Just one last question and I am sorry if I am bothering you. How much memory do you anticipate I would need to run the entire data. I have about 28,000 rows of data and about 93 rows of distinct values.

    Thanks again for your invaluable time, I truly appreciate it.

    -Akber.

    akberali67 (2/8/2013)


    Are Rollup and Cube stored procedures because sql doesnt seem to be recognizing them. I am using SQL Server 2005, maybe thats the problem?

    This is the 2008 forum section. The equivalent syntax for 2005 is as follows:

    -- SQL Server 2005 compatible query

    SELECT Var1,Var2,Var3,Var4,Var5,Var6,Var7,Var8,Var9,Var10,Var11,Var12, --Var13,Var14,Var15,

    AVG([Outcome Label]*1.000) AS PERC,

    COUNT([Outcome Label]) AS LINES

    FROM SampleData

    GROUP BY Var1,Var2,Var3,Var4,Var5,Var6,Var7,Var8,Var9,Var10,Var11,Var12 --,Var13,Var14,Var15

    WITH CUBE

    HAVING AVG([Outcome Label]*1.000) < 0.8

    ORDER BY Var1,Var2,Var3,Var4,Var5,Var6,Var7,Var8,Var9,Var10,Var11,Var12 --, --Var13,Var14,Var15

    I've imported the full sample set of over 10,000 rows from the spreadsheet. Using the syntax above, this little pc with 4gb Ram and SQL Server 2008 running locally will process the sample table doubled up using UNION ALL (21,430 rows total), across 12 variables, taking about 90 seconds to output 789,190 unfiltered rows. The filtered rowcount is only slightly less.

    Adding another set to the input caused an error. Here's the full workings except the import:

    SELECT Var1,Var2,Var3,Var4,Var5,Var6,Var7,Var8,Var9,Var10,Var11,Var12, --Var13,Var14,Var15,

    AVG([Outcome Label]*1.000) AS PERC,

    COUNT([Outcome Label]) AS LINES

    FROM (

    SELECT *

    FROM SampleData

    UNION ALL

    SELECT *

    FROM SampleData

    --UNION ALL

    --SELECT *

    --FROM SampleData

    ) d

    --WHERE rn <= 4 -- 14,300 rows / 00:00:12

    --WHERE rn <= 8 -- 28,506 rows / 00:00:12

    --WHERE rn <= 16 -- 51,620 rows / 00:00:13

    --WHERE rn <= 32 -- 74,496 rows / 00:00:16

    --WHERE rn <= 64 -- 93,434 rows / 00:00:18

    --WHERE rn <= 128 -- 100,472 rows / 00:00:18

    --WHERE rn <= 256 -- 118,362 rows / 00:00:18 / memory grant = 1216kb

    --WHERE rn <= 512 -- 140,024 rows / 00:00:18 / mg = 1344kb

    --WHERE rn <= 1024 -- 157,524 rows / 00:00:19 / mg = 1648kb

    --WHERE rn <= 2048 -- 164,458 rows / 00:00:22 / mg = 2224kb

    --WHERE rn <= 4096 -- 175,350 rows / 00:00:20 / mg = 3392kb

    --WHERE rn <= 8192 -- 768,366 rows / 00:00:53 / mg = 6208kb

    -- Entire sample table (10,715 rows)

    --WHERE rn <= 10715 -- 789,190 rows / 00:01:00 / mg = 8748 / ONE COPY OF TABLE (10,715 rows)

    -- sample table doubled up using UNION, (21,430 rows)

    WHERE rn <= 10715 -- 789,190 rows / 00:01:20 / mg = 13,520

    -- sample table trebled up using UNION

    --WHERE rn <= 10715 -- 0 rows / 00:00:00 / mg = 0 / THREE COPIES OF TABLE (0 rows)

    -- Failed: "The query processor ran out of internal resources and could not produce a query plan.

    -- This is a rare event and only expected for extremely complex queries or queries that reference

    -- a very large number of tables or partitions. Please simplify the query. If you believe you have

    -- received this message in error, contact Customer Support Services for more information."

    GROUP BY Var1,Var2,Var3,Var4,Var5,Var6,Var7,Var8,Var9,Var10,Var11,Var12 WITH CUBE --,Var13,Var14,Var15)

    --HAVING AVG([Outcome Label]*1.0) < 0.8

    ORDER BY Var1,Var2,Var3,Var4,Var5,Var6,Var7,Var8,Var9,Var10,Var11,Var12 --, --Var13,Var14,Var15

    I don't know if this error (Error 8623) is dependant upon system resources, in which case you might get lucky using a server with more resources, or if it's dependant upon SQL Server limitations. I suspect it's the latter but haven't yet found good evidence.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 15 posts - 16 through 30 (of 36 total)

You must be logged in to reply to this topic. Login to reply