Looping through 65 millions of combinations: Excel or SQL or something else?

  • Hello,

    I have to count the number of persons in a company according to multiple dimensions / filters. The output must be a table with all combinations and the number of persons for each combination. If 0, it doesn't have to be in the table.

    For example, I have to find:

    - Number ALL men
    - Number ALL men who are from Germany
    - Number ALL men who are from France
    - Number ALL men who are from X
    - Number ALL women
    - Number ALL women who are from Germany
    - Number ALL women who are from France
    - Number ALL women who are from X

    This is only with two dimensions. I could have betwen 5 and 15, which gives in the worst case 65 000 000 posible combinations.

    For now I'm doing it with Excel / VBA, with up to 15 "For each".
    But I feel like I could do something more efficient with a Select Count in SQL Server. I just have no idea how to do those loops in SQL, and if it's worth it.

    Any help on this?

    Thanks.

  • "Dimensions"? Is this in a cube and thus you need an MDX query, or in a table in the data engine and thus you'd need T-SQL. You've posted in a T-SQL forum, which would suggest the latter, however, you can't use T-SQL to query a Dimension in a cube, hence my confusion.

    I must say, I don't really understand your requirement here; I have no idea how you suddenly got to "65M possible combinations". Could you  provide some sample data and expected results to show what you're after here please? There's a link in my signature on how to post DDL and sample data for questions in my signature.

    Thanks.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • allard.julien.70 - Wednesday, May 23, 2018 8:12 AM

    Hello,

    I have to count the number of persons in a company according to multiple dimensions / filters. The output must be a table with all combinations and the number of persons for each combination. If 0, it doesn't have to be in the table.

    For example, I have to find:

    - Number ALL men
    - Number ALL men who are from Germany
    - Number ALL men who are from France
    - Number ALL men who are from X
    - Number ALL women
    - Number ALL women who are from Germany
    - Number ALL women who are from France
    - Number ALL women who are from X

    This is only with two dimensions. I could have betwen 5 and 15, which gives in the worst case 65 000 000 posible combinations.

    For now I'm doing it with Excel / VBA, with up to 15 "For each".
    But I feel like I could do something more efficient with a Select Count in SQL Server. I just have no idea how to do those loops in SQL, and if it's worth it.

    Any help on this?

    Thanks.

    Have a look at GROUP BY in Books Online (the SQL Server help system)
    SELECT Gender, Country, Count = COUNT(*)
    FROM table
    GROUP BY Gender, Country
    WITH ROLLUP / WITH CUBE

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Try this.  If you wanted to go to an indeterminate number of dimensions, you'd need to use dynamic SQL.  I don't know whether, or at what point, performance will become a problem.

    -- Test harness
    CREATE TABLE #People (
         FullName varchar(100)
    ,    Gender char(1)
    ,    Country char(20)
        );
        TRUNCATE TABLE #People;

    SELECT TOP 100
         CAST(NEWID() AS varchar(100))
    ,    CASE ABS(CHECKSUM(NEWID())) % 2
            WHEN 0 THEN 'M'
            ELSE 'F'
        END
    ,     CASE ABS(CHECKSUM(NEWID())) % 3
            WHEN 0 THEN 'Germany'
            WHEN 1 THEN 'France'
            ELSE 'X'
        END
    FROM sys.all_columns;

    -- Eyeball the test data
    SELECT * FROM #People;
    SELECT
         Gender
    ,    COUNT(*)
    FROM #People
    GROUP BY Gender;
    SELECT
         Country
    ,    COUNT(*)
    FROM #People
    GROUP BY Country;

    -- Get the breakdown
    WITH Genders AS (
        SELECT DISTINCT Gender
        FROM #People
        UNION ALL SELECT NULL
        )
    , Countries AS (
        SELECT DISTINCT Country
        FROM #People
        UNION ALL SELECT NULL
        )
    SELECT
         g.Gender
    ,    c.Country
    ,    COUNT(*)
    FROM Genders g
    CROSS JOIN Countries c
    JOIN #People p
    ON COALESCE(g.Gender,p.Gender) = p.Gender
    AND COALESCE(c.Country,p.Country) = p.Country
    GROUP BY
         g.Gender
    ,    c.Country;

  • John Mitchell-245523 - Wednesday, May 23, 2018 8:43 AM

    Try this.  If you wanted to go to an indeterminate number of dimensions, you'd need to use dynamic SQL.  I don't know whether, or at what point, performance will become a problem.

    John, your last statement can be reduced drastically using WITH CUBE.

    SELECT
      Gender
    ,  Country
    ,  COUNT(*)
    FROM #People
    GROUP BY Gender,
        Country WITH CUBE

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply