How to write this query?

  • Hi,
    I have this dataset:

    create table test1
    (

    [Category] char(1) null,

    [CNT] int null

    )

    insert into test1 ([Category], [CNT])

    values ('A', 5)

    ;

    insert into test1 ([Category], [CNT])

    values ('B', 3)

    ;


    How can I end up with this dataset:

    create table test2

    ([Category] char(1) null)

    insert into test2 (category)

    values ('A')

    ;

    insert into test2 (category)

    values ('A')

    ;

    insert into test2 (category)

    values ('A')

    ;

    insert into test2 (category)

    values ('A')

    ;

    insert into test2 (category)

    values ('A')

    ;

    insert into test2 (category)

    values ('B')

    ;

    insert into test2 (category)

    values ('B')

    ;

    insert into test2 (category)

    values ('B')

    ;


    The idea is one separate row for each category, that corresponds to the count of the category.
    Any help is appreciated!
    (Sorry, I couldn't fix the line spacing in the code...)
    -M

  • Like this, for one method:

    create table dbo.test1
    ([Category] char(1) null,
     [CNT] int null
    );

    create table dbo.test2
    ([Category] char(1) null);

    insert into dbo.test1 ([Category], [CNT])
    values ('A', 5),('B', 3);


    WITH eTally(n) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) n FROM [sys].[all_columns] AS [ac] CROSS JOIN [sys].[all_columns] AS [ac2])
    INSERT INTO dbo.test2([Category])
    SELECT t1.[Category]
    FROM
      dbo.test1 t1
      CROSS APPLY (SELECT TOP(t1.[CNT]) n FROM eTally ORDER BY n) t;

    SELECT * FROM dbo.test2;

    DROP TABLE [dbo].[test2];
    DROP TABLE [dbo].[test1];

  • I don't really see the need for the overhead and potential issues of reading the sys.all_columns view.


    ;WITH
    cteTally10 AS (
      SELECT * FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS numbers(number)
    ),
    cteTally1000 AS (
      SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS number
      FROM cteTally10 c1
      CROSS JOIN cteTally10 c2
      CROSS JOIN cteTally10 c3
    )
    --INSERT INTO dbo.test2 ( [Category] )
    SELECT t1.[Category]
    FROM dbo.test1 t1
    INNER JOIN cteTally1000 t ON t.number BETWEEN 1 AND t1.CNT

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher - Thursday, March 22, 2018 10:48 AM

    I don't really see the need for the overhead and potential issues of reading the sys.all_columns view.


    ;WITH
    cteTally10 AS (
      SELECT * FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS numbers(number)
    ),
    cteTally1000 AS (
      SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS number
      FROM cteTally10 c1
      CROSS JOIN cteTally10 c2
      CROSS JOIN cteTally10 c3
    )
    --INSERT INTO dbo.test2 ( [Category] )
    SELECT t1.[Category]
    FROM dbo.test1 t1
    INNER JOIN cteTally1000 t ON t.number BETWEEN 1 AND t1.CNT

    It was a quick, throw together cte based tally table, one that Jeff has even used.  I could have just as easily taken the time to type this:

    with e1(n) as (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),
         e2(n) as (select 1 from e1 a cross join e1 b),
         e4(n) as (select 1 from e2 a cross join e2 b),
         etally(n) as (select n = row_number() over (order by (select null)) from e4 a cross join e4 b)

  • To add a further tweak, nothing about the ROW_NUMBER is needed here, so there's no need to add the overhead of generating it.

    Something like this is sufficient and avoids the unnecessary overhead:

    WITH
    n AS (SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))n(n)),
    n3 AS (SELECT n=n1.n FROM n n1, n n2, n n3)

    SELECT t1.[Category]
    FROM dbo.test1 t1 CROSS APPLY (SELECT TOP (cnt) n FROM n3)n;


    The overhead's pretty small, but real. 🙂

    Cheers!

    EDIT: I should point out that this isn't guaranteed (that there's additional overhead); it's been a while since I've run the full battery of tests, but I remember that sometimes the optimizer implemented the query forms in the same way. Still, I'd rather not just hope that the optimizer gets it right, and if it's unnecessary, there's no need to clutter up the query 🙂

  • Thanks everybody!
    I ended up using the response by Mr. Jacob Wilkins.
    There is something I'm not understanding though: The cte that is called 'n', why is it just a table of 10 zeros? I don't get how that makes this query work. Also, I noticed the query doesn't work for values over 1,000. What can I do to change that upper limit? Sorry if I'm being dense...
    -m

  • That sort of CTE is part of a common construct to form an on-the-fly tally or numbers table.

    The specific values don't really matter; it's just generating a certain number of rows. In this case, the first CTE generates 10 rows. The second CTE cross joins it to itself 3 times, forming 10^3=1000 rows.

    Most commonly, a ROW_NUMBER is calculated over those rows to generate the tally or numbers table. In this case, we don't need a list of numbers; we just need enough rows to "explode" your original rows into the appropriate number of new rows. For example, if you have something with a count of 5 on one row, we need to make that result in 5 new rows, so we use APPLY to "match" that row to 5 rows from the CTE.

    If you need more than 10^3, just add an appropriate number of cross joins in the second CTE.

    Cheers!

  • Thank you sir, your explanation made me understand it perfectly. I added one more join for a 10K limit which should be enough.
    -martin

  • Just thought I'd mention that I tried both cte tally schemes and they produce a different number of rows.     Jacob, yours seems to produce a doubled result set.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • The Dixie Flatline - Tuesday, March 27, 2018 11:14 AM

    Just thought I'd mention that I tried both cte tally schemes and they produce a different number of rows.     Jacob, yours seems to produce a doubled result set.

    I'd need to see a repro script 🙂

    IF OBJECT_ID('test1') IS NOT NULL DROP TABLE test1;

    CREATE TABLE test1 ([Category] CHAR(1) NULL,[CNT] INT NULL);
    INSERT INTO test1 ([Category], [CNT]) VALUES ('A', 5),('B', 3);

    WITH
    n AS (SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))n(n)),
    n3 AS (SELECT n=n1.n FROM n n1, n n2, n n3)

    SELECT t1.[Category]
    FROM dbo.test1 t1 CROSS APPLY (SELECT TOP (cnt) n FROM n3)n;

    WITH
    cteTally10 AS (
    SELECT * FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS numbers(number)
    ),
    cteTally1000 AS (
    SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS number
    FROM cteTally10 c1
    CROSS JOIN cteTally10 c2
    CROSS JOIN cteTally10 c3
    )
    SELECT t1.[Category]
    FROM dbo.test1 t1
    INNER JOIN cteTally1000 t ON t.number BETWEEN 1 AND t1.CNT;

    Results:

    Category
    --------
    A
    A
    A
    A
    A
    B
    B
    B

    (8 rows affected)

    Category
    --------
    A
    B
    A
    B
    A
    B
    A
    A

    (8 rows affected)

    If you mean the tally itself, and not its application to this particular query, I'd still need to see a repro script 🙂

    IF OBJECT_ID('test1') IS NOT NULL DROP TABLE test1;

    CREATE TABLE test1 ([Category] CHAR(1) NULL,[CNT] INT NULL);
    INSERT INTO test1 ([Category], [CNT]) VALUES ('A', 5),('B', 3);

    WITH
    n AS (SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))n(n)),
    n3 AS (SELECT n=n1.n FROM n n1, n n2, n n3)

    SELECT COUNT(*) FROM n3;

    WITH
    cteTally10 AS (
    SELECT * FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS numbers(number)
    ),
    cteTally1000 AS (
    SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS number
    FROM cteTally10 c1
    CROSS JOIN cteTally10 c2
    CROSS JOIN cteTally10 c3
    )
    SELECT COUNT(*) FROM cteTally1000;

    Results:

    -----------
    1000

    (1 row affected)

    -----------
    1000

    (1 row affected)

    Cheers!

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply