Adding duplicate rows to a resultset

  • I have an interesting challenge to create duplicate rows based on a column value in a result set.

    As an example, I have a resultset with 2 records.

    ---------------------------------

    | id | Product | Description | Qty |

    ---------------------------------

    | 1 | P1 | Product 1 | 2 |

    | 2 | P2 | Product 2 | 3 |

    ---------------------------------

    The resultset should look like below with 5 rows

    ---------------------------------

    | id | Product | Description | Qty |

    ---------------------------------

    | 1 | P1 | Product 1 | 2 |

    | 1 | P1 | Product 1 | 2 |

    | 2 | P2 | Product 2 | 3 |

    | 2 | P2 | Product 2 | 3 |

    | 2 | P2 | Product 2 | 3 |

    ---------------------------------

    I thought about using temp tables and repeating the rows based on the qty value with recursive loops. I wasnt sure if there is a better option or possibly a query trick.

    help is much appreciated!

  • You should use a Tally (or Numbers) table for this. There are several articles about Tally tables on this site. You can join to your table to a Tally table, limiting the number of rows selected from the Tally table based on the value of your Qty column.

  • Hi

    You can use a recursive CTE with a counter like this:

    DECLARE @t TABLE (id INT, product VARCHAR(100), qty INT)

    INSERT INTO @t

    SELECT 1, 'P1', 2

    UNION SELECT 2, 'P2', 3

    ;WITH recurse (id, product, qty, cnt) AS

    (

    SELECT id, product, qty, qty

    FROM @t

    UNION ALL

    SELECT id, product, qty, cnt - 1

    FROM recurse

    WHERE cnt - 1 != 0

    )

    SELECT id, product, qty

    FROM recurse

    ORDER BY id

    Greets

    Flo

  • andrewd.smith (3/24/2009)


    You should use a Tally (or Numbers) table for this. There are several articles about Tally tables on this site. You can join to your table to a Tally table, limiting the number of rows selected from the Tally table based on the value of your Qty column.

    ... sure you are right. The Tally Solution is the better approach!

    SELECT id, product, qty

    FROM @t t

    JOIN Tally ON Tally.N <= t.qty

    Thanks for correction! 🙂

    Flo

  • Thanks guy .... tally table worked like a charm

  • If the maximum value for quantity is relatively small, or if you don't have a Numbers Table available, something like the following provides an efficient alternative:

    declare @data

    table (

    id integer primary key,

    product char(2) not null,

    [description] varchar(100) not null,

    quantity tinyint not null

    );

    insert @data

    (

    id,

    product,

    [description],

    quantity

    )

    values (

    1,

    'P1',

    'Product 1',

    2

    );

    insert @data

    (

    id,

    product,

    [description],

    quantity

    )

    values (

    2,

    'P2',

    'Product 2',

    3

    );

    select d.id,

    d.product,

    d.[description],

    d.quantity

    from @data d

    cross

    apply (

    -- Assuming MAX(quantity) <= 9

    select convert(integer, 1) union all

    select 2 union all

    select 3 union all

    select 4 union all

    select 5 union all

    select 6 union all

    select 7 union all

    select 8 union all

    select 9

    ) ca (rn)

    where ca.rn <= quantity

    order by

    d.id;

    For larger values, a dynamic Numbers Table could be used (generated using TOP (n) and the ROW_NUMBER() function over some system table CROSS JOINed to itself a number of times:

    select top (1000)

    rn = ROW_NUMBER() over (order by a1.allocation_unit_id)

    from master.sys.allocation_units a1,

    master.sys.allocation_units a2,

    master.sys.allocation_units a3,

    master.sys.allocation_units a4

    order by

    rn;

  • Paul White (12/4/2009)


    For larger values, a dynamic Numbers Table could be used (generated using TOP (n) and the ROW_NUMBER() function over some system table CROSS JOINed to itself a number of times:

    select top (1000)

    rn = ROW_NUMBER() over (order by a1.allocation_unit_id)

    from master.sys.allocation_units a1,

    master.sys.allocation_units a2,

    master.sys.allocation_units a3,

    master.sys.allocation_units a4

    order by

    rn;

    Ooooo.... be careful. The idea of using a single cross join is a great idea because it's nasty fast and doesn't cost much. When you start adding addition cross-joins and you go over the boundry of one of the single cross-joins, it starts to become very expensive insofar as possible growth in the TempDB log file.

    I built a "little" race between the Cross-Join method and the cascading CTE method that Itzik has in his book to build a Billion row numeric table just to see wha'hapuns. 😛 The cascading CTE method did nothing to TempDB... the multiple Cross-Join method cause the TempDB logfile to blow out to over 40GB.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Jeff,

    Yes one should be careful. I wouldn't advocate using the method for billions of rows 😉

    I had much smaller maximum values for 'Quantity' in mind - maybe thousands.

    (And if a Tally/Numbers table is to hand that's obviously the way to go anyway)

    I have heard you mention your test versus Itzik's method before - one day I'd like to look into it for myself. Is there anywhere I can find the code for your tests? Just if it's easy - it intrigues me slightly. I imagine the issue is the normal one (row count spools) but I'd be good to know for sure.

    Paul

  • Paul White (12/6/2009)


    Thanks Jeff,

    Yes one should be careful. I wouldn't advocate using the method for billions of rows 😉

    I had much smaller maximum values for 'Quantity' in mind - maybe thousands.

    (And if a Tally/Numbers table is to hand that's obviously the way to go anyway)

    I have heard you mention your test versus Itzik's method before - one day I'd like to look into it for myself. Is there anywhere I can find the code for your tests? Just if it's easy - it intrigues me slightly. I imagine the issue is the normal one (row count spools) but I'd be good to know for sure.

    Paul

    Ah... you can do the tests for yourself... under 8k, the Tally table normally wins especially after it caches. For VARCHAR(MAX), believe it or not, a well written WHILE loop will many times win (of course, I'm thinking "splits" here). Of course, for splits, a very well written CLR will win (and I do mean very well written like the guy that FLO introduced on that very long split thread).

    If you need something up to 16 million, I'd recommend the single cross join on sys.All_Columns in 2k5.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • This was removed by the editor as SPAM

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply