The Joy of Numbers

  • Heh... dang it... I gotta remember... "Scroll Down". You have that in the code. :blush:

    I'm running some tests on this (cuz it's interesting)... I'll give a complete report when I'm done but, so far, the Insert completed on my humble 6 year old desktop (P4 1.8 Ghz, 1GB Ram, 5400 RPM Ide Drive, Windows XP, SQL Server 2005 sp2 Developer's Edition) in only 29 minutes and the conversion of the "i" column to INT NOT NULL (I didn't convert to BIGINT, just INT) only took an hour.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ok... the Billion row results are in. On the orignal code that Russ submitted (looks like modification of Itzek's code),

    Insert Duration - 00:29:00

    Convert column to NOT NULL - 01:01:42

    Add Clustered PK - 01:27:59

    Total - 2:58:41

    First, we'll revert do Itzek's original rendition of the code with is about 5 minutes faster on my humble box... just notice the difference in placement of the ROW_NUMBER...

    ;WITH

    L0 AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rows

    L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B),--4 rows

    L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B),--16 rows

    L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B),--256 rows

    L4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B),--65536 rows

    L5 AS (SELECT 1 AS C FROM L4 AS A, L4 AS B),--4294967296 rows

    Nums AS (SELECT ROW_NUMBER() OVER (ORDER BY C) AS n FROM L5)

    SELECT N AS i INTO NUMBERS_TEST FROM Nums WHERE n <= 100000000

    Now, lemme show you one of those "undocumented" tricks that makes all the difference in the world... let's add a something that will make the "i" column NOT NULL as it's being built...

    ;WITH

    L0 AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rows

    L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B),--4 rows

    L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B),--16 rows

    L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B),--256 rows

    L4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B),--65536 rows

    L5 AS (SELECT 1 AS C FROM L4 AS A, L4 AS B),--4294967296 rows

    Nums AS (SELECT ROW_NUMBER() OVER (ORDER BY C) AS n FROM L5)

    SELECT [font="Arial Black"]ISNULL([/font]N[font="Arial Black"],0) [/font]AS i INTO NUMBERS_TEST FROM Nums WHERE n <= 100000000

    Believe it or not, it doesn't slow the code down at all and has the added benefit of making the result column NOT NULL. That wipes out the whole 01:01:42 for converting the column to NOT NULL in a separate step.

    Now, for the big suprise... after the table was built, the dedicated database I made for this test had expanded to 16.5 GB. Now, the database is set to the Simple Recovery Mode, and the "used" portion did drop back down to that vicinity, but the database MDF file expanded to over 30 GB to build the clustered index. In other words, adding the clustered index temporarily caused the disk usage to almost double for the sorts involved.

    I will say that I'm pretty happy that I don't have to create a Billion row numbers table everyday!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • james elmer (1/23/2007)


    Good article. Interesting that my create table/insert while loop (to 10,000 rows) executed in 4 seconds and the set based solution executed in 17 seconds.

    Heh... guess I've waited long enough... would you post your code, please?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi,

    At the part of eliminate duplicate what if we have a '!' character in string?

    So following solution is so simple and suitable for any scenario.

    And I use a new way for publishing numbers table.

    DECLARE @s-2 VARCHAR(500)='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaffffffffffffffffffffffffdddddddddddd ghqwer a d qqqq [pppp]'

    DECLARE @result VARCHAR(500)='';

    WITH c AS

    (SELECT 1 AS n

    UNION ALL

    SELECT 1 + n FROM c WHERE n < 100),

    c1 AS (SELECT n = ROW_NUMBER() OVER (ORDER BY (SELECT 1))

    FROM c c1

    CROSS JOIN c c2)

    , k AS

    (SELECT n, k = SUBSTRING (@s, n, 1)

    FROM c1

    WHERE n <= LEN(@s))

    SELECT @result = @result + k

    FROM k k1

    WHERE NOT EXISTS

    (SELECT *

    FROM k k2

    WHERE k1.k = k2.k

    AND k1.n+1 = k2.n);

    SELECT @result AS removed;

    /*

    removed

    -----------------------

    afd ghqwer a d q [p]

  • {edit} Still testing... comment removed. Sorry.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • In splitting string using number table is not this approch simpler?

    DECLARE @s-2 VARCHAR(80)='Army,Navy,Air Force,Marines'

    ;WITH c AS

    (SELECT 1 AS n

    UNION ALL

    SELECT n+1 FROM c WHERE n < 100)

    SELECT i

    FROM (SELECT CASE WHEN CHARINDEX(',', @s-2 + ',', n) - n = 0 THEN ''

    ELSE SUBSTRING(@s, n, CHARINDEX(',', @s-2 + ',', n) - n)

    END, n

    FROM c

    WHERE n <= LEN(@s)

    ) d(i, n)

    WHERE SUBSTRING(',' + @s-2, n, 1) = ','

  • _ms65g_ (5/20/2010)


    In splitting string using number table is not this approch simpler?

    DECLARE @s-2 VARCHAR(80)='Army,Navy,Air Force,Marines'

    ;WITH c AS

    (SELECT 1 AS n

    UNION ALL

    SELECT n+1 FROM c WHERE n < 100)

    SELECT i

    FROM (SELECT CASE WHEN CHARINDEX(',', @s-2 + ',', n) - n = 0 THEN ''

    ELSE SUBSTRING(@s, n, CHARINDEX(',', @s-2 + ',', n) - n)

    END, n

    FROM c

    WHERE n <= LEN(@s)

    ) d(i, n)

    WHERE SUBSTRING(',' + @s-2, n, 1) = ','

    Sorry... got pulled away on other things...

    Yes... that method is much "simpler". It's also much slower in the grand scheme of things because recursive counters are as bad or worse than While Loops for performance and they use a lot more "reads" even if the reads are in memory.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you,

    But I do not mean publishing number table, I mean Splitting method is not simpler?

    Also, A simplified method for publishing number table (no loop, no recursion, and no ranking)

    ;WITH C (i) AS

    (SELECT '0' UNION SELECT '1' UNION SELECT '2' UNION SELECT '3' UNION SELECT '4' UNION SELECT '5'

    UNION SELECT '6' UNION SELECT '7' UNION SELECT '8' UNION SELECT '9')

    SELECT c1.i + c2.i + c3.i + c4.i + 1 AS nbr --10000 numbers

    FROM C c1, C c2, C c3, C c4

  • _ms65g_ (5/23/2010)


    Thank you,

    But I do not mean publishing number table, I mean Splitting method is not simpler?

    Also, A simplified method for publishing number table (no loop, no recursion, and no ranking)

    ;WITH C (i) AS

    (SELECT '0' UNION SELECT '1' UNION SELECT '2' UNION SELECT '3' UNION SELECT '4' UNION SELECT '5'

    UNION SELECT '6' UNION SELECT '7' UNION SELECT '8' UNION SELECT '9')

    SELECT c1.i + c2.i + c3.i + c4.i + 1 AS nbr --10000 numbers

    FROM C c1, C c2, C c3, C c4

    Oh be careful... That method will always generate all 10000 numbers even it you put a filter on it and it has the added disadvantage of not producing an ordered output. Compare the actual execution plans of your method and a derivative of Itzik Ben-Gan's method...

    --===== This method causes all 10000 rows to show up in the execution plan even thouth only 100

    -- are called for. It also has the disadvantage of not being sorted. I wouldn't use this method.

    ;WITH C (i) AS

    (

    SELECT '0' UNION ALL SELECT '1' UNION ALL SELECT '2' UNION ALL SELECT '3' UNION ALL SELECT '4' UNION ALL

    SELECT '5' UNION ALL SELECT '6' UNION ALL SELECT '7' UNION ALL SELECT '8' UNION ALL SELECT '9'

    ),

    cteTally AS

    (

    SELECT c1.i + c2.i + c3.i + c4.i + 1 AS nbr --10000 numbers

    FROM C c1, C c2, C c3, C c4

    )

    SELECT nbr

    FROM cteTally t

    WHERE nbr BETWEEN 1 AND 100

    ;

    --===== Although a bit more complicated, this method is very effective because it only generates the

    -- number of rows requested AND has the advantage of a sorted result set.

    WITH

    E1(N) AS ( --=== Create Ten 1's

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 --10

    ),

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000

    cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM E4)

    SELECT N FROM cteTally WHERE N BETWEEN 1 AND 100

    Also, although it doesn't appear to hurt anything, the split code you posted has an unnecessary SELECT in it. Take a look...

    DECLARE @s-2 VARCHAR(80);

    SELECT @s-2 = 'Army,Navy,Air Force,Marines';

    --===== Your splitter with Itzik's inline Tally table

    WITH

    E1(N) AS ( --=== Create Ten 1's

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 --10

    ),

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000

    cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) FROM E4)

    --===== Do your split

    SELECT i

    FROM (SELECT CASE WHEN CHARINDEX(',', @s-2 + ',', n) - n = 0 THEN ''

    ELSE SUBSTRING(@s, n, CHARINDEX(',', @s-2 + ',', n) - n)

    END, n

    FROM cteTally

    WHERE n <= LEN(@s)

    ) d(i, n)

    WHERE SUBSTRING(',' + @s-2, n, 1) = ','

    ;

    --===== An even simpler splitter, also with Itzik's inline Tally table

    WITH

    E1(N) AS ( --=== Create Ten 1's

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 --10

    ),

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000

    cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) FROM E4)

    --===== Do the split

    SELECT SUBSTRING(@s, N, CHARINDEX(',', @s-2 + ',', N) - N) AS Item

    FROM cteTally

    WHERE N < LEN(@s) + 2

    AND SUBSTRING(',' + @s-2, N, 1) = ','

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Remove duplicate side-by-side characters from a string

    New approach using numbers table

    CREATE FUNCTION dbo.fnRemoveDupesI (@String VARCHAR(8000)) 

    RETURNS VARCHAR(8000) 

    AS

    BEGIN

        DECLARE @result VARCHAR(8000) = '';

       

       ;WITH DataOrder

       AS

       (

          SELECT ID, Data

                 ,ROW_NUMBER() OVER (PARTITION BY Data ORDER BY ID) AS RowNum

            FROM (SELECT SUBSTRING(@String, nbr, 1), nbr 

                    FROM Nums 

                   WHERE nbr <= LEN(@String)

                 ) D(data, ID)

       )

      

      SELECT @result = @result + Data

        FROM (SELECT ID, Data

                     ,DENSE_RANK() OVER (ORDER BY ID - RowNum) As [Rank]

                FROM DataOrder

             )D

       GROUP BY Data, [Rank]

       ORDER BY MIN(ID)

       RETURN @result

    END;

  • _ms65g_ (5/28/2010)


    Remove duplicate side-by-side characters from a string

    New approach using numbers table

    CREATE FUNCTION dbo.fnRemoveDupesI (@String VARCHAR(8000)) 

    RETURNS VARCHAR(8000) 

    AS

    BEGIN

        DECLARE @result VARCHAR(8000) = '';

       

       ;WITH DataOrder

       AS

       (

          SELECT ID, Data

                 ,ROW_NUMBER() OVER (PARTITION BY Data ORDER BY ID) AS RowNum

            FROM (SELECT SUBSTRING(@String, nbr, 1), nbr 

                    FROM Nums 

                   WHERE nbr <= LEN(@String)

                 ) D(data, ID)

       )

      

      SELECT @result = @result + Data

        FROM (SELECT ID, Data

                     ,DENSE_RANK() OVER (ORDER BY ID - RowNum) As [Rank]

                FROM DataOrder

             )D

       GROUP BY Data, [Rank]

       ORDER BY MIN(ID)

       RETURN @result

    END;

    Be careful with the multirow concatenation thing you got going on there. You might be better off using the FOR XML PATH subquery method for ordered string concatenation (but then you have to worry about XML entities).

    Thanks

    Mike C

  • Hi,

    Here is a new approach for splitting a delimited string. It is set-based, without CHARINDEX, using number table…

    See:

    DECLARE @s-2 VARCHAR(620)='item_01,item_02,item_03';

    SET @s-2 = ','+ @s-2 + ',';

    ;WITH C AS

    (SELECT n, ROW_NUMBER() OVER(ORDER BY n) AS rn

    FROM Nums --Number Table

    WHERE SUBSTRING(@s, n, 1) = ','

    AND n <= LEN(@s))

    SELECT SUBSTRING(@s, A.n + 1, B.n - 1 - A.n) AS item

    FROM C A

    INNER JOIN C B

    ON A.rn + 1 = B.rn

    AND B.n - A.n > 1;

    Also here is the simplified version of my technique for removing duplicate side-by-side characters.

    DECLARE @result VARCHAR(8000) = '',

    @string VARCHAR(8000) = 'SQQQQQQQQQQLLLLLLL Serrrrrrrrveerr';

    SELECT @result = @result + Data

    FROM (SELECT ID,

    Data,

    ROW_NUMBER() OVER (PARTITION BY Data ORDER BY ID) - ID

    FROM (SELECT SUBSTRING(@String, n, 1), n

    FROM Nums

    WHERE n <= LEN(@String)

    ) D(data, ID)

    ) D(ID, Data, RowNum)

    GROUP BY Data, RowNum

    ORDER BY MIN(ID)

    SELECT @result AS Result

  • _ms65g_ (8/17/2010)


    Hi,

    Here is a new approach for splitting a delimited string. It is set-based, without CHARINDEX, using number table…

    See:

    DECLARE @s-2 VARCHAR(620)='item_01,item_02,item_03';

    SET @s-2 = ','+ @s-2 + ',';

    ;WITH C AS

    (SELECT n, ROW_NUMBER() OVER(ORDER BY n) AS rn

    FROM Nums --Number Table

    WHERE SUBSTRING(@s, n, 1) = ','

    AND n <= LEN(@s))

    SELECT SUBSTRING(@s, A.n + 1, B.n - 1 - A.n) AS item

    FROM C A

    INNER JOIN C B

    ON A.rn + 1 = B.rn

    AND B.n - A.n > 1;

    That method is quite old, actually. Most people don't use it because, since it actually has to read the numbers table twice, it's usually slower than the CHARINDEX method.

    If you'd like to convert the code (I don't want to be accused of writting biased code :-D) so that it can be played against a table with a CSV column in it, I'd be happy to demonstrate/backup my claim with code.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 13 posts - 31 through 42 (of 42 total)

You must be logged in to reply to this topic. Login to reply