Creating multiple Rows from one row by substringing a column

  • I need to create from 1 to n rows from a single row where the one column in the read table contains a string from 8 to 16 characters long.  The rows written contain a single column 8 bytes long containing all each row being a different combination of 8 consecutive characters from the column of the read table. 
    INPUT:  IN_NUM contains '3456738947'
    Output Row 1: NUM_7 contains '34567389'
    Output Row 2: NUM_7 contains '45673894'
    Output Row 3: NUM_7 contains '56738947'
    My ultimate input table will have over 1,000,000 rows and many more columns
    My ultimate output table will have about 4,000,000 rows and a subset of columns from the input table

  • lsinoff - Tuesday, January 2, 2018 9:33 PM

    I need to create from 1 to n rows from a single row where the one column in the read table contains a string from 8 to 16 characters long.  The rows written contain a single column 8 bytes long containing all each row being a different combination of 8 consecutive characters from the column of the read table. 
    INPUT:  IN_NUM contains '3456738947'
    Output Row 1: NUM_7 contains '34567389'
    Output Row 2: NUM_7 contains '45673894'
    Output Row 3: NUM_7 contains '56738947'
    My ultimate input table will have over 1,000,000 rows and many more columns
    My ultimate output table will have about 4,000,000 rows and a subset of columns from the input table

    This is pretty easy to do with SUBSTRING and some randomization combined with CROSS APPLY.  I don't understand the 4 million output rows, though.  Are there not 8! combinations of the 8 bytes?  That's a pretty big number.  Also, what's the business motivation for this?  There's may a better way.

  • I do not know enough TSQL to code the loop needed to generate to create the three rows from 1 row as shown in my example.

  • Here's a sample code to show you how to do it. It uses a cte that creates a tally table on the fly. You can convert the cte into a view, function or store it in a table. I suggest that you read more about tally tables to understand how do they replace loops.

    CREATE TABLE #SampleData ( IN_NUM varchar(16));
    INSERT INTO #SampleData VALUES('3456738947'), ('41724246874135');

    WITH
    E(n) AS(
      SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))E(n)
    ),
    E2(n) AS(
      SELECT a.n FROM E a, E b
    ),
    E4(n) AS(
      SELECT a.n FROM E2 a, E2 b
    ),
    cteTally(n) AS(
      SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) n
      FROM E4
    )
    SELECT IN_NUM, SUBSTRING( IN_NUM, n, 8)
    FROM #SampleData s
    JOIN cteTally t ON LEN(s.IN_NUM) - 7 >= t.n
    --ORDER BY IN_NUM, n
    ;

    GO
    DROP TABLE #SampleData

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • lsinoff - Tuesday, January 2, 2018 9:33 PM

    I need to create from 1 to n rows from a single row where the one column in the read table contains a string from 8 to 16 characters long.  The rows written contain a single column 8 bytes long containing all each row being a different combination of 8 consecutive characters from the column of the read table. 
    INPUT:  IN_NUM contains '3456738947'
    Output Row 1: NUM_7 contains '34567389'
    Output Row 2: NUM_7 contains '45673894'
    Output Row 3: NUM_7 contains '56738947'
    My ultimate input table will have over 1,000,000 rows and many more columns
    My ultimate output table will have about 4,000,000 rows and a subset of columns from the input table

    I'm a bit confused.  Ignoring the typo in Row2 of the output, what are you actually looking for the given value of '3456738947'.  Is it just 3 rows as you have listed or do you really want 8 rows using the full rotation of the "ring counter" that you've described?

    How about the example of '41724246874135'  that Luis used?  Do you want just the first 3 rows from the "ring" or do you want the full "ring" of 14 rows?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Just the three rows.  Some accounts are sent in by our clients with prefixes and/or suffixes and some of the accounts in our Data Base also have prefixes and/or suffixes.  A match on any 7 consecutive characters, same last name and same date of birth will be considered an good match. I need to extract all the 7 character consecutive strings from our DB.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply