Split a field into multiple records

  • Hi,

    I have a record in a database that is text, separated by char(13). I would like to split this field into separate records and get the resulting table. I have found split stored functions on the net, but they always split some string that comes from outside the select/table. They never seem to show an example of spliting a field that came from the select statement/record in a table. Anyone know the answer?

    For example, I see:

    @foo = 'xxx xxx'

    select * from split(@foo, ' ');

    But that is rather useless. I want something more like:

    select bigtext, id from sometable

    <split(bigtext, char(13)) > cross join <with the id>

    I would then wind up with multiple records for each id. In reality, bigtext is a field containing events all concatenated together but separated by char(13). But I need to split up the field for each record into multiple records so that I can then query based on date, event, etc. The resulting table would go from

    id   |  text

    1       "line1CRline2CRline3"

    to something like

    id   |  line

    1       line1

    1       line2

    1       line3

    Like I said above, I have found stored procedures (functions) that will split a string, and they work. I just haven't seen how to split an actual field in a record and recover the resulting values as multiple records in a new/temporary table.

    Thanks.

    Perry

  • Actually, a function would slow stuff down here... you want the full table to be split... why do it a line at a time?  Try this, instead...

    --==================================================================================================

    --      This section is just setting up for the demonstration and would not be included in final code

    --==================================================================================================

    --===== If the test table exists, drop it

         IF OBJECT_ID('TempDB..#myHead') IS NOT NULL

            DROP TABLE #myHead

    --===== Create the test table

     CREATE TABLE #myHead

            (

            RowNum INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,

            DelimitedText VARCHAR(8000)

            )

    --===== Populate the test table with items separated by Cr

     INSERT INTO #myHead

     SELECT 'hello'+CHAR(13)+'thanks'+CHAR(13)+'movies'+CHAR(13)+'tickets' UNION ALL

     SELECT 'abc'+CHAR(13)+'de'+CHAR(13)+'fghi' UNION ALL

     SELECT 'xyz' UNION ALL

     SELECT 'Now is the time'+CHAR(13)+'for all good men'+CHAR(13)+'to come to the aid of their country'

    --==================================================================================================

    --      This demonstrates the solution

    --==================================================================================================

    --===== Split the items and return the RowNum (ID) where it came from, as well

     SELECT RowNum,

            SUBSTRING(CHAR(13) + h.DelimitedText + CHAR(13), t.N + 1,

            CHARINDEX(CHAR(13), CHAR(13) + h.DelimitedText + CHAR(13), t.N + 1) - t.N - 1)

       FROM dbo.Tally t,

            #myHead h

      WHERE SUBSTRING(CHAR(13) + h.DelimitedText + CHAR(13), t.N, 1) = CHAR(13)

        AND t.N < LEN(CHAR(13) + h.DelimitedText + CHAR(13))

    If you don't already have a "Tally" table, it's time to make one... here's how...

    --===== Create and populate the Tally table on the fly

     SELECT TOP 11000 --equates to more than 30 years of dates

            IDENTITY(INT,1,1) AS N

       INTO dbo.Tally

       FROM Master.dbo.SysColumns sc1,

            Master.dbo.SysColumns sc2

    --===== Add a Primary Key to maximize performance

      ALTER TABLE dbo.Tally

            ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N)

    --===== Allow the general public to use it

      GRANT SELECT ON dbo.Tally TO PUBLIC

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    Good one, you nailed it. I was trying something like it with a cross join to create your tally table. I like your solution better; much less code and the tally table might be usefuli in the future.

    Thanks.

    Perry

     

  • Aye... and thank you for the feedback, Perry.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • p.s. You would be amazed at just how useful that Tally table is gonna be...  and, a lot of the time, it will greatly enhance the performance of your code... for example... how would you find the number of Tuesdays between any two dates (well, up to 30 years worth)?  Take a look at the following post...

    http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=169&messageid=313277

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    What a clean solution!

    Will it work on the following example? Instead of splitting one delimited column into multiple rows, I need to separate values that are all in one column that are semi column ( limited.

     

    For example I have a column called tryit and the value = '1234;456.75;01/01/2001;ABBR;T...@GSU.ORG;75%;$12.75;'.

    I want to write a query so that the results can be split out to 7 output

    columns. In other words, I would like the output to be:

     Column 1 = 1234,

     Column 2 = 456.75,  

     Column 3 = 01/01/2001,

     Column 4 = ABBR,

     Column 5 = T...@GSU.ORG,

     Column 6 = 75%,

     Column 7 =  $12.75  

    Thanks!

    Yiming

  • First... replace all the CHAR(13) occurances with ';' including the single quotes.  Some use similar code to make a function where they can pass the delimiter to the function but almost nothing beats good inline code for performance (there are rare exceptions on multi-processor boxes depending on what you call "performance").

    Then, you'll need to use that code as a derived table and do a "cross-tab" bit of code to create your rows.  If this is from a file, I'm thinking that BCP or BULK INSERT would be better suited for the job, though.

    And thanks for the compliment!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    I use Bulk Insert. It works well for most of the strings, but fails for the ones that have double quotes("). So I have to import the whole string to one column first and split the delimited column into different columns.

    Paul Cresham's FromProgressArray function works great for this case -  http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=8&messageid=229031. I think your Tally table could be a very good idea as well and it DOES work great!

    Thanks,

    Yiming

  • Paul Cresham's function is great... I have a similar one that uses a Tally table... thought you might enjoy seeing it... the documentation is longer than the function itself...

    CREATE FUNCTION dbo.fSplit

    /****************************************************************************************

     Purpose:

     This function splits names and strings of words based on a  delimiter of up to 255

     characters.

     Notes:

     1. Max length of string that can be split is 8000-(Length of delimiter X 2)

     2. Designed for use on MS-SQL Server 2000 and MSDE 2000.

     3. Will error under the following conditions:

        a. Length of delimiter <1 or >255

        b. Length of string + (Length of delimiter X 2) > 8000

     4. Unexpected results will occur if a leading or trailing delimiter is included in the

        string to be parsed

     5. Returns a blank (space) when multiple delimiters are encountered (ie. 'Jeff,,Moden'

        would return a blank as the 2nd word).

     Usage:

     dbo.fSplit(StringToBeSplit,SegmentToReturn,DelimiterString)

    -----------------------------------------------------------------------------------------

     Revisions:

     Rev 00 - 06/05/2005 - Jeff Moden - Initial creation and test (single space delimiter)

     Rev 01 - 06/06/2005 - Jeff Moden - Add delimiter character as an input

     Rev 02 - 08/17/2005 - Jeff Moden - Increase possible size of delimiter to 10 characters

     Rev 03 - 09/03/2005 - Jeff Moden - Increase possible size of delimiter to 255 characters

    ****************************************************************************************/

    --===== Define the input parameters

           (

            @pString VARCHAR(8000), --String to be split

            @pWordNum SMALLINT,     --Number of the word to return

            @pDelim VARCHAR(255)    --Delimiter to base the split on (max is 255 or BOOM!)

            )

    --===== Define the return type

    RETURNS VARCHAR(8000)

    AS

    BEGIN

    --======================================================================================

    --      Body of function

    --======================================================================================

    --===== Declare the local variables

    DECLARE @LenDelim TINYINT --Holds length of delimiter (max is 255 or BOOM!)

    DECLARE @Words TABLE

            (

            WordNum INT IDENTITY(1,1), --The number of the word that has been split out

            Word    VARCHAR(8000)      --The word that has been split out

            )

    --===== Determine length of delimiter.  Had to do this way because spaces have 0 for a

         -- value of LEN

        SET @LENDelim = LEN('|'+@pDelim+'|')-2

    --===== Prepare the string to be split for parsing by adding a delimiter to each end

        SET @pString = @pDelim+@pString+@pDelim

    --===== Do the parsing and store the parsed words in the table variable

     INSERT INTO @Words (Word)

     SELECT SUBSTRING(@pString,    --What to split

                      N+@LenDelim, --Where to start

                      CHARINDEX(@pDelim,@pString,N+@LenDelim)-@LenDelim-N --Length

                     )

       FROM dbo.Tally

      WHERE SUBSTRING(@pString,N,@LenDelim) = @pDelim --Finds leading delimiters

        AND N < LEN(@pString)-@LenDelim               --except the last 1

      ORDER BY N --Force the order of words or segments to be correct

    --===== Return the word indicated by the input parameter

     RETURN (SELECT Word

               FROM @Words

              WHERE WordNum = @pWordNum)

    --======================================================================================

    --      End of function

    --======================================================================================

    END

    ...and I owe my good fortune with Tally tables to another fellow by the name of Adam Mechanic.

    The reason why I didn't use a function like dbo.fSplit to solve this original problem is that the code to split the whole table runs a fair bit faster, in most cases.  People forget that functions are a form of RBAR (pronounced "ree-bar" and is a "Moden-ism" for "Row By Agonizing Row ) and, if written incorrectly, can really slow things down.  I equate them to a stored procedure that only returns a single value.

    On the Bulk Insert thing.... yeah, I love it... fastest gun in the West.  And, like BCP, it can use a "Format" file... they take a bit of study and some trial and error because the documentation Microsoft provides on the tool is sparse, sometimes incorrect, and poorly written at best.

    Soooooo.... here's another treat... it's a format file that I wrote to import Quoted CSV like what you are asking for...

    8.0
    27
    1SQLCHAR01""0leadingquote""
    2SQLCHAR060"\",\""2State""
    3SQLCHAR060"\",\""3NpaNxx""
    4SQLCHAR060"\",\""4RateCenter""
    5SQLCHAR060"\",\""5SwitchCLLI""
    6SQLCHAR060"\",\""6Lata""
    7SQLCHAR060"\",\""7NPA""
    8SQLCHAR060"\",\""8NewNpa""
    9SQLCHAR060"\",\""9NXX""
    10SQLCHAR060"\",\""10City""
    11SQLINT060"\",\""11TimeZoneOffset""
    12SQLBIT060"\",\""12ObservesDST""
    13SQLCHAR060"\","13County""
    14SQLINT060",\""14CountyPopulation""
    15SQLCHAR060"\","15ZipCode""
    16SQLINT060","16ZipCodeCount""
    17SQLINT060",\""17ZipCodeFrequency""
    18SQLCHAR060"\",\""18FIPS""
    19SQLCHAR060"\",\""19MSACBSA""
    20SQLINT060"\",\""20MSACBSACode""
    21SQLBIT060"\",\""21Overlay""
    22SQLCHAR060"\",\""22NxxUseType""
    23SQLDATETIME060"\","23NxxIntroVersion""
    24SQLDECIMAL060","24Latitude""
    25SQLDECIMAL060",\""25Longitude""
    26SQLCHAR060"\",\""26OCN""
    27SQLCHAR060"\"\r\n"27Company""

    ...the \" is like \r and \n but is how you mark double quotes as part of the delimiter.  So, to mark a delimiter as "," (includes the double quotes), you need \",\" and that needs to be enclosed in double quotes which gives you "\",\""  as the delimiter in many cases.

    By the way, here's a couple of rows of data from the raw file to play with if you want...

    "State","NPANXX","Rate_Center","Switch_CLLI","LATA","NPA","New_Npa","NXX","City","TimeZone","Observes_DST","County","County_Pop__x1000_","ZipCode_PostalCode","ZipCode_Count","ZipCode_Freq_","FIPS","MSA_CBSA","MSA_CBSA_CODE","Overlay","NXX_Use_Type","NXX_Intro_version","Latitude","Longitude","OCN","Company"

    "OH","740365","CALDWELL","CLMDOHZIBMD","324","740","","365","NEWCOMERSTOWN","5","1","TUSCARAWAS",91,"43832",0,-1,"39157021700","New Philadelphia-Dover, OH","35420","0","L","2001-10-31",40.26,-81.60,"4863","LEVEL 3 COMMUNICATIONS, LLC - OH"

    "OH","740366","NEWARK","NWRKOHXB36C","324","740","","366","NEWARK","5","1","LICKING",145,"43055",4294,99,"39089751600","Columbus, OH","18140","0","L","1998-01-25",40.08,-82.42,"0665","ALLTEL OHIO, INC. - WESTERN OHIO"

    "OH","740367","CHESHIRE","CHSGOH36RS1","324","740","","367","CHESHIRE","5","1","GALLIA",31,"45620",447,53,"39053953500","Point Pleasant, WV-OH","38580","0","L","1998-01-25",38.94,-82.14,"9321","AMERITECH OHIO"

    "OH","740367","CHESHIRE","CHSGOH36RS1","324","740","","367","CHESHIRE","5","1","GALLIA",31,"45631",273,32,"39053953500","Point Pleasant, WV-OH","38580","0","L","1998-01-25",38.94,-82.14,"9321","AMERITECH OHIO"

    "OH","740367","CHESHIRE","CHSGOH36RS1","324","740","","367","CHESHIRE","5","1","GALLIA",31,"45614",124,15,"39053953500","Point Pleasant, WV-OH","38580","0","L","1998-01-25",38.94,-82.14,"9321","AMERITECH OHIO"

    "OH","740368","DELAWARE","DLWROHXADS0","324","740","","368","DELAWARE","5","1","DELAWARE",110,"43015",379,97,"39041010200","Columbus, OH","18140","0","L","1998-05-18",40.30,-83.06,"0615","VERIZON NORTH INC.-OH"

    "OH","740369","DELAWARE","DLWROHXADS0","324","740","","369","DELAWARE","5","1","DELAWARE",110,"43015",3728,96,"39041010200","Columbus, OH","18140","0","L","1998-05-18",40.30,-83.06,"0615","VERIZON NORTH INC.-OH"

    "OH","740369","DELAWARE","DLWROHXADS0","324","740","","369","DELAWARE","5","1","DELAWARE",110,"43061",74,2,"39041010200","Columbus, OH","18140","0","L","1998-05-18",40.30,-83.06,"0615","VERIZON NORTH INC.-OH"

    "OH","740370","PORTSMOUTH","CLMDOHZIBMD","324","740","","370","PORTSMOUTH","5","1","SCIOTO",79,"45662",0,-1,"39145993600","Portsmouth, OH","39020","0","L","2002-10-25",38.73,-83.00,"4863","LEVEL 3 COMMUNICATIONS, LLC - OH"

    "OH","740372","PORTSMOUTH","OTWYOHXBRS0","324","740","","372","OTWAY","5","1","SCIOTO",79,"45657",647,65,"39145992300","Portsmouth, OH","39020","0","L","1998-01-25",38.86,-83.18,"0615","VERIZON NORTH INC.-OH"

    "OH","740372","PORTSMOUTH","OTWYOHXBRS0","324","740","","372","OTWAY","5","1","SCIOTO",79,"45652",203,20,"39145992300","Portsmouth, OH","39020","0","L","1998-01-25",38.86,-83.18,"0615","VERIZON NORTH INC.-OH"

    Obviously, I include the "FIRSTROW = 2" option of Bulk Insert to skip the delimited header record.

    The file I import with this is the ZipCode/NpaNxx cross reference file that a company called "Maponics" provides me with.  It has 427 thousand records and takes 27 seconds to import into a fully formed table that's ready to rock.

    Do notice how the leading quote is handled (ignored) in the format file... that's one of the main keys.

    Hope this helps...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply