October 7, 2006 at 11:29 am
Hi,
I have a record in a database that is text, separated by char(13). I would like to split this field into separate records and get the resulting table. I have found split stored functions on the net, but they always split some string that comes from outside the select/table. They never seem to show an example of spliting a field that came from the select statement/record in a table. Anyone know the answer?
For example, I see:
@foo = 'xxx xxx'
select * from split(@foo, ' ');
But that is rather useless. I want something more like:
select bigtext, id from sometable
<split(bigtext, char(13)) > cross join <with the id>
I would then wind up with multiple records for each id. In reality, bigtext is a field containing events all concatenated together but separated by char(13). But I need to split up the field for each record into multiple records so that I can then query based on date, event, etc. The resulting table would go from
id | text
1 "line1CRline2CRline3"
to something like
id | line
1 line1
1 line2
1 line3
Like I said above, I have found stored procedures (functions) that will split a string, and they work. I just haven't seen how to split an actual field in a record and recover the resulting values as multiple records in a new/temporary table.
Thanks.
Perry
October 7, 2006 at 6:41 pm
Actually, a function would slow stuff down here... you want the full table to be split... why do it a line at a time? Try this, instead...
--==================================================================================================
-- This section is just setting up for the demonstration and would not be included in final code
--==================================================================================================
--===== If the test table exists, drop it
IF OBJECT_ID('TempDB..#myHead') IS NOT NULL
DROP TABLE #myHead
--===== Create the test table
CREATE TABLE #myHead
(
RowNum INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
DelimitedText VARCHAR(8000)
)
--===== Populate the test table with items separated by Cr
INSERT INTO #myHead
SELECT 'hello'+CHAR(13)+'thanks'+CHAR(13)+'movies'+CHAR(13)+'tickets' UNION ALL
SELECT 'abc'+CHAR(13)+'de'+CHAR(13)+'fghi' UNION ALL
SELECT 'xyz' UNION ALL
SELECT 'Now is the time'+CHAR(13)+'for all good men'+CHAR(13)+'to come to the aid of their country'
--==================================================================================================
-- This demonstrates the solution
--==================================================================================================
--===== Split the items and return the RowNum (ID) where it came from, as well
SELECT RowNum,
SUBSTRING(CHAR(13) + h.DelimitedText + CHAR(13), t.N + 1,
CHARINDEX(CHAR(13), CHAR(13) + h.DelimitedText + CHAR(13), t.N + 1) - t.N - 1)
FROM dbo.Tally t,
#myHead h
WHERE SUBSTRING(CHAR(13) + h.DelimitedText + CHAR(13), t.N, 1) = CHAR(13)
AND t.N < LEN(CHAR(13) + h.DelimitedText + CHAR(13))
If you don't already have a "Tally" table, it's time to make one... here's how...
--===== Create and populate the Tally table on the fly
SELECT TOP 11000 --equates to more than 30 years of dates
IDENTITY(INT,1,1) AS N
INTO dbo.Tally
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2
--===== Add a Primary Key to maximize performance
ALTER TABLE dbo.Tally
ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N)
--===== Allow the general public to use it
GRANT SELECT ON dbo.Tally TO PUBLIC
--Jeff Moden
Change is inevitable... Change for the better is not.
October 8, 2006 at 5:16 pm
Jeff,
Good one, you nailed it. I was trying something like it with a cross join to create your tally table. I like your solution better; much less code and the tally table might be usefuli in the future.
Thanks.
Perry
October 8, 2006 at 5:25 pm
Aye... and thank you for the feedback, Perry.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 8, 2006 at 5:30 pm
p.s. You would be amazed at just how useful that Tally table is gonna be... and, a lot of the time, it will greatly enhance the performance of your code... for example... how would you find the number of Tuesdays between any two dates (well, up to 30 years worth)? Take a look at the following post...
http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=169&messageid=313277
--Jeff Moden
Change is inevitable... Change for the better is not.
October 9, 2006 at 10:40 pm
Jeff,
What a clean solution!
Will it work on the following example? Instead of splitting one delimited column into multiple rows, I need to separate values that are all in one column that are semi column ( limited.
For example I have a column called tryit and the value = '1234;456.75;01/01/2001;ABBR;T...@GSU.ORG;75%;$12.75;'.
I want to write a query so that the results can be split out to 7 output
columns. In other words, I would like the output to be:
Column 1 = 1234,
Column 2 = 456.75,
Column 3 = 01/01/2001,
Column 4 = ABBR,
Column 5 = T...@GSU.ORG,
Column 6 = 75%,
Column 7 = $12.75
Thanks!
Yiming
October 10, 2006 at 6:26 am
First... replace all the CHAR(13) occurances with ';' including the single quotes. Some use similar code to make a function where they can pass the delimiter to the function but almost nothing beats good inline code for performance (there are rare exceptions on multi-processor boxes depending on what you call "performance").
Then, you'll need to use that code as a derived table and do a "cross-tab" bit of code to create your rows. If this is from a file, I'm thinking that BCP or BULK INSERT would be better suited for the job, though.
And thanks for the compliment!
--Jeff Moden
Change is inevitable... Change for the better is not.
October 11, 2006 at 4:06 pm
Jeff,
I use Bulk Insert. It works well for most of the strings, but fails for the ones that have double quotes("). So I have to import the whole string to one column first and split the delimited column into different columns.
Paul Cresham's FromProgressArray function works great for this case - http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=8&messageid=229031. I think your Tally table could be a very good idea as well and it DOES work great!
Thanks,
Yiming
October 11, 2006 at 8:21 pm
Paul Cresham's function is great... I have a similar one that uses a Tally table... thought you might enjoy seeing it... the documentation is longer than the function itself...
CREATE FUNCTION dbo.fSplit
/****************************************************************************************
Purpose:
This function splits names and strings of words based on a delimiter of up to 255
characters.
Notes:
1. Max length of string that can be split is 8000-(Length of delimiter X 2)
2. Designed for use on MS-SQL Server 2000 and MSDE 2000.
3. Will error under the following conditions:
a. Length of delimiter <1 or >255
b. Length of string + (Length of delimiter X 2) > 8000
4. Unexpected results will occur if a leading or trailing delimiter is included in the
string to be parsed
5. Returns a blank (space) when multiple delimiters are encountered (ie. 'Jeff,,Moden'
would return a blank as the 2nd word).
Usage:
dbo.fSplit(StringToBeSplit,SegmentToReturn,DelimiterString)
-----------------------------------------------------------------------------------------
Revisions:
Rev 00 - 06/05/2005 - Jeff Moden - Initial creation and test (single space delimiter)
Rev 01 - 06/06/2005 - Jeff Moden - Add delimiter character as an input
Rev 02 - 08/17/2005 - Jeff Moden - Increase possible size of delimiter to 10 characters
Rev 03 - 09/03/2005 - Jeff Moden - Increase possible size of delimiter to 255 characters
****************************************************************************************/
--===== Define the input parameters
(
@pString VARCHAR(8000), --String to be split
@pWordNum SMALLINT, --Number of the word to return
@pDelim VARCHAR(255) --Delimiter to base the split on (max is 255 or BOOM!)
)
--===== Define the return type
RETURNS VARCHAR(8000)
AS
BEGIN
--======================================================================================
-- Body of function
--======================================================================================
--===== Declare the local variables
DECLARE @LenDelim TINYINT --Holds length of delimiter (max is 255 or BOOM!)
DECLARE @Words TABLE
(
WordNum INT IDENTITY(1,1), --The number of the word that has been split out
Word VARCHAR(8000) --The word that has been split out
)
--===== Determine length of delimiter. Had to do this way because spaces have 0 for a
-- value of LEN
SET @LENDelim = LEN('|'+@pDelim+'|')-2
--===== Prepare the string to be split for parsing by adding a delimiter to each end
SET @pString = @pDelim+@pString+@pDelim
--===== Do the parsing and store the parsed words in the table variable
INSERT INTO @Words (Word)
SELECT SUBSTRING(@pString, --What to split
N+@LenDelim, --Where to start
CHARINDEX(@pDelim,@pString,N+@LenDelim)-@LenDelim-N --Length
)
FROM dbo.Tally
WHERE SUBSTRING(@pString,N,@LenDelim) = @pDelim --Finds leading delimiters
AND N < LEN(@pString)-@LenDelim --except the last 1
ORDER BY N --Force the order of words or segments to be correct
--===== Return the word indicated by the input parameter
RETURN (SELECT Word
FROM @Words
WHERE WordNum = @pWordNum)
--======================================================================================
-- End of function
--======================================================================================
END
...and I owe my good fortune with Tally tables to another fellow by the name of Adam Mechanic.
The reason why I didn't use a function like dbo.fSplit to solve this original problem is that the code to split the whole table runs a fair bit faster, in most cases. People forget that functions are a form of RBAR (pronounced "ree-bar" and is a "Moden-ism" for "Row By Agonizing Row ) and, if written incorrectly, can really slow things down. I equate them to a stored procedure that only returns a single value.
On the Bulk Insert thing.... yeah, I love it... fastest gun in the West. And, like BCP, it can use a "Format" file... they take a bit of study and some trial and error because the documentation Microsoft provides on the tool is sparse, sometimes incorrect, and poorly written at best.
Soooooo.... here's another treat... it's a format file that I wrote to import Quoted CSV like what you are asking for...
8.0 | ||||||||||||||
27 | ||||||||||||||
1 | SQLCHAR | 0 | 1 | "" | 0 | leadingquote | "" | |||||||
2 | SQLCHAR | 0 | 60 | "\",\"" | 2 | State | "" | |||||||
3 | SQLCHAR | 0 | 60 | "\",\"" | 3 | NpaNxx | "" | |||||||
4 | SQLCHAR | 0 | 60 | "\",\"" | 4 | RateCenter | "" | |||||||
5 | SQLCHAR | 0 | 60 | "\",\"" | 5 | SwitchCLLI | "" | |||||||
6 | SQLCHAR | 0 | 60 | "\",\"" | 6 | Lata | "" | |||||||
7 | SQLCHAR | 0 | 60 | "\",\"" | 7 | NPA | "" | |||||||
8 | SQLCHAR | 0 | 60 | "\",\"" | 8 | NewNpa | "" | |||||||
9 | SQLCHAR | 0 | 60 | "\",\"" | 9 | NXX | "" | |||||||
10 | SQLCHAR | 0 | 60 | "\",\"" | 10 | City | "" | |||||||
11 | SQLINT | 0 | 60 | "\",\"" | 11 | TimeZoneOffset | "" | |||||||
12 | SQLBIT | 0 | 60 | "\",\"" | 12 | ObservesDST | "" | |||||||
13 | SQLCHAR | 0 | 60 | "\"," | 13 | County | "" | |||||||
14 | SQLINT | 0 | 60 | ",\"" | 14 | CountyPopulation | "" | |||||||
15 | SQLCHAR | 0 | 60 | "\"," | 15 | ZipCode | "" | |||||||
16 | SQLINT | 0 | 60 | "," | 16 | ZipCodeCount | "" | |||||||
17 | SQLINT | 0 | 60 | ",\"" | 17 | ZipCodeFrequency | "" | |||||||
18 | SQLCHAR | 0 | 60 | "\",\"" | 18 | FIPS | "" | |||||||
19 | SQLCHAR | 0 | 60 | "\",\"" | 19 | MSACBSA | "" | |||||||
20 | SQLINT | 0 | 60 | "\",\"" | 20 | MSACBSACode | "" | |||||||
21 | SQLBIT | 0 | 60 | "\",\"" | 21 | Overlay | "" | |||||||
22 | SQLCHAR | 0 | 60 | "\",\"" | 22 | NxxUseType | "" | |||||||
23 | SQLDATETIME | 0 | 60 | "\"," | 23 | NxxIntroVersion | "" | |||||||
24 | SQLDECIMAL | 0 | 60 | "," | 24 | Latitude | "" | |||||||
25 | SQLDECIMAL | 0 | 60 | ",\"" | 25 | Longitude | "" | |||||||
26 | SQLCHAR | 0 | 60 | "\",\"" | 26 | OCN | "" | |||||||
27 | SQLCHAR | 0 | 60 | "\"\r\n" | 27 | Company | "" |
...the \" is like \r and \n but is how you mark double quotes as part of the delimiter. So, to mark a delimiter as "," (includes the double quotes), you need \",\" and that needs to be enclosed in double quotes which gives you "\",\"" as the delimiter in many cases.
By the way, here's a couple of rows of data from the raw file to play with if you want...
"State","NPANXX","Rate_Center","Switch_CLLI","LATA","NPA","New_Npa","NXX","City","TimeZone","Observes_DST","County","County_Pop__x1000_","ZipCode_PostalCode","ZipCode_Count","ZipCode_Freq_","FIPS","MSA_CBSA","MSA_CBSA_CODE","Overlay","NXX_Use_Type","NXX_Intro_version","Latitude","Longitude","OCN","Company"
"OH","740365","CALDWELL","CLMDOHZIBMD","324","740","","365","NEWCOMERSTOWN","5","1","TUSCARAWAS",91,"43832",0,-1,"39157021700","New Philadelphia-Dover, OH","35420","0","L","2001-10-31",40.26,-81.60,"4863","LEVEL 3 COMMUNICATIONS, LLC - OH"
"OH","740366","NEWARK","NWRKOHXB36C","324","740","","366","NEWARK","5","1","LICKING",145,"43055",4294,99,"39089751600","Columbus, OH","18140","0","L","1998-01-25",40.08,-82.42,"0665","ALLTEL OHIO, INC. - WESTERN OHIO"
"OH","740367","CHESHIRE","CHSGOH36RS1","324","740","","367","CHESHIRE","5","1","GALLIA",31,"45620",447,53,"39053953500","Point Pleasant, WV-OH","38580","0","L","1998-01-25",38.94,-82.14,"9321","AMERITECH OHIO"
"OH","740367","CHESHIRE","CHSGOH36RS1","324","740","","367","CHESHIRE","5","1","GALLIA",31,"45631",273,32,"39053953500","Point Pleasant, WV-OH","38580","0","L","1998-01-25",38.94,-82.14,"9321","AMERITECH OHIO"
"OH","740367","CHESHIRE","CHSGOH36RS1","324","740","","367","CHESHIRE","5","1","GALLIA",31,"45614",124,15,"39053953500","Point Pleasant, WV-OH","38580","0","L","1998-01-25",38.94,-82.14,"9321","AMERITECH OHIO"
"OH","740368","DELAWARE","DLWROHXADS0","324","740","","368","DELAWARE","5","1","DELAWARE",110,"43015",379,97,"39041010200","Columbus, OH","18140","0","L","1998-05-18",40.30,-83.06,"0615","VERIZON NORTH INC.-OH"
"OH","740369","DELAWARE","DLWROHXADS0","324","740","","369","DELAWARE","5","1","DELAWARE",110,"43015",3728,96,"39041010200","Columbus, OH","18140","0","L","1998-05-18",40.30,-83.06,"0615","VERIZON NORTH INC.-OH"
"OH","740369","DELAWARE","DLWROHXADS0","324","740","","369","DELAWARE","5","1","DELAWARE",110,"43061",74,2,"39041010200","Columbus, OH","18140","0","L","1998-05-18",40.30,-83.06,"0615","VERIZON NORTH INC.-OH"
"OH","740370","PORTSMOUTH","CLMDOHZIBMD","324","740","","370","PORTSMOUTH","5","1","SCIOTO",79,"45662",0,-1,"39145993600","Portsmouth, OH","39020","0","L","2002-10-25",38.73,-83.00,"4863","LEVEL 3 COMMUNICATIONS, LLC - OH"
"OH","740372","PORTSMOUTH","OTWYOHXBRS0","324","740","","372","OTWAY","5","1","SCIOTO",79,"45657",647,65,"39145992300","Portsmouth, OH","39020","0","L","1998-01-25",38.86,-83.18,"0615","VERIZON NORTH INC.-OH"
"OH","740372","PORTSMOUTH","OTWYOHXBRS0","324","740","","372","OTWAY","5","1","SCIOTO",79,"45652",203,20,"39145992300","Portsmouth, OH","39020","0","L","1998-01-25",38.86,-83.18,"0615","VERIZON NORTH INC.-OH"
Obviously, I include the "FIRSTROW = 2" option of Bulk Insert to skip the delimited header record.
The file I import with this is the ZipCode/NpaNxx cross reference file that a company called "Maponics" provides me with. It has 427 thousand records and takes 27 seconds to import into a fully formed table that's ready to rock.
Do notice how the leading quote is handled (ignored) in the format file... that's one of the main keys.
Hope this helps...
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 9 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy