When I got done reading Stephen Lasham's article Extracting a String Within Delimiters - Part 2, I thought that I would share with the community, some functions I had just written a couple weeks earlier. I have found that I do a lot of string parsing where I work. It turns out that I am always either reading flat files into SQL, or having to parse some table a Web Developers populates with delaminated strings. As such, I happen to have two functions I've written that I think you may find interesting.
The Problem
For a lot of cases I can use DTS to bring in the flat files, and I do. But what if I have a situation where the flat files have the same kind of data, but different column lengths. I ran into this when I started setting up Performance Logs on all our servers. A Server with 4 processors will have more columns than a server with 2. But the type of data I'll still be trapping, CPU data, will still be the same. Wouldn't it be nice to put all those logs into a central location and have one SQL server pull them in? To do this though, I would have to create a DTS package for each log file. Or, create some convoluted DTS logic to handle it.
What about when I get a Web Developer that wants to dump her data into SQL? It's early in the project and she doesn't know how many columns she will be eventually giving you. Wouldn't it be nice to give her a two-column table? One column for the key information, and one column for data values? She could then concatenate her Key values and Data values and dump them into two varchar fields. Later you could just parse them up, and if there are extra columns you either discard them, or handle them with some logic.
In either circumstance, wouldn't it be easy if we could pull the data into SQL, but then have the parsing ability of the DTS package?
The Solution
I've written two custom functions, and they are:
FN_Repetitive_Str_Parse & FN_PatCount
FN_Repetitive_Str_Parse is just as the name suggests. It will take in three parameters: The string you want to parse, the delimiter, and the number of the column you want returned. It will then return the data from the column you specify.
FN_PatCount is much like PATINDEX, but instead of giving you the position of the pattern in the string, It will count the number of times a the pattern occurs within a string. The return is an int value. This is Useful for dynamically finding out how many columns are in your string. You can then use this number to iterate through the string and lift out the variable columns.
This sample code is lifted right from the header of Repetitive String Parse. It demonstrates the ability of the function to turn two delimited values into a 5 column table.
DECLARE @KEY_DATA VARCHAR(200),
@VAL_DATA VARCHAR(200)
SELECT @KEY_DATA = 'AB12345~PA~ORD0001',
@VAL_DATA = '253~USER1'
SELECT dbo.FN_Repetitive_Str_Parse (@KEY_DATA, '~', 1) AS PART,
dbo.FN_Repetitive_Str_Parse (@KEY_DATA, '~', 2) AS LOC,
dbo.FN_Repetitive_Str_Parse (@KEY_DATA, '~', 3) AS [ORDER],
dbo.FN_Repetitive_Str_Parse (@VAL_DATA, '~', 1) AS SHIP_QTY,
dbo.FN_Repetitive_Str_Parse (@VAL_DATA, '~', 2) AS NETID,
GETDATE() AS DATE_STAMP
This next bit of code is lifted out of a script I wrote to parse Performance logs from various computers. The number of data elements is dependant upon the hardware of the system generating the log, and thus varies from system to system. I wanted to make the parsing algorithm as general as possible, and thus translated a horizontal table into a vertical one. Once the raw data (comma delaminated, Quote-string Identified) is Bulk imported into a holding table, this part of the script parses it into a vertical table. Another script reads this 'parsed' vertical table, and depending on column labels, reconstructs the data into a RDS architecture. But I digress. Here's the relevant parsing section
CREATE TABLE #LOG_OUTPUT
( FILNAME VARCHAR(80),
FILDATE DATETIME,
LOGINFO VARCHAR(150)
)
---===<<< Some DDL so there's output to view >>>===---
INSERT INTO #LOG_OUTPUT VALUES('PERFMON_LOGFILE1', '2004-09-22 10:00:00','"COLUMN1","COLUMN2","COLUMN3","COLUMNn"')
INSERT INTO #LOG_OUTPUT VALUES('PERFMON_LOGFILE1', '2004-09-22 10:00:00','"DATAC1_1","DATAC2_1","DATAC3_1","DATACn_1"')
INSERT INTO #LOG_OUTPUT VALUES('PERFMON_LOGFILE1', '2004-09-22 10:00:00','"DATAC1_2","DATAC2_2","DATAC3_2","DATACn_2"')
INSERT INTO #LOG_OUTPUT VALUES('PERFMON_LOGFILE1', '2004-09-22 10:00:00','"DATAC1_3","DATAC2_3","DATAC3_3","DATACn_3"')
INSERT INTO #LOG_OUTPUT VALUES('PERFMON_LOGFILE1', '2004-09-22 10:00:00','"DATAC1_4","DATAC2_4","DATAC3_4","DATACn_4"')
---===<<< This is the code that parses the Log files >>>===---
DECLARE @PARSE_COUNT SMALLINT,
@FIELD_IDX SMALLINT,
@ROW_ID SMALLINT,
@OLD_DATE DATETIME,
@FILEDATE DATETIME,
@LOGINFO VARCHAR(2000)
DECLARE @PARSE_TBL TABLE
( DATESTAMP DATETIME,
COLUMN_HEAD VARCHAR(255),
DATA VARCHAR(255)
)
DECLARE PARSE_PERFLOGS CURSOR FOR
SELECT FILDATE, LOGINFO FROM #LOG_OUTPUT
OPEN PARSE_PERFLOGS
FETCH NEXT FROM PARSE_PERFLOGS
INTO @FILEDATE, @LOGINFO
SELECT @ROW_ID = 1,
@OLD_DATE = @FILEDATE
WHILE @@FETCH_STATUS = 0
BEGIN
SELECT @PARSE_COUNT = DBO.FN_PATCOUNT (',"', @LOGINFO),
@FIELD_IDX = 1
IF @ROW_ID = 1
WHILE @FIELD_IDX <= @PARSE_COUNT
BEGIN
INSERT INTO @PARSE_TBL
SELECT @FILEDATE AS FILEDATE,
dbo.FN_Repetitive_Str_Parse (@LOGINFO, ',"', @FIELD_IDX) AS COLUMN_HEAD,
@FIELD_IDX AS DATA
SET @FIELD_IDX = @FIELD_IDX + 1
END
ELSE
WHILE @FIELD_IDX <= @PARSE_COUNT
BEGIN
INSERT INTO @PARSE_TBL
SELECT @FILEDATE AS FILEDATE,
@FIELD_IDX AS FIELD,
dbo.FN_Repetitive_Str_Parse (@LOGINFO, ',"', @FIELD_IDX) AS COLUMN_HEAD
SET @FIELD_IDX = @FIELD_IDX + 1
END
FETCH NEXT FROM PARSE_PERFLOGS
INTO @FILEDATE, @LOGINFO
IF @OLD_DATE = @FILEDATE
SET @ROW_ID = @ROW_ID + 1
ELSE
SET @ROW_ID = 1
SET @OLD_DATE = @FILEDATE
END
CLOSE PARSE_PERFLOGS
DEALLOCATE PARSE_PERFLOGS
DROP TABLE #LOG_OUTPUT
SELECT * FROM @PARSE_TBL
Here's the source code for the custom Functions.
-------------------- dbo.FN_Repetitive_Str_Parse ---------------------
IF EXISTS (SELECT NAME FROM SYSOBJECTS
WHERE NAME = 'FN_Repetitive_Str_Parse')
DROP FUNCTION FN_Repetitive_Str_Parse
GO
CREATE FUNCTION dbo.FN_Repetitive_Str_Parse
/*****************************************************************
** Name : dbo.FN_Repetitive_Str_Parse
**
** Description : DECLARE @KEY_DATA VARCHAR(200),
** @VAL_DATA VARCHAR(200)
** SELECT @KEY_DATA = 'AB12345~PA~ORD0001',
** @VAL_DATA = '253~USER1'
** SELECT dbo.FN_Repetitive_Str_Parse (@KEY_DATA, '~', 1) AS PART,
** dbo.FN_Repetitive_Str_Parse (@KEY_DATA, '~', 2) AS LOC,
** dbo.FN_Repetitive_Str_Parse (@KEY_DATA, '~', 3) AS [ORDER],
** dbo.FN_Repetitive_Str_Parse (@VAL_DATA, '~', 1) AS SHIP_QTY,
** dbo.FN_Repetitive_Str_Parse (@VAL_DATA, '~', 2) AS NETID,
** GETDATE() AS DATE_STAMP
**
**
**
**
**
**
** Written By : Chris Cathers 8/17/04
**
** Uses Tables:
**
** Parameters :
**
** Returns :
**
** Modifications:
**
**
**
**
**
*****************************************************************/( @STR2PARSE VARCHAR(5000),
@CHECK_CHAR VARCHAR(10),
@DATA_ELEMENT SMALLINT
)
RETURNS VARCHAR(100)
AS
BEGIN
DECLARE @STR_VAL VARCHAR(100),
@STR_INC SMALLINT,
@STR_LEN SMALLINT,
@FIELD_CNT TINYINT,
@CHK_LEN TINYINT,
@MAX_FIELD TINYINT,
@FIELD_IDX SMALLINT,
@VAL_INDEX SMALLINT,
@TEST_CHAR VARCHAR(10),
@TEST_STRING1 VARCHAR(100),
@RETURN_STAT TINYINT
---===<<< Create temp table that contains Positional Data >>>===---
DECLARE @POS_DATA TABLE
( Element_type CHAR(4),
Element_Count TINYINT,
Element_Position SMALLINT
)
DECLARE @VAL_LIST TABLE
( STR_VALUE VARCHAR(800),
DATA_ELEMENT SMALLINT
)
---===<<< Var Initilalization >>>===---
SELECT @STR_LEN = LEN(@STR2PARSE),
@STR_INC = 0,
@FIELD_CNT = 0,
@CHK_LEN = LEN(@CHECK_CHAR),
@TEST_CHAR = ''
---===<<< Parse string >>>===---
-- ****************************************
-- Generate a table recording the position in the string
-- where the delimiter exists.
-- ****************************************
WHILE @STR_INC <= @STR_LEN
BEGIN
IF SUBSTRING(@STR2PARSE, @STR_INC - @CHK_LEN, @CHK_LEN) = @CHECK_CHAR
BEGIN
SET @FIELD_CNT = @FIELD_CNT + 1
INSERT INTO @POS_DATA
VALUES ('LIST', @FIELD_CNT, @STR_INC)
END
SET @STR_INC = @STR_INC + 1
END
-- ****************************************
-- Test to see if the string ended with the delimiter.
-- If it did not, Record the position of the last field.
-- ****************************************
IF (SELECT SUBSTRING(@STR2PARSE, MAX(ELEMENT_POSITION) + 1, 1)
FROM @POS_DATA) <> ''
BEGIN
SET @FIELD_CNT = @FIELD_CNT + 1
INSERT INTO @POS_DATA
VALUES ('LIST', @FIELD_CNT, LEN(@STR2PARSE) + 1)
END
---===<<< Parse values and Columns >>>===---
SELECT @FIELD_CNT = MIN(ELEMENT_COUNT),
@MAX_FIELD = MAX(ELEMENT_COUNT)
FROM @POS_DATA
WHILE @FIELD_CNT <= @MAX_FIELD
BEGIN
SELECT @FIELD_IDX = CASE
WHEN @FIELD_CNT = 1 THEN 1
ELSE @STR_INC - @CHK_LEN + 1
END
SELECT @STR_INC = ELEMENT_POSITION FROM @POS_DATA
WHERE ELEMENT_TYPE = 'LiST'
AND ELEMENT_COUNT = @FIELD_CNT
IF @FIELD_CNT = @MAX_FIELD
SET @TEST_STRING1 = LTRIM(SUBSTRING(@STR2PARSE, @FIELD_IDX, @STR_INC - @FIELD_IDX))
ELSE
SET @TEST_STRING1 = LTRIM(SUBSTRING(@STR2PARSE, @FIELD_IDX, @STR_INC - @FIELD_IDX - @CHK_LEN))
INSERT INTO @VAL_LIST
SELECT @TEST_STRING1,
@FIELD_CNT
SELECT @FIELD_CNT = @FIELD_CNT + 1
END
-- **********************************************************
-- Fetch the value requested by user.
-- If necessary clean out any occurrences of the delimiter
-- **********************************************************
SELECT @STR_VAL = CASE
WHEN CHARINDEX(@CHECK_CHAR, STR_VALUE, 2) > 0 THEN LEFT(STR_VALUE, LEN(STR_VALUE) - @CHK_LEN)
ELSE STR_VALUE
END
FROM @VAL_LIST
WHERE DATA_ELEMENT = @DATA_ELEMENT
RETURN ISNULL(@STR_VAL, '')
END
GO
GRANT EXECUTE ON [dbo].[FN_Repetitive_Str_Parse] TO [PUBLIC]
GO
-----------------------------------------------------------------------
-------------------------- FN_PATCOUNT -------------------------------
IF EXISTS (SELECT NAME FROM SYSOBJECTS
WHERE NAME = 'FN_PATCOUNT'
AND XTYPE = 'FN')
DROP FUNCTION FN_PATCOUNT
GO
CREATE FUNCTION FN_PATCOUNT
/*****************************************************************
** Name : FN_PATCOUNT
**
** Description : Will return a count of how many times the search
** pattern occurs in the submitted string.
** (A custom Metadata function)
**
**
** Written By : Chris Cathers 2004-09-03
**
** Uses Tables:
**
** Parameters :
**
** Returns :
**
** Modifications:
**
**
**
**
**
*****************************************************************/( @PATX VARCHAR(255),
@STR VARCHAR(8000)
)
RETURNS SMALLINT
AS
BEGIN
DECLARE @PATCOUNT SMALLINT,
@PATIDX SMALLINT,
@PATLEN TINYINT
SELECT @PATCOUNT = 1,
@PATIDX = 1,
@PATLEN = LEN(@PATX)
WHILE @PATIDX <= LEN(@STR)
BEGIN
IF (SELECT SUBSTRING(@STR, @PATIDX - @PATLEN, @PATLEN)) = @PATX
SET @PATCOUNT = @PATCOUNT + 1
SET @PATIDX = @PATIDX + 1
END
RETURN @PATCOUNT
END
GO
GRANT EXECUTE ON [dbo].[FN_PATCOUNT] TO [PUBLIC]
GO
-----------------------------------------------------------------------
Conclusions
With the creative use of the canned string functions that come with SQL, we can create some custom functions that can make everyone's job a lot easier.