SQLServerCentral Article

Custom String Functions

,

When I got done reading Stephen Lasham's article Extracting a String Within Delimiters - Part 2, I thought that I would share with the community, some functions I had just written a couple weeks earlier. I have found that I do a lot of string parsing where I work. It turns out that I am always either reading flat files into SQL, or having to parse some table a Web Developers populates with delaminated strings. As such, I happen to have two functions I've written that I think you may find interesting.

The Problem

For a lot of cases I can use DTS to bring in the flat files, and I do. But what if I have a situation where the flat files have the same kind of data, but different column lengths. I ran into this when I started setting up Performance Logs on all our servers. A Server with 4 processors will have more columns than a server with 2. But the type of data I'll still be trapping, CPU data, will still be the same. Wouldn't it be nice to put all those logs into a central location and have one SQL server pull them in? To do this though, I would have to create a DTS package for each log file. Or, create some convoluted DTS logic to handle it.

What about when I get a Web Developer that wants to dump her data into SQL? It's early in the project and she doesn't know how many columns she will be eventually giving you. Wouldn't it be nice to give her a two-column table? One column for the key information, and one column for data values? She could then concatenate her Key values and Data values and dump them into two varchar fields. Later you could just parse them up, and if there are extra columns you either discard them, or handle them with some logic.

In either circumstance, wouldn't it be easy if we could pull the data into SQL, but then have the parsing ability of the DTS package?

The Solution

I've written two custom functions, and they are:

FN_Repetitive_Str_Parse & FN_PatCount

FN_Repetitive_Str_Parse is just as the name suggests. It will take in three parameters: The string you want to parse, the delimiter, and the number of the column you want returned. It will then return the data from the column you specify.

FN_PatCount is much like PATINDEX, but instead of giving you the position of the pattern in the string, It will count the number of times a the pattern occurs within a string. The return is an int value. This is Useful for dynamically finding out how many columns are in your string. You can then use this number to iterate through the string and lift out the variable columns.

This sample code is lifted right from the header of Repetitive String Parse. It demonstrates the ability of the function to turn two delimited values into a 5 column table.

          DECLARE @KEY_DATA VARCHAR(200),
                  @VAL_DATA VARCHAR(200)
          SELECT  @KEY_DATA = 'AB12345~PA~ORD0001',
                  @VAL_DATA = '253~USER1'
          SELECT  dbo.FN_Repetitive_Str_Parse (@KEY_DATA, '~', 1) AS PART,
                  dbo.FN_Repetitive_Str_Parse (@KEY_DATA, '~', 2) AS LOC,
                  dbo.FN_Repetitive_Str_Parse (@KEY_DATA, '~', 3) AS [ORDER],
                  dbo.FN_Repetitive_Str_Parse (@VAL_DATA, '~', 1) AS SHIP_QTY,
                  dbo.FN_Repetitive_Str_Parse (@VAL_DATA, '~', 2) AS NETID,
                  GETDATE() AS DATE_STAMP

This next bit of code is lifted out of a script I wrote to parse Performance logs from various computers. The number of data elements is dependant upon the hardware of the system generating the log, and thus varies from system to system. I wanted to make the parsing algorithm as general as possible, and thus translated a horizontal table into a vertical one. Once the raw data (comma delaminated, Quote-string Identified) is Bulk imported into a holding table, this part of the script parses it into a vertical table. Another script reads this 'parsed' vertical table, and depending on column labels, reconstructs the data into a RDS architecture. But I digress. Here's the relevant parsing section

CREATE TABLE #LOG_OUTPUT
( FILNAME  VARCHAR(80),
  FILDATE  DATETIME,
  LOGINFO  VARCHAR(150)
)
---===<<<  Some DDL so there's output to view  >>>===---
INSERT INTO #LOG_OUTPUT VALUES('PERFMON_LOGFILE1', '2004-09-22 10:00:00','"COLUMN1","COLUMN2","COLUMN3","COLUMNn"')
INSERT INTO #LOG_OUTPUT VALUES('PERFMON_LOGFILE1', '2004-09-22 10:00:00','"DATAC1_1","DATAC2_1","DATAC3_1","DATACn_1"')
INSERT INTO #LOG_OUTPUT VALUES('PERFMON_LOGFILE1', '2004-09-22 10:00:00','"DATAC1_2","DATAC2_2","DATAC3_2","DATACn_2"')
INSERT INTO #LOG_OUTPUT VALUES('PERFMON_LOGFILE1', '2004-09-22 10:00:00','"DATAC1_3","DATAC2_3","DATAC3_3","DATACn_3"')
INSERT INTO #LOG_OUTPUT VALUES('PERFMON_LOGFILE1', '2004-09-22 10:00:00','"DATAC1_4","DATAC2_4","DATAC3_4","DATACn_4"')
---===<<<  This is the code that parses the Log files  >>>===---
DECLARE @PARSE_COUNT  SMALLINT,
        @FIELD_IDX    SMALLINT,
        @ROW_ID       SMALLINT,
        @OLD_DATE     DATETIME,
        @FILEDATE     DATETIME,
        @LOGINFO      VARCHAR(2000)
 
 
 
DECLARE @PARSE_TBL TABLE
( DATESTAMP    DATETIME,
  COLUMN_HEAD  VARCHAR(255),
  DATA         VARCHAR(255)
)
 
DECLARE PARSE_PERFLOGS CURSOR FOR
SELECT FILDATE, LOGINFO FROM #LOG_OUTPUT
 
OPEN PARSE_PERFLOGS
FETCH NEXT FROM PARSE_PERFLOGS
INTO @FILEDATE, @LOGINFO
 
SELECT @ROW_ID = 1,
       @OLD_DATE = @FILEDATE
 
WHILE @@FETCH_STATUS = 0
BEGIN 
  SELECT @PARSE_COUNT = DBO.FN_PATCOUNT (',"', @LOGINFO),
         @FIELD_IDX = 1
  IF @ROW_ID = 1
  WHILE @FIELD_IDX <= @PARSE_COUNT
    BEGIN
      INSERT INTO @PARSE_TBL
      SELECT  @FILEDATE AS FILEDATE, 
              dbo.FN_Repetitive_Str_Parse (@LOGINFO, ',"', @FIELD_IDX) AS COLUMN_HEAD,
              @FIELD_IDX AS DATA
  
      SET @FIELD_IDX = @FIELD_IDX + 1
  END
  ELSE
    WHILE @FIELD_IDX <= @PARSE_COUNT
    BEGIN
      INSERT INTO @PARSE_TBL
      SELECT  @FILEDATE AS FILEDATE,
              @FIELD_IDX AS FIELD,
              dbo.FN_Repetitive_Str_Parse (@LOGINFO, ',"', @FIELD_IDX) AS COLUMN_HEAD
  
      SET @FIELD_IDX = @FIELD_IDX + 1
    END
 
  FETCH NEXT FROM PARSE_PERFLOGS
  INTO @FILEDATE, @LOGINFO
 
  IF @OLD_DATE = @FILEDATE
    SET @ROW_ID = @ROW_ID + 1
  ELSE
    SET @ROW_ID = 1
  SET @OLD_DATE = @FILEDATE
 
END
CLOSE PARSE_PERFLOGS
DEALLOCATE PARSE_PERFLOGS
DROP TABLE #LOG_OUTPUT
SELECT * FROM @PARSE_TBL

Here's the source code for the custom Functions.

-------------------- dbo.FN_Repetitive_Str_Parse  ---------------------
IF EXISTS (SELECT NAME FROM SYSOBJECTS 
WHERE NAME = 'FN_Repetitive_Str_Parse')
   DROP FUNCTION FN_Repetitive_Str_Parse
GO
 
CREATE FUNCTION dbo.FN_Repetitive_Str_Parse 
/*****************************************************************
** Name        : dbo.FN_Repetitive_Str_Parse
**
** Description : DECLARE @KEY_DATA VARCHAR(200),
**                  @VAL_DATA VARCHAR(200)
**          SELECT  @KEY_DATA = 'AB12345~PA~ORD0001',
**                  @VAL_DATA = '253~USER1'
**          SELECT  dbo.FN_Repetitive_Str_Parse (@KEY_DATA, '~', 1) AS PART,
**                  dbo.FN_Repetitive_Str_Parse (@KEY_DATA, '~', 2) AS LOC,
**                  dbo.FN_Repetitive_Str_Parse (@KEY_DATA, '~', 3) AS [ORDER],
**                  dbo.FN_Repetitive_Str_Parse (@VAL_DATA, '~', 1) AS SHIP_QTY,
**                  dbo.FN_Repetitive_Str_Parse (@VAL_DATA, '~', 2) AS NETID,
**                  GETDATE() AS DATE_STAMP
**          
**            
**            
**            
**            
**            
** Written By  : Chris Cathers 8/17/04
**
** Uses Tables:  
**
** Parameters  : 
**
** Returns     : 
**
** Modifications:  
**            
**            
**                  
**            
**                  
*****************************************************************/(        @STR2PARSE           VARCHAR(5000), 
         @CHECK_CHAR          VARCHAR(10),
         @DATA_ELEMENT        SMALLINT
)
RETURNS VARCHAR(100)
 
AS
 
BEGIN
DECLARE @STR_VAL       VARCHAR(100),
        @STR_INC       SMALLINT,
        @STR_LEN       SMALLINT,
        @FIELD_CNT     TINYINT,
        @CHK_LEN       TINYINT,
        @MAX_FIELD     TINYINT,
        @FIELD_IDX     SMALLINT,
        @VAL_INDEX     SMALLINT,
        @TEST_CHAR     VARCHAR(10),
        @TEST_STRING1  VARCHAR(100),
        @RETURN_STAT   TINYINT
 
 
---===<<<  Create temp table that contains Positional Data  >>>===---
 
DECLARE @POS_DATA TABLE
( Element_type     CHAR(4),
  Element_Count    TINYINT,
  Element_Position SMALLINT
)
DECLARE @VAL_LIST TABLE
( STR_VALUE        VARCHAR(800),
  DATA_ELEMENT     SMALLINT
)
 
 
---===<<<  Var Initilalization  >>>===---
 
 
SELECT @STR_LEN = LEN(@STR2PARSE),
       @STR_INC = 0,
       @FIELD_CNT = 0,
       @CHK_LEN = LEN(@CHECK_CHAR),
       @TEST_CHAR = ''
 
---===<<<  Parse string   >>>===---
-- ****************************************
-- Generate a table recording the position in the string
-- where the delimiter exists.
-- ****************************************
WHILE @STR_INC <= @STR_LEN
  BEGIN
    IF SUBSTRING(@STR2PARSE, @STR_INC - @CHK_LEN, @CHK_LEN) = @CHECK_CHAR
    BEGIN
      SET @FIELD_CNT = @FIELD_CNT + 1
      INSERT INTO @POS_DATA
      VALUES ('LIST', @FIELD_CNT, @STR_INC)
    END
    SET @STR_INC = @STR_INC + 1
  END 
 
-- ****************************************
-- Test to see if the string ended with the delimiter.
-- If it did not, Record the position of the last field.
-- ****************************************
IF (SELECT SUBSTRING(@STR2PARSE, MAX(ELEMENT_POSITION) + 1, 1)
              FROM @POS_DATA) <> ''
  BEGIN
    SET @FIELD_CNT = @FIELD_CNT + 1
    INSERT INTO @POS_DATA
    VALUES ('LIST', @FIELD_CNT, LEN(@STR2PARSE) + 1)
  END
 
 
 
---===<<<  Parse values and Columns  >>>===---
SELECT  @FIELD_CNT = MIN(ELEMENT_COUNT),
        @MAX_FIELD = MAX(ELEMENT_COUNT)
  FROM @POS_DATA
 
WHILE @FIELD_CNT <= @MAX_FIELD
BEGIN
  SELECT @FIELD_IDX = CASE 
      WHEN @FIELD_CNT = 1 THEN 1
      ELSE @STR_INC - @CHK_LEN + 1
    END
 
  SELECT @STR_INC = ELEMENT_POSITION FROM @POS_DATA
      WHERE ELEMENT_TYPE = 'LiST'
      AND   ELEMENT_COUNT = @FIELD_CNT
 
  IF @FIELD_CNT = @MAX_FIELD
    SET @TEST_STRING1 = LTRIM(SUBSTRING(@STR2PARSE, @FIELD_IDX, @STR_INC - @FIELD_IDX))
  ELSE
    SET @TEST_STRING1 = LTRIM(SUBSTRING(@STR2PARSE, @FIELD_IDX, @STR_INC - @FIELD_IDX - @CHK_LEN))
 
    INSERT INTO @VAL_LIST
    SELECT @TEST_STRING1,
           @FIELD_CNT
 
  SELECT @FIELD_CNT = @FIELD_CNT + 1
 
END
 
-- **********************************************************
-- Fetch the value requested by user.
-- If necessary clean out any occurrences of the delimiter
-- **********************************************************
SELECT @STR_VAL = CASE
          WHEN CHARINDEX(@CHECK_CHAR, STR_VALUE, 2) > 0 THEN LEFT(STR_VALUE, LEN(STR_VALUE) - @CHK_LEN)
          ELSE STR_VALUE 
        END
  FROM @VAL_LIST
  WHERE DATA_ELEMENT = @DATA_ELEMENT  
 
 
RETURN ISNULL(@STR_VAL, '')
 
END
GO
GRANT  EXECUTE ON [dbo].[FN_Repetitive_Str_Parse]  TO [PUBLIC]
GO
-----------------------------------------------------------------------
 
-------------------------- FN_PATCOUNT  -------------------------------
IF EXISTS (SELECT NAME FROM SYSOBJECTS 
WHERE NAME = 'FN_PATCOUNT'
AND XTYPE = 'FN')
   DROP FUNCTION FN_PATCOUNT
GO
 
CREATE FUNCTION FN_PATCOUNT 
/*****************************************************************
** Name        : FN_PATCOUNT
**
** Description : Will return a count of how many times the search
**            pattern occurs in the submitted string. 
**            (A custom Metadata function)
**            
**            
** Written By  : Chris Cathers 2004-09-03
**
** Uses Tables:  
**
** Parameters  : 
**
** Returns     : 
**
** Modifications:  
**            
**            
**                  
**            
**                  
*****************************************************************/(       @PATX      VARCHAR(255),
        @STR       VARCHAR(8000)
)
RETURNS SMALLINT
AS
BEGIN
 
        
DECLARE @PATCOUNT  SMALLINT,
        @PATIDX    SMALLINT,
        @PATLEN    TINYINT
 
SELECT  @PATCOUNT = 1,
        @PATIDX = 1,
        @PATLEN = LEN(@PATX)
 
WHILE @PATIDX <= LEN(@STR)
BEGIN
 IF (SELECT SUBSTRING(@STR, @PATIDX - @PATLEN, @PATLEN)) = @PATX
  SET @PATCOUNT = @PATCOUNT + 1
  SET @PATIDX = @PATIDX + 1
END
 
RETURN @PATCOUNT
END
GO
GRANT  EXECUTE   ON [dbo].[FN_PATCOUNT]  TO [PUBLIC]
GO
-----------------------------------------------------------------------

Conclusions

With the creative use of the canned string functions that come with SQL, we can create some custom functions that can make everyone's job a lot easier.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating