REGEXP - SQL Server 2008

  • Greetings all;

    Although I consider myself rather advanced at most of this stuff... I've run across a challenge I am struggling with. Some background:

    I have a table with a varchar text field. In that text field are going to be tokens or "data tags" that I want to find then ultimately replace. I figured I could use VBScript RegExp to do this. The EXECUTE method actually returns exactly what I'm looking for; Location of the RegExp expression then the text. However it is returning in a collection.

    What makes this data tag special is that it will contain a reference to a value located somewhere else in the database. (ie: <datatag qid="3" default="Last Name "/> )

    In essence this is working like a mail-merge.

    Design considerations were to;

    1) find all data tags in the <datatag qid.. (RegExp EXECUTE method would handle this nicely)

    2) load these tags in a temp table or table variable

    3) resolve all of the data tags in the temp table and loading them back to same

    4) replace all of the data tags in source data with resolved values.

    (ie. the answer for question ID 3 for this reference is "Smith" so store the data tag 3 and Smith )

    If I were to use the RegExp EXECUTE method which returns the results in a collection, how will I be able to access that information in my SQL Script (stored procedure)?

    Any thoughts?

    Kurt

    Sr. DBA

    Tarrytown, NY

    Kurt W. Zimmerman
    SR DBA
    Lefrak Organization
    New York, NY

    http://www.linkedin.com/in/kurtwzimmerman

  • I have this very same question. Have you learned anything more on this? I am starting to think this is not solvable using a direct call to the VBScript.regexp from sqlserver. I am thinking now that I need to design my own function and compile it into a DLL. That function will utilize the regexp object and return whatever I want from the resulting collection. Then I just make a single call to MY function from TSQL.

    I was just trying to avoid that since it creates another maintenance item for the sys admin. Would rather have all this native in my sql code.

  • The approach I took was what I previously described. I wrote a function that looks for the elements of the data tag and stores the result into a table variable. Once I've identified all of the data tags, which by the way, when inserting the data tag into the table variable I load up the value & default values as well. Once I've run to the end of the text field I then go back and perform the substitution.

    Here is my code:

    USE [CONX_Dev]

    GO

    /****** Object: UserDefinedFunction [dbo].[reResolveDataTagToAnswer] Script Date: 01/21/2011 15:05:02 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    -- =============================================

    -- Author:Kurt W. Zimmerman

    -- Create date: January 04, 2011

    -- Description:Function to resolve data tags and replace with answers

    -- =============================================

    ALTER FUNCTION [dbo].[reResolveDataTagToAnswer]

    (

    --declare

    -- Add the parameters for the function here

    @reResponseID int

    , @BodyString varchar(max)

    )

    RETURNS varchar(max)

    as

    begin

    declare @startLoc bigint

    declare @DataTag varchar(max)

    declare @DataTagLen int

    declare @Quote varchar(1)

    declare @EndDataTag varchar(max)

    declare @EndDataTagLen int

    declare @DataToken varchar(max)

    declare @MetaToken varchar(max)

    declare @FoundToken bit

    declare @EndDataTagLoc int

    declare @EndMetaTagLoc int

    declare @quQuestionID int

    declare @qidLoc int

    declare @qidLocEnd int

    declare @Answer varchar(max)

    declare @NumSubstutions int

    declare @SubSubstutionCnt int = 0

    declare @DefaultTag varchar(max)

    declare @DefaultTagLen int

    declare @DefaultTagLoc int

    declare @DefaultTagLocEnd int

    declare @DefaultAnswer varchar(max)

    declare @MetaTag varchar(max)

    declare @MetaTagLen int

    declare @EndMetaTag varchar(max)

    declare @EndMetaTagLen int

    declare @Results table (ID int primary key identity (1,1)

    ,DataToken varchar(max)

    ,quQuestionID int

    ,Answer varchar(max)

    ,DefaultAnswer varchar(max)

    )

    -- If the below data tag is to be used then the below predefined variables are to be used.

    --<datatag qid="3" default="Last Name "/>

    ----set @Quote = '"'

    ----set @DataTag = '<datatag qid="'

    ----set @DataTagLen = LEN(@DataTag)

    ----set @EndTag = '/>'

    ----set @EndTagLen = LEN(@EndTag)

    ----set @DefaultTag = 'default="'

    ----set @DefaultTagLen = LEN(@DefaultTag)

    --<datatag id='11191'>Phone Number Label</datatag>

    set @Quote = ''''

    set @DataTag = '<datatag id='''

    set @DataTagLen = LEN(@DataTag)

    set @EndDataTag = '</datatag>'

    set @EndDataTagLen = LEN(@EndDataTag)

    set @DefaultTag = '>'

    set @DefaultTagLen = LEN(@DefaultTag)

    set @MetaTag = '<metatag id='''

    set @MetaTagLen = LEN(@MetaTag)

    set @EndMetaTag = '</metatag>'

    set @EndMetaTagLen = LEN(@EndMetaTag)

    --<matatag id='1'>Status</matatag>

    --<matatag id='2'>User ID</matatag>

    --<matatag id='3'>User Name</matatag>

    --<matatag id='4'>Submit Date</matatag>

    --<matatag id='5'>Response ID</matatag>

    --IDName

    --1Status

    --2usUserID

    --3UserName

    --4SubmitDate

    --5ResponseID

    --set @BodyString = '<datatag id=''11191''>Phone Number Label</datatag> <datatag id=''11190''> Number Label</datatag> <datatag id=''11189''> TextArea Label</datatag>'

    --set @reResponseID = 135

    --set @BodyString = '<datatag id=''11191''>Phone Number Label</datatag><datatag id=''11190''>Number Label</datatag><datatag id=''11189''>TextArea Label</datatag> '

    --print @BodyString

    set @FoundToken = case when CHARINDEX(@DataTag, @BodyString) > 0 then 1 else 0 end

    set @startLoc = CHARINDEX(@DataTag, @BodyString)

    set @EndDataTagLoc = CHARINDEX(@EndDataTag, @BodyString, @startLoc) + 1

    set @DataToken = SUBSTRING(@BodyString,@startLoc, @EndDataTagLoc - @startLoc + 1)

    while @FoundToken = 'true'

    begin

    set @startLoc = CHARINDEX(@DataTag,@BodyString, @startLoc)

    set @EndDataTagLoc = CHARINDEX(@EndDataTag,@BodyString, @startLoc) + @EndDataTagLen

    set @DataToken = SUBSTRING(@BodyString, @startLoc, @EndDataTagLoc - @startLoc)

    --print @StartLoc

    --print @EndTagLoc

    --print @DataToken

    --print '==========='

    set @qidLoc = CHARINDEX(@Quote, @DataToken) + 1

    set @qidLocEnd = CHARINDEX(@Quote, @DataToken, @qidLoc + 1) - 1

    set @quQuestionID = SUBSTRING(@DataToken, @qidLoc, @qidLocEnd - @qidLoc + 1)

    --print @qidLoc

    --print @qidLocEnd

    --print @quQuestionID

    --print '==========='

    set @DefaultTagLoc = CHARINDEX(@DefaultTag, @DataToken) + @DefaultTagLen

    set @DefaultTagLocEnd = CHARINDEX(@EndDataTag, @DataToken,@DefaultTagLoc)

    set @DefaultAnswer = SUBSTRING(@DataToken, @DefaultTagLoc, @DefaultTagLocEnd - @DefaultTagLoc )

    --print @DefaultTagLoc

    --print @DefaultTagLocEnd

    --print @DefaultAnswer

    --print '==========='

    select top 1 @Answer = Answer

    from reAnswer

    where quQuestionID = @quQuestionID

    and reResponseID = @reResponseID

    and Active = 'true'

    insert into @Results (DataToken, quQuestionID, Answer, DefaultAnswer) values (@DataToken, @quQuestionID, @Answer, @DefaultAnswer)

    set @DataToken = null

    set @quQuestionID = null

    set @Answer = null

    set @DefaultAnswer = null

    set @startLoc = @EndDataTagLoc

    set @FoundToken = case when CHARINDEX(@DataTag, @BodyString, @EndDataTagLoc) > 0 then 1 else 0 end

    end

    set @NumSubstutions = @@identity

    set @startLoc = CHARINDEX(@MetaTag, @BodyString)

    set @EndDataTagLoc = CHARINDEX(@EndMetaTag, @BodyString, @startLoc) + 1

    set @DataToken = SUBSTRING(@BodyString,@startLoc, @EndDataTagLoc - @startLoc + 1)

    set @FoundToken = case when CHARINDEX(@MetaTag, @BodyString) > 0 then 1 else 0 end

    while @FoundToken = 'true'

    begin

    set @startLoc = CHARINDEX(@MetaTag,@BodyString, @startLoc)

    set @EndDataTagLoc = CHARINDEX(@EndMetaTag,@BodyString, @startLoc) + @EndMetaTagLen

    set @MetaToken = SUBSTRING(@BodyString, @startLoc, @EndMetaTagLoc - @startLoc)

    --print @StartLoc

    --print @EndTagLoc

    --print @DataToken

    --print '==========='

    set @qidLoc = CHARINDEX(@Quote, @MetaToken) + 1

    set @qidLocEnd = CHARINDEX(@Quote, @MetaToken, @qidLoc + 1) - 1

    set @quQuestionID = SUBSTRING(@MetaToken, @qidLoc, @qidLocEnd - @qidLoc + 1)

    --print @qidLoc

    --print @qidLocEnd

    --print @quQuestionID

    --print '==========='

    set @DefaultTagLoc = CHARINDEX(@DefaultTag, @MetaToken) + @DefaultTagLen

    set @DefaultTagLocEnd = CHARINDEX(@EndDataTag, @MetaToken,@DefaultTagLoc)

    set @DefaultAnswer = SUBSTRING(@MetaToken, @DefaultTagLoc, @DefaultTagLocEnd - @DefaultTagLoc )

    --print @DefaultTagLoc

    --print @DefaultTagLocEnd

    --print @DefaultAnswer

    --print '==========='

    select top 1 @Answer = case when @quQuestionID = 1 then Status

    when @quQuestionID = 2 then cast(usUserID as varchar)

    when @quQuestionID = 3 then UserName

    when @quQuestionID = 4 then cast(SubmitDate as varchar)

    when @quQuestionID = 5 then cast(reResponseID as varchar) end

    from reResponseMeta_View

    where reResponseID = @reResponseID

    insert into @Results (DataToken, quQuestionID, Answer, DefaultAnswer) values (@MetaToken, @quQuestionID, @Answer, @DefaultAnswer)

    set @MetaToken = null

    set @quQuestionID = null

    set @Answer = null

    set @DefaultAnswer = null

    set @startLoc = @EndMetaTagLoc

    set @FoundToken = case when CHARINDEX(@MetaTag, @BodyString, @EndMetaTagLoc) > 0 then 1 else 0 end

    end

    set @NumSubstutions = @NumSubstutions + @@identity

    --select * from @Results

    if @NumSubstutions > 0

    begin

    while @SubSubstutionCnt <= @NumSubstutions

    begin

    set @SubSubstutionCnt += 1

    select @DataToken = DataToken, @Answer = isnull(isnull(Answer, DefaultAnswer),'')

    from @Results

    where ID = @SubSubstutionCnt

    set @BodyString = REPLACE(@BodyString, @DataToken, @Answer)

    end

    end

    --print @BodyString

    return @BodyString

    end

    Hope this helps.

    Kurt

    Kurt W. Zimmerman
    SR DBA
    Lefrak Organization
    New York, NY

    http://www.linkedin.com/in/kurtwzimmerman

  • Yep. If you could have manipulated the collection returned by the VBScript.RegExp EXECUTE method, the code might have been a little shorter. You didn't pursue the vbs method further.

    I might have been tempted to do the XML processing outside of SQL Server there. But the code looks good.

  • Why use the VBScript option? You're in 2008, leverage the full .NET version of Regex.

    If you change the function into a stored proc you will have a lot more luck at using the list based items (you can reutrn the matches list etc... as a recordset).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Maybe I misread your last comment. I have to be able to do a regex for both testing and value extraction WITHIN TSQL because I need to create an output table that can then be joined with other tables. The output table will have a column created using the regex.

    I would use C#.Net if I could for my application. I definitely avoid VBScript when possible. But my challenge is that I am attempting to query a database who's design I have no control over (it is the database underlying a third party application). The main problem is that the database violates a couple normalization rules - the worst one - 1NF. There is a field that is non-atomic, and I need to extract from this text field coded information that can be used to join with another part of the database.

    In particular, I have a field I will call "Comment" which is being used to store a cross-reference to account. If I find the pattern AMS#[0-9][0-9]* anywhere in the field, I want to extract the digits.

    So I need to use a regex in the WHERE clause AND I need to use it in the SELECT clause.

    Example:

    select acctNo(comment)

    from funkyTable

    where regexp('AMS#[0-9][0-9]*',comment) > 0

    Then I would create the function acctNo to match against the regex '

    'AMS#([0-9][0-9]*)' and happily extract matched group 1.

    Maybe you are saying that I could do this with any .NET DLL (in my case C#.NET).

  • Assuming you're running this against a SQL Server 2008 version, you can create functions and stored procedures using what is called SQLCLR. These are .NET assemblies built and compiled for direct use in SQL.

    Using that as a backdrop - you can then build a SQLCLR function (or stored proc depending on how you want to use it), which then calls the "full" .NET regex functions.

    Here are a few starting points:

    http://www.sqlteam.com/article/writing-clr-stored-procedures-in-charp-returning-data-part-2

    or from BOL:

    http://msdn.microsoft.com/en-us/library/w2kae45k(v=vs.80).aspx

    and then, there's always the motherlode (Phil Factor's version of implementing Regex via CLR):

    http://www.simple-talk.com/sql/t-sql-programming/clr-assembly-regex-functions-for-sql-server-by-example/[/url]

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply