Manipulating a String

  • I'm still battling with PatIndex.I have two issues TYpe_1 is partly correct except

    that if

    1.

    For the Column Type_1 : using the first record , I'm trying to extract the '80' as the TYPE_1

    its seperated by a space. Sometimes in one space sometimes its a double space.

    2.

    For Column Code1 : i.e '80 98439 9991 997 -50.000' I want to extract the 98439.Sometimes this

    is 4 characters long or 3 or five.

    If I do it in two steps it works. How can I do it in one step and split those two field.

    Output should be:

    '80 98439 9991 997 -50.000' should be Type = 80 Code = 98439

    '0000058916 00000074000039708' should be Type = NULL Code = NULL

    '0000058916 00000074000039708' should be Type = NULL Code = NULL

    '01 759 9991 997 -0.560' should be Type = 01 Code = 759

    '01 9990 9991 997 -127.910' should be Type = 01 Code = 9990

    CREATE TABLE #Test (String Varchar(100),ROWID INT IDENTITY(1,1))

    INSERT #test VALUES ('80 98439 9991 997 -50.000')

    INSERT #test VALUES ('0000058916 00000074000039708')

    INSERT #test VALUES ('80 98408 9991 997 -10.000')

    INSERT #test VALUES ('80 98402 9991 997 -1.400')

    INSERT #test VALUES ('01 9990 9991 997 -127.910')

    INSERT #test VALUES ('01 759 9991 997 -0.560')

    INSERT #test VALUES ('80 98440 9991 997 -84.330')

    INSERT #test VALUES ('80 98474 9991 997 -2.000')

    INSERT #test VALUES ('80 98402 9991 997 -0.280')

    INSERT #test VALUES ('0000058916 00000074000039708')

    INSERT #test VALUES ('80 98408 9991 997 -10.000')

    INSERT #test VALUES ('80 98402 9991 997 -1.400')

    INSERT #test VALUES ('80 98416 3 997 -1.750')

    INSERT #test VALUES ('80 98402 3 997 -0.250')

    INSERT #test VALUES ('01 6060 3 997 -1000.000 CB')

    INSERT #test VALUES ('METLIFE_MALL_2 LA TREE D')

    INSERT #test VALUES ('80 98418 3 997 -5.000')

    INSERT #test VALUES ('80 98421 3 997 -9.500')

    INSERT #test VALUES ('80 98402 3 997 -2.030')

    INSERT #test VALUES ('01 6060 3 997 -500.000 CB')

    INSERT #test VALUES ('M.A.N_SUPERMKT KI NG WILLIAM D')

    INSERT #test VALUES ('80 98418 3 997 -5.000')

    INSERT #test VALUES ('80 98421 3 997 -4.750')

    INSERT #test VALUES ('80 98402 3 997 -1.370')

    INSERT #test VALUES ('01 6060 3 997 -100.000 CB')

    INSERT #test VALUES ('M.A.N_SUPERMKT KI NG WILLIAM D')

    INSERT #test VALUES ('80 98418 3 997 -5.000')

    INSERT #test VALUES ('80 98421 3 997 -0.950')

    INSERT #test VALUES ('80 98402 3 997 -0.830')

    INSERT #test VALUES ('80 98416 3 997 -1.750')

    INSERT #test VALUES ('80 98402 3 997 -0.250')

    INSERT #test VALUES ('01 6060 3 997 -20.000 CB')

    INSERT #test VALUES ('METLIFE_MALL_2 LA TREE D')

    INSERT #test VALUES ('80 98418 3 997 -5.000')

    INSERT #test VALUES ('80 98421 3 997 -0.190')

    INSERT #test VALUES ('80 98402 3 997 -0.730')

    INSERT #test VALUES ('80 98484 9991 997 -50.000')

    INSERT #test VALUES ('80 98402 9991 997 -7.000')

    INSERT #test VALUES ('80 98416 3 997 -1.750')

    INSERT #test VALUES ('80 98402 3 997 -0.250')

    INSERT #test VALUES ('80 98416 3 997 -1.750')

    INSERT #test VALUES ('80 98402 3 997 -0.250')

    INSERT #test VALUES ('80 98416 3 997 -1.750')

    INSERT #test VALUES ('80 98402 3 997 -0.250')

    INSERT #test VALUES ('80 98416 3 997 -1.750')

    INSERT #test VALUES ('80 98402 3 997 -0.250')

    INSERT #test VALUES ('0000074164 00000074000039708')

    INSERT #test VALUES ('80 98408 9991 997 -10.000')

    INSERT #test VALUES ('80 98402 9991 997 -1.400')

    INSERT #test VALUES ('80 98416 3 997 -1.750')

    INSERT #test VALUES ('80 98402 3 997 -0.250')

    ----Step 1

    drop table ##DDD

    SELECT

    SUBSTRING(String, 1, CHARINDEX(' ', String) - 1) AS TYPE_1,

    LTRIM(RTRIM(SUBSTRING(String,CHARINDEX(' ', String) + 1, LEN(String)))) AS Code

    ,MonetoryValue =

    CASE

    WHEN CHARINDEX('.',REVERSE(String)) = 0 THEN NULL ELSE

    REVERSE(SUBSTRING(REVERSE(String), CHARINDEX('.',REVERSE(String))+1,

    CHARINDEX(' ',REVERSE(String),CHARINDEX('.',REVERSE(String))) - CHARINDEX('.',REVERSE(String))))

    END +

    CASE

    WHEN CHARINDEX('.',String) =0 THEN NULL ELSE

    REPLACE(REPLACE(SUBSTRING(String,CHARINDEX('.',String)

    ,LEN(String) - CHARINDEX('.',String)+1),'CB',''),'FC','')

    END

    ,String INTO ##DDD

    FROM

    #Test WHERE String != 'M.A.N_SUPERMKT KI NG WILLIAM D'

    --Step 2

    SELECT Type_1,MonetoryValue,LEFT(Code, NULLIF(CHARINDEX(' ', Code) - 1, -1)) AS Code1,Code,string FROM ##DDD

  • Output should be:

    '80 98439 9991 997 -50.000' should be Type = 80 Code = 98439

    '0000058916 00000074000039708' should be Type = NULL Code = NULL

    '0000058916 00000074000039708' should be Type = NULL Code = NULL

    '01 759 9991 997 -0.560' should be Type = 01 Code = 759

    '01 9990 9991 997 -127.910' should be Type = 01 Code = 9990

    Hey! Matt Miller! This looks like a great place for one of those awesome Regex things you do! 🙂

    RayM,

    In the absense of Regex, you first need to build a Tally table to make your life easier for a lot of things. A Tally table is nothing more than a table with a single column of very well indexed sequential numbers... it's used to replace loops and the like... here's how to build one...

    --===== Create and populate the Tally table on the fly

    SELECT TOP 11000 --equates to more than 30 years of dates

    IDENTITY(INT,1,1) AS N

    INTO dbo.Tally

    FROM Master.dbo.SysColumns sc1,

    Master.dbo.SysColumns sc2

    --===== Add a Primary Key to maximize performance

    ALTER TABLE dbo.Tally

    ADD CONSTRAINT PK_Tally_N

    PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

    --===== Allow the general public to use it

    GRANT SELECT ON dbo.Tally TO PUBLIC

    Once that's done, your problem becomes a whole lot easier and doesn't require performance robbers like the REVERSE function...

    SELECT s.RowID,

    Type = MAX(CASE WHEN s.Posit =1 AND LEN(s.SplitString) =2 THEN s.SplitString ELSE NULL END),

    Code = MAX(CASE WHEN s.Posit =2 AND LEN(s.SplitString)<=5 THEN s.SplitString ELSE NULL END)

    FROM (--==== Derived table "s" splits all of the strings and marks the position (ordinal index) of each

    SELECT RowID,

    SplitString = SUBSTRING(' '+h.String+' ', t.N+1, CHARINDEX(' ', ' '+h.String+' ', t.N+1)-t.N-1),

    Posit = t.N-LEN(REPLACE(LEFT(' '+h.String+' ',t.N), ' ', ''))

    FROM dbo.Tally t

    RIGHT OUTER JOIN --Necessary in case String is NULL

    #Test h

    ON SUBSTRING(' '+h.String+' ', t.N, 1) = ' '

    AND t.N < LEN(' '+h.String+' ')

    AND h.String NOT LIKE '%[_]%'

    )s

    WHERE s.Posit IN (1,2)

    GROUP BY s.RowID

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sorry... missed the "MonetaryValue" column that you included in your code but not in your problem definition... this will do it...

    SELECT s.RowID,

    Type = MAX(CASE WHEN s.Posit =1 AND LEN(s.SplitString) =2 THEN s.SplitString ELSE NULL END),

    Code = MAX(CASE WHEN s.Posit =2 AND LEN(s.SplitString)<=6 THEN s.SplitString ELSE NULL END),

    MonetaryValue = MAX(CASE WHEN s.Posit =5 THEN s.SplitString ELSE NULL END)

    FROM (--==== Derived table "s" splits all of the strings and marks the position (ordinal index) of each

    SELECT RowID,

    SplitString = SUBSTRING(' '+h.String+' ', t.N+1, CHARINDEX(' ', ' '+h.String+' ', t.N+1)-t.N-1),

    Posit = t.N-LEN(REPLACE(LEFT(' '+h.String+' ',t.N), ' ', ''))

    FROM dbo.Tally t

    RIGHT OUTER JOIN --Necessary in case String is NULL

    #Test h

    ON SUBSTRING(' '+h.String+' ', t.N, 1) = ' '

    AND t.N < LEN(' '+h.String+' ')

    AND h.String NOT LIKE '%[_]%'

    )s

    WHERE s.Posit IN (1,2,5)

    GROUP BY s.RowID

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • SELECT * FROM tbPROCESSED

    This is the full picture of what I'm trying to do.The problematic field is called Actual_Text which is the string that requires manipulation.

    I'm deriving three fields from field Actual_Text

    fields Type,Code & MonetoryValue

    Field Type is the digits starting from the left and end when it encounters a space

    only where there is a Valid monetoryValue

    ie. '80 98437 3 997 -2.500'

    in this case Type will be = 80

    ie. 'PARK AVENUE MANOR Pr RAZOO 0'

    in this case Type will be = null

    ie. '530719000254'

    in this case Type will be = null

    -----------------------------Code

    Field Code is the digits after the first space starting from the left and ends when it encounters the

    second space ,only where there is a Valid monetoryValue

    ie. '80 98437 3 997 -2.500'

    Code will be = 98437

    ie. 'PARK AVENUE MANOR Pr RAZOO 0'

    Code will be = null

    ie. '530719000254'

    Code will be = null

    ie. '01 6060 3 997 -20.000 CO'

    Code will be = 6060

    ie. '0997 2247'

    Code will be = null because there is no MonetoryValue

    ---MonetoryValue

    I can extract the monetory value only problem is ho do I remove

    the alpha characters without using replace

    i.e '01 6060 3 997 -20.000 CO'

    '-20.000 CO' to exclude CO etc...

    '01 6060 3 997 -50.000 CB'

  • The code I posted extracts all three fields even for the "01" records...

    You didn't even try the code, did you?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (11/22/2007)


    Output should be:

    '80 98439 9991 997 -50.000' should be Type = 80 Code = 98439

    '0000058916 00000074000039708' should be Type = NULL Code = NULL

    '0000058916 00000074000039708' should be Type = NULL Code = NULL

    '01 759 9991 997 -0.560' should be Type = 01 Code = 759

    '01 9990 9991 997 -127.910' should be Type = 01 Code = 9990

    Hey! Matt Miller! This looks like a great place for one of those awesome Regex things you do! 🙂

    Sorry - just realized I got "paged" to the SSC forum during the long vacation weekend....:)

    Yes - you could use a CLR Regex Function to do just that if you want to. It's actually rather cute, since it allows you to "throw out" the result if you don't have BOTH (if you so desire).

    I just put this together, if you should want to go down that path.

    Here's the CLR (.NET code):

    Imports System

    Imports System.Data

    Imports System.Data.SqlClient

    Imports System.Data.SqlTypes

    Imports Microsoft.SqlServer.Server

    Imports System.Text.RegularExpressions

    Partial Public Class UserDefinedFunctions

    Private Const optionS As RegexOptions = RegexOptions.CultureInvariant

    <Microsoft.SqlServer.Server.SqlFunction(IsDeterministic:=True, IsPrecise:=True)< _

    Public Shared Function RegexMatch(ByVal input As SqlChars, ByVal pattern As SqlString) As SqlString

    ' Add your code here

    Dim rex As System.Text.RegularExpressions.Regex = New System.Text.RegularExpressions.Regex(pattern.Value, optionS)

    Return New SqlString(CType(rex.IsMatch(New String(input.Value)), String))

    End Function

    <Microsoft.SqlServer.Server.SqlFunction(IsDeterministic:=True, IsPrecise:=True)< _

    Public Shared Function RegexMatchGroup(ByVal input As SqlChars, ByVal pattern As SqlString, byval groupnum as SqlInt32) As SqlString

    Dim rex As System.Text.RegularExpressions.Regex = New System.Text.RegularExpressions.Regex(pattern.Value, optionS)

    Return New SqlString(CType(rex.Matches(New String(input.Value))(0).Groups(groupnum.Value).ToString, String))

    End Function

    End Class

    At that point, you could use something like this to pick out the pieces:

    select dbo.regexmatchgroup('80 12345 p9a996458aaa','^(?<1>[0-9]{2}) (?<2>[0-9]{3,5})',2)

    The last parameter could be 0, 1, or 2 (0=whole string, 1/2 = the named backreferences to 80 and 12345 respectively in my example).

    It does require (at least for me) knowing how to correctly structure the grouping constructs for .NET, but this should give you a running start.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • There he is! And I knew he'd have a pretty good answer for this one, too!

    How was the weekend, Matt?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply