Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Manipulating a String Expand / Collapse
Author
Message
Posted Thursday, November 22, 2007 12:08 PM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Sunday, April 26, 2009 7:37 AM
Points: 87, Visits: 307
I'm still battling with PatIndex.I have two issues TYpe_1 is partly correct except
that if

1.
For the Column Type_1 : using the first record , I'm trying to extract the '80' as the TYPE_1
its seperated by a space. Sometimes in one space sometimes its a double space.

2.
For Column Code1 : i.e '80 98439 9991 997 -50.000' I want to extract the 98439.Sometimes this
is 4 characters long or 3 or five.

If I do it in two steps it works. How can I do it in one step and split those two field.

Output should be:
'80 98439 9991 997 -50.000' should be Type = 80 Code = 98439
'0000058916 00000074000039708' should be Type = NULL Code = NULL
'0000058916 00000074000039708' should be Type = NULL Code = NULL
'01 759 9991 997 -0.560' should be Type = 01 Code = 759
'01 9990 9991 997 -127.910' should be Type = 01 Code = 9990


CREATE TABLE #Test (String Varchar(100),ROWID INT IDENTITY(1,1))


INSERT #test VALUES ('80 98439 9991 997 -50.000')
INSERT #test VALUES ('0000058916 00000074000039708')
INSERT #test VALUES ('80 98408 9991 997 -10.000')
INSERT #test VALUES ('80 98402 9991 997 -1.400')
INSERT #test VALUES ('01 9990 9991 997 -127.910')
INSERT #test VALUES ('01 759 9991 997 -0.560')
INSERT #test VALUES ('80 98440 9991 997 -84.330')
INSERT #test VALUES ('80 98474 9991 997 -2.000')
INSERT #test VALUES ('80 98402 9991 997 -0.280')
INSERT #test VALUES ('0000058916 00000074000039708')
INSERT #test VALUES ('80 98408 9991 997 -10.000')
INSERT #test VALUES ('80 98402 9991 997 -1.400')
INSERT #test VALUES ('80 98416 3 997 -1.750')
INSERT #test VALUES ('80 98402 3 997 -0.250')
INSERT #test VALUES ('01 6060 3 997 -1000.000 CB')
INSERT #test VALUES ('METLIFE_MALL_2 LA TREE D')
INSERT #test VALUES ('80 98418 3 997 -5.000')
INSERT #test VALUES ('80 98421 3 997 -9.500')
INSERT #test VALUES ('80 98402 3 997 -2.030')
INSERT #test VALUES ('01 6060 3 997 -500.000 CB')
INSERT #test VALUES ('M.A.N_SUPERMKT KI NG WILLIAM D')
INSERT #test VALUES ('80 98418 3 997 -5.000')
INSERT #test VALUES ('80 98421 3 997 -4.750')
INSERT #test VALUES ('80 98402 3 997 -1.370')
INSERT #test VALUES ('01 6060 3 997 -100.000 CB')
INSERT #test VALUES ('M.A.N_SUPERMKT KI NG WILLIAM D')
INSERT #test VALUES ('80 98418 3 997 -5.000')
INSERT #test VALUES ('80 98421 3 997 -0.950')
INSERT #test VALUES ('80 98402 3 997 -0.830')
INSERT #test VALUES ('80 98416 3 997 -1.750')
INSERT #test VALUES ('80 98402 3 997 -0.250')
INSERT #test VALUES ('01 6060 3 997 -20.000 CB')
INSERT #test VALUES ('METLIFE_MALL_2 LA TREE D')
INSERT #test VALUES ('80 98418 3 997 -5.000')
INSERT #test VALUES ('80 98421 3 997 -0.190')
INSERT #test VALUES ('80 98402 3 997 -0.730')
INSERT #test VALUES ('80 98484 9991 997 -50.000')
INSERT #test VALUES ('80 98402 9991 997 -7.000')
INSERT #test VALUES ('80 98416 3 997 -1.750')
INSERT #test VALUES ('80 98402 3 997 -0.250')
INSERT #test VALUES ('80 98416 3 997 -1.750')
INSERT #test VALUES ('80 98402 3 997 -0.250')
INSERT #test VALUES ('80 98416 3 997 -1.750')
INSERT #test VALUES ('80 98402 3 997 -0.250')
INSERT #test VALUES ('80 98416 3 997 -1.750')
INSERT #test VALUES ('80 98402 3 997 -0.250')
INSERT #test VALUES ('0000074164 00000074000039708')
INSERT #test VALUES ('80 98408 9991 997 -10.000')
INSERT #test VALUES ('80 98402 9991 997 -1.400')
INSERT #test VALUES ('80 98416 3 997 -1.750')
INSERT #test VALUES ('80 98402 3 997 -0.250')

----Step 1
drop table ##DDD
SELECT
SUBSTRING(String, 1, CHARINDEX(' ', String) - 1) AS TYPE_1,
LTRIM(RTRIM(SUBSTRING(String,CHARINDEX(' ', String) + 1, LEN(String)))) AS Code
,MonetoryValue =
CASE
WHEN CHARINDEX('.',REVERSE(String)) = 0 THEN NULL ELSE
REVERSE(SUBSTRING(REVERSE(String), CHARINDEX('.',REVERSE(String))+1,
CHARINDEX(' ',REVERSE(String),CHARINDEX('.',REVERSE(String))) - CHARINDEX('.',REVERSE(String))))
END +
CASE
WHEN CHARINDEX('.',String) =0 THEN NULL ELSE
REPLACE(REPLACE(SUBSTRING(String,CHARINDEX('.',String)
,LEN(String) - CHARINDEX('.',String)+1),'CB',''),'FC','')
END
,String INTO ##DDD
FROM
#Test WHERE String != 'M.A.N_SUPERMKT KI NG WILLIAM D'


--Step 2
SELECT Type_1,MonetoryValue,LEFT(Code, NULLIF(CHARINDEX(' ', Code) - 1, -1)) AS Code1,Code,string FROM ##DDD





Post #425054
Posted Thursday, November 22, 2007 9:15 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 9:34 AM
Points: 35,532, Visits: 32,115
Output should be:
'80 98439 9991 997 -50.000' should be Type = 80 Code = 98439
'0000058916 00000074000039708' should be Type = NULL Code = NULL
'0000058916 00000074000039708' should be Type = NULL Code = NULL
'01 759 9991 997 -0.560' should be Type = 01 Code = 759
'01 9990 9991 997 -127.910' should be Type = 01 Code = 9990


Hey! Matt Miller! This looks like a great place for one of those awesome Regex things you do! :)

RayM,

In the absense of Regex, you first need to build a Tally table to make your life easier for a lot of things. A Tally table is nothing more than a table with a single column of very well indexed sequential numbers... it's used to replace loops and the like... here's how to build one...

--===== Create and populate the Tally table on the fly
SELECT TOP 11000 --equates to more than 30 years of dates
IDENTITY(INT,1,1) AS N
INTO dbo.Tally
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2

--===== Add a Primary Key to maximize performance
ALTER TABLE dbo.Tally
ADD CONSTRAINT PK_Tally_N
PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

--===== Allow the general public to use it
GRANT SELECT ON dbo.Tally TO PUBLIC

Once that's done, your problem becomes a whole lot easier and doesn't require performance robbers like the REVERSE function...
 SELECT s.RowID,
Type = MAX(CASE WHEN s.Posit =1 AND LEN(s.SplitString) =2 THEN s.SplitString ELSE NULL END),
Code = MAX(CASE WHEN s.Posit =2 AND LEN(s.SplitString)<=5 THEN s.SplitString ELSE NULL END)
FROM (--==== Derived table "s" splits all of the strings and marks the position (ordinal index) of each
SELECT RowID,
SplitString = SUBSTRING(' '+h.String+' ', t.N+1, CHARINDEX(' ', ' '+h.String+' ', t.N+1)-t.N-1),
Posit = t.N-LEN(REPLACE(LEFT(' '+h.String+' ',t.N), ' ', ''))
FROM dbo.Tally t
RIGHT OUTER JOIN --Necessary in case String is NULL
#Test h
ON SUBSTRING(' '+h.String+' ', t.N, 1) = ' '
AND t.N < LEN(' '+h.String+' ')
AND h.String NOT LIKE '%[_]%'
)s
WHERE s.Posit IN (1,2)
GROUP BY s.RowID



--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #425122
Posted Thursday, November 22, 2007 9:23 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 9:34 AM
Points: 35,532, Visits: 32,115
Sorry... missed the "MonetaryValue" column that you included in your code but not in your problem definition... this will do it...

 SELECT s.RowID,
Type = MAX(CASE WHEN s.Posit =1 AND LEN(s.SplitString) =2 THEN s.SplitString ELSE NULL END),
Code = MAX(CASE WHEN s.Posit =2 AND LEN(s.SplitString)<=6 THEN s.SplitString ELSE NULL END),
MonetaryValue = MAX(CASE WHEN s.Posit =5 THEN s.SplitString ELSE NULL END)
FROM (--==== Derived table "s" splits all of the strings and marks the position (ordinal index) of each
SELECT RowID,
SplitString = SUBSTRING(' '+h.String+' ', t.N+1, CHARINDEX(' ', ' '+h.String+' ', t.N+1)-t.N-1),
Posit = t.N-LEN(REPLACE(LEFT(' '+h.String+' ',t.N), ' ', ''))
FROM dbo.Tally t
RIGHT OUTER JOIN --Necessary in case String is NULL
#Test h
ON SUBSTRING(' '+h.String+' ', t.N, 1) = ' '
AND t.N < LEN(' '+h.String+' ')
AND h.String NOT LIKE '%[_]%'
)s
WHERE s.Posit IN (1,2,5)
GROUP BY s.RowID



--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #425124
Posted Friday, November 23, 2007 5:53 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Sunday, April 26, 2009 7:37 AM
Points: 87, Visits: 307
SELECT * FROM tbPROCESSED

This is the full picture of what I'm trying to do.The problematic field is called Actual_Text which is the string that requires manipulation.
I'm deriving three fields from field Actual_Text
fields Type,Code & MonetoryValue

Field Type is the digits starting from the left and end when it encounters a space
only where there is a Valid monetoryValue

ie. '80 98437 3 997 -2.500'
in this case Type will be = 80


ie. 'PARK AVENUE MANOR Pr RAZOO 0'
in this case Type will be = null

ie. '530719000254'
in this case Type will be = null

-----------------------------Code
Field Code is the digits after the first space starting from the left and ends when it encounters the
second space ,only where there is a Valid monetoryValue

ie. '80 98437 3 997 -2.500'
Code will be = 98437


ie. 'PARK AVENUE MANOR Pr RAZOO 0'
Code will be = null

ie. '530719000254'
Code will be = null

ie. '01 6060 3 997 -20.000 CO'
Code will be = 6060


ie. '0997 2247'
Code will be = null because there is no MonetoryValue

---MonetoryValue

I can extract the monetory value only problem is ho do I remove
the alpha characters without using replace

i.e '01 6060 3 997 -20.000 CO'
'-20.000 CO' to exclude CO etc...

'01 6060 3 997 -50.000 CB'







  Post Attachments 
GenerateData_1.txt (0 views, 53.45 KB)
Post #425250
Posted Friday, November 23, 2007 11:12 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 9:34 AM
Points: 35,532, Visits: 32,115
The code I posted extracts all three fields even for the "01" records...

You didn't even try the code, did you?


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #425336
Posted Monday, November 26, 2007 12:33 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 12:04 PM
Points: 7,154, Visits: 15,630
Jeff Moden (11/22/2007)
Output should be:
'80 98439 9991 997 -50.000' should be Type = 80 Code = 98439
'0000058916 00000074000039708' should be Type = NULL Code = NULL
'0000058916 00000074000039708' should be Type = NULL Code = NULL
'01 759 9991 997 -0.560' should be Type = 01 Code = 759
'01 9990 9991 997 -127.910' should be Type = 01 Code = 9990


Hey! Matt Miller! This looks like a great place for one of those awesome Regex things you do! :)



Sorry - just realized I got "paged" to the SSC forum during the long vacation weekend....:)

Yes - you could use a CLR Regex Function to do just that if you want to. It's actually rather cute, since it allows you to "throw out" the result if you don't have BOTH (if you so desire).

I just put this together, if you should want to go down that path.

Here's the CLR (.NET code):

Imports System
Imports System.Data
Imports System.Data.SqlClient
Imports System.Data.SqlTypes
Imports Microsoft.SqlServer.Server
Imports System.Text.RegularExpressions

Partial Public Class UserDefinedFunctions
Private Const optionS As RegexOptions = RegexOptions.CultureInvariant
<Microsoft.SqlServer.Server.SqlFunction(IsDeterministic:=True, IsPrecise:=True)< _
Public Shared Function RegexMatch(ByVal input As SqlChars, ByVal pattern As SqlString) As SqlString
' Add your code here
Dim rex As System.Text.RegularExpressions.Regex = New System.Text.RegularExpressions.Regex(pattern.Value, optionS)
Return New SqlString(CType(rex.IsMatch(New String(input.Value)), String))
End Function
<Microsoft.SqlServer.Server.SqlFunction(IsDeterministic:=True, IsPrecise:=True)< _
Public Shared Function RegexMatchGroup(ByVal input As SqlChars, ByVal pattern As SqlString, byval groupnum as SqlInt32) As SqlString
Dim rex As System.Text.RegularExpressions.Regex = New System.Text.RegularExpressions.Regex(pattern.Value, optionS)

Return New SqlString(CType(rex.Matches(New String(input.Value))(0).Groups(groupnum.Value).ToString, String))
End Function
End Class

At that point, you could use something like this to pick out the pieces:

select dbo.regexmatchgroup('80 12345 p9a996458aaa','^(?<1>[0-9]{2}) (?<2>[0-9]{3,5})',2)

The last parameter could be 0, 1, or 2 (0=whole string, 1/2 = the named backreferences to 80 and 12345 respectively in my example).

It does require (at least for me) knowing how to correctly structure the grouping constructs for .NET, but this should give you a running start.


----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Post #425955
Posted Monday, November 26, 2007 3:55 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 9:34 AM
Points: 35,532, Visits: 32,115
There he is! And I knew he'd have a pretty good answer for this one, too!

How was the weekend, Matt?


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #426094
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse