Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12345»»»

Nested replaces ? Expand / Collapse
Author
Message
Posted Thursday, January 17, 2013 4:07 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Saturday, August 2, 2014 1:47 AM
Points: 246, Visits: 1,168
Hello all,

At this moment I need some nested REPLACE actions.

Some time ago I saw a set of routines for string 'manipulation', I think there were two sets, one for free, one which could be bought.

Now I can not find that set anymore, have been searching my own set of data, been searching with google.

Is there a set of string manipulation routines which do extend the functionality of the build in routines ?

(I did find the stringworkshop script).
Thanks,
Ben Brugman.

Post #1408313
Posted Thursday, January 17, 2013 4:15 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Tuesday, August 19, 2014 12:26 AM
Points: 2,840, Visits: 3,963
i havent got what you actually needed and asked ? can you post your requirement with sample data along with desired output. see "we cant see what you see"

-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done
Post #1408318
Posted Thursday, January 17, 2013 5:13 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Saturday, August 2, 2014 1:47 AM
Points: 246, Visits: 1,168
Bhuvnesh (1/17/2013)
i havent got what you actually needed and asked ? can you post your requirement with sample data along with desired output. see "we cant see what you see"


An example what I am doing at the moment, but there are many variants on this:

I would like to have an interface like:
special_replace(@string, 'abcdefgh','12345678')
where all the 'a' get replaced by a 1,
all be 'b' get replaced by a 2, etc.

or
special_replace2(@string, 'aaa|bb|c|d|e|f|g|h','11|2|3|4444|5|6|7|8')
where the 'aaa' get's replaced by a 11

or
special_replace3(@string, '1111|111|11|1|2222|222|22|2','41|31|21|11|42|32|22|12')
where the '1111' get's replaced by a 41
where the '111' get's replaced by a 31




-- To find 'exotic characters' in a table
select '--' [--], COUNT(*) as tel from (
select distinct
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(H,'a','')
,'b','')
,'c','')
,'d','')
,'e','')
,'f','')
,'g','')
,'h','')
,'i','')
,'j','')
,'k','')
,'l','')
,'m','')
,'n','')
,'0','')
,'p','')
,'q','')
,'r','')
,'s','')
,'t','')
,'u','')
,'v','')
,'w','')
,'x','')
,'y','')
,'z','')
,'0','')
,'1','')
,'2','')
,'3','')
,'4','')
,'5','')
,'6','')
,'7','')
,'8','')
,'9','')
,'8','')
AS H
FROM A_table
) as xxx

-- To find how significant the exotic characters are.

select distinct
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(H,' ','')
,',','')
,'&','')
,'/','')
,'\','')
,'|','')
AS H
FROM A_table
) as xxx

-- to do some manipulation on the string.
select distinct
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(H,' ','\1 ')
,',','\2')
,'&','\3')
,'/','\4')
,'\','\5')
,'|','\6')
AS H
FROM A_table
) as xxx


Thanks for your time and attention,
ben brugman
Post #1408352
Posted Thursday, January 17, 2013 6:32 AM
SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Today @ 12:58 PM
Points: 4,189, Visits: 3,227
Wow! That's quite a set of REPLACEs. If I'm reading what you want correctly, I see two different approaches you can take here.

The first is using the STUFF statement. http://msdn.microsoft.com/en-us/magazine/cc163473.aspx. I find it works pretty well for small numbers of replacements with varying character lengths. It's a cool function to have in your "toolbox", but please continue reading.

It looks to me like regular expressions are what you really want. I suggest writing a .NET CLR, create an assembly for it and then write database function wrappers. I've seen this used with some ridiculously complicated pattern matches.

Library and How-To: http://www.codeproject.com/Articles/42764/Regular-Expressions-in-MS-SQL-Server-2005-2008
MSDN Article: http://msdn.microsoft.com/en-us/magazine/cc163473.aspx

I don't do the .NET library exposed to SQL Server very much because I find that most things can be accomplished with normal T-SQL (I'm not looking for an argument here) but this is one of those cases where it just makes sense. Performance is pretty good, but it of course depends on how you write your .NET library. Having used regular expressions natively in Oracle 8 and 9i, I can say that the performance of this approach wins, hands down. It'll take some work to write the first time, but then you'll find yourself using it as you need it.



Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions
Post #1408392
Posted Thursday, January 17, 2013 9:00 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 3:04 PM
Points: 13,296, Visits: 12,147

I would like to have an interface like:
special_replace(@string, 'abcdefgh','12345678')
where all the 'a' get replaced by a 1,
all be 'b' get replaced by a 2, etc.

or
special_replace2(@string, 'aaa|bb|c|d|e|f|g|h','11|2|3|4444|5|6|7|8')
where the 'aaa' get's replaced by a 11

or
special_replace3(@string, '1111|111|11|1|2222|222|22|2','41|31|21|11|42|32|22|12')
where the '1111' get's replaced by a 41
where the '111' get's replaced by a 31


What a bizarre requirement!!! I can't come up with a real world scenario where any of these make sense. The regex suggestion won't really help here because you need to modify the values. All the regex is going to do is tell you if it matches some pattern, which you don't have.

I do however think that CLR would be the way to go here. You are going to have to loop through these strings character by character and t-sql just isn't the best way to do that.

I understand the requirements for the first and third versions but the second one I don't get why 'aaa' becomes 11 and why does 'd' become 4444????

For the first one you could do a pretty straight forward ascii calculation. Just force the string to upper first and then for each character take the ascii value and subtract 16.

The third one would be a modification to that logic, just need to track which character you are working with and a counter. When the character changes you output the counter and the same ascii calculation.


_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1408481
Posted Thursday, January 17, 2013 12:13 PM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Saturday, August 2, 2014 1:47 AM
Points: 246, Visits: 1,168
Thanks for your input,
Suggestions how to proceed are very welcome.
My goal at the moment is finding similar strings.
For a long reply see under the signature,

Ben Brugman


Sean Lange (1/17/2013)

I would like to have an interface like:
special_replace(@string, 'abcdefgh','12345678')
where all the 'a' get replaced by a 1,
all be 'b' get replaced by a 2, etc.

or
special_replace2(@string, 'aaa|bb|c|d|e|f|g|h','11|2|3|4444|5|6|7|8')
where the 'aaa' get's replaced by a 11
This was a fantasie example. (Correctly spotted)

or
special_replace3(@string, '1111|111|11|1|2222|222|22|2','41|31|21|11|42|32|22|12')
where the '1111' get's replaced by a 41
where the '111' get's replaced by a 31
This is a part of an actual script I made.



What a bizarre requirement!!! I can't come up with a real world scenario where any of these make sense. The regex suggestion won't really help here because you need to modify the values. All the regex is going to do is tell you if it matches some pattern, which you don't have.

The example code was actual code I used.
First to study what characters were used.
(Alphabetic, Numeric, +-.,; etc)
This was for me the fastest way to find out which characters where used in a column. First getting rid of all 'normal' characters then eliminating the others.
Afterwards only the others were removed so that the columns could be compared more easy.


I do however think that CLR would be the way to go here. You are going to have to loop through these strings character by character and t-sql just isn't the best way to do that.

Processing speed was no issue.
The speed I can (or could) write the code and rerun and change the code made the script as it was.
And I do not master CLR code.

I understand the requirements for the first and third versions but the second one I don't get why 'aaa' becomes 11 and why does 'd' become 4444????

Correctly spotted this was an artificial example.
I didn't want to use a predictive structure to avoid not fitting solutions. (sorry).


For the first one you could do a pretty straight forward ascii calculation. Just force the string to upper first and then for each character take the ascii value and subtract 16.

Again you spotted the correct function. Indeed this is the count of a sequence of same characters. This actually is the anwser to a puzzle, I think I will throw this puzzle in this forum. The above was part of the solution. (So don't tell anyone yet)
(My solution for the puzzle was rather cumbersom with loads of replaces, I am wondering if other come up with other solutions to learn from).

This would probably require a function and then a loop within the function for each position. Now I did a copy/past 26 times for the alphabeth, which for me works faster than writing a function with a loop.

But sometime ago I saw that somebody had allready solved this problem. (With functions ?) And I was hoping to find that solution again and re-using that solution.

The number of replaces I use in largest script is over 50 which is nog very elegant.
In stringworkshop there was a far more elegant solution but that was for a single string parameter, not for a column in a select. I could adapt that, but that would take more time.


The third one would be a modification to that logic, just need to track which character you are working with and a counter. When the character changes you output the counter and the same ascii calculation.

Now it is within a single select statement, doing this for a column would make the code more complex again.

So as you can read I am building these examples on the fly running them and then modifying them to my next requirements. Changes are almost continues. Running speed is not a problem, because the building takes some time.

I use the technique with nested replaces fairly often, because I can do it inline and build the code fast. Building this code inline although not elegant makes the code easy portable. And if speed is no issue, then it works fast enough.

A problem with this code is overview (to much replaces) and it is difficult to keep count of the correct number of replaces and the matching number of parameters for the replace. (A miscount is easely made).

The actual use this time was.
Excel with over 25 000 rows.
Containing strings and the translations of the strings.
But some strings (or equivalent strings) appear more than once, sometimes with small differences like a comma, or another special character.

So I wanted to eliminate the special characters.
(But wanted first to know which special characters where used, I found two different spaces for example, still have to study that).

Also in the next step I am thinking of eliminating repeating characters, to exclude some writing mistakes and perhaps something like replacing 'qu' with 'q'
and replacing all 'c' with an 'k' ('disc' and 'disk' for example is the same word for me).

And maybe substituting all numeric strings for the same numeric string. For example replacing any nummeric string with 123.

(I am thinking of breaking up the strings without special characters in triads (sets of three characters) and test how many of them match up, and give this a score if allmost all triads are the same, the difference might be a small spelling error or a typing mistake).
Example.
I am a fish.
Would become.
iamafish
And then:
iam
ama
maf
afi
fis
ish


A sentence like
I am the fish
or
I am an fish

Would give a large number of the same triads in the same order, so a large likelyhood of the 'same' sentence.


Thanks for your input,
Suggestions how to proceed are very welcome.
My goal at the moment is finding similar strings.

Ben Brugman


Post #1408559
Posted Thursday, January 17, 2013 12:58 PM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 12:55 PM
Points: 2,033, Visits: 3,041
For that, I would use a function with PATINDEX to retain only the desired chars.

If performance is not an issue, you can use a scalar function.

I can give you a sample function if you would like. I wrote somewhat similar functions to do "remove all non-alpha, non-numeric" chars and duplicate Oracle's "InitCap" function (first letter of each word is upper case, remaining letters of same word are all lower case).


SQL DBA,SQL Server MVP('07, '08, '09)
"In America, every man is innocent until proven broke!" Brant Parker
Post #1408579
Posted Thursday, January 17, 2013 10:57 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Today @ 6:20 AM
Points: 3,627, Visits: 5,269
Special_replace2 ain't so hard:

DECLARE @MyString VARCHAR(100) = 'aaabbcdefgh'

;WITH Replacements (n,a,b) AS (
SELECT 1, 'aaa|bb|c|d|e|f|g|h','11|2|3|4444|5|6|7|8'),
Transform (ItemNumber, a, b) AS (
SELECT ItemNumber, MAX(a), MAX(b)
FROM (
SELECT n, ItemNumber, a=a.Item, b=NULL
FROM Replacements
CROSS APPLY DelimitedSplit8K(a, '|') a
UNION ALL
SELECT n, ItemNumber, NULL, Item
FROM Replacements
CROSS APPLY DelimitedSplit8K(b, '|')) a
GROUP BY n, ItemNumber),
rCTEReplace (n, s, r) AS (
SELECT n=1, MyString, REPLACE(MyString, a, b)
FROM (SELECT @MyString) a(MyString)
JOIN Transform ON ItemNumber = 1
UNION ALL
SELECT n+1, s, REPLACE(r, a, b)
FROM rCTEReplace
JOIN Transform ON ItemNumber = n+1
)
SELECT *
FROM rCTEReplace
WHERE n = (SELECT COUNT(*) FROM Transform)


Ain't gonna win no performance contests though!



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1408719
Posted Thursday, January 17, 2013 11:13 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Today @ 6:20 AM
Points: 3,627, Visits: 5,269
Probably will run a little faster with dynamic SQL though.

DECLARE @SQL NVARCHAR(MAX) = ''
DECLARE @MyString VARCHAR(100) = 'aaabbcdefgh'

;WITH Replacements (n,a,b) AS (
SELECT 1, 'aaa|bb|c|d|e|f|g|h','11|2|3|4444|5|6|7|8'),
Transform (ItemNumber, a, b) AS (
SELECT ItemNumber, MAX(a), MAX(b)
FROM (
SELECT n, ItemNumber, a=a.Item, b=NULL
FROM Replacements
CROSS APPLY DelimitedSplit8K(a, '|') a
UNION ALL
SELECT n, ItemNumber, NULL, Item
FROM Replacements
CROSS APPLY DelimitedSplit8K(b, '|')) a
GROUP BY n, ItemNumber)
SELECT @SQL='SELECT r=' + s + '''' + @MyString + ''',' + (
SELECT '''' + a + ''',''' + b + '''),'
FROM Transform
ORDER BY ItemNumber
FOR XML PATH('')) + 's=''' + @MyString + ''''
FROM (
SELECT 'REPLACE('
FROM Transform
FOR XML PATH('')) a(s)

--PRINT @SQL
EXEC (@SQL)





My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1408722
Posted Friday, January 18, 2013 1:48 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Saturday, August 2, 2014 1:47 AM
Points: 246, Visits: 1,168
ScottPletcher (1/17/2013)
For that, I would use a function with PATINDEX to retain only the desired chars.


First I used the replace to find which were the undesired chars.
Next I used the replace to get rid of the undesired chars.
(Maybe a next step would be to use replace to replace all 'c' with a 'k', to make some text like 'disc' and 'disk' the same. And maybe I'll remove all double characters to get even more matches.).

I do not see how I would do that with the PATINDEX function.
I can only see complex solutions when using the PATINDEX for this problem. But maybe I am missing something. (Sorry for that).

Thanks for your response,
ben brugman

Post #1408758
« Prev Topic | Next Topic »

Add to briefcase 12345»»»

Permissions Expand / Collapse