SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Replacing recurring characters in a string with single character


Replacing recurring characters in a string with single character

Author
Message
j.heidrich
j.heidrich
Say Hey Kid
Say Hey Kid (684 reputation)Say Hey Kid (684 reputation)Say Hey Kid (684 reputation)Say Hey Kid (684 reputation)Say Hey Kid (684 reputation)Say Hey Kid (684 reputation)Say Hey Kid (684 reputation)Say Hey Kid (684 reputation)

Group: General Forum Members
Points: 684 Visits: 63
Hi all,

I have a problem where I want to write a function to remove recurring characters from a string and replace them with a single same character.

For instance I have the string '12333345566689' and the result should be '12345689'. In Oracle I could do this with "regexp_replace('12333345566689', '(.)\1+', '\1')", but in T-SQL the only solution I could think of is something like this:


DECLARE @code NVARCHAR(255)
SET @code = '12333345566689';
SET @code = REPLACE(REPLACE(REPLACE(@Code, '1', '~1'), '1~', ''), '~1', '1');



and repeat this for 2 - 9. But I'm sure there is a more elegant version for this in SQL Server 2012. Unfortunately I haven't found any solution in other posts, so maybe someone has an idea for this?

Thanks in advance.
Cadavre
Cadavre
SSCrazy Eights
SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)

Group: General Forum Members
Points: 9430 Visits: 8492
Probably not the best way to do this, just off the top of my head: -

DECLARE @code NVARCHAR(255);
SET @code = '12333345566689';

WITH CTE(N) AS (SELECT 1 FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1))a(N)),
CTE2(N) AS (SELECT 1 FROM CTE x CROSS JOIN CTE y),
CTE3(N) AS (SELECT 1 FROM CTE2 x CROSS JOIN CTE2 y),
CTE4(N) AS (SELECT 1 FROM CTE3 x CROSS JOIN CTE3 y),
CTE5(N) AS (SELECT 1 FROM CTE4 x CROSS JOIN CTE4 y),
CTE6(N) AS (SELECT TOP (LEN(@code)) 1 FROM CTE5 x CROSS JOIN CTE5 y),
TALLY(N) AS (SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM CTE6)
SELECT @code = REPLACE(REPLACE(REPLACE(LTRIM(RTRIM(@code)),SUBSTRING(@code,N,1)+SUBSTRING(@code,N,1),SUBSTRING(@code,N,1)+CHAR(7)),CHAR(7)+SUBSTRING(@code,N,1),''),CHAR(7),'')
FROM TALLY;

SELECT @code;



Returns "12345689"

Based on REPLACE Multiple Spaces with One and The "Numbers" or "Tally" Table: What it is and how it replaces a loop.


Forever trying to learn

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better, quicker answers on SQL Server performance related questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/



If you litter your database queries with nolock query hints, are you aware of the side effects?
Try reading a few of these links...

(*) Missing rows with nolock
(*) Allocation order scans with nolock
(*) Consistency issues with nolock
(*) Transient Corruption Errors in SQL Server error log caused by nolock
(*) Dirty reads, read errors, reading rows twice and missing rows with nolock


Craig Wilkinson - Software Engineer
LinkedIn
Luis Cazares
Luis Cazares
SSC-Forever
SSC-Forever (43K reputation)SSC-Forever (43K reputation)SSC-Forever (43K reputation)SSC-Forever (43K reputation)SSC-Forever (43K reputation)SSC-Forever (43K reputation)SSC-Forever (43K reputation)SSC-Forever (43K reputation)

Group: General Forum Members
Points: 43478 Visits: 19859
Here's a different method that might work well. Should we start testing? :-D

DECLARE @code nvarchar(4000) = '12333345566689';

with seed1 (a)
as
(
select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all
select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1
),
numbers (n) as
(
select top (LEN(@code)) row_number() over (order by (select null))
from seed1 s1, seed1 s2, seed1 s3
)
select a.b.value('(./text())[1]', 'nvarchar(4000)') as [text]
from (
select CASE WHEN SUBSTRING( @code, n, 1) = SUBSTRING( @code, n - 1, 1) THEN ''
ELSE SUBSTRING( @code, n, 1) END
FROM numbers
for xml path (''), type
)a(b)




Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?


How to post data/code on a forum to get the best help: Option 1 / Option 2
mickyT
mickyT
SSCrazy
SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)

Group: General Forum Members
Points: 2914 Visits: 3318
A slight variation to Luis's method and implemented as an inline table valued function

CREATE FUNCTION compressDuplicates(@code nvarchar(4000))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH cteTally as (
SELECT TOP(LEN(@code) - 1) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) N
FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) E1(N)
,(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) E2(N)
,(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) E3(N)
),
startpattern as (
SELECT SUBSTRING(@code,1,1) P
UNION ALL
SELECT SUBSTRING(@code,N+1,1) P
FROM cteTally
WHERE SUBSTRING(@code,N,1) <> SUBSTRING(@code,N+1,1) COLLATE Latin1_General_BIN -- Change to suit comparison required
)
SELECT CAST(r AS NVARCHAR(4000)) result
FROM (
SELECT P
FROM startpattern
FOR XML PATH('')
) AS A(R);



If there is non numerics in your string you will need to decide how to handle case, etc and set the collation accordingly
dwain.c
dwain.c
SSCoach
SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)

Group: General Forum Members
Points: 18523 Visits: 6431
Just to be different, and seeing as how we are in the SQL 2012 forum which allows me to use LAG, I will offer this:


WITH SampleData AS
(
SELECT code=CAST('12333345566689333' AS NVARCHAR(4000))
UNION ALL SELECT '12333345566689'
),
Tally (n) AS
(
SELECT TOP (4000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns
),
SplitString AS
(
SELECT code, s, p, n
FROM
(
SELECT code, s, p=LAG(s, 1, '') OVER (PARTITION BY code ORDER BY n), n
FROM
(
SELECT code, n, s=SUBSTRING(code, n, 1)
FROM SampleData a
CROSS APPLY
(
SELECT TOP (LEN(code)) n
FROM Tally
) b
WHERE SUBSTRING(code, n, 1) <> ''
) a
) a
WHERE s <> p
)
SELECT code
,ReducedString=
(
SELECT s + ''
FROM SplitString b
WHERE a.code = b.code
ORDER BY n
FOR XML PATH('')
)
FROM SampleData a;





My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
My temporal SQL musings: Calendar Tables, an Easter SQL, Time Slots and Self-maintaining, Contiguous Effective Dates in Temporal Tables
dwain.c
dwain.c
SSCoach
SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)

Group: General Forum Members
Points: 18523 Visits: 6431
OK, so the purists will argue that LAG is overkill, and for them I offer this instead:


WITH SampleData AS
(
SELECT code=CAST('12333345566689333' AS NVARCHAR(4000))
UNION ALL SELECT '12333345566689'
),
Tally (n) AS
(
SELECT TOP (4000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns
),
SplitString AS
(
SELECT code, s, n
FROM
(
SELECT code, n, s=SUBSTRING(code, n, 1)
FROM SampleData a
CROSS APPLY
(
SELECT TOP (LEN(code)) n
FROM Tally
) b
WHERE SUBSTRING(code, n, 1) <> SUBSTRING(code, n-1, 1)
) a
)
SELECT code
,ReducedString=
(
SELECT s + ''
FROM SplitString b
WHERE a.code = b.code
ORDER BY n
FOR XML PATH('')
)
FROM SampleData a;





My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
My temporal SQL musings: Calendar Tables, an Easter SQL, Time Slots and Self-maintaining, Contiguous Effective Dates in Temporal Tables
j.heidrich
j.heidrich
Say Hey Kid
Say Hey Kid (684 reputation)Say Hey Kid (684 reputation)Say Hey Kid (684 reputation)Say Hey Kid (684 reputation)Say Hey Kid (684 reputation)Say Hey Kid (684 reputation)Say Hey Kid (684 reputation)Say Hey Kid (684 reputation)

Group: General Forum Members
Points: 684 Visits: 63
Many thanks for your anwers.

I tried the two different approaches from dwain.c with my sample table with 100k nvarchar(50) strings. On my machine with Win 8 64, SQL Server 2012, Core i7 2.4 Ghz and 16 GB RAM the second version without LAG function took around 5 seconds. The first version with LAG executed for 12 minutes when I aborted it!! Slight difference...

With the other three approaches I have to admit that I can't get it to work with my sample table. How can I incorporate a column from my table into the code instead of the @code parameter?
Anyway, I think the 5 seconds will be hard to beat.
dwain.c
dwain.c
SSCoach
SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)

Group: General Forum Members
Points: 18523 Visits: 6431
j.heidrich (1/21/2014)
Many thanks for your anwers.

I tried the two different approaches from dwain.c with my sample table with 100k nvarchar(50) strings. On my machine with Win 8 64, SQL Server 2012, Core i7 2.4 Ghz and 16 GB RAM the second version without LAG function took around 5 seconds. The first version with LAG executed for 12 minutes when I aborted it!! Slight difference...

With the other three approaches I have to admit that I can't get it to work with my sample table. How can I incorporate a column from my table into the code instead of the @code parameter?
Anyway, I think the 5 seconds will be hard to beat.


Thanks for posting those performance results. I was just having a bit of fun with LAG, knowing it wouldn't be as good as the option without it. I'm casting my net wide to find an instance where LAG beats an earlier code pattern in performance and your test shows me this ain't one of 'em.


My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
My temporal SQL musings: Calendar Tables, an Easter SQL, Time Slots and Self-maintaining, Contiguous Effective Dates in Temporal Tables
dwain.c
dwain.c
SSCoach
SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)

Group: General Forum Members
Points: 18523 Visits: 6431
MickyT's FUNCTION should work like this with your table replacing SampleData:


WITH SampleData AS
(
SELECT code=CAST('12333345566689333' AS NVARCHAR(4000))
UNION ALL SELECT '12333345566689'
)
SELECT *
FROM SampleData
CROSS APPLY compressDuplicates(code);





My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
My temporal SQL musings: Calendar Tables, an Easter SQL, Time Slots and Self-maintaining, Contiguous Effective Dates in Temporal Tables
ChrisM@Work
ChrisM@Work
SSC-Forever
SSC-Forever (43K reputation)SSC-Forever (43K reputation)SSC-Forever (43K reputation)SSC-Forever (43K reputation)SSC-Forever (43K reputation)SSC-Forever (43K reputation)SSC-Forever (43K reputation)SSC-Forever (43K reputation)

Group: General Forum Members
Points: 43143 Visits: 20015
Couple of small changes to MickeyT's otherwise excellent function:

DECLARE @code nvarchar(4000);
SET @code = '99944777777775588888888888812' ;

WITH cteTally as (
SELECT TOP(LEN(@code) - 1) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) N
FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) E1(N)
,(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) E2(N)
,(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) E3(N)
),
startpattern as (
SELECT SUBSTRING(@code,1,1) P
UNION ALL
SELECT SUBSTRING(@code,N+1,1) P
FROM cteTally
WHERE SUBSTRING(@code,N,1) <> SUBSTRING(@code,N+1,1) COLLATE Latin1_General_BIN -- Change to suit comparison required
)
SELECT CAST(r AS NVARCHAR(4000)) result
FROM (
SELECT P
FROM startpattern
FOR XML PATH('')
) AS A(R);

-- result: "<P>9</P><P>4</P><P>7</P><P>5</P><P>8</P><P>1</P><P>2</P>"
--------------------------------------------------------------------------------------
WITH cteTally as (
SELECT TOP(LEN(@code) - 1) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) N
FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) E1(N)
,(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) E2(N)
,(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) E3(N)
),
startpattern as (
SELECT FirstNumOfSet = SUBSTRING(@code,n,1)
FROM cteTally
WHERE SUBSTRING(@code,N,1) <> SUBSTRING(@code,N-1,1) COLLATE Latin1_General_BIN -- Change to suit comparison required
)

SELECT result = CAST(XMLresult AS NVARCHAR(4000))
FROM (
SELECT FirstNumOfSet AS [text()]
FROM startpattern
FOR XML PATH('')
) d (XMLresult);


-- result: "947581"



“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search