October 24, 2024 at 11:02 am
Hi there,
This was posted in the wrong forum, hence duplicating here. I have SQL version 2019 (thanks to Phil, Scott and Steve for their previous inputs).
I've tried CROSS APPLY, PATINDEX and many other functions, but can't nail this.
For each record, I want to extract all numbers which follow a '#' and then create a new row for each.
Example String 1: "Hello world. #1234 has been replaced by #014521"
To return:
1234
014521
Example String 2: "#687459"
To return:
687459
If there is no '#', then return blank.
Thanks in advance.
October 24, 2024 at 5:36 pm
Post got submitted twice. Sorry.
-- Itzik Ben-Gan 2001
October 24, 2024 at 5:36 pm
For this you can use a "splitter" (AKA "tokenizer" function.)
DECLARE @String VARCHAR(8000) = 'Hello world. #1234 has been replaced by #014521';
SELECT SomeNbr = REPLACE(split.[value],'#','')
FROM STRING_SPLIT(@String,' ') AS split
WHERE split.[value] LIKE '#[0-9]%';
Returns:
SomeNbr
----------
1234
014521
Against a table it would look like this:
DECLARE @Strings TABLE (StringID INT IDENTITY, SomeString VARCHAR(1000));
INSERT @Strings(SomeString)
VALUES
('Hello world. #1234 has been replaced by #014521'),
('Numbers, numbers... #012 #999 #Numbers'),
('The quick brown fox...'),
('#000 Another row... #123') ;
SELECT
StringID = s.StringID,
SomeNbr = REPLACE(split.[value],'#','')
FROM @Strings AS s
CROSS APPLY STRING_SPLIT(s.SomeString,' ') AS split
WHERE split.[value] LIKE '#[0-9]%';
This returns:
StringID SomeNbr
----------- -----------
1 1234
1 014521
2 012
2 999
4 000
4 123
This gets us what we need except for "If there is no '#', then return blank." For the blanks we'll need to push our logic into a subquery and leverage OUTER APPLY like so:
SELECT
StringID = s.StringID,
SomeNbr = ISNULL(split.SomeNbr,'')
FROM @Strings AS s
OUTER APPLY
(
SELECT SomeNbr = REPLACE(split.[value],'#','')
FROM STRING_SPLIT(s.SomeString,' ') AS split
WHERE split.[value] LIKE '#[0-9]%'
) AS split;
Returns:
StringID SomeNbr
----------- -----------
1 1234
1 014521
2 012
2 999
4 000
4 123
-- Itzik Ben-Gan 2001
October 25, 2024 at 9:45 am
Hi Alan,
Many thanks for the code you have provided. I am still evaluating how to apply this to an existing table, but I do have an observation. If there is no space between the number and '#', I get this:-
DECLARE @String VARCHAR(8000) = 'Hello world. #1234 has been replaced by #014521#2345';
SELECT SomeNbr = REPLACE(split.[value],'#','')
FROM STRING_SPLIT(@String,' ') AS split
WHERE split.[value] LIKE '#[0-9]%';
Result:
1234
0145212345
Should be:
1234
014521
2345
Also, if I declare this string with parenthesis around a number, it give me just the first occurrence of a #, 1234:-
DECLARE @String VARCHAR(8000) = 'Hello world. #1234 has been replaced by (#014521) (#2345)
Result:
1234
Thank you again!
p.s. How do I avoid the extra spaces between each line of text? lol
October 25, 2024 at 7:47 pm
Or what's the question? It seems the topic's scope has crept after solutions were posted
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
October 25, 2024 at 10:56 pm
It's too difficult to answer. I am a new learner.
October 25, 2024 at 10:57 pm
It is too difficult for me.
October 30, 2024 at 9:09 pm
UPDATED 10/31/2024
The was an error in my code, I just changed "bernieML.samd.ngrams8K" to "dbo.ngrams8K". I was using code from my own DB (BernieML) with a different schema (samd).
Sorry for the late reply - I didn't get the email saying there were replies...
To deal with cases where there aren't spaces before "#" or situations like (#123), or (for fun) numbers without preceding #'s, we can do this:
SELECT
Item = SUBSTRING(split.[value],1,IIF(i.Pos=0,8000,i.Pos-1))
FROM STRING_SPLIT(@String,'#') AS split
CROSS APPLY (VALUES(PATINDEX('%[^0-9]%',split.[Value]))) AS i(Pos)
WHERE split.[value] LIKE '[0-9]%';
On a pre-2019 system I would recommend delimitedSplit8K. The solution would look like this:
SELECT
Item = SUBSTRING(split.[value],1,IIF(i.Pos=0,8000,i.Pos-1))
FROM STRING_SPLIT(@String,'#') AS split
CROSS APPLY (VALUES(PATINDEX('%[^0-9]%',split.[Value]))) AS i(Pos)
WHERE split.[value] LIKE '[0-9]%';
This parameter value includes each of the aforementioned issues, and will still be handled correctly:
DECLARE @String VARCHAR(8000) =
'Emp 223 says: (#1234) has been replaced by #014521#2345. Please call me at 555-333-1234.';
For customized split requirements I use NGrams8K. Note the code below and results.
DECLARE @String VARCHAR(8000) =
'Emp 223 says: (#1234) has been replaced by #014521#2345. Please call me at 555-333-1234.';
;--==== (1) Get the position of each delimiter (ng.Position)
SELECT ng.Position
FROM dbo.NGrams8k(@String,1) AS ng
WHERE ng.Token = '#';
;--==== (2) Build out suffixes from each delimiter position (s.Suffix)
SELECT
ng.Position,
s.Suffix
FROM dbo.NGrams8k(@String,1) AS ng
CROSS APPLY (VALUES(SUBSTRING(@String,ng.Position+1,8000))) AS s(Suffix)
WHERE ng.Token = '#';
;--==== (3) Find the end position for each item (i.Pos)
SELECT
ng.Position,
s.Suffix,
i.Pos
FROM dbo.NGrams8k(@String,1) AS ng
CROSS APPLY (VALUES(SUBSTRING(@String,ng.Position+1,8000))) AS s(Suffix)
CROSS APPLY (VALUES(PATINDEX('%[^0-9]%',s.Suffix))) AS i(Pos)
WHERE ng.Token = '#';
;--==== (4) use s.Suffix and i.Pos to build the "Item"
SELECT
Item = SUBSTRING(s.Suffix,1,IIF(i.Pos=0,8000,i.Pos-1))
FROM dbo.NGrams8k(@String,1) AS ng
CROSS APPLY (VALUES(SUBSTRING(@String,ng.Position+1,8000))) AS s(Suffix)
CROSS APPLY (VALUES(PATINDEX('%[^0-9]%',s.Suffix))) AS i(Pos)
WHERE ng.Token = '#';
Results from each:
-- Itzik Ben-Gan 2001
October 31, 2024 at 1:16 am
Alan... you might want to explain what the bernieML.samd. stuff is.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 31, 2024 at 1:42 am
I am still evaluating how to apply this to an existing table..
And there's the rub. Please provide 5 to 10 rows of readily consumable data for use to operate with. Please see the article at the first link in my signature line below for one way to provide such a thing.
In the meantime, the following works for the variable version...
DECLARE @String VARCHAR(8000) = 'Hello world. #1234 has been replaced by #014521#2345';
SELECT Number = TRIM(sp2.Number)
FROM STRING_SPLIT(@String,'#') sp1
CROSS APPLY (SELECT value FROM STRING_SPLIT(sp1.value,' '))sp2(Number)
WHERE sp2.Number LIKE ('[0-9]%')
;
Here's the result...
This would make a good, fairly high performance iTVF (inline Table Valued Function).
Looking forward to some readily consumable data to take it there.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 31, 2024 at 3:59 pm
Alan... you might want to explain what the bernieML.samd. stuff is.
Thanks Jeff - I fixed my code. I was using my own DB and forgot to remove the DB.Schema reference. It's been a couple years since I posted here - rust and lack of coffee
-- Itzik Ben-Gan 2001
Viewing 11 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy