Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

How to find values in a comma delimeted string with len>3 Expand / Collapse
Author
Message
Posted Wednesday, April 17, 2013 7:58 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, August 21, 2013 8:11 AM
Points: 5, Visits: 54
Hi everyone,

can you please help me with the following? I have a nvarchar column with data like this: ', 55,85,1,4,9888,6587,'
How can I found the rows that include in their comma delimeted string, values with len()>3?

I.e. if I had the following rows:

', 55,85,1,4,9888,6587,'
', 55,85,1,4,98,65,'

I would like a statement that returns only the first row (beacuse it has values: 9888 and 6587) or only the values: 9888, 6587.

Thank you in advance
Post #1443273
Posted Wednesday, April 17, 2013 8:11 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 9:48 AM
Points: 5,415, Visits: 10,069
Here's one way of doing it. It may not be the most efficient way, so you'll want to test for performance if you're going to use it in production.

SELECT Mynvarcharcol
FROM Mytable
WHERE Mynvarcharol LIKE '%[^,][^,][^,][^,]%'

John
Post #1443283
Posted Thursday, April 18, 2013 12:22 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Today @ 12:07 PM
Points: 1,061, Visits: 2,570
[I think what the OP wants is to return only the comma-delimited strings where any one of the values in the string has LEN(value) > 3. If that's the case, this works, but may need performance tweaking depending on the OP's database schema and data:


-- NOTE: Be sure to remove the lowercase letter 'x' from the CREATE and DROP T-SQL keywords in this code - my company blocks internet traffic that includes certain T-SQL keywords.

CxREATE TABLE #TEMP (rowID int identity(1,1), delimitedString varchar(60))

GO

INSERT INTO #TEMP

VALUES (', 55,85,1,4,9888,6587,'), (', 55,85,1,4,98,65,' )

GO

WITH cte1 AS (

SELECT c.rowID, s.Item, LEN(s.Item) as valueLen

FROM #TEMP c

OUTER APPLY dbo.DelimitedSplit8K(c.delimitedString,',') s

WHERE LEN(s.Item) > 3

)

SELECT t.rowID, t.delimitedString

FROM #TEMP t

WHERE EXISTS (SELECT 1 FROM cte1 c WHERE t.rowID = c.rowID)

GO

DxROP TABLE #TEMP



OP, you'll need to have some column to uniquely identify each row of your input - if the table has a primary key, you can use that in place of the rowID column I created in my #TEMP table. In cte1, I parsed each string into its values using my version of Jeff Moden's string splitter function and included the rowID where LEN(<parsed value>) > 3. Then I selected rows from #TEMP where the rowID exists in cte1, which gives me only the rows where the LEN() > 3 for any parsed value in the comma-delimited string.

If the CTE in my code causes performance issues with your data, you could convert it to a temp table with an appropriate index to speed things up. No matter what you do, though, parsing delimited strings for any number of rows will be pretty slow.

Hope that helps!


Jason Wolfkill
Blog: SQLSouth
Twitter: @SQLSouth
Post #1444079
Posted Thursday, April 18, 2013 1:17 PM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Today @ 12:25 PM
Points: 20,729, Visits: 32,489
labri (4/17/2013)
Hi everyone,

can you please help me with the following? I have a nvarchar column with data like this: ', 55,85,1,4,9888,6587,'
How can I found the rows that include in their comma delimeted string, values with len()>3?

I.e. if I had the following rows:

', 55,85,1,4,9888,6587,'
', 55,85,1,4,98,65,'

I would like a statement that returns only the first row (beacuse it has values: 9888 and 6587) or only the values: 9888, 6587.

Thank you in advance


Which do you want, the entire row in which there is at least one numeric value with a length greater than 3 or just those numeric values from that row with a length greater than 3?




Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #1444114
Posted Thursday, April 18, 2013 2:38 PM


SSChasing Mays

SSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing Mays

Group: General Forum Members
Last Login: Friday, October 17, 2014 7:06 AM
Points: 612, Visits: 2,853
It's work noting that wolfkill and I are using the same splitter (I mine just has a different name).

-- sample data 
IF OBJECT_ID('tempdb..#x') IS NOT NULL
DROP TABLE #x;

CREATE TABLE #x (id int identity primary key, val nvarchar(100) NOT NULL);

INSERT INTO #x
SELECT '55,85,1,4,9888,6587' UNION ALL --we want this one
SELECT '55,85,1,4,98,65' UNION ALL --we don't want this
SELECT '1122,33333,22,11,40' UNION ALL --we want this one
SELECT '312,9,8,7' --we don't want this
GO

-- code to get what you need
WITH legitIDs AS
( SELECT id
FROM #x x
CROSS APPLY dbo.splitString(x.val,',')
WHERE LEN(item)>=4
GROUP BY id)
SELECT val
FROM #x x
JOIN legitIDs l ON x.id=l.id;

--cleanup
DROP TABLE #x;
GO



-- Alan Burstein



Read this article for best practices on asking questions.
Need to split a string? Try this (Jeff Moden)
Need a pattern-based string spitter? Try this (Dwain Camps)

"I can't stress enough the importance of switching from a 'sequential files' mindset to 'set-based' thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code. " -- Itzek Ben-Gan 2001

My blog
Post #1444145
Posted Friday, April 19, 2013 6:16 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, August 21, 2013 8:11 AM
Points: 5, Visits: 54
Thank you all for your answers! I found what I needed using the
Alan.B's query.
Post #1444343
Posted Friday, April 19, 2013 7:20 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Today @ 12:07 PM
Points: 1,061, Visits: 2,570
labri (4/19/2013)
Thank you all for your answers! I found what I needed using the
Alan.B's query.


Glad you got what you needed. Thanks for posting back to let us know you did!


Jason Wolfkill
Blog: SQLSouth
Twitter: @SQLSouth
Post #1444379
Posted Friday, April 19, 2013 8:38 AM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Today @ 12:25 PM
Points: 20,729, Visits: 32,489
I was doing some testing, code below:


set nocount on;
-- sample data
IF OBJECT_ID('tempdb..#x') IS NOT NULL
DROP TABLE #x;

CREATE TABLE #x (id int identity primary key, val nvarchar(100) NOT NULL);
GO

INSERT INTO #x
SELECT '55,85,1,4,9888,6587' UNION ALL --we want this one
SELECT '55,85,1,4,98,65' UNION ALL --we don't want this
SELECT '1122,33333,22,11,40' UNION ALL --we want this one
SELECT '312,9,8,7' --we don't want this
GO 10000

set nocount on;
--===== Create the timer and bit-bucket variables and start the timer.
DECLARE @BitBucket nvarchar(100); --< change or add variables needed here to eliminate display distortion

Declare @cpu_ int,
@lreads_ bigint,
@eMsec_ bigint,
@Reads_ bigint,
@Writes_ bigint;
declare @CpuMs int,
@LogRds bigint,
@Elapsed bigint,
@Reads bigint,
@Writes bigint;

dbcc freeproccache with no_infomsgs;
dbcc freesystemcache('ALL') with no_infomsgs;
dbcc dropcleanbuffers with no_infomsgs;


Select
@cpu_ = cpu_time
, @lreads_ = logical_reads
, @eMsec_ = total_elapsed_time
, @Reads_ = reads
, @Writes_ = writes
From
sys.dm_exec_requests
Where
session_id = @@spid;

---
WITH legitIDs AS
( SELECT id
FROM #x x
CROSS APPLY dbo.DelimitedSplit8K(x.val,',')
WHERE LEN(item)>=4
GROUP BY id)
SELECT
@BitBucket = val
FROM
#x x
JOIN legitIDs l ON x.id=l.id;
---

Select
@CpuMs = cpu_time - @cpu_
, @LogRds = logical_reads - @lreads_
, @Elapsed = total_elapsed_time - @eMsec_
, @Reads = reads - @Reads_
, @Writes = writes - @Writes_
From
sys.dm_exec_requests
Where
session_id = @@spid;
--===== Display the duration
PRINT '----- Using DelimitedSplit/CROSS APPLY -----';
PRINT 'CPU(ms): ' + right(' ' + cast(@CpuMs as varchar(10)),10) +
' Logical Reads: ' + right(' ' + cast(@LogRds as varchar(10)),10) +
' Elapsed(ms): ' + right(' ' + cast(@Elapsed as varchar(10)),10) +
' Reads: ' + right(' ' + cast(@Reads as varchar(10)),10) +
' Writes: ' + right(' ' + cast(@Writes as varchar(10)),10);

set nocount off;
GO

set nocount on;
--===== Create the timer and bit-bucket variables and start the timer.
DECLARE @BitBucket nvarchar(100); --< change or add variables needed here to eliminate display distortion

Declare @cpu_ int,
@lreads_ bigint,
@eMsec_ bigint,
@Reads_ bigint,
@Writes_ bigint;
declare @CpuMs int,
@LogRds bigint,
@Elapsed bigint,
@Reads bigint,
@Writes bigint;

dbcc freeproccache with no_infomsgs;
dbcc freesystemcache('ALL') with no_infomsgs;
dbcc dropcleanbuffers with no_infomsgs;


Select
@cpu_ = cpu_time
, @lreads_ = logical_reads
, @eMsec_ = total_elapsed_time
, @Reads_ = reads
, @Writes_ = writes
From
sys.dm_exec_requests
Where
session_id = @@spid;

---
select
@BitBucket = val
FROM
#x x
where
patindex('%[^,][^,][^,][^,]%',val) > 0;
---

Select
@CpuMs = cpu_time - @cpu_
, @LogRds = logical_reads - @lreads_
, @Elapsed = total_elapsed_time - @eMsec_
, @Reads = reads - @Reads_
, @Writes = writes - @Writes_
From
sys.dm_exec_requests
Where
session_id = @@spid;
--===== Display the duration
PRINT '----- Using PATINDEX -----';
PRINT 'CPU(ms): ' + right(' ' + cast(@CpuMs as varchar(10)),10) +
' Logical Reads: ' + right(' ' + cast(@LogRds as varchar(10)),10) +
' Elapsed(ms): ' + right(' ' + cast(@Elapsed as varchar(10)),10) +
' Reads: ' + right(' ' + cast(@Reads as varchar(10)),10) +
' Writes: ' + right(' ' + cast(@Writes as varchar(10)),10);

set nocount off;
GO

--cleanup
DROP TABLE #x;
GO



Results:


Beginning execution loop
Batch execution completed 10000 times.

----- Using DelimitedSplit/CROSS APPLY -----
CPU(ms): 422 Logical Reads: 80564 Elapsed(ms): 1667 Reads: 80 Writes: 0

----- Using PATINDEX -----
CPU(ms): 109 Logical Reads: 245 Elapsed(ms): 110 Reads: 0 Writes: 0





Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #1444462
Posted Friday, April 19, 2013 9:13 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Today @ 12:07 PM
Points: 1,061, Visits: 2,570
Lynn Pettis (4/19/2013)
I was doing some testing, code below:

<snipped>

Results:


Beginning execution loop
Batch execution completed 10000 times.

----- Using DelimitedSplit/CROSS APPLY -----
CPU(ms): 422 Logical Reads: 80564 Elapsed(ms): 1667 Reads: 80 Writes: 0

----- Using PATINDEX -----
CPU(ms): 109 Logical Reads: 245 Elapsed(ms): 110 Reads: 0 Writes: 0




I see the cleverness of using PATINDEX to check for the existence of a substring of four consecutive non-comma characters and agree that it works for the minimal specifications that the OP provided.

If, however, the values in these comma-delimited strings are supposed to be numbers (and not strings of numerals) and what the OP really wants is any row where the comma-delimited string includes a value >= 1000 (on which I wish the OP had provided some clarification or that I had asked for such), the PATINDEX solution won't work. It will return 1 for the string '100, 192, 38, 3' because the second value consists of [space][1][9][2] - four non-comma characters - but 192 is not >= 1000. The string-split method using LEN() as both Alan.B and I wrote it will do the same thing, but it can be easily modified to convert the parsed values to numeric datatypes and compare them to the reference value as numbers to get only rows where at least one parsed, converted value >= 1000.


Jason Wolfkill
Blog: SQLSouth
Twitter: @SQLSouth
Post #1444476
Posted Friday, April 19, 2013 10:14 AM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Today @ 12:25 PM
Points: 20,729, Visits: 32,489
wolfkillj (4/19/2013)
Lynn Pettis (4/19/2013)
I was doing some testing, code below:

<snipped>

Results:


Beginning execution loop
Batch execution completed 10000 times.

----- Using DelimitedSplit/CROSS APPLY -----
CPU(ms): 422 Logical Reads: 80564 Elapsed(ms): 1667 Reads: 80 Writes: 0

----- Using PATINDEX -----
CPU(ms): 109 Logical Reads: 245 Elapsed(ms): 110 Reads: 0 Writes: 0




I see the cleverness of using PATINDEX to check for the existence of a substring of four consecutive non-comma characters and agree that it works for the minimal specifications that the OP provided.

If, however, the values in these comma-delimited strings are supposed to be numbers (and not strings of numerals) and what the OP really wants is any row where the comma-delimited string includes a value >= 1000 (on which I wish the OP had provided some clarification or that I had asked for such), the PATINDEX solution won't work. It will return 1 for the string '100, 192, 38, 3' because the second value consists of [space][1][9][2] - four non-comma characters - but 192 is not >= 1000. The string-split method using LEN() as both Alan.B and I wrote it will do the same thing, but it can be easily modified to convert the parsed values to numeric datatypes and compare them to the reference value as numbers to get only rows where at least one parsed, converted value >= 1000.


I had asked the same question and received no answer as well. I do have another piece of code that will deal with that.



Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #1444505
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse