Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 123»»»

Parse field with Numeric ranges and comma deliminated values Expand / Collapse
Author
Message
Posted Wednesday, March 24, 2010 10:35 AM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Today @ 10:47 AM
Points: 143, Visits: 612
Ok I am a bit stuck

I have a field that contains a mix of ranged data. All data is numeric; field type is varchar.

Example of field contents:

44100-44110, 44150

I need to derive a result in a NEW table that will contain one record each of

44100
44101
44102
44103
44104
44105
44106
44107
44108
44109
44110
44150


Can anyone help with a piece of code that would accomplish this?

J


<hr noshade size='1' width='250' color='#BBC8E5'>


Regards,

Jeffery Williams
http://www.linkedin.com/in/jwilliamsoh
Post #889119
Posted Wednesday, March 24, 2010 10:53 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 8:51 AM
Points: 6,600, Visits: 8,900
This looks interesting, and I'd like to help you out, but there just isn't enough information to begin. So, to start off, please read the first link in my signature and then post some table DDL statements, and some DML to put some representative data into it, and then I'll be able to see what you're working with.

Will all of the data always have one range, followed by a comma and one additional number? If not, ensure that you include enough variations of data so that we can see what we really need to work with.


Wayne
Microsoft Certified Master: SQL Server 2008
If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
Links: For better assistance in answering your questions, How to ask a question, Performance Problems, Common date/time routines,
CROSS-TABS and PIVOT tables Part 1 & Part 2, Using APPLY Part 1 & Part 2, Splitting Delimited Strings
Post #889140
Posted Wednesday, March 24, 2010 11:09 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 8:51 AM
Points: 6,600, Visits: 8,900
Based on the single row of test data you supplied, and assuming that all other rows are in the same exact format, this will work:
declare @test table (MixedData varchar(50))
insert into @test values ('44100-44110, 44150')


;WITH CTE (RangeData, SingleData) AS
(SELECT SUBSTRING(MixedData, 1, Charindex(',', MixedData)-1),
convert(int, SUBSTRING(MixedData, CharIndex(',', MixedData)+1, 50))
FROM @test),
CTE2 (RangeStart, RangeEnd, SingleData) AS
(SELECT convert(int, SUBSTRING(RangeData, 1, CharIndex('-', RangeData)-1)),
convert(int, SUBSTRING(RangeData, CharIndex('-', RangeData)+1, 50)),
SingleData
FROM CTE),
Tens (N) AS (SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL
SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0),
Thousands(N) AS (SELECT 1 FROM Tens t1 CROSS JOIN Tens t2 CROSS JOIN Tens t3),
Millions (N) AS (SELECT 1 FROM Thousands t1 CROSS JOIN Thousands t2),
Tally (N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM Millions)
SELECT N
FROM Tally, CTE2
WHERE N BETWEEN CTE2.RangeStart and CTE2.RangeEnd
OR N = CTE2.SingleData

Note that this utilizes a virtual tally table. If you already have your own tally table, you can omit the parts that build it.


Wayne
Microsoft Certified Master: SQL Server 2008
If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
Links: For better assistance in answering your questions, How to ask a question, Performance Problems, Common date/time routines,
CROSS-TABS and PIVOT tables Part 1 & Part 2, Using APPLY Part 1 & Part 2, Splitting Delimited Strings
Post #889150
Posted Wednesday, March 24, 2010 11:15 AM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Today @ 10:47 AM
Points: 143, Visits: 612
Thank you, I was just preparing to post the create statements and a text file of some data.

I will take a look at your code. With regards to all rows containing the same exact format; not really...


SOME will have no range in it (44100-44110) and only contain things like 44110,44125,44157, etc (maybe three or so values, maybe 20)...

Other fields might have multiple ranges

ex: 441100-44110, 44128, 44524-44555, 44112, 52144


Things like this... Makes it interesting for sure.



<hr noshade size='1' width='250' color='#BBC8E5'>


Regards,

Jeffery Williams
http://www.linkedin.com/in/jwilliamsoh
Post #889160
Posted Wednesday, March 24, 2010 11:08 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 8:51 AM
Points: 6,600, Visits: 8,900
-- See how this starts off by creating a table 
-- and inserting representative test data into it?
-- If you do this, it makes it a LOT easier for all
-- of the volunteers on this site to just copy/paste
-- this into a query window and start working on it.
DECLARE @test TABLE (MixedData varchar(50))
INSERT INTO @test
SELECT '44100-44110, 44150' UNION ALL
SELECT '44115,44125,44157' UNION ALL
SELECT ' 44250-44350, 44128, 44524-44555, 44112, 52144'

;WITH
-- This begins a virtual tally table. This query will be a LOT faster (< 1 second) with a permanent one.
-- See the article below for how to build one.
Tens (N) AS (SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL
SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0),
Thousands(N) AS (SELECT 1 FROM Tens t1 CROSS JOIN Tens t2 CROSS JOIN Tens t3),
Millions (N) AS (SELECT 1 FROM Thousands t1 CROSS JOIN Thousands t2),
Tally (N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM Millions),

-- first, need to break down each mixed data row into separate items, either a single item or a range.
-- See Jeff Moden's article The "Numbers" or "Tally" Table: What it is and how it replaces a loop.
-- at http://www.sqlservercentral.com/articles/T-SQL/62867/ for how a tally table can split strings apart.
Elements (Items) AS (
SELECT RTRIM(LTRIM(SUBSTRING(',' + MixedData + ',',N+1,CHARINDEX(',',',' + MixedData + ',',N+1)-N-1)))
FROM Tally, @test
WHERE N < LEN(',' + MixedData + ',')
AND SUBSTRING(',' + MixedData + ',',N,1) = ','),

-- Now, for each item, get the range start/end. A range will have a "-".
-- If just a single item, use the item for both the range start and end.
Elements2 (RangeStart, RangeEnd) AS (
SELECT CONVERT(INT, CASE WHEN CharIndex('-', Items) > 0 THEN LEFT(Items, CharIndex('-', Items)-1)
ELSE Items
END),
CONVERT(INT, CASE WHEN CharIndex('-', Items) > 0 THEN SUBSTRING(Items, CharIndex('-', Items)+1, len(Items))
ELSE Items
END)
FROM Elements)

-- Finally, return all the individual items
SELECT N
FROM Tally, Elements2
WHERE N BETWEEN Elements2.RangeStart and Elements2.RangeEnd

Edit: corrected name misspelling


Wayne
Microsoft Certified Master: SQL Server 2008
If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
Links: For better assistance in answering your questions, How to ask a question, Performance Problems, Common date/time routines,
CROSS-TABS and PIVOT tables Part 1 & Part 2, Using APPLY Part 1 & Part 2, Splitting Delimited Strings
Post #889536
Posted Thursday, March 25, 2010 8:25 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 3:25 AM
Points: 12,914, Visits: 32,071
nice breakthrough Wayne; I couldn't get my mind around the multiple ranges;
performance on this solution sux, took my decent dev machine 1:09 to return the 151 rows, we might look at optimizing it , but i'm just happy to add a solution like that to my toolbox.


Lowell

--There is no spoon, and there's no default ORDER BY in sql server either.
Actually, Common Sense is so rare, it should be considered a Superpower. --my son
Post #889855
Posted Thursday, March 25, 2010 8:40 AM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Today @ 10:47 AM
Points: 143, Visits: 612
Same here. I PM'd but should have replied publicly. Yes the results of this script are amazing and I just could not get my head around it either.

Performance is really bad (BUT IT WORKS)... I do need to tune it however and I am going to be working on that. The performance hit is in the joins. If anyone has a recommendation or modifications that would be great. Otherwise I will post what I come up with here.

Jeff


<hr noshade size='1' width='250' color='#BBC8E5'>


Regards,

Jeffery Williams
http://www.linkedin.com/in/jwilliamsoh
Post #889867
Posted Thursday, March 25, 2010 11:25 AM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Today @ 10:47 AM
Points: 143, Visits: 612
The new code is working well. Need to make changes to include some additional code I need for my purpose.


<hr noshade size='1' width='250' color='#BBC8E5'>


Regards,

Jeffery Williams
http://www.linkedin.com/in/jwilliamsoh
Post #890061
Posted Thursday, March 25, 2010 11:29 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 8:51 AM
Points: 6,600, Visits: 8,900
Lowell (3/25/2010)
nice breakthrough Wayne; I couldn't get my mind around the multiple ranges;
performance on this solution sux, took my decent dev machine 1:09 to return the 151 rows, we might look at optimizing it , but i'm just happy to add a solution like that to my toolbox.


Yes, the performance isn't very good. Several things contribute to this:
1. using a virtual (aka in-line or dynamic) tally table vs. a physical tally table with a good clustered index
2. the tally table is being called twice
3. we're looking at numbers that require building it into the million-row range

If you replace the virtual tally table with a physical one, things perform much better. On my system, it goes from 50+ seconds to <1 second.

To create the permanent tally table with a clustered index:
;WITH
Tens (N) AS (SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL
SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0),
Thousands(N) AS (SELECT 1 FROM Tens t1 CROSS JOIN Tens t2 CROSS JOIN Tens t3),
Millions (N) AS (SELECT 1 FROM Thousands t1 CROSS JOIN Thousands t2),
Tally (N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM Millions)
SELECT N
INTO dbo.TALLY
FROM Tally
ORDER BY N

ALTER TABLE dbo.TALLY ALTER COLUMN N int NOT NULL
ALTER TABLE dbo.TALLY ADD CONSTRAINT [PK_TALLY] PRIMARY KEY CLUSTERED (N)

Then, change the previously posted code for this solution to:
-- See how this starts off by creating a table 
-- and inserting representative test data into it?
-- If you do this, it makes it a LOT easier for all
-- of the volunteers on this site to just copy/paste
-- this into a query window and start working on it.
DECLARE @test TABLE (MixedData varchar(50))
INSERT INTO @test
SELECT '44100-44110, 44150' UNION ALL
SELECT '44115,44125,44157' UNION ALL
SELECT ' 44250-44350, 44128, 44524-44555, 44112, 52144'

;WITH
-- first, need to break down each mixed data row into separate items,
-- either a single item or a range. See Jeff Moden's article
-- The "Numbers" or "Tally" Table: What it is and how it replaces a loop.
-- at http://www.sqlservercentral.com/articles/T-SQL/62867/ for how
-- a tally table can split strings apart.
Elements (Items) AS (
SELECT RTRIM(LTRIM(SUBSTRING(',' + MixedData + ',',N+1,CHARINDEX(',',',' + MixedData + ',',N+1)-N-1)))
FROM dbo.Tally, @test
WHERE N < LEN(',' + MixedData + ',')
AND SUBSTRING(',' + MixedData + ',',N,1) = ','),

-- Now, for each item, get the range start/end. A range will have a "-".
-- If just a single item, use the item for both the range start and end.
Elements2 (RangeStart, RangeEnd) AS (
SELECT CONVERT(INT, CASE WHEN CharIndex('-', Items) > 0 THEN LEFT(Items, CharIndex('-', Items)-1)
ELSE Items
END),
CONVERT(INT, CASE WHEN CharIndex('-', Items) > 0 THEN SUBSTRING(Items, CharIndex('-', Items)+1, len(Items))
ELSE Items
END)
FROM Elements)

-- Finally, return all the individual items
SELECT N
FROM dbo.Tally, Elements2
WHERE N BETWEEN Elements2.RangeStart and Elements2.RangeEnd

Lowell and Jeff, I'm interested in how this improves the performance for you.

My results:

Table '#07020F21'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TALLY'. Scan count 13, logical reads 73, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 165 ms.

Execution plan is attached.

Edit: added performance results and execution plan
Edit2: corrected name misspelling


Wayne
Microsoft Certified Master: SQL Server 2008
If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
Links: For better assistance in answering your questions, How to ask a question, Performance Problems, Common date/time routines,
CROSS-TABS and PIVOT tables Part 1 & Part 2, Using APPLY Part 1 & Part 2, Splitting Delimited Strings


  Post Attachments 
Parse Field with numeric ranges and single items.sqlplan (2 views, 106.91 KB)
Post #890070
Posted Thursday, March 25, 2010 11:58 AM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Today @ 10:47 AM
Points: 143, Visits: 612
From 4.5 hours to get through HALF the records to 13 seconds to process the entire job.


<hr noshade size='1' width='250' color='#BBC8E5'>


Regards,

Jeffery Williams
http://www.linkedin.com/in/jwilliamsoh
Post #890115
« Prev Topic | Next Topic »

Add to briefcase 123»»»

Permissions Expand / Collapse