I'll pick up that gauntlet.. How about we up the ante a little. Split the string, return the PK, the split value AND the index in the original string....
Heh... Good Man! I just knew you'd be the one to bite, Matt... we've been having a lot of fun at this on other threads... 🙂
Ok... here's the rules...
1. You're going to use a CLR... you may pick ANY language to write the CLR but you must tell us what you used.
2. I'm going to use my ol' favorite... Tally Table. Here's the code for it just in case you don't have it for testing...
--===== Create and populate the Tally table on the fly
SELECT TOP 11000 --equates to more than 30 years of dates
IDENTITY(INT,1,1) AS N
FROM Master.dbo.SysColumns sc1,
--===== Add a Primary Key to maximize performance
ALTER TABLE dbo.Tally
ADD CONSTRAINT PK_Tally_N
PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100
--===== Allow the general public to use it
GRANT SELECT ON dbo.Tally TO PUBLIC
3. We both use the same data. With that in mind, here's the data we'll use. We may not add, remove, or alter any columns or calculated columns. We may add/remove/change any indexes on the column we see fit but we must reveal what they are.
DROP TABLE JBMTest
--===== Create and populate a 100,000 row test table.
-- Column "RowNum" has a range of 1 to 100,000 unique numbers
-- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers
-- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings
-- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers
-- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times
-- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'
-- for all rows.
-- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)
-- Takes about 2 seconds to execute.
SELECT TOP 100000
SomeID = IDENTITY(INT,1,1),
SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,
SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)
SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),
SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),
SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),
SomeHex12 = RIGHT(NEWID(),12)
FROM Master.dbo.SysColumns t1,
Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN
--===== A table is not properly formed unless a Primary Key has been assigned
-- Takes about 1 second to execute.
ALTER TABLE dbo.JBMTest
ADD PRIMARY KEY CLUSTERED (SomeID)
--===== Change a couple of the SomeCSV values to produce special situations
SET SomeCSV = NULL
WHERE SomeID = 1
SET SomeCSV = ''
WHERE SomeID = 2
SET SomeCSV = ','
WHERE SomeID = 3
SET SomeCSV = ',Part02,Part03,Part04,,,Part07,Part08,Part09,'
WHERE SomeID = 4
SET SomeCSV = ',Part02,Part03,Part04, , ,Part07,Part08,Part09,'
WHERE SomeID = 5
SET SomeCSV = 'Part01'
WHERE SomeID = 6
SET SomeCSV = NULL
WHERE SomeID = 7
4. The result of the split will be a table containing 1 row for each "split value". Although the table may contain any number of columns or indexes, it must contain at least the Primary Key column (SomeID) from the test table, the SplitValue, and the [Index] of that SplitValue. The end result of the new table must have a Unique Clustered Index at completion.
5. "Missing" values in the original test table must be single blanks in the new table as verified by DATALENGTH.
6. Fully NULL string in the test table must return a NULL for the SplitValue. Index in that case may be NULL or zero, your choice.
7. SplitValue column must be able to handle max string width of 8000 characters (obviously NOT NVarChar).
8. All code, except for SELECT TOP 100 * for verification purposes and duration measurment code, must be included in duration measurements.
With all of that in mind... here's my submittal including duration code...
--===== Identify the run
PRINT SPACE(12)+'Jeff Moden''s Tally Table Solution for Indexed Parsing.'
--===== Declare and start a time to measure duration with
DECLARE @StartTime DATETIME
SET @StartTime = GETDATE()
--===== Make sure the scratchpad table doesn't already exist
IF OBJECT_ID('TempDB..#MyHead','U') IS NOT NULL
DROP TABLE #MyHead
--===== Do the split including an index for each part
-- Empty or blank strings/split values return BLANKS with the correct Index.
-- Totally Null string returns NULL with a "0" index, saves about 2 seconds if NULL returned instead.
-- Leading and trailing spaces in each split value are removed.
SplitValue = LTRIM(RTRIM(SUBSTRING(h.SomeCsv, t.N+1, CHARINDEX(',', h.SomeCsv, t.N+1)-t.N-1))),
[Index] = ISNULL(t.N-DATALENGTH(REPLACE(LEFT(h.SomeCsv,t.N),',','')),0)
FROM dbo.Tally t
RIGHT OUTER JOIN --Necessary in case SomeCsv is NULL
(SELECT SomeID, SomeCsv = ','+SomeCsv+',', Length = LEN(SomeCSV)+2 FROM dbo.jbmTest) h
ON SUBSTRING(h.SomeCsv, t.N, 1) = ','
AND t.N < h.Length
--===== Add the Unique Clustered Index
CREATE UNIQUE INDEX UCIX_tmpMyHead_SomeID_Index
ON #MyHead (SomeID,[Index])
--===== Display the duration
PRINT CONVERT(CHAR(12),GETDATE()-@StartTime,114) + ' Duration (hh:mi:ss:mmm)'
--===== Check the first 100 rows...
SELECT TOP 100 *
ORDER BY SomeID,[Index]
For the phone number challenge... I'll let you produce the test data...
is pronounced "ree-bar
" and is a "Modenism
" for R
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
"If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
"Change is inevitable... change for the better is not."
When you put the right degree of spin on it, the number 3|8
is also a glyph that describes the nature of a DBAs job. 😉
How to post code problems
Create a Tally Function (fnTally)