SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


String splitter to table weird result


String splitter to table weird result

Author
Message
Digs
Digs
Ten Centuries
Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)

Group: General Forum Members
Points: 1195 Visits: 564
SQL 2005

Ok create a table



CREATE TABLE [dbo].[DIM_TEST](
[ID] [int] NULL,
[Data] [varchar](max) NULL
)

go





1) First row add a record [ID] say 100
2) In the [Data] field copy and paste data from DataSample.txt. (NOTE: Download 'SaveAs' to desktop to do this safely).
3) Down load fnDelimitedSplit, and install. Function is from this forum (NOTE: Download 'SaveAs' to desktop to do this safely [Lynn Petits developer, thanks to her])
4) I have inspected the sample data set, all the format is consistent, knock on wood!

Create another table to capture test results



CREATE TABLE [dbo].[DIM_TEST_1](
[ItemID] [int] NULL,
[Item] [varchar](500) NULL
)



Run this code



DELETE FROM dbo.DIM_TEST_1
go

DECLARE @Str AS VARCHAR(MAX)

SET @Str = (SELECT Data FROM dbo.DIM_TEST WHERE ID = 100)

INSERT INTO dbo.DIM_TEST_1
SELECT * FROM dbo.fnDelimitedSplit(@Str,'#')
go

SELECT * FROM dbo.DIM_TEST_1 ORDER BY ItemID DESC





So whats strange then
1) The data sample goes from 1928 to 2010, the SQL result is NOT
2) The records should be 20400 plus, final result set is 16000 or so

3) The splitting stops, no error report from SQL studio.

Questions
1) Why does it not split the whole datasample.txt
2) Why does it stop

Any ideas welcome...:-)
Attachments
DataSample.txt (68 views, 1.00 MB)
fnDelimiterSplit.txt (84 views, 2.00 KB)
Lowell
Lowell
SSC-Forever
SSC-Forever (48K reputation)SSC-Forever (48K reputation)SSC-Forever (48K reputation)SSC-Forever (48K reputation)SSC-Forever (48K reputation)SSC-Forever (48K reputation)SSC-Forever (48K reputation)SSC-Forever (48K reputation)

Group: General Forum Members
Points: 48296 Visits: 40594
Digs thanks; as always, you post everything someone might need to test the scenario. I really appreciate it.

this was interesting as heck; i got similar results when trying to use the string you posted directly as a delimited string in a statement, 16026 rows with yours function as well as another, slightly ITV functions; yet a third gave up at 157 rows. it returned results pretty quickly; under 1 minute.

what was weirder though, was when i stuck the value into a table instead of an inline statement, and then just ran a cross apply to get the list, it took a long time to run(more than 15 minutes!), but i got 20486 rows. I show that the data itself was more than 1M chars.

i suspect that a CTE would be extremely quick, as my text editor had no problem doing a find and replace on it so i could insert CRLf to see the #rows.

select MyAlias.Item from #temp cross apply [dbo].[DelimitedSplit](theval,'#') MyAlias

Lowell
--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!
Digs
Digs
Ten Centuries
Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)

Group: General Forum Members
Points: 1195 Visits: 564
Thanks for taking the time to look at this.

But I dont understand your advice.

What code changes do I make to fnDelimitedSplit to make it work ( hopefully in a few minutes rather thank 20 minutes) correctly ?

Or are you suggesting the DataSample text be formatted in a different way ??
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (154K reputation)SSC Guru (154K reputation)SSC Guru (154K reputation)SSC Guru (154K reputation)SSC Guru (154K reputation)SSC Guru (154K reputation)SSC Guru (154K reputation)SSC Guru (154K reputation)

Group: General Forum Members
Points: 154822 Visits: 41774
I got something about the use of an illegal operand in a TOP clause when I used Lynn's code. And, I'm sorry, I don't have the time to troubleshoot it.

When I used my function, it took 34 seconds and returned the correct number of rows. I spot checked the first 2, last 2 and a couple of rows in the middle and everything seems to be fine. A secondary split at the comma level took about 22 seconds and returned the correct number of rows and data, as well.

However, before I post such code, I have to tell you that splitting over a million charcters in 56 seconds is still horribly slow... Tally tables and MAX datatypes just don't get along and just using MAX makes the code twice as slow even if it fits within the 8k limit.

So let me ask... where does this data come from? Is it in a file somewhere because if it is, it'll take less than half a second to load the data in its final form with my old friend, BULK INSERT. For example, here's the BULK INSERT code I used on the example file given by the OP (stored in C:\Temp)


CREATE TABLE #MyHead
(
Col1 VARCHAR(10),
Col2 DATETIME,
Col3 DECIMAL(9,2),
Col4 DECIMAL(9,2),
Col5 DECIMAL(9,2),
Col6 DECIMAL(9,2),
Col7 BIGINT,
Col8 DECIMAL(9,2)
)
;
BULK INSERT #MyHead
FROM 'C:\Temp\DataSample.txt'
WITH (
CODEPAGE = 'RAW',
DATAFILETYPE = 'CHAR',
FIELDTERMINATOR = ',',
ROWTERMINATOR = '#',
TABLOCK
)
;




... and here's the CPU and duration stats for that tiny bit of nasty fast code...

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 3 ms.

SQL Server Execution Times:
CPU time = 437 ms, elapsed time = 475 ms.

(20485 row(s) affected)


And that's on an 8 year old desktop box. ;-)

If this data starts out in a file, then use BULK INSERT to load it. If the data DOESN'T start out in a file, consider putting it there before you load it.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Digs
Digs
Ten Centuries
Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)

Group: General Forum Members
Points: 1195 Visits: 564
I cant use BULK Insert as the client is asp.net site hosted on a shared server and dont allow CLR and it a pain to do a webclient pull then a save to folder then BULK INSERT into SQL table.

My code is trying to avoid the file save part of the data flow

.net webclient > SQL string > Store proc into table

But as you can see the splitter is dropping some rows.
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (154K reputation)SSC Guru (154K reputation)SSC Guru (154K reputation)SSC Guru (154K reputation)SSC Guru (154K reputation)SSC Guru (154K reputation)SSC Guru (154K reputation)SSC Guru (154K reputation)

Group: General Forum Members
Points: 154822 Visits: 41774
Ah... understood. Here's code to take your very long "parameter" and twist it into what I believe you really want for a final table... "we doan nid no stinken funksions". :-P

The details as to how this is done are in the comments, Digs.

--2D Split Combined with Refactor to Table

--===== Load the passed parameter
DECLARE @Parameter VARCHAR(MAX);
SET @Parameter = (SELECT Data FROM dbo.DIM_TEST);

--===== Suppress the auto-display of rowcounts to keep them from being
-- mistaken as part of the result set.
SET NOCOUNT ON;

--===== Add a start comma to the Parameter and change all "group"
-- delimiters to a comma so we can handle all the elements the same way.
SET @Parameter = ','+REPLACE(@Parameter,'#',',');

--===== Do the split with some row numbering
WITH -- This stuff makes an "inline" Tally "table"...
E1(N) AS ( --=== Create Ten 1's
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 --10
),
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000
E8(N) AS (SELECT 1 FROM E4 a, E4 b), --100,000,000
E16(N) AS (SELECT 1 FROM E8 a, E8 b), --10,000,000,000,000,000
cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) FROM E16),
cteSplit AS
( --=== This does the split and vertical stacking...
SELECT ROW_NUMBER() OVER (ORDER BY N)-1 AS RowNumber,
SUBSTRING(@Parameter,N+1,CHARINDEX(',',@Parameter,N+1)-N-1) AS Element
FROM cteTally
WHERE N < LEN(@Parameter)
AND SUBSTRING(@Parameter,N,1) = ',' --Notice how we find the comma
)
--==== ... and this puts the data back together in a fixed table format
-- using classic Cross-Tab code. It also converts columns that are
-- supposed to be numeric (Jeff Moden)
SELECT (RowNumber/8)+1 AS RowNumber,
CAST(MAX(CASE WHEN RowNumber%8 = 0 THEN Element END) AS VARCHAR(10)) AS [Symbol],
CAST(MAX(CASE WHEN RowNumber%8 = 1 THEN Element END) AS DATETIME) AS [Date],
CAST(MAX(CASE WHEN RowNumber%8 = 2 THEN Element END) AS DECIMAL(9,2)) AS [Open],
CAST(MAX(CASE WHEN RowNumber%8 = 3 THEN Element END) AS DECIMAL(9,2)) AS [High],
CAST(MAX(CASE WHEN RowNumber%8 = 4 THEN Element END) AS DECIMAL(9,2)) AS [Low],
CAST(MAX(CASE WHEN RowNumber%8 = 5 THEN Element END) AS DECIMAL(9,2)) AS [Close],
CAST(MAX(CASE WHEN RowNumber%8 = 6 THEN Element END) AS BIGINT) AS [Volume],
CAST(MAX(CASE WHEN RowNumber%8 = 7 THEN Element END) AS DECIMAL(9,2)) AS [AdjClose]
INTO dbo.JBMTest
FROM cteSplit
GROUP BY RowNumber/8;

SELECT TOP 10 * FROM dbo.JBMTest;
SELECT COUNT (*) FROM dbo.JBMTest;




--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Digs
Digs
Ten Centuries
Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)

Group: General Forum Members
Points: 1195 Visits: 564
Wow that took 21 sec on my machine.

Thanks Jeff. Will study this code!
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (154K reputation)SSC Guru (154K reputation)SSC Guru (154K reputation)SSC Guru (154K reputation)SSC Guru (154K reputation)SSC Guru (154K reputation)SSC Guru (154K reputation)SSC Guru (154K reputation)

Group: General Forum Members
Points: 154822 Visits: 41774
Heh... I have to say that no one can knock on you about getting back with a response fast enough. Thanks for the feedback, Digs.

I'll also say that most of us wouldn't have spent a minute on this problem if you hadn't posted the test data like you did and the usage for it like you did. Well done, Digs

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Digs
Digs
Ten Centuries
Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)

Group: General Forum Members
Points: 1195 Visits: 564
Jeff,

All works fine: And Fast !

I will keep your name on my list when I need to spend $$$ on my project .

Minutes have gone down to seconds !!:-):-D:-P;-)w00tCoolHehe
WayneS
WayneS
SSCoach
SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)

Group: General Forum Members
Points: 15567 Visits: 10635
Digs (5/1/2010)
Jeff,

All works fine: And Fast !

I will keep your name on my list when I need to spend $$$ on my project .

Minutes have gone down to seconds !!:-):-D:-P;-)w00tCoolHehe


Jeff must be slipping :-D... usually when he gets involved, minutes go down to milliseconds! w00t

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
Links: For better assistance in answering your questions, How to ask a question, Performance Problems, Common date/time routines,
CROSS-TABS and PIVOT tables Part 1 & Part 2, Using APPLY Part 1 & Part 2, Splitting Delimited Strings

Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search