February 14, 2012 at 2:19 am
Hi,
I am trying to loop thru a current dataset of 10million+ records.
I receive XML data in a text file. One column with 10million rows. I am then stripping out the tags to obtain the columns and values. I will then be pivotting this information. The only snag is when i import this info it obviously needs to groupd certain records together ie tags between <album> and </ablum> is associated with 1 record. I have scripted a cursor with will loop throug the info to obtain the first <ablum> and insert that info into a secondary table with the ID, ColumnName, and Value. It will then get the next record and will have to associate the same ID to the next record but with the new columnName and Value and so forth. Till it reaches the next <ablum> it will set ID + 1 and perform the same loop.
What is the best way to handle a process like this. Currently its run 2hrs30mins+ and i have stopped the query.
Please help?
February 14, 2012 at 2:27 am
Please post table definitions, sample data and desired output. Read this to see the best way to post this to get quick responses.
http://www.sqlservercentral.com/articles/Best+Practices/61537/
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 14, 2012 at 4:09 am
Hi,
These are the two tables being used:
--This one holds the inital imported data
CREATE TABLE
[dbo].[XMLLoading]
(
[ID] [int] IDENTITY(1,1) NOT NULL,
[ColumnSource] [varchar](8000) NULL
)
--This one is used to hold the new cleanedData with associated RecordID
CREATE TABLE
[dbo].[XMLPivotTable]
(
[RecordID] [int] NOT NULL,
[ColumnName] [varchar](1000) NULL,
[ItemName] [varchar](max) NULL
)
The Imported data into [dbo].[XMLLoading] looks something like this:
1<?xml version="1.0"?>
2<rimco>
3 <album>
4 <producttype>music</producttype>
6 <barcode>034571173344</barcode>
8 <barcodetype>UPC</barcodetype>
10 <origin>USA / Canada</origin>
12 <title>Complete Songs 2</title>
14 <artist>Johnson, Graham</artist>
16 <altartist>Graham Johnson</altartist>
18 <composer>Faure</composer>
20 <stock>CDA 67334</stock>
22 <coupling>CDA 67334</coupling>
24 <source>AMARANTH</source>
26 <label>Hyperion</label>
28 <media>CD</media>
30 <mediatype>CD</mediatype>
32 <mediacount>1</mediacount>
34 <releasedate>2005-03-31 00:00:00</releasedate>
36 <coding>None</coding>
I then clean up the info using the following script and insert into [dbo].[XMLPivotTable]:
DECLARE
@RecordIDINT
,@GetPreviousRecordINT
,@ColumnNameVARCHAR(1000)
,@ItemNameVARCHAR(MAX)
,@getRecordIDCURSOR
SET @getRecordID = CURSOR FOR
SELECT
CASE WHEN
charindex('</', ColumnSource) = 0
THEN NULL
ELSE SubString(ColumnSource, charindex('<', ColumnSource)+1, (charindex('>', ColumnSource)-1)-charindex('<', ColumnSource)) END AS ColumnName,
CASE WHEN
charindex('</', ColumnSource) = 0
THEN NULL
ELSE SubString(ColumnSource, charindex('>', ColumnSource)+1, (charindex('</', ColumnSource)-2)-charindex('>', ColumnSource)+1) END AS ItemName
FROM
dbo.XMLLoading WITH(NOLOCK)
WHERE
(CHARINDEX('</', ColumnSource) > CHARINDEX('<', ColumnSource)
ORColumnSource like '%<album>%')
OPEN @getRecordID
SET @RecordID = 0
FETCH NEXT FROM @getRecordID INTO @ColumnName, @ItemName
WHILE @@FETCH_STATUS = 0
BEGIN
IF @ColumnName IS NULL
SET @GetPreviousRecord = @RecordID + 1
INSERT INTO
XMLPivotTable
SELECT
@GetPreviousRecord AS RecordID
,@ColumnName AS ColumnName
,@ItemName AS ItemName
FETCH NEXT FROM @getRecordID INTO @ColumnName, @ItemName
SET @RecordID = @GetPreviousRecord
END
CLOSE @getRecordID
DEALLOCATE @getRecordID
--I also PIVOT this information dynamically.
--DECLARE
--@ColumnNamePvtVARCHAR(MAX)
--,@SQLNVARCHAR(MAX)
--
--
--SELECT
-- @ColumnNamePvt = Coalesce(@ColumnNamePvt +',['+ColumnName+']' , '['+ColumnName+']')
--FROM
--(
--SELECT DISTINCT
--ColumnName
--FROM
--XMLPivotTable
--WHERE
--ColumnName IS NOT NULL
--) AS A
--
--SET @sql =
--
--'SELECT RecordID, ' +
--@ColumnNamePvt +
--'FROM
--(
--SELECT
--RecordID,
--ColumnName,
--ItemName
--FROM
--XMLPivotTable
--WHERE
--ColumnName IS NOT NULL
--) AS a
--PIVOT
--(
--MAX(ItemName)
--FOR ColumnName IN (' + @ColumnNamePvt + ')
--) AS pvt'
--
--EXEC sp_executesql @sql
--PRINT @sql
This process is way to slow.
Any suggestions??
February 15, 2012 at 2:09 am
Hello,
you can try to do it without using cursors, that should speed up your process.
I think in a solution using intermediate tables, let me explain:
You coded a query to strip a row and obtain its columnName, itemName; use this query to populate a new table,
CREATE TABLE
[dbo].[IntermediateResults1]
([ID] [int] NOT NULL,
[ColumnName] [varchar](1000) NULL,
[ItemName] [varchar](max) NULL)
this table should contain
1, NULL, NULL
2, NULL, NULL
3, NULL, NULL
4, producttype, music
6, barcode, 034571173344
8, barcodetype, UPC
10, origin, USA / Canada
12, title, Complete Songs 2
14, artist, Johnson, Graham
16, altartist, Graham Johnson
18, composer, Faure
20, stock, CDA 67334
...
Add a column albumId to this table, the albumId to assign to a row is the previous row where columnName is NULL.
Go to create a new IntermediateAlbumId table:
SELECT ID as AlbumId
INTO IntermediateAlbumId
FROM IntermediateResults1
WHERE columnName IS NULL
Now there are some options to populate the column IntermediateResults1.AlbumId. That's one:
UPDATE IntermediateResults1 SET AlbumId =
(SELECT max(albumId)
FROM IntermediateAlbumId A
WHERE A.AlbumId <= Id)
WHERE albumId IS NULL
or, if it is to slow, iterate this query
UPDATE IntermediateResults1 SET AlbumId = newAlbumId FROM
(SELECT TOP 10000 Id,
(SELECT max(albumId)
FROM IntermediateAlbumId A
WHERE A.AlbumId <= Id) AS newAlbumId
FROM IntermediateResults1
WHERE albumId IS NULL)
I haven't checked these queries, please correct them.
Hope that this helps,
Francesc
February 15, 2012 at 10:33 am
Is there a reason to import the XML this way? If you were to use SSIS on the source XML file you wouldn't be having to play some outlandish games like you are now.
Even without that - using OPENXML would allow you to retrieve whole albums at a time, preserving the relations.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
February 16, 2012 at 1:04 am
Hi,
With SSIS 2005 you can only import 1 column at a time using the XML source component. i know 2008 handles the XML better so for now rather than creating many data flows into different temp tables then joining them all together again.
I was trying some other way until somebody could help me otherwise.
Please advise.
February 16, 2012 at 12:21 pm
ismailmallum (2/16/2012)
Hi,With SSIS 2005 you can only import 1 column at a time using the XML source component. i know 2008 handles the XML better so for now rather than creating many data flows into different temp tables then joining them all together again.
I was trying some other way until somebody could help me otherwise.
Please advise.
How are you using it? I took a mocked up model of the format you previewed, repeated the album element in there a dozen times, and ask it to import them, and it brought them in as 12 records (in 2005). No need to cycle through each element and re-pivot them: just define the album table, and point the XML source at that node and it will auto-parse the inner elements for you.
I've done a lot of these feeds in previous jobs, and it will chew through huge XML files in a few minutes.
Yes - 2008 allows me to play some extra games with the data, but it looks like 2005 can handle your file structure. Unless there's an extra snag that doesn't show in the sample you offered up.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
February 17, 2012 at 2:25 am
When i try importing using the XML source it gives me a message:
Error at Data Flow Task 1 [XML source [1]]: The XML Source Adapter does not support mixed content model on Complex Types.
What does this mean?
February 17, 2012 at 8:24 am
That means you have instances of elements with text AND sub-elements.
So - this would be an example:
<myBadNode>
This is a sample piece of text for <Context>Sample</Context> purposes.
</myBadNode>
This kind of content is tough to deal with because there's an ambiguity as to what you're supposed to do with the contents (it is one piece of info? 2? 3?).
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Viewing 9 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply