Performance Problems converting data into columns

  • Hello,

    I am in need of some help with an SSIS package (SQL 2008). I am trying to extract data out of a progress database and provide a reporting database for our users.

    The area I am having some trouble is with some tables which have columns where data is stored and split by a ‘;’ . This data in the system actually represents different fields in the application so the end game is to have these in their own columns for people to select from.

    Example of table:

    GroupNumber Group_id GroupName AddressLine1 AddressLine2 AddressLine3 Postcode DataColumn

    Group1 0001 TestGroup Capital letter Ltd Test Street Test Town XX1 22A 00-00-00;112115;TestBank

    GROUP2 0002 TestGroup2 Bad letter Ltd Test2 Street Test2 Town XX2 11B 11-00-55;522445;TestBank2

    The way the data needs to end up is to have the data column split out into:

    Sortcode AccountNumber BankName

    00-00-00 112115 TestBank

    11-00-55 522445 TestBank2

    The issue I am having is performance of the methods I have tried so far: the DataColum has up to 200 ‘columns’ within it. This is fine over a few records <100 but when it performs across >40,000 then I am starting to notice some serious performance problems results are taking as long as 8 minutes. If I only split 10 out of 100 columns it performs much better over 40,000 rows than splitting them all.

    Example data:

    CREATE TABLE GroupTable

    (

    GroupNumber VARCHAR(6) NULL

    ,Group_ID INT

    ,GroupName VARCHAR(10) NULL

    ,AddressLine1 VARCHAR(50) NULL

    ,AddressLine2 VARCHAR(50) NULL

    ,AddressLine3 VARCHAR(50) NULL

    ,Postcode VARCHAR(8) NULL

    ,Datacolumn VARCHAR(300)

    );

    INSERT INTO GroupTable (GroupNumber,Group_ID,GroupName,AddressLine1,AddressLine2,AddressLine3,Postcode,datacolumn)

    VALUES

    ('Group1',1,'testgroup','Capital letter Ltd','1 Test Street','Test Town','XX1 22A','00-00-00;112115;TestBank'),

    ('Group2',2,'testgroup2','Low letter Ltd','2 Test2 Street','Test2 Town','XX2 11B','11-00-55;522445;TestBank2'),

    ('Group1',1,'testgroup','Capital letter Ltd','1 Test Street','Test Town','XX1 22A','10-10-10;112115;TestBank'),

    ('Group2',2,'testgroup2','Low letter Ltd','2 Test2 Street','Test2 Town','XX2 11B','11-00-00;522445;TestBank2')

    I originally started using the following function and it was working ok until I got to pull larger amounts of data (and adding in more columns to split): (this was in an old post on these forums)

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    ALTER FUNCTION [dbo].[FromProgressArray] (@array NVARCHAR(4000), @index INT)

    RETURNS NVARCHAR(256)

    AS

    BEGIN

    IF @index = 0

    RETURN( LEFT(@array, CHARINDEX(';', @array) - 1) )

    DECLARE @counter INT

    SELECT @counter = 0

    WHILE @counter +1 < @index

    BEGIN

    IF (CHARINDEX(';', @array) <> 0)

    SELECT @array = SUBSTRING(@array, CHARINDEX(';', @array) + 1, LEN(@array))

    ELSE

    SELECT @array = ''

    SELECT @counter = @counter + 1

    END

    IF CHARINDEX(';', @array) <> 0

    SELECT @array = LEFT(@array, CHARINDEX(';', @array) - 1)

    RETURN( @array )

    END

    Example query:

    SELECT

    GroupNumber

    ,Group_ID

    ,GroupName

    ,AddressLine1

    ,AddressLine2

    ,AddressLine3

    ,Postcode

    ,dbo.FromProgressArray(datacolumn,1) AS sortcode

    ,dbo.FromProgressArray(datacolumn,2) AS AccountNumber

    ,dbo.FromProgressArray(datacolumn,3) AS BankName

    FROM

    dbo.GroupTable

    I have also tried to use the tally table method mentioned by Jeff Moden: http://www.sqlservercentral.com/articles/T-SQL/62867/

    Although this performs 50% quicker I have a problem here transposing the rows into columns.

    I was using the function DelimitedSplit8K

    Example query:

    SELECT

    GroupNumber

    ,GroupName

    ,AddressLine1

    ,AddressLine2

    ,AddressLine3

    ,Postcode

    ,MAX(CASE WHEN dsk.ItemNumber = 1 THEN dsk.Item END) AS SortCode

    ,MAX(CASE WHEN dsk.ItemNumber = 2 THEN dsk.Item END) AS AccountNumber

    ,MAX(CASE WHEN dsk.ItemNumber = 3 THEN dsk.Item END) AS BankName

    FROM

    GroupTable

    CROSS APPLY dbo.DelimitedSplit8K(datacolumn, ';') dsk

    GROUP BY

    GroupNumber

    ,GroupName

    ,AddressLine1

    ,AddressLine2

    ,AddressLine3

    ,Postcode

    The problem here is that the group record can have multiple entries in the table and the only way to distinguish the differences is once the datacolumn has been split out. This causes problems when using the above method as the grouping messes it up.

    As you can see only two results are shown not the 4 that should be. If you add in dsk.itemnumber to the grouping the case ceases to work. I will also point out that the datacolumn can have no data between the ; .In the event it finds one it should add in a blank or NULL entry.

    I have also looked into using the derived column feature but am not sure about the best way of doing this and haven’t found that much when using search engines (I maybe using the wrong keywords though).

    Could someone tell me which way should be more efficient? Using a function or derived column in SSIS and assist with getting me on the right road so to say?

    Many Thanks

  • OK so I have actually got this to a reasonable preformance time.

    Using the delimetedsplit8k works great. this issue was with my query:

    If you use a derrived table to like below you can produce it for the number of columns you need and the correct number of rows:

    SELECT

    GroupNumber

    ,GroupName

    ,AddressLine1

    ,AddressLine2

    ,AddressLine3

    ,Postcode

    ,dsk.item AS sortcode

    ,dsk2.item AS AccountNumber

    ,dsk3.item AS BankName

    FROM

    GroupTable

    CROSS APPLY (SELECT item FROM dbo.DelimitedSplit8K(datacolumn, ';' ) WHERE dbo.DelimitedSplit8K.ItemNumber = 1) dsk

    CROSS APPLY (SELECT item FROM dbo.DelimitedSplit8K(datacolumn, ';' ) WHERE dbo.DelimitedSplit8K.ItemNumber = 2) dsk2

    CROSS APPLY (SELECT item FROM dbo.DelimitedSplit8K(datacolumn, ';' ) WHERE dbo.DelimitedSplit8K.ItemNumber = 3) dsk3

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply