Parsing a summary / detail flat file

  • My team has a flat file from another system that we need to parse and import into 2 tables. We're currently importing this into a staging table then parsing the staging table RBAR style in WHILE loop. As you can imagine, this is causing us an extraordinary amount of pain. It's taking several days to process one file and the bigger they get...

    Each line on the file has a record id. There's 1 (the header), 2 (the vendor info), and 3 (the details). A vendor can have 1-N number of 3 records listed after it but (and here's the kicker) none of the detail records have any identifying information that connect it to the vendor info in record 2. The only way we know they are connect is by the order.

    Example Data:

    1MyFile05122013

    2VendorID123 StartDate Terminated AgreementNum AnotherCol AnotherCol2

    3ContractNum SaleDate Product Region Col1 Col2 Col3

    2VendorID456 StartDate Terminated AgreementNum AnotherCol AnotherCol2

    3ContractNum SaleDate Product Region Col1 Col2 Col3

    3ContractNum SaleDate Product Region Col1 Col2 Col3

    3ContractNum SaleDate Product Region Col1 Col2 Col3

    3ContractNum SaleDate Product Region Col1 Col2 Col3

    2VendorID789 StartDate Terminated AgreementNum AnotherCol AnotherCol2

    3ContractNum SaleDate Product Region Col1 Col2 Col3

    3ContractNum SaleDate Product Region Col1 Col2 Col3

    So all the 3 records contain all the details of the vendor that are above it. But again, there's no identifying information between the 3 records and the 2 records. To add to this burden, the report is a rolling 3 month report of all records, so we can't trucate the tables and start over from scratch because it's not inception to date. Also, we have to delete out the current month (though I'm working to get that changed).

    Our Staging table looks like this:

    CREATE TABLE [dbo].[Staging](

    [ID] [int] IDENTITY(1,1) NOT NULL,

    [Extract_Record] [varchar](2000) NULL,

    [UNID] [int] NULL

    ) ON [PRIMARY]

    GO

    Because we can't immediately tell what record is what and we don't want to mess up the order, we insert all the lines into the Extract_Record column and parse everything out later with substrings. We use UNID to create a unifying ID for all these records and insert them into their tables.

    Forgive me for scrubbing the heck out of all my column names, but I'm erroring on the side of protecting our business. Here's what our code looks like:

    BEGIN

    DECLARE @i INT,

    @max-2 INT,

    @rec SMALLINT,

    @newid INT,

    @TransactionID INT,

    @unid INT;

    --Below code sets unid to identify complete record sets

    SELECT @i=1,@max=MAX(id),@newid=0

    FROM dbo.Staging;

    WHILE @i<=@max

    BEGIN

    SELECT @rec=CONVERT(SMALLINT,LEFT(extract_RI,1))

    FROM dbo.Staging

    WHERE id=@i;

    IF @rec>1 AND @rec<4

    BEGIN

    IF @rec=2

    BEGIN

    SET @newid=@newid+1;

    UPDATE dbo.Staging

    SET unid=@newid

    WHERE id=@i;

    END

    ELSE

    BEGIN

    UPDATE dbo.Staging

    SET unid=@newid

    WHERE id=@i;

    END

    END

    SET @i=@i+1;

    END

    SELECT @i=1,@unid=NULL, @rec=NULL;

    WHILE @i<=@max

    BEGIN

    SELECT @unid=unid

    FROM dbo.Staging

    WHERE unid IS NOT NULL and id=@i;

    IF @unid IS NOT NULL

    BEGIN

    SELECT @rec=CONVERT(SMALLINT,LEFT(extract_RI,1))

    FROM dbo.Staging

    WHERE id=@i;

    IF @rec=2

    BEGIN

    --SELECT @ri=Substring(extract_RI, 2,10)

    --FROM dbo.Staging

    --WHERE id=@i;

    --Below code deletes current month data. Reporting team only requires previous months data

    WITH CurrentMonth AS

    (select * from dbo.Staging

    WHERE LTRIM(RTRIM(Substring(Extract_Record, 97,6))) = Substring(CONVERT(varchar,GETDATE(),112),1,6))

    DELETE FROM dbo.Staging FROM dbo.Staging ser

    INNER JOIN CurrentMonth cm

    ON ser.UNID = cm.UNID

    SELECT @TransactionID=@@IDENTITY;

    --Below code loads differential data from staging table to final reporting table

    INSERT INTO Summary (Col1,

    Col2,

    Col3,

    Col4,

    Col5,

    Col6,

    Col7,

    Col8,

    Col9

    )

    SELECT

    LTRIM(RTRIM(Substring(Extract_Record, 2,4))) AS Col1,

    LTRIM(RTRIM(Substring(Extract_Record, 6,10))) AS Col2,

    LTRIM(RTRIM(Substring(Extract_Record, 16,3))) AS Col3,

    LTRIM(RTRIM(Substring(Extract_Record, 19,20))) AS Col4,

    LTRIM(RTRIM(Substring(Extract_Record, 39,8))) AS Col5,

    LTRIM(RTRIM(Substring(Extract_Record, 47,10))) AS Col6,

    LTRIM(RTRIM(Substring(Extract_Record, 57,10))) AS Col7,

    LTRIM(RTRIM(Substring(Extract_Record, 67,30))) AS Col8,

    LTRIM(RTRIM(Substring(Extract_Record, 97,8))) AS Col9

    FROM dbo.Staging ri

    LEFT OUTER JOIN Summary rit

    ON LTRIM(RTRIM(Substring (ri.Extract_Record, 2,4))) = LTRIM(RTRIM(rit.Col1))

    AND LTRIM(RTRIM(Substring(ri.Extract_Record, 6,10))) = LTRIM(RTRIM(rit.Col2))

    AND LTRIM(RTRIM(Substring(ri.Extract_Record, 16,3))) = LTRIM(RTRIM(rit.Col3))

    AND LTRIM(RTRIM(Substring(ri.Extract_Record, 19,20)))= LTRIM(RTRIM(rit.Col4))

    AND LTRIM(RTRIM(Substring(ri.Extract_Record, 39,8))) = LTRIM(RTRIM(rit.Col5))

    AND LTRIM(RTRIM(Substring(ri.Extract_Record, 47,10)))= LTRIM(RTRIM(rit.Col6))

    AND LTRIM(RTRIM(Substring(ri.Extract_Record, 57,10)))= LTRIM(RTRIM(rit.Col7))

    AND LTRIM(RTRIM(Substring(ri.Extract_Record, 67,30)))= LTRIM(RTRIM(rit.Col8))

    AND LTRIM(RTRIM(Substring(ri.Extract_Record, 97,8)))= LTRIM(RTRIM(rit.Col9))

    WHERE id=@i

    AND rit.Col1 IS NULL

    AND rit.Col2 IS NULL

    AND rit.Col3 IS NULL

    AND rit.Col4 IS NULL

    AND rit.Col5 IS NULL

    AND rit.Col6 IS NULL

    AND rit.Col7 IS NULL

    AND rit.Col8 IS NULL

    AND rit.Col9 IS NULL;

    SELECT @TransactionID=@@IDENTITY;

    END

    ELSE IF @rec=3

    BEGIN

    --Below code loads differential data from staging table to final reporting table

    INSERT INTO Detail (TransactionID,

    Col1,

    Col2,

    Col3,

    Col4,

    Col5,

    Col6,

    Col7,

    Col8,

    Col9,

    Col10,

    Col11,

    Col12,

    Col13,

    Col14,

    Col15,

    Col16)

    SELECT @TransactionID,

    LTRIM(RTRIM(Substring(Extract_Record, 2,4))) AS Col1,

    LTRIM(RTRIM(Substring(Extract_Record, 6,4))) AS Col2,

    LTRIM(RTRIM(Substring(Extract_Record, 10,20))) AS Col3,

    LTRIM(RTRIM(Substring(Extract_Record, 30,8))) AS Col4,

    LTRIM(RTRIM(Substring(Extract_Record, 38,24))) AS Col5,

    LTRIM(RTRIM(Substring(Extract_Record, 62,8))) AS Col6,

    LTRIM(RTRIM(Substring(Extract_Record, 70,8))) AS Col7,

    LTRIM(RTRIM(Substring(Extract_Record, 78,24))) AS Col8,

    LTRIM(RTRIM(Substring(Extract_Record, 102,10))) AS Col9,

    LTRIM(RTRIM(Substring(Extract_Record, 112,4))) AS Col10,

    LTRIM(RTRIM(Substring(Extract_Record, 116,10))) AS Col11,

    LTRIM(RTRIM(Substring(Extract_Record, 126,4))) AS Col12,

    LTRIM(RTRIM(Substring(Extract_Record, 130,24))) AS Col13,

    LTRIM(RTRIM(Substring(Extract_Record, 154,3))) AS Col14,

    LTRIM(RTRIM(Substring(Extract_Record, 157,17))) AS Col15,

    LTRIM(RTRIM(Substring(Extract_Record, 174,5))) AS Col16

    FROM dbo.Staging ri2

    LEFT OUTER JOIN sap.tblRITransactionDetail rit2

    ON LTRIM(RTRIM(Substring(ri2.Extract_Record, 2,4))) = LTRIM(RTRIM(rit2.Col1))

    AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 6,4))) = LTRIM(RTRIM(rit2.Col2))

    AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 10,20))) = LTRIM(RTRIM(rit2.Col3))

    AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 30,8))) = LTRIM(RTRIM(rit2.Col4))

    AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 38,24)))= LTRIM(RTRIM(rit2.Col5))

    AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 62,8))) = LTRIM(RTRIM(rit2.Col6))

    AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 70,8))) = LTRIM(RTRIM(rit2.Col7))

    AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 78,24))) = LTRIM(RTRIM(rit2.Col8))

    AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 102,10))) = LTRIM(RTRIM(rit2.Col9))

    AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 112,4))) = LTRIM(RTRIM(rit2.Col10))

    AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 116,10))) = LTRIM(RTRIM(rit2.Col11))

    AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 126,4))) = LTRIM(RTRIM(rit2.Col12))

    AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 130,24))) = LTRIM(RTRIM(rit2.Col13))

    AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 154,3))) = LTRIM(RTRIM(rit2.Col14))

    AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 157,17)))= LTRIM(RTRIM(rit2.Col15))

    AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 174,5))) = LTRIM(RTRIM(rit2.Col16))

    WHERE id=@i

    AND rit2.Col1 IS NULL

    AND rit2.Col2 IS NULL

    AND rit2.Col3 IS NULL

    AND rit2.Col4 IS NULL

    AND rit2.Col5 IS NULL

    AND rit2.Col6 IS NULL

    AND rit2.Col7 IS NULL

    AND rit2.Col8 IS NULL

    AND rit2.Col9 IS NULL

    AND rit2.Col10 IS NULL

    AND rit2.Col11 IS NULL

    AND rit2.Col12 IS NULL

    AND rit2.Col13 IS NULL

    AND rit2.Col14 IS NULL

    AND rit2.Col15 IS NULL

    AND rit2.Col16 IS NULL;

    END

    END

    SET @i=@i+1;

    END

    END

    So, messy, slow and very very annoying. I've got a few thoughts on how we can start to fix it, but before I share I wanted to see what everyone else was thinking. I'm hoping someone has an ephiphany that can help.

    So, any ideas?

    Just an FYI: there's no fixing the input file in our immediate future. Just getting this much information was like pulling teeth and the other team takes months upon months (if even that soon) to work changes through their SDLC. So I have to do what I can to mitigate the load issue now with the file that I have.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Can you use SSIS?

    I've done something similar recently with SSIS, so if you can use it I think you have a good option.

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

  • Also, could you mock-up some sample data and provide a destination for the data? Could help figure out how to approach the problem.

  • Stefan Krzywicki (5/13/2013)


    Can you use SSIS?

    I've done something similar recently with SSIS, so if you can use it I think you have a good option.

    Yes, we can use SSIS. In fact we're using SSIS to load the file into our staging table and run the proc that does the update.

    I've got a few meetings, but after those I'll dummy up some data and sanitze the destination tables and post them in a bit.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Brandie Tarvin (5/13/2013)


    Stefan Krzywicki (5/13/2013)


    Can you use SSIS?

    I've done something similar recently with SSIS, so if you can use it I think you have a good option.

    Yes, we can use SSIS. In fact we're using SSIS to load the file into our staging table and run the proc that does the update.

    I've got a few meetings, but after those I'll dummy up some data and sanitze the destination tables and post them in a bit.

    I have two files I need to process every week that have a similar format. Descriptive lines that I don't need to import and multiple header rows that are interspersed with corresponding data rows. I use a conditional split to check the line type and divert it to the proper parsing and insert, which I think you do with the IF @rec=2 line.

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

  • Duh. Forgot there is a record 4 line which is the last line of the file. Anyway, sample data attached and CREATE TABLE code as follows:

    /****** Object: Table [dbo].[Detail] Script Date: 05/13/2013 10:56:50 ******/

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Detail]') AND type in (N'U'))

    DROP TABLE [dbo].[Detail]

    GO

    /****** Object: Table [dbo].[Summary] Script Date: 05/13/2013 10:56:50 ******/

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Summary]') AND type in (N'U'))

    DROP TABLE [dbo].[Summary]

    GO

    /****** Object: Table [dbo].[Detail] Script Date: 05/13/2013 10:56:51 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    SET ANSI_PADDING ON

    GO

    CREATE TABLE [dbo].[Detail](

    [TransactionDetailID] [int] IDENTITY(1,1) NOT NULL,

    [TransactionID] [int] NULL,

    [Col1] [varchar](4) NOT NULL,

    [Col2] [varchar](4) NOT NULL,

    [Col3] [varchar](20) NULL,

    [Col4] [varchar](8) NULL,

    [Col5] [varchar](24) NULL,

    [Col6] [varchar](8) NULL,

    [Col7] [varchar](8) NULL,

    [Col8] [varchar](24) NOT NULL,

    [Col9] [varchar](10) NOT NULL,

    [Col10] [varchar](4) NOT NULL,

    [Col11] [varchar](10) NOT NULL,

    [Col12] [varchar](4) NOT NULL,

    [Col13] [varchar](24) NOT NULL,

    [Col14] [varchar](3) NOT NULL,

    [Col15] [varchar](17) NOT NULL,

    [Col16] [varchar](5) NOT NULL,

    CONSTRAINT [PK_Detail_TransactionDetailID] PRIMARY KEY NONCLUSTERED

    (

    [TransactionDetailID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    SET ANSI_PADDING OFF

    GO

    /****** Object: Table [dbo].[Summary] Script Date: 05/13/2013 10:56:51 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    SET ANSI_PADDING ON

    GO

    CREATE TABLE [dbo].[Summary](

    [TransactionID] [int] IDENTITY(1,1) NOT NULL,

    [Col1] [varchar](4) NOT NULL,

    [Col2] [varchar](10) NOT NULL,

    [Col3] [varchar](3) NOT NULL,

    [Col4] [varchar](20) NOT NULL,

    [Col5] [varchar](8) NOT NULL,

    [Col6] [varchar](10) NOT NULL,

    [Col7] [varchar](10) NOT NULL,

    [Col8] [varchar](30) NULL,

    [Col9] [varchar](8) NULL,

    CONSTRAINT [PK_Summary_TransactionID] PRIMARY KEY NONCLUSTERED

    (

    [TransactionID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    SET ANSI_PADDING OFF

    GO

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Brandie - just to verify... you are using SQL 2008? (I know you posted it in this forum, but we both know that folks sometimes make mistakes...)

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • WayneS (5/13/2013)


    Brandie - just to verify... you are using SQL 2008? (I know you posted it in this forum, but we both know that folks sometimes make mistakes...)

    Yes, I am using 2008. Plain vanilla Enterprise Edition 2008.

    NOT R2, BTW.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Yo Jeff, sounds like a candidate for the "Quirky Update"?

    Robyn Page's Article[/url]

    Jeff Moden's Article[/url]

    I added a second UNID column and ran this:

    DECLARE @RollingCounterLevel1 int

    , @RollingCounterLevel2 int

    UPDATE Staging

    SET @RollingCounterLevel1 = UNID = CASE LEFT(Extract_Record, 1) WHEN '1' THEN ID ELSE @RollingCounterLevel1 END

    , @RollingCounterLevel2 = UNID2 = CASE LEFT(Extract_Record, 1) WHEN '1' THEN NULL WHEN 2 THEN ID ELSE @RollingCounterLevel2 END

    It gave me the correct 1st and 2nd level IDs, you might need to tweak it depending if you want the 2nd level value to be itself or the 1st level value.

    As I remember there are several prerequisites to make sure the Quirky Update works (hints, parallelism, indexes, etc.) and MS doesn't "guarantee" it will always work, but Jeff couldn't find a situation where it didn't work as he expected.

    Chad

  • Brandie,

    I couldn't use your sample data because the text file formatting seemed to be inconsistent from line to line, so I made up my own. This solution assumes that you can get the rows into a table in the correct order, parse the rows into columns, and once you can assign an identifying number to the rows that belong together, you can do what you need to do with them. This takes three full scans of the table to work, but with some divide-and-conquer (temp tables instead of CTEs, etc.) supported by proper indexing, you may be able to get it to perform acceptably.

    In the output of this code, rows with the same listNbr belong together.

    -- Please remove the extraneous letter "x" from the CREATE keyword. My company filters internet traffic with certain T-SQL keywords in it.

    CxREATE TABLE dbo.fileParseTest (rowNbr INT IDENTITY(1,1), rowType int, data varchar(20))

    INSERT INTO dbo.fileParseTest(rowType, data)

    VALUES (1, 'HEADER')

    ,(2, 'VENDOR123')

    ,(3, 'CONTRACTX')

    ,(3, 'CONTRACTY')

    ,(3, 'CONTRACT1')

    ,(3, 'CONTRACT2')

    ,(3, 'CONTRACT3')

    ,(3, 'CONTRACT4')

    ,(3, 'CONTRACT5')

    ,(3, 'CONTRACT6')

    ,(3, 'CONTRACT7')

    ,(3, 'CONTRACT8')

    ,(3, 'CONTRACT9')

    ,(2, 'VENDOR456')

    ,(3, 'CONTRACTZ')

    ,(2, 'VENDOR789')

    ,(3, 'CONTRACTA')

    ,(3, 'CONTRACTB')

    ,(2, 'VENDORXYZ')

    ,(3, 'CONTRACT1')

    ;

    WITH cteA AS (

    SELECT ROW_NUMBER() OVER (ORDER BY rowNbr) as listNbr

    ,rowNbr

    FROM dbo.fileParseTest

    WHERE rowType = 2

    )

    ,cteB AS (

    SELECT c1.listNbr, c1.rowNbr AS rowNbrStart, ISNULL(c2.rowNbr - 1, 100000) AS rowNbrEnd

    FROM cteA c1

    LEFT OUTER JOIN cteA c2

    ON c1.listNbr + 1 = c2.listNbr

    )

    SELECT b.listNbr, fpt.rowNbr, fpt.rowType, fpt.data

    FROM cteB b

    LEFT OUTER JOIN dbo.fileParseTest fpt

    ON fpt.rowNbr BETWEEN b.rowNbrStart and b.rowNbrEnd

    listNbrrowTypedata

    12VENDOR123

    13CONTRACTX

    13CONTRACTY

    13CONTRACT1

    13CONTRACT2

    13CONTRACT3

    13CONTRACT4

    13CONTRACT5

    13CONTRACT6

    13CONTRACT7

    13CONTRACT8

    13CONTRACT9

    22VENDOR456

    23CONTRACTZ

    32VENDOR789

    33CONTRACTA

    33CONTRACTB

    42VENDORXYZ

    43CONTRACT1

    Jason Wolfkill

  • Chad Crawford (5/13/2013)


    Yo Jeff, sounds like a candidate for the "Quirky Update"?

    I don't think that the QU is necessary here, though it would work.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Brandie Tarvin (5/13/2013)


    So, any ideas?

    Brandie, Try this out:

    -- Let's add a column to the staging table to make deletes easier. You could even index this if you want to.

    -- However, this isn't necessary.

    IF NOT EXISTS (SELECT * FROM sys.columns WHERE object_id = OBJECT_ID('dbo.Staging','U') AND name = 'DateField')

    ALTER TABLE dbo.Staging ADD DateField DATE;

    -- this is necessary only to prevent any other updates to the Summary table until all of the inserts have taken place.

    SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

    -- get the vendor records assigned first, and assign sequential numbers for the UNID

    WITH cte AS

    (

    SELECT ID, UNID, Extract_Record, DateField, ROW_NUMBER() OVER (ORDER BY ID) AS RN

    FROM dbo.Staging

    WHERE Extract_Record LIKE '2%' -- get the vendor data

    )

    UPDATE cte

    SET UNID = RN,

    DateField=LTRIM(RTRIM(Substring(Extract_Record, 97,6)))+'01';

    -- now assign the details to the same UNID for the vendor record < this current record

    -- if you don't want to add the DateField column to the Staging table, you can remove

    -- the 2nd CROSS APPLY and the update to that column.

    UPDATE t1

    SET UNID = t2.UNID,

    DateField = t3.DateField

    FROM dbo.Staging t1

    CROSS APPLY (SELECT MAX(UNID) FROM dbo.Staging WHERE ID < t1.ID) t2(UNID)

    CROSS APPLY (SELECT DateField FROM dbo.Staging WHERE UNID = t2.UNID) t3

    WHERE t1.Extract_Record LIKE '3%';

    -- delete the current month:

    -- Again, if you don't want to use the DateField column, just use your existing delete statement

    -- DELETE FROM dbo.Staging where DateField = CONVERT(DATETIME, CONVERT(char(6),GETDATE(),112) + '01')

    BEGIN TRANSACTION

    DECLARE @max-2 INT;

    SELECT @max-2=ISNULL(MAX(TransactionID),0) FROM dbo.Summary;

    -- the repeatable read prevents any other transaction from inserting data that would change the max setting

    -- I need to control the value of the identity column, so that I can associate the identity value with the UNID value for the summary record.

    -- The association is the @max-2 from above with the UNID already calculated for this record.

    SET IDENTITY_INSERT dbo.Summary ON;

    INSERT INTO dbo.Summary( TransactionID, Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8,Col9)

    SELECT @max-2+UNID,

    LTRIM(RTRIM(Substring(Extract_Record, 2,4))) AS Col1,

    LTRIM(RTRIM(Substring(Extract_Record, 6,10))) AS Col2,

    LTRIM(RTRIM(Substring(Extract_Record, 16,3))) AS Col3,

    LTRIM(RTRIM(Substring(Extract_Record, 19,20))) AS Col4,

    LTRIM(RTRIM(Substring(Extract_Record, 39,8))) AS Col5,

    LTRIM(RTRIM(Substring(Extract_Record, 47,10))) AS Col6,

    LTRIM(RTRIM(Substring(Extract_Record, 57,10))) AS Col7,

    LTRIM(RTRIM(Substring(Extract_Record, 67,30))) AS Col8,

    LTRIM(RTRIM(Substring(Extract_Record, 97,8))) AS Col9

    FROM dbo.Staging ri

    LEFT OUTER JOIN Summary rit

    ON LTRIM(RTRIM(Substring (ri.Extract_Record, 2,4))) = LTRIM(RTRIM(rit.Col1))

    AND LTRIM(RTRIM(Substring(ri.Extract_Record, 6,10))) = LTRIM(RTRIM(rit.Col2))

    AND LTRIM(RTRIM(Substring(ri.Extract_Record, 16,3))) = LTRIM(RTRIM(rit.Col3))

    AND LTRIM(RTRIM(Substring(ri.Extract_Record, 19,20)))= LTRIM(RTRIM(rit.Col4))

    AND LTRIM(RTRIM(Substring(ri.Extract_Record, 39,8))) = LTRIM(RTRIM(rit.Col5))

    AND LTRIM(RTRIM(Substring(ri.Extract_Record, 47,10)))= LTRIM(RTRIM(rit.Col6))

    AND LTRIM(RTRIM(Substring(ri.Extract_Record, 57,10)))= LTRIM(RTRIM(rit.Col7))

    AND LTRIM(RTRIM(Substring(ri.Extract_Record, 67,30)))= LTRIM(RTRIM(rit.Col8))

    AND LTRIM(RTRIM(Substring(ri.Extract_Record, 97,8)))= LTRIM(RTRIM(rit.Col9))

    WHERE rit.TransactionID IS NULL

    -- TransactionID is a PK -->> NOT NULL. A null here means that no matching record was found, so it's okay to insert this record.

    -- No need to compare every column.

    AND ri.Extract_Record LIKE '2%'; -- just the "2" records

    -- don't forget to turn IDENTITY INSERT off!

    SET IDENTITY_INSERT dbo.Summary OFF;

    INSERT INTO Detail (TransactionID,

    Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8,

    Col9,Col10,Col11,Col12,Col13,Col14,Col15,Col16)

    SELECT @max-2+UNID, -- since we already have the UNID matching in the table, and we inserted UNID+@Max into the summary, use this same calc for the detail.

    LTRIM(RTRIM(Substring(Extract_Record, 2,4))) AS Col1,

    LTRIM(RTRIM(Substring(Extract_Record, 6,4))) AS Col2,

    LTRIM(RTRIM(Substring(Extract_Record, 10,20))) AS Col3,

    LTRIM(RTRIM(Substring(Extract_Record, 30,8))) AS Col4,

    LTRIM(RTRIM(Substring(Extract_Record, 38,24))) AS Col5,

    LTRIM(RTRIM(Substring(Extract_Record, 62,8))) AS Col6,

    LTRIM(RTRIM(Substring(Extract_Record, 70,8))) AS Col7,

    LTRIM(RTRIM(Substring(Extract_Record, 78,24))) AS Col8,

    LTRIM(RTRIM(Substring(Extract_Record, 102,10))) AS Col9,

    LTRIM(RTRIM(Substring(Extract_Record, 112,4))) AS Col10,

    LTRIM(RTRIM(Substring(Extract_Record, 116,10))) AS Col11,

    LTRIM(RTRIM(Substring(Extract_Record, 126,4))) AS Col12,

    LTRIM(RTRIM(Substring(Extract_Record, 130,24))) AS Col13,

    LTRIM(RTRIM(Substring(Extract_Record, 154,3))) AS Col14,

    LTRIM(RTRIM(Substring(Extract_Record, 157,17))) AS Col15,

    LTRIM(RTRIM(Substring(Extract_Record, 174,5))) AS Col16

    FROM dbo.Staging ri2

    LEFT OUTER JOIN dbo.Detail rit2

    ON LTRIM(RTRIM(Substring(ri2.Extract_Record, 2,4))) = LTRIM(RTRIM(rit2.Col1))

    AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 6,4))) = LTRIM(RTRIM(rit2.Col2))

    AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 10,20))) = LTRIM(RTRIM(rit2.Col3))

    AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 30,8))) = LTRIM(RTRIM(rit2.Col4))

    AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 38,24)))= LTRIM(RTRIM(rit2.Col5))

    AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 62,8))) = LTRIM(RTRIM(rit2.Col6))

    AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 70,8))) = LTRIM(RTRIM(rit2.Col7))

    AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 78,24))) = LTRIM(RTRIM(rit2.Col8))

    AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 102,10))) = LTRIM(RTRIM(rit2.Col9))

    AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 112,4))) = LTRIM(RTRIM(rit2.Col10))

    AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 116,10))) = LTRIM(RTRIM(rit2.Col11))

    AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 126,4))) = LTRIM(RTRIM(rit2.Col12))

    AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 130,24))) = LTRIM(RTRIM(rit2.Col13))

    AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 154,3))) = LTRIM(RTRIM(rit2.Col14))

    AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 157,17)))= LTRIM(RTRIM(rit2.Col15))

    AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 174,5))) = LTRIM(RTRIM(rit2.Col16))

    WHERE rit2.TransactionDetailID IS NULL

    -- TransactionDetailID is PK -->> NOT NULL. Null here means no matching record was found, so it's okay to insert.

    -- No need to compare every column

    AND ri2.Extract_Record LIKE '3%' -- just the "3" rows

    COMMIT TRANSACTION

    SELECT * FROM dbo.Staging ORDER BY ID;

    SELECT * FROM dbo.Summary;

    SELECT * FROM dbo.Detail;

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • wolfkillj (5/13/2013)


    I couldn't use your sample data because the text file formatting seemed to be inconsistent from line to line

    Exactly. If you check my first post, you'll see where I made that point. That's why we import all rows into a single column in the staging table to begin with.

    I need a solution that addresses my current setup, including the inconsistent data.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Thanks everyone for the responses. We're going to start checking them now. I'm glad I asked the question as this was not the direction I had been heading in with my thoughts.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Brandie Tarvin (5/13/2013)


    wolfkillj (5/13/2013)


    I couldn't use your sample data because the text file formatting seemed to be inconsistent from line to line

    Exactly. If you check my first post, you'll see where I made that point. That's why we import all rows into a single column in the staging table to begin with.

    I need a solution that addresses my current setup, including the inconsistent data.

    Handling the inconsistent data is my favorite part of doing it in SSIS, but the other approaches put forward here look good too.

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply