SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Parsing a summary / detail flat file


Parsing a summary / detail flat file

Author
Message
WayneS
WayneS
SSC-Insane
SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)

Group: General Forum Members
Points: 21868 Visits: 10653
Chad Crawford (5/13/2013)
Yo Jeff, sounds like a candidate for the "Quirky Update"?

I don't think that the QU is necessary here, though it would work.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
Links: For better assistance in answering your questions, How to ask a question, Performance Problems, Common date/time routines,
CROSS-TABS and PIVOT tables Part 1 & Part 2, Using APPLY Part 1 & Part 2, Splitting Delimited Strings

WayneS
WayneS
SSC-Insane
SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)

Group: General Forum Members
Points: 21868 Visits: 10653
Brandie Tarvin (5/13/2013)

So, any ideas?


Brandie, Try this out:

-- Let's add a column to the staging table to make deletes easier. You could even index this if you want to.
-- However, this isn't necessary.
IF NOT EXISTS (SELECT * FROM sys.columns WHERE object_id = OBJECT_ID('dbo.Staging','U') AND name = 'DateField')
ALTER TABLE dbo.Staging ADD DateField DATE;

-- this is necessary only to prevent any other updates to the Summary table until all of the inserts have taken place.
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

-- get the vendor records assigned first, and assign sequential numbers for the UNID
WITH cte AS
(
SELECT ID, UNID, Extract_Record, DateField, ROW_NUMBER() OVER (ORDER BY ID) AS RN
FROM dbo.Staging
WHERE Extract_Record LIKE '2%' -- get the vendor data
)
UPDATE cte
SET UNID = RN,
DateField=LTRIM(RTRIM(Substring(Extract_Record, 97,6)))+'01';


-- now assign the details to the same UNID for the vendor record < this current record
-- if you don't want to add the DateField column to the Staging table, you can remove
-- the 2nd CROSS APPLY and the update to that column.
UPDATE t1
SET UNID = t2.UNID,
DateField = t3.DateField
FROM dbo.Staging t1
CROSS APPLY (SELECT MAX(UNID) FROM dbo.Staging WHERE ID < t1.ID) t2(UNID)
CROSS APPLY (SELECT DateField FROM dbo.Staging WHERE UNID = t2.UNID) t3
WHERE t1.Extract_Record LIKE '3%';


-- delete the current month:
-- Again, if you don't want to use the DateField column, just use your existing delete statement
-- DELETE FROM dbo.Staging where DateField = CONVERT(DATETIME, CONVERT(char(6),GETDATE(),112) + '01')


BEGIN TRANSACTION
DECLARE @MAX INT;
SELECT @MAX=ISNULL(MAX(TransactionID),0) FROM dbo.Summary;
-- the repeatable read prevents any other transaction from inserting data that would change the max setting

-- I need to control the value of the identity column, so that I can associate the identity value with the UNID value for the summary record.
-- The association is the @MAX from above with the UNID already calculated for this record.
SET IDENTITY_INSERT dbo.Summary ON;
INSERT INTO dbo.Summary( TransactionID, Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8,Col9)
SELECT @MAX+UNID,
LTRIM(RTRIM(Substring(Extract_Record, 2,4))) AS Col1,
LTRIM(RTRIM(Substring(Extract_Record, 6,10))) AS Col2,
LTRIM(RTRIM(Substring(Extract_Record, 16,3))) AS Col3,
LTRIM(RTRIM(Substring(Extract_Record, 19,20))) AS Col4,
LTRIM(RTRIM(Substring(Extract_Record, 39,8))) AS Col5,
LTRIM(RTRIM(Substring(Extract_Record, 47,10))) AS Col6,
LTRIM(RTRIM(Substring(Extract_Record, 57,10))) AS Col7,
LTRIM(RTRIM(Substring(Extract_Record, 67,30))) AS Col8,
LTRIM(RTRIM(Substring(Extract_Record, 97,8))) AS Col9
FROM dbo.Staging ri
LEFT OUTER JOIN Summary rit
ON LTRIM(RTRIM(Substring (ri.Extract_Record, 2,4))) = LTRIM(RTRIM(rit.Col1))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 6,10))) = LTRIM(RTRIM(rit.Col2))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 16,3))) = LTRIM(RTRIM(rit.Col3))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 19,20)))= LTRIM(RTRIM(rit.Col4))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 39,8))) = LTRIM(RTRIM(rit.Col5))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 47,10)))= LTRIM(RTRIM(rit.Col6))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 57,10)))= LTRIM(RTRIM(rit.Col7))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 67,30)))= LTRIM(RTRIM(rit.Col8))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 97,8)))= LTRIM(RTRIM(rit.Col9))
WHERE rit.TransactionID IS NULL
-- TransactionID is a PK -->> NOT NULL. A null here means that no matching record was found, so it's okay to insert this record.
-- No need to compare every column.
AND ri.Extract_Record LIKE '2%'; -- just the "2" records
-- don't forget to turn IDENTITY INSERT off!
SET IDENTITY_INSERT dbo.Summary OFF;

INSERT INTO Detail (TransactionID,
Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8,
Col9,Col10,Col11,Col12,Col13,Col14,Col15,Col16)
SELECT @MAX+UNID, -- since we already have the UNID matching in the table, and we inserted UNID+@Max into the summary, use this same calc for the detail.
LTRIM(RTRIM(Substring(Extract_Record, 2,4))) AS Col1,
LTRIM(RTRIM(Substring(Extract_Record, 6,4))) AS Col2,
LTRIM(RTRIM(Substring(Extract_Record, 10,20))) AS Col3,
LTRIM(RTRIM(Substring(Extract_Record, 30,8))) AS Col4,
LTRIM(RTRIM(Substring(Extract_Record, 38,24))) AS Col5,
LTRIM(RTRIM(Substring(Extract_Record, 62,8))) AS Col6,
LTRIM(RTRIM(Substring(Extract_Record, 70,8))) AS Col7,
LTRIM(RTRIM(Substring(Extract_Record, 78,24))) AS Col8,
LTRIM(RTRIM(Substring(Extract_Record, 102,10))) AS Col9,
LTRIM(RTRIM(Substring(Extract_Record, 112,4))) AS Col10,
LTRIM(RTRIM(Substring(Extract_Record, 116,10))) AS Col11,
LTRIM(RTRIM(Substring(Extract_Record, 126,4))) AS Col12,
LTRIM(RTRIM(Substring(Extract_Record, 130,24))) AS Col13,
LTRIM(RTRIM(Substring(Extract_Record, 154,3))) AS Col14,
LTRIM(RTRIM(Substring(Extract_Record, 157,17))) AS Col15,
LTRIM(RTRIM(Substring(Extract_Record, 174,5))) AS Col16
FROM dbo.Staging ri2
LEFT OUTER JOIN dbo.Detail rit2
ON LTRIM(RTRIM(Substring(ri2.Extract_Record, 2,4))) = LTRIM(RTRIM(rit2.Col1))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 6,4))) = LTRIM(RTRIM(rit2.Col2))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 10,20))) = LTRIM(RTRIM(rit2.Col3))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 30,8))) = LTRIM(RTRIM(rit2.Col4))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 38,24)))= LTRIM(RTRIM(rit2.Col5))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 62,8))) = LTRIM(RTRIM(rit2.Col6))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 70,8))) = LTRIM(RTRIM(rit2.Col7))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 78,24))) = LTRIM(RTRIM(rit2.Col8))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 102,10))) = LTRIM(RTRIM(rit2.Col9))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 112,4))) = LTRIM(RTRIM(rit2.Col10))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 116,10))) = LTRIM(RTRIM(rit2.Col11))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 126,4))) = LTRIM(RTRIM(rit2.Col12))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 130,24))) = LTRIM(RTRIM(rit2.Col13))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 154,3))) = LTRIM(RTRIM(rit2.Col14))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 157,17)))= LTRIM(RTRIM(rit2.Col15))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 174,5))) = LTRIM(RTRIM(rit2.Col16))
WHERE rit2.TransactionDetailID IS NULL
-- TransactionDetailID is PK -->> NOT NULL. Null here means no matching record was found, so it's okay to insert.
-- No need to compare every column
AND ri2.Extract_Record LIKE '3%' -- just the "3" rows
COMMIT TRANSACTION

SELECT * FROM dbo.Staging ORDER BY ID;

SELECT * FROM dbo.Summary;
SELECT * FROM dbo.Detail;



Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
Links: For better assistance in answering your questions, How to ask a question, Performance Problems, Common date/time routines,
CROSS-TABS and PIVOT tables Part 1 & Part 2, Using APPLY Part 1 & Part 2, Splitting Delimited Strings

Brandie Tarvin
Brandie Tarvin
SSC-Dedicated
SSC-Dedicated (38K reputation)SSC-Dedicated (38K reputation)SSC-Dedicated (38K reputation)SSC-Dedicated (38K reputation)SSC-Dedicated (38K reputation)SSC-Dedicated (38K reputation)SSC-Dedicated (38K reputation)SSC-Dedicated (38K reputation)

Group: General Forum Members
Points: 38385 Visits: 9279
wolfkillj (5/13/2013)
I couldn't use your sample data because the text file formatting seemed to be inconsistent from line to line


Exactly. If you check my first post, you'll see where I made that point. That's why we import all rows into a single column in the staging table to begin with.

I need a solution that addresses my current setup, including the inconsistent data.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.
Brandie Tarvin
Brandie Tarvin
SSC-Dedicated
SSC-Dedicated (38K reputation)SSC-Dedicated (38K reputation)SSC-Dedicated (38K reputation)SSC-Dedicated (38K reputation)SSC-Dedicated (38K reputation)SSC-Dedicated (38K reputation)SSC-Dedicated (38K reputation)SSC-Dedicated (38K reputation)

Group: General Forum Members
Points: 38385 Visits: 9279
Thanks everyone for the responses. We're going to start checking them now. I'm glad I asked the question as this was not the direction I had been heading in with my thoughts.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.
Sioban Krzywicki
Sioban Krzywicki
SSCarpal Tunnel
SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)

Group: General Forum Members
Points: 4526 Visits: 8096
Brandie Tarvin (5/13/2013)
wolfkillj (5/13/2013)
I couldn't use your sample data because the text file formatting seemed to be inconsistent from line to line


Exactly. If you check my first post, you'll see where I made that point. That's why we import all rows into a single column in the staging table to begin with.

I need a solution that addresses my current setup, including the inconsistent data.


Handling the inconsistent data is my favorite part of doing it in SSIS, but the other approaches put forward here look good too.

--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
wolfkillj
wolfkillj
SSCrazy
SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)

Group: General Forum Members
Points: 2730 Visits: 2582
Brandie Tarvin (5/13/2013)
wolfkillj (5/13/2013)
I couldn't use your sample data because the text file formatting seemed to be inconsistent from line to line


Exactly. If you check my first post, you'll see where I made that point. That's why we import all rows into a single column in the staging table to begin with.

I need a solution that addresses my current setup, including the inconsistent data.


I went back and re-read your OP. If I understand you correctly, rows of different types (vendor and detail) have different data and so can't be parsed into the same set of columns. But you CAN reliably parse out the record ID value, leaving the rest of the row in a single column, and you CAN parse each type of row into its particular column set, right? If so, you can use the code I posted as a basis for identifying the rows that belong together (e.g., a record ID = 2 row with its associated record ID = 3 row(s)) and assigning them a single identifying number. Then, you can split the rows out into the different tables based on their record ID values (vendors and detail), using that identifying number as a foreign key. Does that get you where you need to go, or am I overlooking something.

I didn't study WayneS's solution in detail, but it looks like he took a very similar approach to keeping together the groups of rows that belong together.

Jason Wolfkill
Blog: SQLSouth
Twitter: @SQLSouth
Evil Kraig F
Evil Kraig F
SSC-Insane
SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)

Group: General Forum Members
Points: 20937 Visits: 7660
Brandie Tarvin (5/13/2013)
Thanks everyone for the responses. We're going to start checking them now. I'm glad I asked the question as this was not the direction I had been heading in with my thoughts.


Brandie, in SSIS are you familiar with using a Synchronous Tranformation Script Component? They are a godsend for issues like this where you're dependant on order for transformation. Also, plus side, they aren't stream interrupts.

Here's the short form: You fire up a script component midstream and add a column with GUIDs (easiest). Set the an internal variable to the first guid and then reset it on (1) lines. Carry through the old GUID on any other line.

Dump to database staging table.

Let me know if you need me to walk you through the code/setup. I'm hoping you're okay as unless you're familiar with them Script Transforms require a lot of screenshots. ;-)

EDIT: Side note: If you want to get fancy, said script transform can also split the string into components based on values and send it to multiple streams. Might be overkill on the first pass, but that's how I usually set them up.


- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions | Forum Netiquette
For index/tuning help, follow these directions. |Tally Tables

Twitter: @AnyWayDBA
mister.magoo
mister.magoo
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10725 Visits: 7891
Craig's idea sounds great!

Before I saw it, I came up with this , which is similar but different to Wayne's method for tying the rows together...
Just including it as an option, but probably Craig's SSIS magic is much better as this does some horrible sorting operations Sick


;with magic as
(
select
st.ID
, st.Extract_Record
, st.UNID
, case left(st.Extract_Record,1)
when '2' then rank() over(partition by left(st.Extract_Record,1) order by st.id)
else row_number() over(order by st.id) - rank() over(partition by left(st.Extract_Record,1) order by st.id)
end AS new_ID
from staging AS st
where st.Extract_Record like '[23]%'
)
update magic
set unid = new_ID




MM


select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);




  • Forum Etiquette: How to post Reporting Services problems
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • How to Post Performance Problems - by Gail Shaw

  • Jeff Moden
    Jeff Moden
    SSC Guru
    SSC Guru (217K reputation)SSC Guru (217K reputation)SSC Guru (217K reputation)SSC Guru (217K reputation)SSC Guru (217K reputation)SSC Guru (217K reputation)SSC Guru (217K reputation)SSC Guru (217K reputation)

    Group: General Forum Members
    Points: 217579 Visits: 41995
    Brandie Tarvin (5/13/2013)
    Duh. Forgot there is a record 4 line which is the last line of the file. Anyway, sample data attached and CREATE TABLE code as follows:


    Heh... ok. I've watched this long enough and now they're starting to use really nasty 4 letter words... like SSIS.:-D

    How many lines of data does your typical file have? And, any limits on the type of SQL Server objects that you can create?

    --Jeff Moden

    RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

    Helpful Links:
    How to post code problems
    How to post performance problems
    Forum FAQs
    Evil Kraig F
    Evil Kraig F
    SSC-Insane
    SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)

    Group: General Forum Members
    Points: 20937 Visits: 7660
    Jeff Moden (5/13/2013)
    Brandie Tarvin (5/13/2013)
    Duh. Forgot there is a record 4 line which is the last line of the file. Anyway, sample data attached and CREATE TABLE code as follows:


    Heh... ok. I've watched this long enough and now they're starting to use really nasty 4 letter words... like SSIS.:-D

    How many lines of data does your typical file have? And, any limits on the type of SQL Server objects that you can create?


    LOL, there is nothing wrong with SSIS, particularly to solve an iterative programming artifact that breaks normalization rules because it was originally built to be processed by iterative programs. :-P

    Not my magic, though, Magoo. I learned off others just like everyone else. ;-) I did however trip over a link that I think embodies the process in the most simplified method possible; it's just the facts, ma'am:
    http://agilebi.com/jwelch/2007/07/12/processing-a-flat-file-with-header-and-detail-rows/

    (As to the SSIS, what did you expect me to solve this with... SQLCLR? Hehe )


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions | Forum Netiquette
    For index/tuning help, follow these directions. |Tally Tables

    Twitter: @AnyWayDBA
    Go


    Permissions

    You can't post new topics.
    You can't post topic replies.
    You can't post new polls.
    You can't post replies to polls.
    You can't edit your own topics.
    You can't delete your own topics.
    You can't edit other topics.
    You can't delete other topics.
    You can't edit your own posts.
    You can't edit other posts.
    You can't delete your own posts.
    You can't delete other posts.
    You can't post events.
    You can't edit your own events.
    You can't edit other events.
    You can't delete your own events.
    You can't delete other events.
    You can't send private messages.
    You can't send emails.
    You can read topics.
    You can't vote in polls.
    You can't upload attachments.
    You can download attachments.
    You can't post HTML code.
    You can't edit HTML code.
    You can't post IFCode.
    You can't post JavaScript.
    You can post emoticons.
    You can't post or upload images.

    Select a forum

































































































































































    SQLServerCentral


    Search