Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««123»»

Parsing a summary / detail flat file Expand / Collapse
Author
Message
Posted Monday, May 13, 2013 11:38 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Tuesday, August 19, 2014 7:36 PM
Points: 6,593, Visits: 8,874
Chad Crawford (5/13/2013)
Yo Jeff, sounds like a candidate for the "Quirky Update"?

I don't think that the QU is necessary here, though it would work.


Wayne
Microsoft Certified Master: SQL Server 2008
If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
Links: For better assistance in answering your questions, How to ask a question, Performance Problems, Common date/time routines,
CROSS-TABS and PIVOT tables Part 1 & Part 2, Using APPLY Part 1 & Part 2, Splitting Delimited Strings
Post #1452250
Posted Monday, May 13, 2013 11:52 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Tuesday, August 19, 2014 7:36 PM
Points: 6,593, Visits: 8,874
Brandie Tarvin (5/13/2013)

So, any ideas?


Brandie, Try this out:
-- Let's add a column to the staging table to make deletes easier. You could even index this if you want to.
-- However, this isn't necessary.
IF NOT EXISTS (SELECT * FROM sys.columns WHERE object_id = OBJECT_ID('dbo.Staging','U') AND name = 'DateField')
ALTER TABLE dbo.Staging ADD DateField DATE;

-- this is necessary only to prevent any other updates to the Summary table until all of the inserts have taken place.
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

-- get the vendor records assigned first, and assign sequential numbers for the UNID
WITH cte AS
(
SELECT ID, UNID, Extract_Record, DateField, ROW_NUMBER() OVER (ORDER BY ID) AS RN
FROM dbo.Staging
WHERE Extract_Record LIKE '2%' -- get the vendor data
)
UPDATE cte
SET UNID = RN,
DateField=LTRIM(RTRIM(Substring(Extract_Record, 97,6)))+'01';


-- now assign the details to the same UNID for the vendor record < this current record
-- if you don't want to add the DateField column to the Staging table, you can remove
-- the 2nd CROSS APPLY and the update to that column.
UPDATE t1
SET UNID = t2.UNID,
DateField = t3.DateField
FROM dbo.Staging t1
CROSS APPLY (SELECT MAX(UNID) FROM dbo.Staging WHERE ID < t1.ID) t2(UNID)
CROSS APPLY (SELECT DateField FROM dbo.Staging WHERE UNID = t2.UNID) t3
WHERE t1.Extract_Record LIKE '3%';


-- delete the current month:
-- Again, if you don't want to use the DateField column, just use your existing delete statement
-- DELETE FROM dbo.Staging where DateField = CONVERT(DATETIME, CONVERT(char(6),GETDATE(),112) + '01')


BEGIN TRANSACTION
DECLARE @MAX INT;
SELECT @MAX=ISNULL(MAX(TransactionID),0) FROM dbo.Summary;
-- the repeatable read prevents any other transaction from inserting data that would change the max setting

-- I need to control the value of the identity column, so that I can associate the identity value with the UNID value for the summary record.
-- The association is the @MAX from above with the UNID already calculated for this record.
SET IDENTITY_INSERT dbo.Summary ON;
INSERT INTO dbo.Summary( TransactionID, Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8,Col9)
SELECT @MAX+UNID,
LTRIM(RTRIM(Substring(Extract_Record, 2,4))) AS Col1,
LTRIM(RTRIM(Substring(Extract_Record, 6,10))) AS Col2,
LTRIM(RTRIM(Substring(Extract_Record, 16,3))) AS Col3,
LTRIM(RTRIM(Substring(Extract_Record, 19,20))) AS Col4,
LTRIM(RTRIM(Substring(Extract_Record, 39,8))) AS Col5,
LTRIM(RTRIM(Substring(Extract_Record, 47,10))) AS Col6,
LTRIM(RTRIM(Substring(Extract_Record, 57,10))) AS Col7,
LTRIM(RTRIM(Substring(Extract_Record, 67,30))) AS Col8,
LTRIM(RTRIM(Substring(Extract_Record, 97,8))) AS Col9
FROM dbo.Staging ri
LEFT OUTER JOIN Summary rit
ON LTRIM(RTRIM(Substring (ri.Extract_Record, 2,4))) = LTRIM(RTRIM(rit.Col1))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 6,10))) = LTRIM(RTRIM(rit.Col2))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 16,3))) = LTRIM(RTRIM(rit.Col3))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 19,20)))= LTRIM(RTRIM(rit.Col4))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 39,8))) = LTRIM(RTRIM(rit.Col5))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 47,10)))= LTRIM(RTRIM(rit.Col6))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 57,10)))= LTRIM(RTRIM(rit.Col7))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 67,30)))= LTRIM(RTRIM(rit.Col8))
AND LTRIM(RTRIM(Substring(ri.Extract_Record, 97,8)))= LTRIM(RTRIM(rit.Col9))
WHERE rit.TransactionID IS NULL
-- TransactionID is a PK -->> NOT NULL. A null here means that no matching record was found, so it's okay to insert this record.
-- No need to compare every column.
AND ri.Extract_Record LIKE '2%'; -- just the "2" records
-- don't forget to turn IDENTITY INSERT off!
SET IDENTITY_INSERT dbo.Summary OFF;

INSERT INTO Detail (TransactionID,
Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8,
Col9,Col10,Col11,Col12,Col13,Col14,Col15,Col16)
SELECT @MAX+UNID, -- since we already have the UNID matching in the table, and we inserted UNID+@Max into the summary, use this same calc for the detail.
LTRIM(RTRIM(Substring(Extract_Record, 2,4))) AS Col1,
LTRIM(RTRIM(Substring(Extract_Record, 6,4))) AS Col2,
LTRIM(RTRIM(Substring(Extract_Record, 10,20))) AS Col3,
LTRIM(RTRIM(Substring(Extract_Record, 30,8))) AS Col4,
LTRIM(RTRIM(Substring(Extract_Record, 38,24))) AS Col5,
LTRIM(RTRIM(Substring(Extract_Record, 62,8))) AS Col6,
LTRIM(RTRIM(Substring(Extract_Record, 70,8))) AS Col7,
LTRIM(RTRIM(Substring(Extract_Record, 78,24))) AS Col8,
LTRIM(RTRIM(Substring(Extract_Record, 102,10))) AS Col9,
LTRIM(RTRIM(Substring(Extract_Record, 112,4))) AS Col10,
LTRIM(RTRIM(Substring(Extract_Record, 116,10))) AS Col11,
LTRIM(RTRIM(Substring(Extract_Record, 126,4))) AS Col12,
LTRIM(RTRIM(Substring(Extract_Record, 130,24))) AS Col13,
LTRIM(RTRIM(Substring(Extract_Record, 154,3))) AS Col14,
LTRIM(RTRIM(Substring(Extract_Record, 157,17))) AS Col15,
LTRIM(RTRIM(Substring(Extract_Record, 174,5))) AS Col16
FROM dbo.Staging ri2
LEFT OUTER JOIN dbo.Detail rit2
ON LTRIM(RTRIM(Substring(ri2.Extract_Record, 2,4))) = LTRIM(RTRIM(rit2.Col1))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 6,4))) = LTRIM(RTRIM(rit2.Col2))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 10,20))) = LTRIM(RTRIM(rit2.Col3))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 30,8))) = LTRIM(RTRIM(rit2.Col4))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 38,24)))= LTRIM(RTRIM(rit2.Col5))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 62,8))) = LTRIM(RTRIM(rit2.Col6))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 70,8))) = LTRIM(RTRIM(rit2.Col7))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 78,24))) = LTRIM(RTRIM(rit2.Col8))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 102,10))) = LTRIM(RTRIM(rit2.Col9))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 112,4))) = LTRIM(RTRIM(rit2.Col10))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 116,10))) = LTRIM(RTRIM(rit2.Col11))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 126,4))) = LTRIM(RTRIM(rit2.Col12))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 130,24))) = LTRIM(RTRIM(rit2.Col13))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 154,3))) = LTRIM(RTRIM(rit2.Col14))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 157,17)))= LTRIM(RTRIM(rit2.Col15))
AND LTRIM(RTRIM(Substring(ri2.Extract_Record, 174,5))) = LTRIM(RTRIM(rit2.Col16))
WHERE rit2.TransactionDetailID IS NULL
-- TransactionDetailID is PK -->> NOT NULL. Null here means no matching record was found, so it's okay to insert.
-- No need to compare every column
AND ri2.Extract_Record LIKE '3%' -- just the "3" rows
COMMIT TRANSACTION

SELECT * FROM dbo.Staging ORDER BY ID;

SELECT * FROM dbo.Summary;
SELECT * FROM dbo.Detail;



Wayne
Microsoft Certified Master: SQL Server 2008
If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
Links: For better assistance in answering your questions, How to ask a question, Performance Problems, Common date/time routines,
CROSS-TABS and PIVOT tables Part 1 & Part 2, Using APPLY Part 1 & Part 2, Splitting Delimited Strings
Post #1452252
Posted Monday, May 13, 2013 12:12 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 6:12 AM
Points: 7,187, Visits: 6,333
wolfkillj (5/13/2013)
I couldn't use your sample data because the text file formatting seemed to be inconsistent from line to line


Exactly. If you check my first post, you'll see where I made that point. That's why we import all rows into a single column in the staging table to begin with.

I need a solution that addresses my current setup, including the inconsistent data.


Brandie Tarvin, MCITP Database Administrator

Webpage: http://www.BrandieTarvin.net
LiveJournal Blog: http://brandietarvin.livejournal.com/
On LinkedIn!, Google+, and Twitter.

Freelance Writer: Shadowrun
Latchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.
Post #1452261
Posted Monday, May 13, 2013 12:16 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 6:12 AM
Points: 7,187, Visits: 6,333
Thanks everyone for the responses. We're going to start checking them now. I'm glad I asked the question as this was not the direction I had been heading in with my thoughts.

Brandie Tarvin, MCITP Database Administrator

Webpage: http://www.BrandieTarvin.net
LiveJournal Blog: http://brandietarvin.livejournal.com/
On LinkedIn!, Google+, and Twitter.

Freelance Writer: Shadowrun
Latchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.
Post #1452263
Posted Monday, May 13, 2013 12:40 PM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Thursday, August 7, 2014 7:44 PM
Points: 2,762, Visits: 7,235
Brandie Tarvin (5/13/2013)
wolfkillj (5/13/2013)
I couldn't use your sample data because the text file formatting seemed to be inconsistent from line to line


Exactly. If you check my first post, you'll see where I made that point. That's why we import all rows into a single column in the staging table to begin with.

I need a solution that addresses my current setup, including the inconsistent data.


Handling the inconsistent data is my favorite part of doing it in SSIS, but the other approaches put forward here look good too.


--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
Post #1452272
Posted Monday, May 13, 2013 12:52 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: 2 days ago @ 8:14 AM
Points: 1,221, Visits: 2,544
Brandie Tarvin (5/13/2013)
wolfkillj (5/13/2013)
I couldn't use your sample data because the text file formatting seemed to be inconsistent from line to line


Exactly. If you check my first post, you'll see where I made that point. That's why we import all rows into a single column in the staging table to begin with.

I need a solution that addresses my current setup, including the inconsistent data.


I went back and re-read your OP. If I understand you correctly, rows of different types (vendor and detail) have different data and so can't be parsed into the same set of columns. But you CAN reliably parse out the record ID value, leaving the rest of the row in a single column, and you CAN parse each type of row into its particular column set, right? If so, you can use the code I posted as a basis for identifying the rows that belong together (e.g., a record ID = 2 row with its associated record ID = 3 row(s)) and assigning them a single identifying number. Then, you can split the rows out into the different tables based on their record ID values (vendors and detail), using that identifying number as a foreign key. Does that get you where you need to go, or am I overlooking something.

I didn't study WayneS's solution in detail, but it looks like he took a very similar approach to keeping together the groups of rows that belong together.


Jason Wolfkill
Blog: SQLSouth
Twitter: @SQLSouth
Post #1452276
Posted Monday, May 13, 2013 4:02 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 12:03 AM
Points: 6,236, Visits: 7,379
Brandie Tarvin (5/13/2013)
Thanks everyone for the responses. We're going to start checking them now. I'm glad I asked the question as this was not the direction I had been heading in with my thoughts.


Brandie, in SSIS are you familiar with using a Synchronous Tranformation Script Component? They are a godsend for issues like this where you're dependant on order for transformation. Also, plus side, they aren't stream interrupts.

Here's the short form: You fire up a script component midstream and add a column with GUIDs (easiest). Set the an internal variable to the first guid and then reset it on (1) lines. Carry through the old GUID on any other line.

Dump to database staging table.

Let me know if you need me to walk you through the code/setup. I'm hoping you're okay as unless you're familiar with them Script Transforms require a lot of screenshots.

EDIT: Side note: If you want to get fancy, said script transform can also split the string into components based on values and send it to multiple streams. Might be overkill on the first pass, but that's how I usually set them up.



- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions | Forum Netiquette
For index/tuning help, follow these directions. |Tally Tables

Twitter: @AnyWayDBA
Post #1452350
Posted Monday, May 13, 2013 4:33 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: 2 days ago @ 5:29 PM
Points: 1,787, Visits: 5,693
Craig's idea sounds great!

Before I saw it, I came up with this , which is similar but different to Wayne's method for tying the rows together...
Just including it as an option, but probably Craig's SSIS magic is much better as this does some horrible sorting operations

;with magic as
(
select
st.ID
, st.Extract_Record
, st.UNID
, case left(st.Extract_Record,1)
when '2' then rank() over(partition by left(st.Extract_Record,1) order by st.id)
else row_number() over(order by st.id) - rank() over(partition by left(st.Extract_Record,1) order by st.id)
end AS new_ID
from staging AS st
where st.Extract_Record like '[23]%'
)
update magic
set unid = new_ID




MM


  • MMGrid Addin
  • MMNose Addin


  • Forum Etiquette: How to post Reporting Services problems
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • How to Post Performance Problems - by Gail Shaw

  • Post #1452354
    Posted Monday, May 13, 2013 5:04 PM


    SSC-Dedicated

    SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

    Group: General Forum Members
    Last Login: Yesterday @ 11:34 PM
    Points: 36,952, Visits: 31,461
    Brandie Tarvin (5/13/2013)
    Duh. Forgot there is a record 4 line which is the last line of the file. Anyway, sample data attached and CREATE TABLE code as follows:


    Heh... ok. I've watched this long enough and now they're starting to use really nasty 4 letter words... like SSIS.

    How many lines of data does your typical file have? And, any limits on the type of SQL Server objects that you can create?


    --Jeff Moden
    "RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

    First step towards the paradigm shift of writing Set Based code:
    Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

    (play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

    Helpful Links:
    How to post code problems
    How to post performance problems
    Post #1452360
    Posted Monday, May 13, 2013 5:09 PM


    SSCertifiable

    SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

    Group: General Forum Members
    Last Login: Today @ 12:03 AM
    Points: 6,236, Visits: 7,379
    Jeff Moden (5/13/2013)
    Brandie Tarvin (5/13/2013)
    Duh. Forgot there is a record 4 line which is the last line of the file. Anyway, sample data attached and CREATE TABLE code as follows:


    Heh... ok. I've watched this long enough and now they're starting to use really nasty 4 letter words... like SSIS.

    How many lines of data does your typical file have? And, any limits on the type of SQL Server objects that you can create?


    LOL, there is nothing wrong with SSIS, particularly to solve an iterative programming artifact that breaks normalization rules because it was originally built to be processed by iterative programs.

    Not my magic, though, Magoo. I learned off others just like everyone else. I did however trip over a link that I think embodies the process in the most simplified method possible; it's just the facts, ma'am:
    http://agilebi.com/jwelch/2007/07/12/processing-a-flat-file-with-header-and-detail-rows/

    (As to the SSIS, what did you expect me to solve this with... SQLCLR? )



    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions | Forum Netiquette
    For index/tuning help, follow these directions. |Tally Tables

    Twitter: @AnyWayDBA
    Post #1452361
    « Prev Topic | Next Topic »

    Add to briefcase ««123»»

    Permissions Expand / Collapse