SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Whats the easiest way to read the first 10 characters of a flat file


Whats the easiest way to read the first 10 characters of a flat file

Author
Message
pmcquirk 96901
pmcquirk 96901
Forum Newbie
Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)

Group: General Forum Members
Points: 6 Visits: 15
Basically I am recieveing multiple flat files of two different file layouts, the file names do not give a clue to the layout, howevet the files (pipe delimited with different numbers of columns) have the column names in the first row and the first colum name is different for each layout.

So Im looking for an easy way to read the data ( in a ascript ?? or powershell ??) and set a vaiable in my package to identify the data flow I need to use.

SSIS 2012.
sqlsurfing
sqlsurfing
Ten Centuries
Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)

Group: General Forum Members
Points: 1002 Visits: 1190
Powershell would work.

You could read first line until crlf - that would give you the format of your file, then use some string processing in Powershell to build some type of "logic" to assess the type of "data file format"

--------------------------------------------------
...0.05 points per day since registration... slowly crawl up to 1 pt per day hopefully :-D
Phil Parkin
Phil Parkin
SSC Guru
SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)

Group: General Forum Members
Points: 50487 Visits: 21152
pmcquirk 96901 (5/6/2013)
Basically I am recieveing multiple flat files of two different file layouts, the file names do not give a clue to the layout, howevet the files (pipe delimited with different numbers of columns) have the column names in the first row and the first colum name is different for each layout.

So Im looking for an easy way to read the data ( in a ascript ?? or powershell ??) and set a vaiable in my package to identify the data flow I need to use.

SSIS 2012.


You could also use a script task, as you suggest. Here is a link that should get you started, should you choose to go down that path. Within the script task, you would also, of course, set the package variable which you would use in subsequent precedence constraints to select the relevant data flow.


Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this link.

If the answer to your question can be found with a brief Google search, please perform the search yourself, rather than expecting one of the SSC members to do it for you.

Please surround any code or links you post with the appropriate IFCode formatting tags. It helps readability a lot.
dg227
dg227
Mr or Mrs. 500
Mr or Mrs. 500 (575 reputation)Mr or Mrs. 500 (575 reputation)Mr or Mrs. 500 (575 reputation)Mr or Mrs. 500 (575 reputation)Mr or Mrs. 500 (575 reputation)Mr or Mrs. 500 (575 reputation)Mr or Mrs. 500 (575 reputation)Mr or Mrs. 500 (575 reputation)

Group: General Forum Members
Points: 575 Visits: 819
Without PS or a script component task, you can treat the data records in the given file as a single column text field. In the data flow, start with a conditional split looking at the substring(1,10,[field]) and divert the remaining records from there. The records can then be shredded in subsequent derived column fields based on your file definition/needs.

It looks like the OP decided on PS (not a bad way to go), but I'll post a sample package and files this afternoon to demonstrate the above.
Phil Parkin
Phil Parkin
SSC Guru
SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)

Group: General Forum Members
Points: 50487 Visits: 21152
dg227 (5/7/2013)
Without PS or a script component, you can treat the data records in the given file as a single column text field. In the data flow, start with a conditional split looking at the substring(1,10,[field]) and divert the remaining records from there. The records can then be shredded in subsequent derived column fields based on your file definition/needs.


This would divert only the first row.

"..divert the remaining records from there"
Please explain how you would implement this.

Also, I suggested a Script Task, not a Script Component - there is a difference.


Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this link.

If the answer to your question can be found with a brief Google search, please perform the search yourself, rather than expecting one of the SSC members to do it for you.

Please surround any code or links you post with the appropriate IFCode formatting tags. It helps readability a lot.
pmcquirk 96901
pmcquirk 96901
Forum Newbie
Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)

Group: General Forum Members
Points: 6 Visits: 15
Thanks everyone,
I'm going to use powershell to split the files into separate directories, and loop the loadds through all the files in the directory.

peter
dg227
dg227
Mr or Mrs. 500
Mr or Mrs. 500 (575 reputation)Mr or Mrs. 500 (575 reputation)Mr or Mrs. 500 (575 reputation)Mr or Mrs. 500 (575 reputation)Mr or Mrs. 500 (575 reputation)Mr or Mrs. 500 (575 reputation)Mr or Mrs. 500 (575 reputation)Mr or Mrs. 500 (575 reputation)

Group: General Forum Members
Points: 575 Visits: 819
As follow up from yesterday, I've uploaded a zip file containing the dtsx package (detailed below) and a folder (containing an A and B file) that should be saved in your local c:\test folder.

File A has four columns and 5 data records, file B has 6 columns and 3 data records.

In a nutshell, the package loops through that SSC_20130507 directory with the two files, parsing them in a single data flow. Using a Delimiter variable (a pipe, in this example) and key values to be found in each file's header, first column ("Col1A" and "Col1B", respectively ... this is shortened to 5 characters from the OP's 10 character request, but demonstrates the functionality) the Data flow determine's each file's type (A or B) based on that key, appends "A" or "B" to the file record, and then conditionally splits the data flow from there. Demonstrating that each file's data can then be parsed and used in its own subsequent set of tasks, a derived column task shreds the column data based on the Delimiter var in to distinct columns, which can then be later used in the data flow (terminating here in Union All transformations, for simplicity).

Basically, while there can be different approaches to handling this problem, including Script Tasks in the control flow, external scripting (PowerShell), etc., I wanted to demonstrate that yes, you can accomplish this fairly easily within a single data flow.
Attachments
SSC_20130507_Example.zip (9 views, 24.00 KB)
Phil Parkin
Phil Parkin
SSC Guru
SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)SSC Guru (50K reputation)

Group: General Forum Members
Points: 50487 Visits: 21152
I've just checked your solution - the data flow starts by using a Script Component to check the file type - and every single row of every file will go through that component, despite your original quote:

Without PS or a script component, you can treat the data records in the given file as a single column text field. In the data flow, start with a conditional split looking at the substring(1,10,[field]) and divert the remaining records from there.


Nonetheless, an interesting working example - thanks for posting it.


Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this link.

If the answer to your question can be found with a brief Google search, please perform the search yourself, rather than expecting one of the SSC members to do it for you.

Please surround any code or links you post with the appropriate IFCode formatting tags. It helps readability a lot.
dg227
dg227
Mr or Mrs. 500
Mr or Mrs. 500 (575 reputation)Mr or Mrs. 500 (575 reputation)Mr or Mrs. 500 (575 reputation)Mr or Mrs. 500 (575 reputation)Mr or Mrs. 500 (575 reputation)Mr or Mrs. 500 (575 reputation)Mr or Mrs. 500 (575 reputation)Mr or Mrs. 500 (575 reputation)

Group: General Forum Members
Points: 575 Visits: 819
True, that's is why I edited my earlier response when you pointed that out, changing component to task (which is what I had originally meant - wrong term).

The solution does require all file rows to pass through the data flow's script component in order to flag the row with the correct file type and then conditionally splits from there. I don't see that as notable issue because all records are likely going to have to pass through a data flow somewhere in order to be processed (substantially large files could see a performance issue, I suppose, by running each record through that script component ... that'd have have to be tested/evaluated based on business needs, as would the selected solution at the end of the day).

I'm glad that the sample package and related files worked! (first posting/response with supporting documentation/examples like that)

Thanks!

:-D
Lowell
Lowell
SSC Guru
SSC Guru (70K reputation)SSC Guru (70K reputation)SSC Guru (70K reputation)SSC Guru (70K reputation)SSC Guru (70K reputation)SSC Guru (70K reputation)SSC Guru (70K reputation)SSC Guru (70K reputation)

Group: General Forum Members
Points: 70188 Visits: 40923
in a kind-of-similar thread, someone had a table full of varbinary(max) datatypes full of graphic images, without the filename/extension, so they couldn't figure out if the file was *.jpg, *.gif or what.

That thread is here if you are curious:
http://www.sqlservercentral.com/Forums/Topic1178898-392-1.aspx

when i was helping there, i wrote a snippet that read the first ten bytes in vb.NET from everything in a local folder so i could do the analysis and make some TSQL code to map the first ten bytes to known extensions.

here's the vb code i used:

Dim myFilePath As String = String.Empty
Dim myFileBrowser As New OpenFileDialog
Try
With myFileBrowser
'With statement is used to execute statements using a particular object, here,_

'setting filters so that Text files and All Files choice appears in the Files of Type box
'in the dialog
If .ShowDialog() = DialogResult.OK Then
'showDialog method makes the dialog box visible at run time
myFilePath = .FileName

End If
End With
Catch ex As Exception

MsgBox(ex.Message)
Finally
'If Not (sr Is Nothing) Then
' sr.Close()
'End If
End Try
'if we have a valid path, we can go forward.
If myFilePath <> String.Empty Then
' Open a file that is to be loaded into a byte array
Dim oFile As System.IO.FileInfo
oFile = New System.IO.FileInfo(myFilePath)

Dim oFileStream As System.IO.FileStream = oFile.OpenRead()
Dim lBytes As Long = oFileStream.Length
'the above would read the whole file into the byte array, we want just the 1st 10 bytes for testing.
lBytes = 10
If (lBytes > 0) Then
Dim fileData(lBytes - 1) As Byte

' Read the file into a byte array
oFileStream.Read(fileData, 0, lBytes)
oFileStream.Close()
Debug.Print(oFile.Extension & "||" & BitConverter.ToString(fileData))
End If


End If


Lowell
--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search