Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Whats the easiest way to read the first 10 characters of a flat file Expand / Collapse
Author
Message
Posted Monday, May 6, 2013 3:29 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, October 8, 2013 1:59 PM
Points: 2, Visits: 15
Basically I am recieveing multiple flat files of two different file layouts, the file names do not give a clue to the layout, howevet the files (pipe delimited with different numbers of columns) have the column names in the first row and the first colum name is different for each layout.

So Im looking for an easy way to read the data ( in a ascript ?? or powershell ??) and set a vaiable in my package to identify the data flow I need to use.

SSIS 2012.
Post #1449907
Posted Monday, May 6, 2013 5:47 PM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, August 22, 2014 4:50 PM
Points: 115, Visits: 824
Powershell would work.

You could read first line until crlf - that would give you the format of your file, then use some string processing in Powershell to build some type of "logic" to assess the type of "data file format"




--------------------------------------------------
...0.05 points per day since registration... slowly crawl up to 1 pt per day hopefully
Post #1449940
Posted Monday, May 6, 2013 11:10 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 8:38 AM
Points: 5,047, Visits: 11,803
pmcquirk 96901 (5/6/2013)
Basically I am recieveing multiple flat files of two different file layouts, the file names do not give a clue to the layout, howevet the files (pipe delimited with different numbers of columns) have the column names in the first row and the first colum name is different for each layout.

So Im looking for an easy way to read the data ( in a ascript ?? or powershell ??) and set a vaiable in my package to identify the data flow I need to use.

SSIS 2012.


You could also use a script task, as you suggest. Here is a link that should get you started, should you choose to go down that path. Within the script task, you would also, of course, set the package variable which you would use in subsequent precedence constraints to select the relevant data flow.



Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this link.

When you ask a question (and please do ask a question: "My T-SQL does not work" just doesn't cut it), please provide enough information for us to understand its context.
Post #1449976
Posted Tuesday, May 7, 2013 7:40 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Thursday, August 21, 2014 2:57 PM
Points: 254, Visits: 752
Without PS or a script component task, you can treat the data records in the given file as a single column text field. In the data flow, start with a conditional split looking at the substring(1,10,[field]) and divert the remaining records from there. The records can then be shredded in subsequent derived column fields based on your file definition/needs.

It looks like the OP decided on PS (not a bad way to go), but I'll post a sample package and files this afternoon to demonstrate the above.
Post #1450141
Posted Tuesday, May 7, 2013 7:53 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 8:38 AM
Points: 5,047, Visits: 11,803
dg227 (5/7/2013)
Without PS or a script component, you can treat the data records in the given file as a single column text field. In the data flow, start with a conditional split looking at the substring(1,10,[field]) and divert the remaining records from there. The records can then be shredded in subsequent derived column fields based on your file definition/needs.


This would divert only the first row.

"..divert the remaining records from there"
Please explain how you would implement this.

Also, I suggested a Script Task, not a Script Component - there is a difference.



Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this link.

When you ask a question (and please do ask a question: "My T-SQL does not work" just doesn't cut it), please provide enough information for us to understand its context.
Post #1450149
Posted Tuesday, May 7, 2013 10:42 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, October 8, 2013 1:59 PM
Points: 2, Visits: 15
Thanks everyone,
I'm going to use powershell to split the files into separate directories, and loop the loadds through all the files in the directory.

peter
Post #1450249
Posted Wednesday, May 8, 2013 8:02 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Thursday, August 21, 2014 2:57 PM
Points: 254, Visits: 752
As follow up from yesterday, I've uploaded a zip file containing the dtsx package (detailed below) and a folder (containing an A and B file) that should be saved in your local c:\test folder.

File A has four columns and 5 data records, file B has 6 columns and 3 data records.

In a nutshell, the package loops through that SSC_20130507 directory with the two files, parsing them in a single data flow. Using a Delimiter variable (a pipe, in this example) and key values to be found in each file's header, first column ("Col1A" and "Col1B", respectively ... this is shortened to 5 characters from the OP's 10 character request, but demonstrates the functionality) the Data flow determine's each file's type (A or B) based on that key, appends "A" or "B" to the file record, and then conditionally splits the data flow from there. Demonstrating that each file's data can then be parsed and used in its own subsequent set of tasks, a derived column task shreds the column data based on the Delimiter var in to distinct columns, which can then be later used in the data flow (terminating here in Union All transformations, for simplicity).

Basically, while there can be different approaches to handling this problem, including Script Tasks in the control flow, external scripting (PowerShell), etc., I wanted to demonstrate that yes, you can accomplish this fairly easily within a single data flow.


  Post Attachments 
SSC_20130507_Example.zip (4 views, 24.74 KB)
Post #1450583
Posted Wednesday, May 8, 2013 8:32 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 8:38 AM
Points: 5,047, Visits: 11,803
I've just checked your solution - the data flow starts by using a Script Component to check the file type - and every single row of every file will go through that component, despite your original quote:

Without PS or a script component, you can treat the data records in the given file as a single column text field. In the data flow, start with a conditional split looking at the substring(1,10,[field]) and divert the remaining records from there.


Nonetheless, an interesting working example - thanks for posting it.



Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this link.

When you ask a question (and please do ask a question: "My T-SQL does not work" just doesn't cut it), please provide enough information for us to understand its context.
Post #1450609
Posted Wednesday, May 8, 2013 10:17 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Thursday, August 21, 2014 2:57 PM
Points: 254, Visits: 752
True, that's is why I edited my earlier response when you pointed that out, changing component to task (which is what I had originally meant - wrong term).

The solution does require all file rows to pass through the data flow's script component in order to flag the row with the correct file type and then conditionally splits from there. I don't see that as notable issue because all records are likely going to have to pass through a data flow somewhere in order to be processed (substantially large files could see a performance issue, I suppose, by running each record through that script component ... that'd have have to be tested/evaluated based on business needs, as would the selected solution at the end of the day).

I'm glad that the sample package and related files worked! (first posting/response with supporting documentation/examples like that)

Thanks!

Post #1450676
Posted Wednesday, May 8, 2013 10:30 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 2:51 PM
Points: 12,910, Visits: 32,026
in a kind-of-similar thread, someone had a table full of varbinary(max) datatypes full of graphic images, without the filename/extension, so they couldn't figure out if the file was *.jpg, *.gif or what.

That thread is here if you are curious:
http://www.sqlservercentral.com/Forums/Topic1178898-392-1.aspx

when i was helping there, i wrote a snippet that read the first ten bytes in vb.NET from everything in a local folder so i could do the analysis and make some TSQL code to map the first ten bytes to known extensions.

here's the vb code i used:
Dim myFilePath As String = String.Empty
Dim myFileBrowser As New OpenFileDialog
Try
With myFileBrowser
'With statement is used to execute statements using a particular object, here,_

'setting filters so that Text files and All Files choice appears in the Files of Type box
'in the dialog
If .ShowDialog() = DialogResult.OK Then
'showDialog method makes the dialog box visible at run time
myFilePath = .FileName

End If
End With
Catch ex As Exception

MsgBox(ex.Message)
Finally
'If Not (sr Is Nothing) Then
' sr.Close()
'End If
End Try
'if we have a valid path, we can go forward.
If myFilePath <> String.Empty Then
' Open a file that is to be loaded into a byte array
Dim oFile As System.IO.FileInfo
oFile = New System.IO.FileInfo(myFilePath)

Dim oFileStream As System.IO.FileStream = oFile.OpenRead()
Dim lBytes As Long = oFileStream.Length
'the above would read the whole file into the byte array, we want just the 1st 10 bytes for testing.
lBytes = 10
If (lBytes > 0) Then
Dim fileData(lBytes - 1) As Byte

' Read the file into a byte array
oFileStream.Read(fileData, 0, lBytes)
oFileStream.Close()
Debug.Print(oFile.Extension & "||" & BitConverter.ToString(fileData))
End If


End If


Lowell

--There is no spoon, and there's no default ORDER BY in sql server either.
Actually, Common Sense is so rare, it should be considered a Superpower. --my son
Post #1450686
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse