Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Extract data from CSV file - SSIS Expand / Collapse
Author
Message
Posted Tuesday, April 2, 2013 7:14 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, August 1, 2014 8:41 AM
Points: 101, Visits: 437
Hi all,

I have a csv file with data from 2005...and on daily basis the data is getting appended in the csv file.. Im able to use a conditional split to include data for current 2 years and exclude the rest...


But I just wanted to know if something can be done in the script task(control flow) itself, so that the flat file connection manager itself picks up the data for the current year only. I don wanna pass all the rows to the conditional split and exclude the rows...any help on this?

Thanks
Post #1437875
Posted Tuesday, April 2, 2013 7:25 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 1:35 AM
Points: 5,161, Visits: 12,004
Why not archive the file (or the very old data in the file) rather than letting it grow?


Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this link.

When you ask a question (and please do ask a question: "My T-SQL does not work" just doesn't cut it), please provide enough information for us to understand its context.
Post #1437881
Posted Tuesday, April 2, 2013 7:28 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, August 1, 2014 8:41 AM
Points: 101, Visits: 437
archiving the file is not in my scope...the file is being generated by the third party vendor on a daily basis...im just accessing their back up csv files to feed data for my database.

Is there by anyways to read the data from CSV file for current year or last 30 days or so...instead reading everything and using the split?

thanks
Post #1437884
Posted Monday, April 8, 2013 11:59 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 7:06 PM
Points: 7,125, Visits: 12,720
How are your C# or VB.net skills? I am thinking pre-processing the file might be one way. You could use a Script Task to read the file in reverse order until you found data that has already been imported. Depending on how much data you expect and how much memory you have available you could store the data as you reversed through the file and then write it in one shot when you found old data, or, you could not store the data as you reversed and just changed course to read the file in forward-order to the end when you found old data, writing a new file to disk as you went. Either way you would end up with a file that you could import straightaway using a Data Flow Task. It would be tedious but not terribly difficult if you have some .NET skills.

See the second example in this article to see how to read a file in reverse order. And the first example shows you how to write a file byte by byte. It is for .NET 4.0 but these techniques will work in 2.0, it's just that the 2.0 article does not have the nice example to show you how to read a file in reverse using Seek.
http://msdn.microsoft.com/en-us/library/system.io.filestream.seek(v=vs.100).aspx


__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Post #1440156
Posted Tuesday, April 9, 2013 12:46 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 1:35 AM
Points: 5,161, Visits: 12,004
opc.three (4/8/2013)
How are your C# or VB.net skills? I am thinking pre-processing the file might be one way. You could use a Script Task to read the file in reverse order until you found data that has already been imported. Depending on how much data you expect and how much memory you have available you could store the data as you reversed through the file and then write it in one shot when you found old data, or, you could not store the data as you reversed and just changed course to read the file in forward-order to the end when you found old data, writing a new file to disk as you went. Either way you would end up with a file that you could import straightaway using a Data Flow Task. It would be tedious but not terribly difficult if you have some .NET skills.

See the second example in this article to see how to read a file in reverse order. And the first example shows you how to write a file byte by byte. It is for .NET 4.0 but these techniques will work in 2.0, it's just that the 2.0 article does not have the nice example to show you how to read a file in reverse using Seek.
http://msdn.microsoft.com/en-us/library/system.io.filestream.seek(v=vs.100).aspx


Neat idea. Maybe using a Script Component source would mean that this could be done in one hit in a data flow (though I'm not sure how you would gracefully 'escape' from the data flow when the old records are reached).



Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this link.

When you ask a question (and please do ask a question: "My T-SQL does not work" just doesn't cut it), please provide enough information for us to understand its context.
Post #1440168
Posted Tuesday, April 9, 2013 10:08 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 7:06 PM
Points: 7,125, Visits: 12,720
Phil Parkin (4/9/2013)
opc.three (4/8/2013)
How are your C# or VB.net skills? I am thinking pre-processing the file might be one way. You could use a Script Task to read the file in reverse order until you found data that has already been imported. Depending on how much data you expect and how much memory you have available you could store the data as you reversed through the file and then write it in one shot when you found old data, or, you could not store the data as you reversed and just changed course to read the file in forward-order to the end when you found old data, writing a new file to disk as you went. Either way you would end up with a file that you could import straightaway using a Data Flow Task. It would be tedious but not terribly difficult if you have some .NET skills.

See the second example in this article to see how to read a file in reverse order. And the first example shows you how to write a file byte by byte. It is for .NET 4.0 but these techniques will work in 2.0, it's just that the 2.0 article does not have the nice example to show you how to read a file in reverse using Seek.
http://msdn.microsoft.com/en-us/library/system.io.filestream.seek(v=vs.100).aspx


Neat idea. Maybe using a Script Component source would mean that this could be done in one hit in a data flow (though I'm not sure how you would gracefully 'escape' from the data flow when the old records are reached).

Good thought. I think a Script Component setup as a Source could keep it all in one Data Flow, which might avoid an intermediate file. The only tradeoff I can think of would be having to manually parse the row in the .NET or maybe a Derived Column Transform if pushing a single-column row down. I have not used a Script Component Source in a long time but I think after we pushed all the new data onto the pipeline as new rows and found the first old data then we would just set "DTS task = Success" and the Data Flow would happily move on.


__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Post #1440446
Posted Wednesday, April 10, 2013 5:22 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Friday, October 10, 2014 4:13 AM
Points: 205, Visits: 1,339
You don't tell us how big the file is.
You might not be able to store it in the internal memory and read it reversed?
Anyhow you can read the file per line and write it to another file.
There must be something in the file which tells the date or year of a record.
If the file is sorted you can use a pretty simple and fast method like:


Imports System
Imports System.IO

Public Class testa
Public Shared Sub test()

Dim line As String
Dim reader As StreamReader = New StreamReader("C:\temp\input.txt")
Dim writer As StreamWriter = New StreamWriter("C:\temp\output.txt")

line = "start"

' Loop over each line in file, While line is Not Nothing.
Do While (Not line Is Nothing)
line = reader.ReadLine
If Mid(line, 1, 9) <= "something" Then ' Adjust
Else
writer.WriteLine(line)
End If

Loop

writer.Close()
reader.Close()

End Sub
End Class

Thanks
Gosta
Post #1440734
Posted Wednesday, April 10, 2013 5:25 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 7:06 PM
Points: 7,125, Visits: 12,720
Gosta Munktell (4/10/2013)
You don't tell us how big the file is.
You might not be able to store it in the internal memory and read it reversed?

You do not need to store it as you read through in in reverse, or you could, both options were mentioned above and size of file and available memory would likely dictate the choice.

Anyhow you can read the file per line and write it to another file.
There must be something in the file which tells the date or year of a record.
If the file is sorted you can use a pretty simple and fast method like:


Imports System
Imports System.IO

Public Class testa
Public Shared Sub test()

Dim line As String
Dim reader As StreamReader = New StreamReader("C:\temp\input.txt")
Dim writer As StreamWriter = New StreamWriter("C:\temp\output.txt")

line = "start"

' Loop over each line in file, While line is Not Nothing.
Do While (Not line Is Nothing)
line = reader.ReadLine
If Mid(line, 1, 9) <= "something" Then ' Adjust
Else
writer.WriteLine(line)
End If

Loop

writer.Close()
reader.Close()

End Sub
End Class

Thanks
Gosta

What do you think using a Flat-File Manager to read the file and then applying the Conditional Split in the Data Flow accomplishes?


__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Post #1440738
Posted Wednesday, April 10, 2013 5:51 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, August 1, 2014 8:41 AM
Points: 101, Visits: 437
Hi All,

Thanks for the replies.

As you say to keep it simple I use conditional split to filter the records from falt file and load it. But the proble is the flat file reads all the records.. I wanted to restrict the flat file to read only the reqd records and not the entire one..Probably I got to implete some logic within the script component to read olny the reqd data from flat file.
Post #1440749
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse