Extract data from CSV file - SSIS

  • Hi all,

    I have a csv file with data from 2005...and on daily basis the data is getting appended in the csv file.. Im able to use a conditional split to include data for current 2 years and exclude the rest...

    But I just wanted to know if something can be done in the script task(control flow) itself, so that the flat file connection manager itself picks up the data for the current year only. I don wanna pass all the rows to the conditional split and exclude the rows...any help on this?

    Thanks

  • Why not archive the file (or the very old data in the file) rather than letting it grow?

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • archiving the file is not in my scope...the file is being generated by the third party vendor on a daily basis...im just accessing their back up csv files to feed data for my database.

    Is there by anyways to read the data from CSV file for current year or last 30 days or so...instead reading everything and using the split?

    thanks

  • How are your C# or VB.net skills? I am thinking pre-processing the file might be one way. You could use a Script Task to read the file in reverse order until you found data that has already been imported. Depending on how much data you expect and how much memory you have available you could store the data as you reversed through the file and then write it in one shot when you found old data, or, you could not store the data as you reversed and just changed course to read the file in forward-order to the end when you found old data, writing a new file to disk as you went. Either way you would end up with a file that you could import straightaway using a Data Flow Task. It would be tedious but not terribly difficult if you have some .NET skills.

    See the second example in this article to see how to read a file in reverse order. And the first example shows you how to write a file byte by byte. It is for .NET 4.0 but these techniques will work in 2.0, it's just that the 2.0 article does not have the nice example to show you how to read a file in reverse using Seek.

    http://msdn.microsoft.com/en-us/library/system.io.filestream.seek(v=vs.100).aspx

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (4/8/2013)


    How are your C# or VB.net skills? I am thinking pre-processing the file might be one way. You could use a Script Task to read the file in reverse order until you found data that has already been imported. Depending on how much data you expect and how much memory you have available you could store the data as you reversed through the file and then write it in one shot when you found old data, or, you could not store the data as you reversed and just changed course to read the file in forward-order to the end when you found old data, writing a new file to disk as you went. Either way you would end up with a file that you could import straightaway using a Data Flow Task. It would be tedious but not terribly difficult if you have some .NET skills.

    See the second example in this article to see how to read a file in reverse order. And the first example shows you how to write a file byte by byte. It is for .NET 4.0 but these techniques will work in 2.0, it's just that the 2.0 article does not have the nice example to show you how to read a file in reverse using Seek.

    http://msdn.microsoft.com/en-us/library/system.io.filestream.seek(v=vs.100).aspx

    Neat idea. Maybe using a Script Component source would mean that this could be done in one hit in a data flow (though I'm not sure how you would gracefully 'escape' from the data flow when the old records are reached).

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Phil Parkin (4/9/2013)


    opc.three (4/8/2013)


    How are your C# or VB.net skills? I am thinking pre-processing the file might be one way. You could use a Script Task to read the file in reverse order until you found data that has already been imported. Depending on how much data you expect and how much memory you have available you could store the data as you reversed through the file and then write it in one shot when you found old data, or, you could not store the data as you reversed and just changed course to read the file in forward-order to the end when you found old data, writing a new file to disk as you went. Either way you would end up with a file that you could import straightaway using a Data Flow Task. It would be tedious but not terribly difficult if you have some .NET skills.

    See the second example in this article to see how to read a file in reverse order. And the first example shows you how to write a file byte by byte. It is for .NET 4.0 but these techniques will work in 2.0, it's just that the 2.0 article does not have the nice example to show you how to read a file in reverse using Seek.

    http://msdn.microsoft.com/en-us/library/system.io.filestream.seek(v=vs.100).aspx

    Neat idea. Maybe using a Script Component source would mean that this could be done in one hit in a data flow (though I'm not sure how you would gracefully 'escape' from the data flow when the old records are reached).

    Good thought. I think a Script Component setup as a Source could keep it all in one Data Flow, which might avoid an intermediate file. The only tradeoff I can think of would be having to manually parse the row in the .NET or maybe a Derived Column Transform if pushing a single-column row down. I have not used a Script Component Source in a long time but I think after we pushed all the new data onto the pipeline as new rows and found the first old data then we would just set "DTS task = Success" and the Data Flow would happily move on.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • You don't tell us how big the file is.

    You might not be able to store it in the internal memory and read it reversed?

    Anyhow you can read the file per line and write it to another file.

    There must be something in the file which tells the date or year of a record.

    If the file is sorted you can use a pretty simple and fast method like:

    Imports System

    Imports System.IO

    Public Class testa

    Public Shared Sub test()

    Dim line As String

    Dim reader As StreamReader = New StreamReader("C:\temp\input.txt")

    Dim writer As StreamWriter = New StreamWriter("C:\temp\output.txt")

    line = "start"

    ' Loop over each line in file, While line is Not Nothing.

    Do While (Not line Is Nothing)

    line = reader.ReadLine

    If Mid(line, 1, 9) <= "something" Then ' Adjust

    Else

    writer.WriteLine(line)

    End If

    Loop

    writer.Close()

    reader.Close()

    End Sub

    End Class

    Thanks

    Gosta

  • Gosta Munktell (4/10/2013)


    You don't tell us how big the file is.

    You might not be able to store it in the internal memory and read it reversed?

    You do not need to store it as you read through in in reverse, or you could, both options were mentioned above and size of file and available memory would likely dictate the choice.

    Anyhow you can read the file per line and write it to another file.

    There must be something in the file which tells the date or year of a record.

    If the file is sorted you can use a pretty simple and fast method like:

    Imports System

    Imports System.IO

    Public Class testa

    Public Shared Sub test()

    Dim line As String

    Dim reader As StreamReader = New StreamReader("C:\temp\input.txt")

    Dim writer As StreamWriter = New StreamWriter("C:\temp\output.txt")

    line = "start"

    ' Loop over each line in file, While line is Not Nothing.

    Do While (Not line Is Nothing)

    line = reader.ReadLine

    If Mid(line, 1, 9) <= "something" Then ' Adjust

    Else

    writer.WriteLine(line)

    End If

    Loop

    writer.Close()

    reader.Close()

    End Sub

    End Class

    Thanks

    Gosta

    What do you think using a Flat-File Manager to read the file and then applying the Conditional Split in the Data Flow accomplishes? :Whistling:

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Hi All,

    Thanks for the replies.

    As you say to keep it simple I use conditional split to filter the records from falt file and load it. But the proble is the flat file reads all the records.. I wanted to restrict the flat file to read only the reqd records and not the entire one..Probably I got to implete some logic within the script component to read olny the reqd data from flat file.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply