Using the Script Task in SSIS to Process Data Files When They Arrive

  • Comments posted to this topic are about the item Using the Script Task in SSIS to Process Data Files When They Arrive

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • This is my first post ever so yea herewe go... let me say thanks for such an amazing and complete step by step post.

    I just came across with this email in my mailbox, just when it is almost time for me to develop a solution like the one you posted. My doubts and limitations rise when im stopped for that medium-sized wall called c#, so i need to keep practicing.

    But i also have to achieve this using SSIS 2008 R2. So my question is, is it still possible?

    Cheers.

  • Excellent use of SSIS

    however I did spot a potential bottleneck, you could potentially get stuck in loop if the process that held the file open has halted or crashed or ended with a file lock still in place

    Consider the following as a replacement for your wait code, Hope this proves useful at times when things get stuck 🙂

    private void WaitForExclusiveAccess(FileInfo dataFile)

    {

    // local variable to say how many seconds to wait in between checking if we can gain

    // exclusive access to the found file

    int secondsToWaitBetweenAttempts = 5;

    // local variable to pass to events that require parameters be passed by ref

    bool fireAgain = true;

    int Loopcount = 1;

    int EndlessLoopCount = 10;

    //multiple of the interval between open attemtps ie 10 in this case will lapse after 60 seconds of trying to open the file

    // Loop indefinitely checking if we can access the data file.

    while (1 == 1) {

    if (Loopcount > EndlessLoopCount) {

    string strError = string.Format("failed to gain exclusive access to file {0} . Waited {1} seconds trying every {2} Seconds, Exiting package", foundFile.FullName, secondsToWaitBetweenAttempts * EndlessLoopCount, secondsToWaitBetweenAttempts);

    Dts.Events.FireInformation(0, null, strError, string.Empty, 0, true);

    throw new TimeoutException(strError);

    }

    try {

    // Attempt to gain access to the file.

    using (Stream stream = new FileStream(dataFile.FullName, FileMode.Open)) {

    // If we made it here no exception was thrown meaning we

    // could access the file. We will break out of the loop and allow

    // the rest of the package to continue processing.

    break; // TODO: might not be correct. Was : Exit Try

    }

    // We are not interested in ending the program when an IOException

    // occurs in this area. This type of exception means we could not

    // gain access to the file.

    // In general, programming algorithms that leverage exceptions for

    // control flow are frowned upon. However in the case of file access

    // it is an acceptable pattern.

    } catch (IOException generatedExceptionName) {

    }

    // raise an information event saying we could not gain exclusive access to the found file and will wait

    Dts.Events.FireInformation(0, null, "Could not gain exclusive access to file " + foundFile.FullName + ". Waiting " + secondsToWaitBetweenAttempts.ToString() + " seconds before trying again...", string.Empty, 0, fireAgain);

    // wait some time before checking whether the file can be used

    Threading.Thread.Sleep(secondsToWaitBetweenAttempts * 1000);

    Loopcount += 1;

    }

    }

    or if people have converted to VB

    Private Sub WaitForExclusiveAccess(ByVal dataFile As FileInfo)

    ' local variable to say how many seconds to wait in between checking if we can gain

    ' exclusive access to the found file

    Dim secondsToWaitBetweenAttempts As Integer = 5

    ' local variable to pass to events that require parameters be passed by ref

    ' EndlessLoopcount multiple of the interval between open attemtps ie 10 in this case will lapse after 60 seconds of trying to open the file

    Dim fireAgain As Boolean = True, Loopcount As Integer = 1, EndlessLoopCount As Integer = 10

    ' Loop indefinitely checking if we can access the data file.

    While 1 = 1

    If Loopcount > EndlessLoopCount Then

    Dim strError As String = String.Format("failed to gain exclusive access to file {0} . Waited {1} seconds trying every {2} Seconds, Exiting package", foundFile.FullName, secondsToWaitBetweenAttempts * EndlessLoopCount, secondsToWaitBetweenAttempts)

    Dts.Events.FireInformation(0, Nothing, strError, String.Empty, 0, True)

    Throw New TimeoutException(strError)

    End If

    Try

    ' Attempt to gain access to the file.

    Using stream As Stream = New FileStream(dataFile.FullName, FileMode.Open)

    ' If we made it here no exception was thrown meaning we

    ' could access the file. We will break out of the loop and allow

    ' the rest of the package to continue processing.

    Exit Try

    End Using

    ' We are not interested in ending the program when an IOException

    ' occurs in this area. This type of exception means we could not

    ' gain access to the file.

    ' In general, programming algorithms that leverage exceptions for

    ' control flow are frowned upon. However in the case of file access

    ' it is an acceptable pattern.

    Catch generatedExceptionName As IOException

    End Try

    ' raise an information event saying we could not gain exclusive access to the found file and will wait

    Dts.Events.FireInformation(0, Nothing, "Could not gain exclusive access to file " + foundFile.FullName + ". Waiting " + secondsToWaitBetweenAttempts.ToString() + " seconds before trying again...", String.Empty, 0, fireAgain)

    ' wait some time before checking whether the file can be used

    Threading.Thread.Sleep(secondsToWaitBetweenAttempts * 1000)

    Loopcount += 1

    End While

    End Sub

  • Yes SQL 2008 R2 SSIS can support this

    Looking under "Control Flow tools"

    you will find in order of use in this package

    1. Script Task

    2. File system Task

    3. Execute SQL Task (just truncate the staging table with an sql statement)

    4. Data flow Task

    5. File System Task

    that should do it

  • excellent and comprehensive article.

    I only have a couple of questions..

    first - if you where required to process multiple files per day, would you adapt this process, or would you use a process similiar to the others whereby the package is run every 5 minutes (and subsequently loaded and unloaded from memory)?

    secondly - I've not really covered c# before but it would appear to be the prefered language SQL2005 onwards. So could you recommend any useful resource where I could start to bring my skill in this area up to date

    thanks

    _____________________________________________________________________________MCITP: Business Intelligence Developer (2005)

  • This is a great idea, but in my particular case our sysadmin will not allow the dot net framework to be installed on his SQL Servers because of perceived security vulnerabilities. Is he right, or are there valid arguments I can make to have it installed?

  • zapallina (10/25/2012)


    This is my first post ever so yea herewe go... let me say thanks for such an amazing and complete step by step post.

    I just came across with this email in my mailbox, just when it is almost time for me to develop a solution like the one you posted. My doubts and limitations rise when im stopped for that medium-sized wall called c#, so i need to keep practicing.

    But i also have to achieve this using SSIS 2008 R2. So my question is, is it still possible?

    Cheers.

    Thank you for your kind comments. SSIS 2008 R2 uses a different version of the .NET Framework than does SSIS 2012 and while I have only tested this on SSIS 2012, to my knowledge I am not doing anything in the C# code that would prevent the technique from working for SSIS 2008 R2. That said, the C# Script Task development environment is a bit different in SSIS 2008 R2 so you may not be able to copy and paste the entire block of code from the article and paste into a SSIS 2008 R2 Script Task development window. You may need to be precise and only take the using statements you need plus the individual methods to fit them into the SSIS 2008 R2 Script task structure.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • andre.pageot (10/25/2012)


    Excellent use of SSIS

    however I did spot a potential bottleneck, you could potentially get stuck in loop if the process that held the file open has halted or crashed or ended with a file lock still in place

    Consider the following as a replacement for your wait code, Hope this proves useful at times when things get stuck 🙂

    We have to draw the line somewhere when producing code (especially for demonstration purposes) which by nature will leave some things 'out of scope' and this was one of those cases. Thanks for posting, you offered a very nice addition (and in two languages no less!).

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • dave-dj (10/25/2012)


    excellent and comprehensive article.

    I only have a couple of questions..

    first - if you where required to process multiple files per day, would you adapt this process, or would you use a process similiar to the others whereby the package is run every 5 minutes (and subsequently loaded and unloaded from memory)?

    If it were part of the requirement to process multiple files per day then you have some options. In keeping with the idea that we want our SSIS package to run continuosly, and not have to stop and start it every so often, I would look into putting the entire set of tasks into a For Loop with a condition of 'where 1=1'. This would effectively bring us back to the 'watcher' after processing a file. If there was a possibility for a file to arrive during the processing of a previous file, then you could consider setting WatcherInputFindExistingFiles to True to ensure that file were processed.

    secondly - I've not really covered c# before but it would appear to be the prefered language SQL2005 onwards. So could you recommend any useful resource where I could start to bring my skill in this area up to date

    SSIS 2005 only supports VB.net. Microsoft heard the calls from their customers though and added support for C# to SSIS 2008, which I was very happy about since my background was in C# and converting language skills back and forth was a nuisance.

    Here is a great set of resources I wish were around when I was getting started that will help you get started with C# as a programming language, using Visual Studio. Working with C# as a stand-alone language in Visual Studio is slightly different than working with it as a scripting language in SSIS, however the generic C# language skills learned from these lessons will immediately translate to working with C# in SSIS.

    Channel 9 > C# Fundamentals: Development for Absolute Beginners

    Thanks for reading, and posting!

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Stan Kulp-439977 (10/25/2012)


    This is a great idea, but in my particular case our sysadmin will not allow the dot net framework to be installed on his SQL Servers because of perceived security vulnerabilities. Is he right, or are there valid arguments I can make to have it installed?

    Well, your sysadmin may not realize it, but chances are the .NET Framework is already installed on the server hosting SQL Server. The Framework is pre-installed on all newer versions of Windows, and is required for the SQL Server installer to function since at least SQL Server 2008 (and maybe SQL Server 2005). I have heard rumors that once SQL Server is installed you can uninstall the .NET Framework, however that may lead to unintended consequences and I am not sure that is a supported configuration.

    As for security concerns, having the .NET Framework installed on a server is not typically considered a security threat in and of itself but anything you have installed on a server increases the attackable surface area that you need to protect against so your sysadmin is right to consider all possibilities.

    It's also possible your sysadmin may have been referring to the SQLCLR, and not necessarily the .NET Framework itself. While the .NET Framework is installed on the operating system, SQLCLR is an option that must be enabled within the database engine that leverages the .NET Framework. The SQLCLR feature is disabled by default within SQL Server and while it is not typically considered a security threat in and of itself, as I said, anything that is enabled (or installed) is worth investigating as a potential security threat to ensure your systems are properly secured.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Doh. 2008 is what I meant! 🙂

    Thanks for the resource guide though and also the suggested solution for multiple files.

    _____________________________________________________________________________MCITP: Business Intelligence Developer (2005)

  • nice work...any way you could apply this technique to a SharePoint list view? I use them frequently as a 'queue' for gathering information and then use SSIS (with the CodePlex SharePoint List Adapters) to process into SQL...it would be great to have a near-realtime processing technique like this!

  • Brandt Smith (10/25/2012)


    nice work...any way you could apply this technique to a SharePoint list view? I use them frequently as a 'queue' for gathering information and then use SSIS (with the CodePlex SharePoint List Adapters) to process into SQL...it would be great to have a near-realtime processing technique like this!

    Thanks. The real-time aspect of this solution relies on the FileSystemWatcher class that is included in the .NET Framework. I did some searching but did not find that anyone had created a SharePointListWatcher class, unfortunately. I am not a SharePoint developer but in searching I did learn that SharePoint Web Services allow us to access the items in a List from C#. Once you can gain access to the List from within a C# Script Task in SSIS then keeping track of what items you have processed so far and deciding when to process a List item as "new" is only a few lines of code away.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • thanks...I may fiddle around with that a bit...i guess traditionally the soln would be to write a SharePoint event receiver-but then you're customizing SP...I'd rather leave it OOTB as much as possible and use SSIS to be a kind of queue server to pick up requests...in my case from a SP extranet

    SP List Item = Request to do something in SQL

  • Thank you very much for the great article! This certainly goes into the "briefcase". I have one question, which you sort of answered in one of the other comments, but I would like to make sure. How can we make this SSIS package actually run continuously? You mentioned placing the entire code inside the loop WHILE 1=1. So guess we would create a job executing the package, start it and it would run forever. Are there other methods?

Viewing 15 posts - 1 through 15 (of 62 total)

You must be logged in to reply to this topic. Login to reply