SSIS File Exists Over Range of Time

  • Comments posted to this topic are about the item SSIS File Exists Over Range of Time

  • Great, simple example.

    There's a "FileWatcher" task at http://www.sqlis.com that is available for free.

    I have used it (SQL 2005), and love it (for what it does 🙂 ).

    This example, however, gives me some additional coding practices, which I continue to consume as I browse this excellent site.

  • It always concerns me when I see an endless loop used as a timing device. It'd seem that having a number of packages running on a server, each using an endless loop as a file watcher, might be detrimental to performance.

    It's very easy to add a "Sleep" statement in the body of the loop to put the thread to sleep for a few seconds or minutes:

    System.Threading.Thread.Sleep(milliseconds)

    I've done the file watcher thing by putting the thread to sleep for 60000 milliseconds (one minute), and counting how many minutes have transpired before failing the task:

    Dim i As int

    While True

    If System.IO.File.Exists(Dts.Variables("FlatFileConnection").Value.ToString)Then

    Exit While

    Else

    System.Threading.Thread.Sleep(60000)

    i += 1

    'Exit after 240 minutes (4 hours):

    If i > 240 Then

    Exit While

    End If

    End If

    End While

    This is much simpler and more efficient code than using a DateTime variable and comparing it's value to Now(). The way the code in this article is written, Now() is called millions and millions of times in the body of the loop, while in the revised loop, the comparison is performed only once a minute, and, at most, 240 times. The revised loop achieves the same objective while requiring a lot fewer CPU cycles.

  • There's also a tweak I would make to eliminate a (very small) chance of failure if the file is put out there literally at the last second. Change

    ...

    If Now() >= NewDT Then

    Dts.TaskResult = ScriptResults.Failure

    Else

    Dts.TaskResult = ScriptResults.Success

    End If

    ...

    to

    ...

    If System.IO.File.Exists(Dts.Variables("FlatFileConnection").Value.ToString) Then

    Dts.TaskResult = ScriptResults.Success

    Else

    Dts.TaskResult = ScriptResults.Failure

    End If

    ...

    This way, the success is dependant on the real thing you want to check (i.e, the existence of the file). Otherwise, if the file arrives at the last second or some other performance issue causes the task to run slowly, you may get a failure even with the file out there.

  • Very good points and I like the ideas. Thanks for the input. I will try that out and that should be a bigger improvement witht he items I handle.

  • sknox (3/31/2009)


    There's also a tweak I would make to eliminate a (very small) chance of failure if the file is put out there literally at the last second. Change

    ...

    If Now() >= NewDT Then

    Dts.TaskResult = ScriptResults.Failure

    Else

    Dts.TaskResult = ScriptResults.Success

    End If

    ...

    to

    ...

    If System.IO.File.Exists(Dts.Variables("FlatFileConnection").Value.ToString) Then

    Dts.TaskResult = ScriptResults.Success

    Else

    Dts.TaskResult = ScriptResults.Failure

    End If

    ...

    This way, the success is dependant on the real thing you want to check (i.e, the existence of the file). Otherwise, if the file arrives at the last second or some other performance issue causes the task to run slowly, you may get a failure even with the file out there.

    The only thing is that I dont want it to keep checking past a certain amount of time, so in your case wouldnt the job keep going until the file is found?

  • Does File.Exists Evaluate to true even if the file is still copying?

    If a file is large it can take several minutes to copy, and if it picks it up too soon The file will not be read properly.

    The Filewatcher task noted above does a good job of managing this for you.

  • That's a great question. I'm really not sure if the file "exists" while it's being copied to a file server. Our packages work with a lot of really big (>100 megs) files, and FTP'ing them from remote sites can take several minutes.

    It's require a bit more complicated code, but it should be possible to write the code in such a way that the file watcher monitors the file for a minute or so to see if the file's size is changing (I'm assuming a large file "grows" as it's downloaded).

    The System.IO.FileInfo has a Length property that returns the size (in bytes) of a file. Once the file is 'there', get the size, wait a minute or two, and compare the size with the initial value.

    The System.IO.File class that Silver uses in the article has a number of methods that could also be useful: GetCreationTime, GetLastAccessTime, and GetLastWriteTime. I'm not sure which of these change as a file is being written, but I'd bet at least one of them ought to be useful.

    Once it (meaning: size, time, etc.) stops changing, it should be safe to assume the file is stable and ready for processing.

  • sknox (3/31/2009)


    There's also a tweak I would make .....

    ...

    If System.IO.File.Exists(Dts.Variables("FlatFileConnection").Value.ToString) Then

    Dts.TaskResult = ScriptResults.Success

    Else

    Dts.TaskResult = ScriptResults.Failure

    End If

    ...

    That's a real nice upgrade to any package. Seriously, most SSIS package run unattended, and anything we (as developers) can do to enhance the reliability of our packages -- and, reduce support calls at the same time! -- is a real good idea! Sometimes a line or two of addition code can eliminate a lot of trouble later on!

  • I think the comments above, regarding the sleep command is required, and so is the file check logic after the loop.

    Suggestion 3.

    The script putting the file onto the server should copy to a separate folder and after the copy completes, should use a 'rename' command run on the target server to 'move' the file into the target folder after file copy.

    It is unlikely for a Rename command to be swapped out during execution, when ssis goes to pickup the file it will be completely written.

  • It seems like another issue to consider is what if the file exists but is still in the process of being written, such as a file that is FTP'd from another system. In cases like that, you probably want to check not only that the file exists but that it can also be opened, using File.Exists in combination with File.OpenRead.

  • Truckin (3/31/2009)


    Suggestion 3.

    The script putting the file onto the server should copy to a separate folder and after the copy completes, should use a 'rename' command run on the target server to 'move' the file into the target folder after file copy.

    It is unlikely for a Rename command to be swapped out during execution, when ssis goes to pickup the file it will be completely written.

    I agree with the above statement 100 percent. The safest way to work with large data files is to copy them to a staging directory and then execute a move to the directory being polled by the package. I have never once had a contention or timing issue processing files in this manner and doing so removes the need for most of the other overhead.

  • Agreed. I used this exact approach of moving from a stage directory to an incoming directory in a recent J2EE implementation. (As long as the directories are on the same disk partition.) However, in the case of a file being "pushed" to another system such as via FTP, something must monitor the file and know when it has been completely transferred before performing the move or errors can result. I guess my overall statement is be sure to consider the transfer latency SOMEWHERE in your architecture. 🙂

  • V-Man (3/31/2009)


    ... guess my overall statement is be sure to consider the transfer latency SOMEWHERE in your architecture. 🙂

    For sure! The good news in all of this is that the .Net base classes provide the tools needed to perform these validations. At work we've had numerous cases where the original developer used only small test cases, and their packages failed after weeks or months of running because of those "transfer latencies" that are bound to occur eventually. And, very often these failures look like FTP or network or file server issues, and not something as simple as overly long file transfer times.

    Good discussion on this short article. I've enjoyed reading input from others who've had to deal with stuff like this.

  • This is great, but I think there is another solution may be even simpler and with no coding required. I have a set of csv files need to be uploaded to our production database every week day. I create a set of packages to load these files respectively and run these packages in a scheduled SQL Server Agent job step by step. At the first step, the job will run the first package to load the first csv file on schedule, on Job Properties Advanced page, I set the Retry Attempts and Retry Interval options to cover an expected range of time. For example, you can set Retry Attempts to 4 and Retry Interval 30 minutes to cover two hours. So, if the csv file is not at the location on schedule and the first step of the scheduled job will be hold and continue retry 4 times within the designed two hours.

    Charley lou, MCSD

    DBA

    GMP Securities L.P.

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply