SSIS File Exists Over Range of Time

  • Certainly that works if the file does not show up within the allocated # of intervals, but what if the file is not the expected size?

    Frequently we get files that are not correct.

    Our tell tale sign that something has gone wrong at the processor is the file is 1k in size, no where near the 15 mb the file usually contains.

    So in addition to the file existing, I check the filesize, I set the lowest value to to the smallest file we have received in the last year, (Less a few mb). Sure it May send a false notification once in while, but its good to understand the when files are out of boundry if you will.

    But for simplicity that surely works.

    Ultimately I wouldn't want to log "Failure" for each retry simply because the file hasn't gotten there yet. I'd rather handle the exception so it exits gracefully in an expected situation.

  • Interesting view on building a "File watch" process. I have built similar solutions but I tend not to use VB to control the delay as I recall the CPU usage shoots up.

    Instead, I use the VB purely to check for the file and return true or false back to the package/ populate a variable.

    I also use this inside a "for loop container" and include a SQL task to process the delay (does not hit the CPU).

    The properties on the "for loop container" are then configured to increment up to a given point,

    i.e. n times the delay in the SQL Task

    and include checks on the EvalExpresion to drop out when the file is found (previously populated variable =true) or when the total delay has been exceeded.

    The work flows continuing from the container can then also check the previously populated variable and act accordingly.

    Simple modifications to the VB to use a variable to hold the file to watch, this package can be called from a parent package, and collecting parent variables, allows the same child package to be called multiple times, e.g. waiting for n number of files to be available.

    I am sure there are another dozen ways to achieve the same goal.

  • Yes, you are right. The Retry Attempts and Retry Interval option from agent job relys on the failure of package execution and it will record every failure of retry in log. So if the source file is existed but is still in the process of being written, the uploading package will be failed anyway. That just takes the step of the job waiting for a next retry and that's it. I think the file size is quite unreliable to be taken as judgement. 1K means an empty file (it's ok to me. that means no new record today), but how big the size means good file and will that be accurate? Actually, I developed a custom task component to check the file existence (similar code but in C##). It can be used in any package that I want to check the file existence. All I need to do is to pass the full path file name as the parameter to that component. That works prety well for me.

  • Great article! This is very useful including the additional tweaks other users added. These steps are working great for a package I created.

  • I believe you can use the FileInfo.OpenWrite() method to check if the file is avaialbe for opening to write (assuming that you will not be able to if something else is writing to it)

    Dim oLoadFile As FileInfo = New FileInfo(cFilePath)

    Dim lComplete As Boolean

    Complete = False

    ...

    'Loop until the file is openable (e.g. FTP or copy complete) or iTimeOut is reached

    While lComplete = False

    lComplete = True

    Try

    oLoadFile.OpenWrite()

    Catch

    lComplete = False

    System.Threading.Thread.Sleep(iCycleTime)

    End If

    i += iCycleTime

    If i > iTimeOut Then

    Exit While

    End Try

    End While

    ...etc

    If it ain't broke, don't fix it...

  • File.IsReadWrite() is a better approach, IMO

  • I have a similar situation, except the files we receive (approx 30 files) are sent more frequently. The source system is scheduled to send the files every :15 throughout the day at :00, :15, :30 and :45. The problem is, depending on the file size and load on the source server, the files can arrive anywhere in that 15 minute window.

    The expectation is that we try to stay as current as possible on this data, so we're checking for and processing updates every 5 minutes. Unfortunately, about 30% of the time, we end up grabbing a file as it's being written and end up with a partial file. We copy it from our FTP/File server to our SQL server for processing. I wish the senders could do a rename on the file after it's been written, but that's not currently an option. Since FTP doesn't lock the file, the Konesan's File Watcher task doesn't do us much good as it's able to open the file at any point as the file is being written.

    I'm contemplating a solution that checks the file sizes after we've robocopied the files to the SQL server to see if the file sizes match. If they do, great, if not, re-run the robocopy script and keep checking until it does.

    Curious what others might do in this situation?

  • michael.groh (3/31/2009)


    It always concerns me when I see an endless loop used as a timing device. It'd seem that having a number of packages running on a server, each using an endless loop as a file watcher, might be detrimental to performance.

    It's very easy to add a "Sleep" statement in the body of the loop to put the thread to sleep for a few seconds or minutes:

    System.Threading.Thread.Sleep(milliseconds)

    I've done the file watcher thing by putting the thread to sleep for 60000 milliseconds (one minute), and counting how many minutes have transpired before failing the task:

    Dim i As int

    While True

    If System.IO.File.Exists(Dts.Variables("FlatFileConnection").Value.ToString)Then

    Exit While

    Else

    System.Threading.Thread.Sleep(60000)

    i += 1

    'Exit after 240 minutes (4 hours):

    If i > 240 Then

    Exit While

    End If

    End If

    End While

    This is much simpler and more efficient code than using a DateTime variable and comparing it's value to Now(). The way the code in this article is written, Now() is called millions and millions of times in the body of the loop, while in the revised loop, the comparison is performed only once a minute, and, at most, 240 times. The revised loop achieves the same objective while requiring a lot fewer CPU cycles.

    I created a DTS and a SSIS Package to check for the existence of a backup file, rename and move the file to an archive folder and a task to delete the files after a given date. It needs to have a timer and when it finds the file it loads it into a Table. That was years ago.

    Does anyone have sample code to do this?

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply