Downloading files for Import?

  • I need to automate downloading some IRS zip files from an IRS web site on a monthly basis to import into a SQL database and I'm wondering what the standard and/or best way to do this is. I am specifically asking about the automated donwloading, not the importing (I know how to do that).

    Surprisingly (to me anyway) I haven't actually done automated downloads for a SQL process in almost a decade, I've always gotten involved in these situations after a customer already had a way to automatically deliver the files to a share, folder, etc to help them with the Import.

    Ten years or so ago, the way I did it was to run a BAT file script that used some primitive FTP utility to download the file, and then PKZIP or something similar to unzip it. But even then, that seemed kludgy and archaic. Today it strikes me as downright antediluvian, and I can't help thinking that there must be a better/easier approach.

    How is this typically done? If there are multiple ways, is any better/easier/more advantageous for a SQL Import? I am open to suggestions, so anything that is straight from SQL, built-in tools/utilities, DOS commands, even .Net programming. I would also use SSIS if it has a way of doing this easily, but I am not using that for the Import, so there's no built-in synergy there.

    If you want to look at the smallest example of one of these files, its here (1.9MB Zip file). The IRS site to download all of these files is here: http://www.irs.gov/uac/SOI-Tax-Stats---Exempt-Organizations-Business-Master-File-Extract-(EO-BMF)

    Thanks, ...

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Thanks for the suggestion, arbarona. However, this is not quite what I am asking for.

    This link explains how to write a custom script in SSIS to use either the System.Net.WebClient object, or the SSIS HttpClientConnection objects to read the file from a website and write it back out to a Flat File. I suppose I could scavenge the VB/C# script there to write my own command line utility, but I'm going to hold that as a last resort since this seems messy and will require a lot of code-testing because 1:I'm not downloading a text file, but a ZIP of a text file, and 2:its just a demo snippet, not a finished solution which means that I'll have to develop all of the error-handling and exception conditions for it.

    I suppose that what I'm hoping for is a command-line tool that will just download and unzip it to a file or table, and handle/report any errors appropriately.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • {oops}

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • SSIS has an FTP object that can be used to automatically download the files. Won't work if it's FTPS, but will for FTP. Will that work for you? It can also call the shell and run an unzip command as if it were the command-line.

    I have a process that outputs, encrypts (via command-line), and then FTP uploads a bunch of files every night. Do the whole thing in a pretty simple SSIS package. Not the same direction as what you're doing, and encryption instead of unzipping, but it's just the same actions in the oposite direction.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I found this solution.

    I tested and works OK!, since the files are the same every month you can use variables to hold the file names.

    take at look

    http://www.sqlis.com/post/Downloading-a-file-over-HTTP-the-SSIS-way.aspx

    to unzip is easy

    Yo need winzip on the SSIS Machine.

    You need an Execute Process Task in your SSIS

    Configure as follow

    Executable: C:\Program Files (x86)\WinZip\WINZIP32.EXE

    Arguments:-e -o "FilefromIRS.zip"

    Working Directory: "UNCpath"

    I can email you a sample about the unzip.

    Regards

  • GSquared (9/11/2012)


    SSIS has an FTP object that can be used to automatically download the files. Won't work if it's FTPS, but will for FTP. Will that work for you? It can also call the shell and run an unzip command as if it were the command-line.

    I have a process that outputs, encrypts (via command-line), and then FTP uploads a bunch of files every night. Do the whole thing in a pretty simple SSIS package. Not the same direction as what you're doing, and encryption instead of unzipping, but it's just the same actions in the oposite direction.

    Thanks, Gus. I really don't want to jump this into SSIS for this unless its dead easy and reliable. This site does not already use SSIS, so I would have to go through the whole deal of setting it up for them, along with standards for logging, reporting, etc. Plus, I already have the SQL import procedures and structures almost done.

    I have been looking around today and it does seem like there are some much better command-line tools for downloading today than there were 10 years ago. Some (like CoreFTP) are native Windows (and not a crummy Unix port) and will do both the download and the unzip at the same time, all from a single DOS command line (avoiding all of that awful FTP conversation-script dreck from years ago). Plus they have a free version, so that's the way I'm leaning at the moment.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • RBarryYoung (9/11/2012)


    GSquared (9/11/2012)


    SSIS has an FTP object that can be used to automatically download the files. Won't work if it's FTPS, but will for FTP. Will that work for you? It can also call the shell and run an unzip command as if it were the command-line.

    I have a process that outputs, encrypts (via command-line), and then FTP uploads a bunch of files every night. Do the whole thing in a pretty simple SSIS package. Not the same direction as what you're doing, and encryption instead of unzipping, but it's just the same actions in the oposite direction.

    Thanks, Gus. I really don't want to jump this into SSIS for this unless its dead easy and reliable. This site does not already use SSIS, so I would have to go through the whole deal of setting it up for them, along with standards for logging, reporting, etc. Plus, I already have the SQL import procedures and structures almost done.

    I have been looking around today and it does seem like there are some much better command-line tools for downloading today than there were 10 years ago. Some (like CoreFTP) are native Windows (and not a crummy Unix port) and will do both the download and the unzip at the same time, all from a single DOS command line (avoiding all of that awful FTP conversation-script dreck from years ago). Plus they have a free version, so that's the way I'm leaning at the moment.

    Since the shop does not all ready have SSIS i would look to avoid that unless they're looking to upgrade their entire ETL process library. (which i did in my current job) when i have a process like this i tend to like purpose built steps so if something better comes along (better ftp client) i only have to change one step instead of an entire ssis package.


    For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

    For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

    Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
    Jeff Moden's Cross tab and Pivots Part 1[/url]
    Jeff Moden's Cross tab and Pivots Part 2[/url]

  • RBarryYoung (9/11/2012)


    GSquared (9/11/2012)


    SSIS has an FTP object that can be used to automatically download the files. Won't work if it's FTPS, but will for FTP. Will that work for you? It can also call the shell and run an unzip command as if it were the command-line.

    I have a process that outputs, encrypts (via command-line), and then FTP uploads a bunch of files every night. Do the whole thing in a pretty simple SSIS package. Not the same direction as what you're doing, and encryption instead of unzipping, but it's just the same actions in the oposite direction.

    Thanks, Gus. I really don't want to jump this into SSIS for this unless its dead easy and reliable. This site does not already use SSIS, so I would have to go through the whole deal of setting it up for them, along with standards for logging, reporting, etc. Plus, I already have the SQL import procedures and structures almost done.

    I have been looking around today and it does seem like there are some much better command-line tools for downloading today than there were 10 years ago. Some (like CoreFTP) are native Windows (and not a crummy Unix port) and will do both the download and the unzip at the same time, all from a single DOS command line (avoiding all of that awful FTP conversation-script dreck from years ago). Plus they have a free version, so that's the way I'm leaning at the moment.

    Well, I stand corrected. It turns out that CoreFTP is an even bigger pile of excrement than the previous FTP tools that I have used. Oh well, maybe I can get WGet to work...

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Just to clarify, you don't seem to be looking for an FTP client, you're looking for a client that will do an http download and extract a zip file?

    I'd personally just use a powershell script (you could write a .Net app, but powershell is at least more transparent to support after it's implemented).

    There's an article here with an example, it's only a few lines of very simple code:

    http://bartdesmet.net/blogs/bart/archive/2006/11/25/PowerShell-_2D00_-How-to-download-a-file_3F00_.aspx

    The required error handling is already in the DownloadFile method. Then you can either continue to extract the file in Powershell, or have a separate command line step to unzip.

  • At my employer, we have a couple of methods, depending on site in question. One method was written up by our developers and has a webservice call that goes out every day searching for the files. This method uses a table in SQL Server that we put in the file names, the host FTP server, and when we expect these files to be available.

    Another method we use is the SSIS / FTP method for jobs that we run on demand.

    RoboFTP is setup to deal with a lot of files, but that is a push-to-us methodology and not what you want. I don't know if RoboFTP can deal with pulling down files or not.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • RBarryYoung (9/11/2012)


    GSquared (9/11/2012)


    SSIS has an FTP object that can be used to automatically download the files. Won't work if it's FTPS, but will for FTP. Will that work for you? It can also call the shell and run an unzip command as if it were the command-line.

    I have a process that outputs, encrypts (via command-line), and then FTP uploads a bunch of files every night. Do the whole thing in a pretty simple SSIS package. Not the same direction as what you're doing, and encryption instead of unzipping, but it's just the same actions in the oposite direction.

    Thanks, Gus. I really don't want to jump this into SSIS for this unless its dead easy and reliable. This site does not already use SSIS, so I would have to go through the whole deal of setting it up for them, along with standards for logging, reporting, etc. Plus, I already have the SQL import procedures and structures almost done.

    I have been looking around today and it does seem like there are some much better command-line tools for downloading today than there were 10 years ago. Some (like CoreFTP) are native Windows (and not a crummy Unix port) and will do both the download and the unzip at the same time, all from a single DOS command line (avoiding all of that awful FTP conversation-script dreck from years ago). Plus they have a free version, so that's the way I'm leaning at the moment.

    I use CoreFTP all the time. Great tool.

    Edit: What problems are you running into with it?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Have you considered using Powershell? I'm presuming you have it on the server, so you could craft a Powershell script to grab the file(s), then unzip them. Once they're unzipped, you could then import them into SQL using whatever method you prefer.

    Below is a series of functions for Powershell for FTPing files around:

    http://gallery.technet.microsoft.com/scriptcenter/PowerShell-FTP-Client-db6fe0cb

    And this has some code for zipping and unzipping files in Powershell:

    http://blogs.msdn.com/b/daiken/archive/2007/02/12/compress-files-with-windows-powershell-then-package-a-windows-vista-sidebar-gadget.aspx

  • HowardW (9/12/2012)


    Just to clarify, you don't seem to be looking for an FTP client, you're looking for a client that will do an http download and extract a zip file?

    Yes, I think that's a good description, Howard. I don't think that I understood at the beginning that FTP alone was no longer sufficient for most of the downloads, because most sites only seem to support HTTP now. The unzip doesn't have to be built-in because I know how to do that separately, but it would be convenient.

    I'd personally just use a powershell script (you could write a .Net app, but powershell is at least more transparent to support after it's implemented).

    There's an article here with an example, it's only a few lines of very simple code:

    http://bartdesmet.net/blogs/bart/archive/2006/11/25/PowerShell-_2D00_-How-to-download-a-file_3F00_.aspx

    The required error handling is already in the DownloadFile method. Then you can either continue to extract the file in Powershell, or have a separate command line step to unzip.

    Thanks, Howard. I do want to learn Powershell, but I've been putting it off. I'll take a look at this link..

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Brandie Tarvin (9/12/2012)


    At my employer, we have a couple of methods, depending on site in question. One method was written up by our developers and has a webservice call that goes out every day searching for the files. This method uses a table in SQL Server that we put in the file names, the host FTP server, and when we expect these files to be available.

    ...

    Hmmm, I like the sound of that. Any possibilty that I could get a copy of the code??? .... :-D:-D:-D

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

Viewing 15 posts - 1 through 15 (of 35 total)

You must be logged in to reply to this topic. Login to reply