Best Practice for Importing a Text File??

  • I'm building an application using vb6 and sql server 7. Users will load a text file into sql server. There are several methods available, I'm looking for the best solution.

    1. I could call either bcp or type from xp_cmdshell. Sticky part is, security. I don't want my users to be SA and bulk inserts and xp_cmdshell require the user to be SA.

    2. Build a componant that will live in MTS / COM+ to authenticate users. Once a user is authenticated, the COM+ object will log into SQL server as SA and call bcp via xp_cmdshell.

    3. Same as above using SQL DMO in place of xp_cmdshell and bcp. My guess is SQL Server gets the same command if I use DMO to build the command line or simply execute the xp_cmdshell from my connection object, but I'm not sure.

    4. Build a DTS package and use the dtsrun command.

    To complicate matters just a bit more, this is an interactive, multi-user application. I don't know how to bcp into a temp table (or you can't). If I create the table in sql, I'm left using sp_executesql to run dynamic queries in the SP. I *hate* doing that! It defeats the purpose of having a stored procedure. Anyway, if theres a way to bcp into a #table, please let me know.

    IMHO either there is something I'm not considering, or #2 is the answer.

    Thanks for you time. I look forward to you suggestions.

    John

  • hmmm... interesting problem..

    I like and have used may times option 4). the package can ve secured and you can run it any time via DTSRUN, or, batch it up and load at another point in time.

    I gather the application is authenticating the end-user from how i read it? if so, method 2) is also not a bad method. Trick is changing SA password will impact your code. Alternatively, use windows authentication.

    Why not BCP into a generic table what includes some unique identifier (like batch#) to identity the imported data at a later point in the program. This assumes the same structure file each and every time though.


    Chris Kempster
    www.chriskempster.com
    Author of "SQL Server Backup, Recovery & Troubleshooting"
    Author of "SQL Server 2k for the Oracle DBA"

  • You can let users other than SA have access to xp_cmdshell. Not a good idea, but it is an option. Bulk insert has it's own role in SQL2K, doesnt have to be a member of sysadmins for it to work.

    #2 would be ok, if worth the effort? I think you're right about #3, shelling is shelling. #4 is sound and fairly easy to maintain.

    I only partially agree about the use of dynamic sql and procs. Sure you lose the performance gain of having the sp compiled, but you still keep the advantage of having the code easily editable - you can swap the code for something better later on without breaking your app. Unless your data is the same structure everytime, I don't see any options here.

    Couple other ideas. One thing I use is a "pick up" folder. I run a job that checks for files with a certain extension, if found I run a DTS job against them. This means I can run the job using the agent account and not the user account. It also serializes server load, you dont have 18 people loading files at once. The other is a hack, but works - I needed to have a couple of my users be able to create db's, set up replication, a few other things. Rather than give them SA access directly, I built an app that does everything they need (just a couple textboxes and a go button for them), it logs on using an sa account and password that I lightly encrypted to discourage anyone browsing the exe. I also restricted the program use to those same people by checking their login up front.

    Go with #4 if you can.

    Andy

  • From the VB side you have the option of the .AppendChunk method from ADO. I don't use it per se, but I do have several web apps running a component we purchased that does (ASPUpload from Persits Software). Here's the Microsoft page giving a VB example:

    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/ado270/htm/mdmthappendchunkx.asp

    K. Brian Kelley

    bk@warpdrivedesign.org

    K. Brian Kelley
    @kbriankelley

  • I run bcp into a generic table:

    @sql = 'create table tempTable + @@spid

    (data varchar(1000))'

    sp_executesql @sql

    The whole text file gets dumped into the data field. How can I append the batchID in this step? From here, I do an insert into another table that adds the batch_ID and an identity column named Row_Number for sorting purposes.

    quote:


    hmmm... interesting problem..

    I like and have used may times option 4). the package can ve secured and you can run it any time via DTSRUN, or, batch it up and load at another point in time.

    I gather the application is authenticating the end-user from how i read it? if so, method 2) is also not a bad method. Trick is changing SA password will impact your code. Alternatively, use windows authentication.

    Why not BCP into a generic table what includes some unique identifier (like batch#) to identity the imported data at a later point in the program. This assumes the same structure file each and every time though.


  • quote:


    You can let users other than SA have access to xp_cmdshell. Not a good idea, but it is an option. Bulk insert has it's own role in SQL2K, doesnt have to be a member of sysadmins for it to work.

    #2 would be ok, if worth the effort? I think you're right about #3, shelling is shelling. #4 is sound and fairly easy to maintain.

    I only partially agree about the use of dynamic sql and procs. Sure you lose the performance gain of having the sp compiled, but you still keep the advantage of having the code easily editable - you can swap the code for something better later on without breaking your app. Unless your data is the same structure everytime, I don't see any options here.

    Couple other ideas. One thing I use is a "pick up" folder. I run a job that checks for files with a certain extension, if found I run a DTS job against them. This means I can run the job using the agent account and not the user account. It also serializes server load, you dont have 18 people loading files at once. The other is a hack, but works - I needed to have a couple of my users be able to create db's, set up replication, a few other things. Rather than give them SA access directly, I built an app that does everything they need (just a couple textboxes and a go button for them), it logs on using an sa account and password that I lightly encrypted to discourage anyone browsing the exe. I also restricted the program use to those same people by checking their login up front.

    Go with #4 if you can.

    Andy


    I think you're on to something with that "pick up" folder. Is there a way I could set up a job to check for files every, say, minute or so? That way instead of having to use the VB app to kick off the job, the user could just copy the file into the right directory (they have to do this anyway). Everything would run as agent which should give me the rights I need to xp_cmdshell and bcp.

    I like that idea! I swear, the more I learn about sql server, the less I need VB!

  • I think probably the easiest way would be use the file system object in your job, that would let you determine if there were files (that match your criteria), then pass that filename to your app, stored proc, whatever. You could also use fso directly in a sp, but more work. Jobs let you use VBS, so easy to use FSO there.

    Remember to think about disk permissions too!

    Andy

  • I do something like this too... I shell out and do a dir /b /o-d and dump the results into a table so I can look at them. It allows me to look at every file in the directory then pick the one I want. This only works in NT. Then I read the data in from that file and dump it to table. I also use this same trick to look at my backup directories and apply the last known good full all diffs and t-log incramentals up to a point in time for a restore.

    Wes

  • I'd recommend DTS and use the SQL Agent to run it every xx minutes.

    One thing to do when you do this is keep a table and when the job starts, check this table for a semaphore (flag). If one is set, then bail. You don't want to have this thing running twice.

    If there is no flag set, then set the flag in your package. From there, you can use the ActiveX and filesystemobject to get a list of all files and process them.

    Don't forget to lower the semaphore( reset the flag) when you end the package.

    Steve Jones

    steve@dkranch.net

  • New question along the same lines... I'm still importing a text file. Now the debate is over where the file should be during the import. My instincts tell me it should be on the sql box so it isn't being pulled across the network. My DBA's disagree but haven't convinced me. Can you?

  • Nothing wrong with either approach to me. I would normally try not to be doing a lot of extra disk access on the server, but this probably wouldnt amount to much. The only advantage I can see to copying it to the server is that you can then rule out network glitches once you start. Either way you have to deal with permissions.

    I do use a folder on the server to "drop" files for pickup by other members of our IS team, as many as 10-15 a day. No problems and is very convienient.

    Andy

  • I was more concerned with IO performance when bringing in large (40 meg) files. This was my DBA's take:

    <snip>

    SQL server boxes should only contain database files, database log files, and database backup files. By placing static files on the NT server running sql server you are running the risk of a drive filling up causing the database to fail. Also you are causing the disk controller and disk drive to add processing time which is taken away from the database which will cause performance problems. The files that need to be loaded should reside on a file server and be pulled into the database via a sql statement or dts job depending on how you design it. </snip>

    ??????????

  • 40m is not all that huge. Their points are fair. Not sure that its worth arguing about even! Two things to consider. One is that whether or not you copy the file to the disk, you're going to generate a fair amount of disk activity as you process it anyway, both in the db itself and in the transaction log, along with the associated processor and memory that will be required. The other is that you usually run jobs like this off peak when you have the capacity anyway.

    You could benchmark it of course, but getting a usable metric is tricky. Disk space is hardly worth talking about - why would you keep the file after you've processed it successfully? Those kinds of files SHOULD be archived onto a file server or dumped in folder for burn to CD or tape. As I mentioned above, I use a drop folder - typically around the middle of the month I archive all the files from the previous month into a zip, then delete the files. Zip stays on the server. Does it have to be on that server? No. Just works for me.

    My advice is that the difference between your way and their way isnt much, and isnt worth any loss of goodwill you might need later!

    Andy

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply