bulk insert vs openrowset

  • I intend to import csv files to SQL Server. Which would be a better option: bulk insert or openrowset. Would anyone be able to elaborate what are the disadvantages and advantages of each?

    We have a current implementation wherein we use bulk insert to load files to SQL Server via temp tables before loading them to their actual tables. We do this because the actual tables have other fields that are system-defined rather than coming from the file. We intend to improve this to make the load faster but not sacrificing flexibility to import files of different formats. One way is using openrowset bulk to bypass loading to temp tables and having format files to define the fields. Isn't this the same as using bulk insert?

  • I'm a big fan of SSIS (SQL Server Integration Services). You'll find that you can do a lot more with the data in terms of cleansing and logging exactly what happened using this fuctionality than the features you are currently thinking of using.

    To create a basic SSIS package you can use the Import/Export data wizard. Save the package and then make it as complicated as it needs to be to accomplish your task. For example you could have it watching a directory and so it will pick up the file as soon as it is entered into the directory, thereby producing the data to the users in the quickest possible time - as opposed to a polling interval for a scheduled job.

    Rich

    Hope this helps,
    Rich

    [p]
    [/p]

  • BULK INSERT only really works if you're doing a raw data dump from the CSV file into a table that has the exact same matching structure as the CSV file.

    OPENROWSET allows you to effectively query off of the csv file so you can write it with selecting from specific columns in the CSV file and also include a WHERE clause. OPENROWSET can be used with a FORMATFILE that will tell SQL Server the datatypes, lengths, and column names, which can be useful if you don't want all of the data to be imported or imported in a specific way, giving you more control over what you import.

  • I'm also a big fan of SSIS. The only worrying thing is that once it's deployed and goes live in production and then something goes wrong, the only way I know to debug is by opening the package via BIDS. Our DBAs are not skilled enough to do this and would require assistance from developers. It's not straightforward to apply a fix and redeploy as well unlike when it's implemented as a stored proc. The package is scheduled to run nightly and getting hold of the developer will only come in the morning. Besides this, I'm not sure of other ways to get the package up and running sooner.

  • <snip>

    For example you could have it watching a directory and so it will pick up the file as soon as it is entered into the directory, thereby producing the data to the users in the quickest possible time - as opposed to a polling interval for a scheduled job.

    Could you explain how you do that in some more detail, please? Is the SSIS package set to be in a perpetual loop until it finds the file? I wasn't aware SSIS had a built-in tool for monitoring folders.

    Thanks,

    Rich

  • I'm interested in this too. I would love to use this for log files.

  • There is a File Watcher Task available from www.SQLIS.com. Some people have reported memory leaks with it but I'm not sure if that's true or not. Another alternative is to write your own Windows service with VB/C#/whatever that will watch the folder and kick off your SSIS package when it finds a new file. I've done that before and I can say it's virtually no memory overhead.

    └> bt



    Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • Thanks for the link!

    Rich

  • Couldn't that be done too via a SQL Agent job that runs every say 30 minutes or 1 hour? Just periodically look to see if there is a file and if so, then execute the rest of the SSIS package.

  • This is 'a depends' answer.

    You can use either, if all the csv files are of the same format then you could BULK INSERT to a staging table before transfering to output tables, this is what I do.

    This is the code I use to transfer files, beware it uses a cursor.

    CREATE TABLE #files

    (

    rowid int IDENTITY(1,1),

    fname varchar(255) NULL

    )

    DECLARE@filename varchar(255),

    @count int,

    @sql nvarchar(500),

    @result int,

    @cmd nvarchar(4000)

    INSERT#files

    (fname)

    EXECUTE@result = master.dbo.xp_cmdshell 'DIR \\servername\folder /a-d /b'

    IF EXISTS (SELECT * FROM #files WHERE fname = 'File Not Found')

    BEGIN

    PRINT' No Files Found'

    RETURN 0

    END

    IF @result <> 0

    BEGIN

    PRINT'Command (DIR) Failed. Result = '+CAST(@result as varchar)

    RETURN 1

    END

    IF NOT EXISTS (SELECT fname FROM #files WHERE fname IS NOT NULL)

    BEGIN

    PRINT'Failed to Retrieve Files'

    RETURN 1

    END

    DECLARE filecursor CURSOR FOR

    SELECTfname

    FROM#files

    WHEREfname IS NOT NULL

    OPEN filecursor

    FETCH NEXT FROM filecursor INTO @filename

    WHILE @@FETCH_STATUS = 0

    BEGIN

    BULK INSERT .....

    FETCH NEXT FROM filecursor INTO @filename

    END

    CLOSE filecursor

    DEALLOCATE filecursor

    Far away is close at hand in the images of elsewhere.
    Anon.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply