Bulk insert of large files (more then 450MB)

  • Hi everyone

    I have to populate a table with a pretty large text file and I'm happily using the very good BULK INSERT statement and this "import" procedure is one of many I have on my DB.

    A series of condition must be met before the parent procedure starts to call all the child procedures and one is (obviously) to check if all needed files are in the import directory.

    From time to time the copy of this big file takes more than usual and the procedure starts with this file still in use by the system (still being copied to the destination directory).

    Is there any technique that I can use to detect if the file is still in use without generating the error and delay the parent import start?

    I'm using SQL Server 2008

    Thanks a lot in advance

  • adminorama (9/11/2012)


    Hi everyone

    I have to populate a table with a pretty large text file and I'm happily using the very good BULK INSERT statement and this "import" procedure is one of many I have on my DB.

    A series of condition must be met before the parent procedure starts to call all the child procedures and one is (obviously) to check if all needed files are in the import directory.

    From time to time the copy of this big file takes more than usual and the procedure starts with this file still in use by the system (still being copied to the destination directory).

    Is there any technique that I can use to detect if the file is still in use without generating the error and delay the parent import start?

    I'm using SQL Server 2008

    Thanks a lot in advance

    Yes. Set a flag.

    Let the process that generates the offending file create a second - even empty - file after it completes creating the file your process need then your process has to check for the existence of the flag-file, if not there sleep for a couple of minutes and try again, if there, do whatever has to be done and delete the flag-file.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Thanks

    The problem is that I don't have any power over the file generation and getting the flag file from the client could take a lot of time.

    I was hoping for a SQL solution, but if there isn't any I'd better try the tecnique you suggested

  • Wandering around the web I found that OLE Automation could be a solution, so I want to share this

    I created a function that returns "0" if the file is free and "1" if it's used by another process or deas not exists

    CREATE FUNCTION fnc_IsFileInUse(@filename VARCHAR(1000))

    RETURNS BIT

    AS

    BEGIN

    DECLARE @FS INT

    DECLARE @OLEResult INT

    DECLARE @FileID INT

    DECLARE @source NVARCHAR(255)

    DECLARE @description NVARCHAR(255)

    DECLARE @flag BIT

    SET @source = 'Exist'

    SET @description = 'Exist'

    EXECUTE @OLEResult = sp_OACreate 'Scripting.FileSystemObject', @FS OUT

    IF @OLEResult <> 0

    BEGIN

    EXEC sp_OAGetErrorInfo NULL, @source OUTPUT, @description OUTPUT

    GOTO displayerror

    END

    EXECUTE @OLEResult = sp_OAMethod @FS, 'OpenTextFile', @FileID OUT, @filename, 1

    IF @OLEResult <> 0

    BEGIN

    EXEC sp_OAGetErrorInfo NULL, @source OUTPUT, @description OUTPUT

    GOTO displayerror

    END

    EXECUTE @OLEResult = sp_OADestroy @FileID

    EXECUTE @OLEResult = sp_OADestroy @FS

    displayerror:

    IF @source IS NULL

    AND @description IS NULL

    BEGIN

    SET @flag = 1

    END

    ELSE

    BEGIN

    SET @flag = 0

    END

    RETURN @flag

    END

    I've tested this function on SQL Server 2008 and SQL Server 2005, it needs OLE Automation to be enabled and the user permission on the included stored procedures, so this might not be a solution suitable for everyone

  • Nice reseach, congrats! Thank you for providing a working solution, I'm sure people will benefit from it.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Yes... at the beginnning of your proc, try to bulk insert the first row from the file into a single wide column on a dummy temp table using the "LASTROW" option set to "1". If you get an "Error 21" (just look for an error), then the file isn't ready. Then you can either exit or set a WAITFOR DELAY and try again. This keeps you from having to do things like turning on OLE Automation.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you too Jeff, this can be a good solution too!

  • adminorama (9/12/2012)


    Wandering around the web I found that OLE Automation could be a solution, so I want to share this

    I created a function that returns "0" if the file is free and "1" if it's used by another process or deas not exists

    CREATE FUNCTION fnc_IsFileInUse(@filename VARCHAR(1000))

    RETURNS BIT

    AS

    BEGIN

    DECLARE @FS INT

    DECLARE @OLEResult INT

    DECLARE @FileID INT

    DECLARE @source NVARCHAR(255)

    DECLARE @description NVARCHAR(255)

    DECLARE @flag BIT

    SET @source = 'Exist'

    SET @description = 'Exist'

    EXECUTE @OLEResult = sp_OACreate 'Scripting.FileSystemObject', @FS OUT

    IF @OLEResult <> 0

    BEGIN

    EXEC sp_OAGetErrorInfo NULL, @source OUTPUT, @description OUTPUT

    GOTO displayerror

    END

    EXECUTE @OLEResult = sp_OAMethod @FS, 'OpenTextFile', @FileID OUT, @filename, 1

    IF @OLEResult <> 0

    BEGIN

    EXEC sp_OAGetErrorInfo NULL, @source OUTPUT, @description OUTPUT

    GOTO displayerror

    END

    EXECUTE @OLEResult = sp_OADestroy @FileID

    EXECUTE @OLEResult = sp_OADestroy @FS

    displayerror:

    IF @source IS NULL

    AND @description IS NULL

    BEGIN

    SET @flag = 1

    END

    ELSE

    BEGIN

    SET @flag = 0

    END

    RETURN @flag

    END

    I've tested this function on SQL Server 2008 and SQL Server 2005, it needs OLE Automation to be enabled and the user permission on the included stored procedures, so this might not be a solution suitable for everyone

    To by pass OLE Automation which you would typically want to do anyway you can put code along these

    lines in a SQL CLR Function to test for the readiness of a file to be read:

    try

    {

    using (File.Open(filepath, FileMode.Open, FileAccess.Read, FileShare.None)) { }

    }

    catch (Exception)

    {

    return false;

    }

    return true;

  • OTF (9/13/2012)


    To by pass OLE Automation which you would typically want to do anyway you can put code along these

    lines in a SQL CLR Function to test for the readiness of a file to be read:

    try

    {

    using (File.Open(filepath, FileMode.Open, FileAccess.Read, FileShare.None)) { }

    }

    catch (Exception)

    {

    return false;

    }

    return true;

    Thanks for this approach, I tought about using a CLR function but in that case I would have to deal with CLR authorizations and enabling (pretty much the same to enablig OLE automation) and maintaining an external piece of code.

    Nonetheless it's a valid solution.

  • Hi Everyone

    I just found that in some cases the OLEAutomation check fails, giving me the "green light" when the file is still being copied

    I've gone through the CLR solution with a little twist, this peace of function also tell me if a file is nonexistent

    Here's the c# part

    using System;

    using System.Data.SqlTypes;

    using Microsoft.SqlServer.Server;

    using System.IO;

    namespace IsFileInUse

    {

    public class IsFileInUse

    {

    [SqlFunction()]

    public static SqlInt32 isFileInUse(string fullFileName)

    {

    SqlInt32 FILE_IS_FREE = 0;

    SqlInt32 FILE_IS_IN_USE = 1;

    SqlInt32 FILE_IS_MISSING = 2;

    SqlInt32 intResult = FILE_IS_FREE;

    try

    {

    using (File.Open(fullFileName, FileMode.Open, FileAccess.Read, FileShare.None)) { }

    }

    catch (Exception e)

    {

    if (e.GetType() == typeof(FileNotFoundException))

    {

    intResult = FILE_IS_MISSING;

    }

    else

    {

    intResult = FILE_IS_IN_USE;

    }

    }

    return intResult;

    }

    }

    }

    The SQl bit to register the assembly

    CREATE ASSEMBLY [IsFileInUse]

    FROM 'C:\IsFileInUse.dll'

    WITH PERMISSION_SET = EXTERNAL_ACCESS

    The sql bit to create the function

    CREATE FUNCTION fnc_IsFileInUse(@fullFileName NVARCHAR(MAX))

    RETURNS INT

    EXTERNAL NAME IsFileInUse.[IsFileInUse.IsFileInUse].isFileInUse

    And the example on how to use this function

    DECLARE @intResult INT

    SELECT @intResult = dbo.fnc_IsFileInUse('c:\bigSoonToBeImportedFile.txt')

    IF @intResult = 1

    BEGIN

    --report file in use

    END

    ELSE IF @intResult = 2

    BEGIN

    -- report file missing

    END

    ELSE

    BEGIN

    -- import the file

    END

    Obviously in order to register the assebly and to use it, CLR must be enabled and the right permissions must be set

    EXEC sp_configure 'clr enabled', 1

    GO

    RECONFIGURE WITH OVERRIDE

    GO

    EXEC sp_changedbowner 'sa'

    GO

    ALTER DATABASE databaseNameHere SET TRUSTWORTHY ON

    GO

    I'd like to thank everyone who helped me

  • Unfortunately BULK INSERT does not play nice sometimes so we have to apply a workaround. We cannot CATCH errors from BULK INSERT related to gaining exclusive access to the file straightaway. We must use a proc and a nested TRY/CATCH, i.e. two TRY/CATCH blocks and a context switch to catch one error.

    A variation of Jeff's solution is the one I would go with.

    I would just try loading the file straightaway and catch the exception from the inner proc call. If it is an exception about 'exclusive access' like this ...

    Cannot bulk load because the file "%s" could not be opened. Operating system error code 32(The process cannot access the file because it is being used by another process.).

    ... then wait 5 seconds and try again.

    The technique requires no additional features be enabled which helps you on the security and auditing side and while managing program flow using exceptions is generally frowned upon, checking for exclusive file access is actually one of the few scenarios where it is acceptable. How do you know if a file is ready unless you try opening it? And if you simply test for access and proceed if you get exclusive access, by the time you drop into your processing code something else could have grabbed exclusive access in between. Now, is it likely that someone will grab exclusive access after it initially your particular file becomes available? Maybe not in your scenario but it never hurts to use a known design pattern that simply works.

    Something like this:

    1. create new folder called C:\@

    2. save the text file attached to this post to C:\@3. Open the text file in Excel to block others from gaining exclusive access to the file.

    4. Run this SQL on a local instance:

    USE tempdb;

    GO

    IF EXISTS ( SELECT *

    FROM sys.objects

    WHERE object_id = OBJECT_ID(N'dbo.incoming_data')

    AND type IN (N'U') )

    DROP TABLE dbo.incoming_data;

    GO

    CREATE TABLE dbo.incoming_data (identifier VARCHAR(50));

    GO

    IF EXISTS ( SELECT *

    FROM sys.objects

    WHERE object_id = OBJECT_ID(N'dbo.load_data_inner')

    AND type IN (N'P', N'PC') )

    DROP PROCEDURE dbo.load_data_inner;

    GO

    CREATE PROC dbo.load_data_inner

    AS

    BEGIN

    SET NOCOUNT ON;

    TRUNCATE TABLE dbo.incoming_data;

    BEGIN TRY

    BULK INSERT dbo.incoming_data

    FROM 'c:\@\1.txt'

    WITH

    (

    FIELDTERMINATOR ='|',

    ROWTERMINATOR =''

    );

    END TRY

    BEGIN CATCH

    -- notice this code will not be hit when an exclusive access error occurs

    RAISERROR('dbo.load_data_inner CATCH',10,1) WITH NOWAIT;

    END CATCH

    END

    GO

    IF EXISTS ( SELECT *

    FROM sys.objects

    WHERE object_id = OBJECT_ID(N'dbo.load_data_outer')

    AND type IN (N'P', N'PC') )

    DROP PROCEDURE dbo.load_data_outer;

    GO

    CREATE PROC dbo.load_data_outer

    AS

    BEGIN

    SET NOCOUNT ON;

    WHILE 1 = 1

    BEGIN

    BEGIN TRY

    EXEC dbo.load_data_inner;

    BREAK;

    END TRY

    BEGIN CATCH

    RAISERROR('dbo.load_data_outer CATCH',10,1) WITH NOWAIT;

    DECLARE @error_number INT = ERROR_NUMBER(),

    @error_message NVARCHAR(2048)= ERROR_MESSAGE();

    RAISERROR('%d',10,1,@error_number) WITH NOWAIT;

    RAISERROR('%s',10,1,@error_message) WITH NOWAIT;

    IF (

    @error_number = 4861

    AND @error_message LIKE '%Operating system error code 32(The process cannot access the file because it is being used by another process.).%'

    )

    BEGIN

    RAISERROR('waiting for exclusive access...',10,1) WITH NOWAIT;

    WAITFOR DELAY '00:00:05';

    END

    ELSE

    BEGIN

    -- something we did not expect happened so bubble up the error, i.e. exit this proc

    RAISERROR( @error_message,11,1);

    END

    END CATCH

    END

    END

    GO

    EXEC dbo.load_data_outer

    GO

    SELECT *

    FROM dbo.incoming_data

    GO

    5. Switch to the messages Tab in the SSMS window to see what the code is doing.

    6. When you have had enough close the file from Excel and the code should succeed.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Thanks a lot for this powerful alternative, I cannot test it right now but makes perfect sense and I'll surely give it a try soon.

    I had a very complete overview of the problems thanks to your and other people's answer, this forum is so good!

    Thank you very much

  • opc.three (9/19/2012)


    I would just try loading the file straightaway and catch the exception from the inner proc call. If it is an exception about 'exclusive access' like this ...

    +1

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply