Binary Files Import and Export

  • Here is what I did.

    create table data(data varchar(max));

    INSERT INTO data(data)

    SELECT * FROM

    OPENROWSET(BULK 'D:\docs\DB\oracle.txt', SINGLE_BLOB) as f ;

    declare @cmd varchar(512);

    SET @cmd =

    'bcp "SELECT data FROM woof.dbo.data" queryout c:\1.txt -T -c -C 1251';

    EXEC master..xp_cmdshell @cmd;

    Even the Russian characters were preserved perfectly whithin the text file and no jarbled ones.

  • Errrrrr this line is too long. You could've shown me at least the code

    of the create table I would've believed you.

    I've just tried even to dump the mp3 and pdf files with this code

    I haven't noticed any corruption.

    Show me please the output of your

    xp_msver.

  • Errrrrr this line is too long. You could've shown me at least the code

    of the create table I would've believed you.

    I've just tried even to dump the mp3 and pdf files with this code

    I haven't noticed any corruption.

    Show me please the output of your

    xp_msver.

    OK - the Create table and insert is as follows:

    [highlight="#FFFFCC"]CREATE TABLE TestFiles (

    FileId bigint IDENTITY(1,1),

    FileDesc nvarchar(100),

    FileData varbinary(max)

    )

    INSERT INTO TestFiles(FileDesc, FileData)

    SELECT

    'This is my uploaded file',

    (SELECT * FROM OPENROWSET(BULK '\\NTSERVER3\Users\Robert\Data\Dump1.txt', SINGLE_BLOB) as f);[/highlight]

    When I bcp using your latest suggestion the output file contains ascii characters that are the hex representation of those ascii characters. So if the original file began with the characters 'pti', the output file begins '707469'. Hence the output file is exactly twice as long as the source file except that it has a carriage return line feed combination immediately after the strangely formatted data.

    Output from xp_msver is as follows:

    [highlight="#FFFFCC"]IndexNameInternal_ValueCharacter_Value

    1ProductNameNULLMicrosoft SQL Server

    2ProductVersion5898249.00.1399.06

    3Language1033English (United States)

    4PlatformNULLNT INTEL X86

    5CommentsNULLNT INTEL X86

    6CompanyNameNULLMicrosoft Corporation

    7FileDescriptionNULLSQL Server Windows NT

    8FileVersionNULL2005.090.1399.00

    9InternalNameNULLSQLSERVR

    10LegalCopyrightNULL© Microsoft Corp. All rights reserved.

    11LegalTrademarksNULLMicrosoft® is a registered trademark of Microsoft Corporation. Windows(TM) is a trademark of Microsoft Corporation

    12OriginalFilenameNULLSQLSERVR.EXE

    13PrivateBuildNULLNULL

    14SpecialBuild91684864NULL

    15WindowsVersion2483819575.2 (3790)

    16ProcessorCount44

    17ProcessorActiveMask150000000f

    18ProcessorType586PROCESSOR_INTEL_PENTIUM

    19PhysicalMemory81918191 (8588959744)

    20Product IDNULLNULL[/highlight]

    And yes I know I'm running this on a SQL installation that hasn't had the service packs installed yet.

  • Robert (8/29/2008)


    Errrrrr this line is too long. You could've shown me at least the code

    of the create table I would've believed you.

    I've just tried even to dump the mp3 and pdf files with this code

    I haven't noticed any corruption.

    Show me please the output of your

    xp_msver.

    OK - the Create table and insert is as follows:

    [highlight="#FFFFCC"]CREATE TABLE TestFiles (

    FileId bigint IDENTITY(1,1),

    FileDesc nvarchar(100),

    FileData varbinary(max)

    )[/highlight]

    First of all I would strongly reccomend you to use

    the varchar(max) type for the text files.

    [highlight="#FFFFCC"]INSERT INTO TestFiles(FileDesc, FileData)

    SELECT

    'This is my uploaded file',

    (SELECT * FROM OPENROWSET(BULK '\\NTSERVER3\Users\Robert\Data\Dump1.txt', SINGLE_BLOB) as f);[/highlight]

    Try to use (SELECT * FROM OPENROWSET(BULK '\\NTSERVER3\Users\Robert\Data\Dump1.txt', SINGLE_CLOB) in conjunction with the varchar(max)

    When I bcp using your latest suggestion the output file contains ascii characters that are the hex representation of those ascii characters. So if the original file began with the characters 'pti', the output file begins '707469'. Hence the output file is exactly twice as long as the source file except that it has a carriage return line feed combination immediately after the strangely formatted data.

    Ok looks like by default the bcp dumps the file using the UTF8 that what makes your

    file twice as big - for each character it takes two bytes although you have to use -w

    option in order to dump the text file using Unicode characters instead of -N for binaries.

    So try to use e.g.

    bcp "select field from yourdb.table" -T -c -C 1250.

    Output from xp_msver is as follows:

    [highlight="#FFFFCC"]IndexNameInternal_ValueCharacter_Value

    1ProductNameNULLMicrosoft SQL Server

    2ProductVersion5898249.00.1399.06

    3Language1033English (United States)

    4PlatformNULLNT INTEL X86

    5CommentsNULLNT INTEL X86

    6CompanyNameNULLMicrosoft Corporation

    7FileDescriptionNULLSQL Server Windows NT

    8FileVersionNULL2005.090.1399.00

    9InternalNameNULLSQLSERVR

    10LegalCopyrightNULL© Microsoft Corp. All rights reserved.

    11LegalTrademarksNULLMicrosoft® is a registered trademark of Microsoft Corporation. Windows(TM) is a trademark of Microsoft Corporation

    12OriginalFilenameNULLSQLSERVR.EXE

    13PrivateBuildNULLNULL

    14SpecialBuild91684864NULL

    15WindowsVersion2483819575.2 (3790)

    16ProcessorCount44

    17ProcessorActiveMask150000000f

    18ProcessorType586PROCESSOR_INTEL_PENTIUM

    19PhysicalMemory81918191 (8588959744)

    20Product IDNULLNULL[/highlight]

    And yes I know I'm running this on a SQL installation that hasn't had the service packs installed yet.

    Nasty 🙂

  • First of all I would strongly reccomend you to use the varchar(max) type for the text files.

    For the purposes of experimentation I did so but really I'm looking for a generic solution to store and retrieve any kind of file to and from a database.

    Try to use (SELECT * FROM OPENROWSET(BULK '\\NTSERVER3\Users\Robert\Data\Dump1.txt', SINGLE_CLOB) in conjunction with the varchar(max)

    I did this with the amended table structure.

    Ok looks like by default the bcp dumps the file using the UTF8 that what makes your

    file twice as big - for each character it takes two bytes although you have to use -w

    option in order to dump the text file using Unicode characters instead of -N for binaries.

    So try to use e.g.

    bcp "select field from yourdb.table" -T -c -C 1250.

    The output file was 2 bytes larger than the input file and those two bytes were an extra carriage return line feed combination on the end of the output file.

    I'm beginning to wish I'd never started this!

  • Another option is to use CLR, even though not everyone is a fan of that option. The SQL# library that is available at http://www.SQLsharp.com/ has functions to do this. Since those functions are not available in the free version, I can say that you can accomplish this by using the BinaryReader and BinaryWriter classes and the SqlBytes datatype to map to VARBINARY(MAX).

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Another option is to use CLR, even though not everyone is a fan of that option. The SQL# library that is available at http://www.SQLsharp.com/ has functions to do this. Since those functions are not available in the free version, I can say that you can accomplish this by using the BinaryReader and BinaryWriter classes and the SqlBytes datatype to map to VARBINARY(MAX).

    Thanks I'll try that.

    Regards

    Robert

  • Robert (8/28/2008)


    I'm also having trouble implementing this without corruption of the file after it's undergone a round trip. I am importing a text file using OPENROWSET and as far as I can see that arrives in the varbinary field perfectly. The query:

    SELECT DATALENGTH(FileData) FROM TestFiles

    returns a number exactly equal to the test file size.

    However when I output it back to a file using the options you specify for bcp, I get a file that is 4 bytes longer than the original. The 4 extra bytes are right at the beginning of the file before the expected data starts and are as follows (in hex):

    B5 42 00 00

    Any ideas?

    OK. Here is the answer for this I've done some research during the spare time.

    The research is called RTFM 🙂

    The article here called

    http://msdn.microsoft.com/en-us/library/ms191212.aspx

    The MSSQL2005-2008 create the 8 byte prefix at the beginning of the file

    during the bulk export. To avoid it - use the .fmt file.

    My bcp.fmt file looks like this:

    9.0

    1

    1 SQLBINARY 0 0 "" 1 data ""

    Then the bcp command :

    bcp "SELECT data FROM testdb.dbo.data WHERE file_id=1" queryout 0.dat -T -f bcp.fmt

    That would be the solution.

  • Great - that was indeed the answer EXCEPT that I had to have 8.0 at the top of the format file rather than 9.0 (my output file had a 4 byte prefix rather than 8 bytes).

    Thanks for your help.

    Robert

  • EDIT: Wow... This was boneheaded of me. I didn't see there were two other pages after the first, in which DE's questioned were eventually answered. Leaving the rest of my post here since I can't delete it outright...

    Apologies for the necroposting, but this thread came up a handful of times when I was googling around trying to find a solution to the very same problem.

    In short, DeafEater's problem is real, not imagined. I've had the same problem, and it doesn't matter whether I'm connecting to SQL Server 2000, 2005, or 2008. It doesn't matter what version of the client tools I'm using.

    In my case, I'm importing and exporting binary files, mostly ZIPs. I'm importing files into the table like so:

    CREATE TABLE FileStore (

    [id] [int] IDENTITY(1,1) NOT NULL,

    [filename] [varchar](255) NULL,

    [created] [datetime] NULL,

    [filedata] [varbinary](max) NULL,

    )

    insert FileStore (filename, created, filedata) select '$(_FILENAME)', GETDATE(), BulkColumn from Openrowset ( Bulk '$(_FILENAME)', Single_Blob) as filedata

    Originally, I exported ZIP files from the database with:

    bcp "select top 1 filedata from dbiutil.dbo.FileStore order by created desc" queryout test01.zip -S fssrv -T -n

    Which appeared to work fine, but test01.zip was always four bytes larger than it should be- and WinZip wasn't able to open it. Later, I figured out that Info-ZIP (zip.exe and unzip.exe) and 7zip (7za.exe) had no problems extracting the ZIP. However, Info-ZIP gives this warning:

    Archive: test0.zip

    warning [test0.zip]: 4 extra bytes at beginning or within zipfile

    (attempting to process anyway)

    creating: test01/

    inflating: test01/a

    inflating: test01/b

    Naturally, this stumped me. After running into a few forum threads here and elsewhere, I tried doing the BCP without the -n (native type) switch...

    c:\bcptesting>bcp "select top 1 filedata from dbiutil.dbo.FileStore order by created desc" queryout test02.zip -S fssrv -T

    Enter the file storage type of field filedata [image]:

    Enter prefix-length of field filedata [4]: 0

    Enter length of field filedata [0]:

    Enter field terminator [none]:

    Do you want to save this format information in a file? [Y/n] y

    Host filename [bcp.fmt]:

    Starting copy...

    1 rows copied.

    Network packet size (bytes): 4096

    Clock Time (ms.): total 6000

    This time, test01.zip had no extra bytes, and I was able ot open it in WinZip, Info-ZIP, and 7zip without errors or warnings. Indeed, the md5sum hash of the test02.zip that I exported with this method was the same as the original file.

    It should be obvious to anyone reading this that those four extra bytes is the "prefix-length of field filedata" mentioned above. By specifically telling bcp to use 0 things are imported and exported without any corruption or added bytes. Since running through this interactively is hardly what we want, just take the resulting bcp.fmt and use it for any future exports...

    c:\cbtesting>bcp "select top 1 filedata from dbiutil.dbo.FileStore order by created desc" queryout test03.zip -S fssrv -T -f bcp.fmt

    Starting copy...

    1 rows copied.

    Network packet size (bytes): 4096

    Clock Time (ms.): total 5922

    Again, test03.zip has the same md5sum hash as the original file and test02.zip.

    It's a bit tedious to have to ship an extra file like this, but at least there's some solution. It works the same way no matter what version of SQL Server you're using from 2000 on; same goes for the client tools. Microsoft forgot to add a parameter for bcp.exe for specifying the "prefix-length of field filedata," forcing you to run through the manual process once first.

    BCP doesn't have a bug, it's just poorly designed. 😛

    Aaron

  • Hey,

    I'm having the exact same problem and just posted a detailed description at MSDN forums yesterday [/url] Unfortunately, I haven't gotten any answers there yet.

    I found a posting this morning that suggested the problem is that the extra characters are because the BCP QUERYOUT utility is adding the file length at the beginning of the VARBINARY field contents before reading it in the external file. The solution is supposed to be creating format files, though I don't know if you need a different file for each file type (PDF, XLS, DOC, etc.) Here's the posting

    I'll post something back if I can get this to work.

    Good luck,

    Bill

  • Hi,

    I'm having the exact same problem. Here's my post on it over in MSDN forums [/url]

    I found some postings elsewhere that suggested the problem is the BCP QUERYOUT is adding the file length to the beginning of the contents--those are the extra characters [/url]

    Apparently the solution is to specify a format file during your output that stops the utility from adding the file length data. I'm going to play with that in the next day or two and will post back if I find something.

    Good luck,

    Bill

  • You should read the whole thread first because it has been answered years ago.

  • Hey, thanks. Yes, the interface was very confusing. I responded twice because I was Replying to the post at the bottom of Page 1 and it kept refreshing without showing my reply appended the post I was replying to. I will study the second and third pages carefully. Cheers, Bill

  • Off topic, but I've been playing with R on and off for the last 6 months and it's a lot better for this type of thing than SQL. You can write a function to read every file in a directory or one at a time and analyze them.

    Last week I even found it easier to use R to analyze some MSDB data than doing it through SQL

Viewing 15 posts - 16 through 29 (of 29 total)

You must be logged in to reply to this topic. Login to reply