saving PDF or word document to SQL server database

  • tomorrow I have just been given a trial period with a small company

    I have used MS Access with object field to store objects in, and have just stored the link to the file.

    With the job tomorrow, I have to save the pdf/word link to the database and make it so that the link can be changed. They said the file they want also saved as hexadecimal format? Is this a good thing to do.

    And if storing the whole file in the database would be a viable option, how would I do that?

    Please give me a complete example that I can hopefully follow to complete their task, that then I can at least have some idea of what steps need to be completed, as of course no 2 examples would be exactly the same with different names for different fields, servers etc.

  • This was removed by the editor as SPAM

  • With files you basically have two options.

    1.  put the file in the db

    2.  store the file on the disk and store the silename in the db.

    Storing files in a db has pros and cons.

    pros

    you get the security of the sql server

    All files are backed up when the db is backed up

    cons

    SQL server is optimized to serve data.  File servers are optimized for serving files

    Backups are slow because you are backing up all of the files everytime.  If you store the files on the disk, you can backup the db then do a backup of the folder of only changed files

    I hope this helps

  • here's a real basic example from vb6: hope it helps. you don't save the file in any format...you actually save the file, in binary format, within the database using the code below.

    to extract it, it's basically the same.

    dim Conn as ADODB.Connection

    dim RS as ADODB.RecordSet

    dim binObj as ADODB.Stream

    Conn.ConnectionString="Provider=SQLOLEDB;Persist Security Info=False;User ID=sa;Password=;Initial Catalog=khc405;Network Library=dbmssocn; Data Source=db1;"

    Conn.Open

    Set RS = New ADODB.Recordset

                                        sql = "SELECT * FROM SOMETABLE WHERE FILENAME='HOWTO.PDF'

                                        RS.Open sql, GLBcn, adOpenDynamic, adLockOptimistic

                                        If Not (RS.BOF And RS.eof) Then

                                            Set binObj = New ADODB.Stream

                                            binObj.Type = adTypeBinary

                                            binObj.Open

                                            binObj.LoadFromFile (App.Path & "\SomeFolder\" & ''HOWTO.PDF''

                                            RS!FILEDATA = binObj.Read

                                            RS!FileName ='HOWTO.PDF'

                                            RS.Update

                                            binObj.Close

                                            Set binObj = Nothing

         End If

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Can you give a vb6 example of how to extract the file?

  • here's an example; this is getting everything out of a table in a loop, and should give you the framework you are looking for:

    Private Function ExportBlobs()

        Dim ConnObj As Object

        Dim rsObj As Object

        Dim binObj As Object

        Dim SavePath As String

        Dim SQL As String

        On Error GoTo ExportBlobs_Error

       

        Set ConnObj = CreateObject("ADODB.Connection")

        Set rsObj = CreateObject("ADODB.Recordset")

        Set binObj = CreateObject("ADODB.Stream")

        ConnObj.ConnectionString = "Provider=SQLOLEDB;Persist Security Info=True;User ID=sa;Password=yourpass;Initial Catalog=webpictures;Network Library=dbmssocn; Data Source=myserver;"

        ConnObj.CursorLocation = adUseClient

        ConnObj.Open

       

        SQL = "SELECT PICTURENAME,PICTUREDATA FROM HDSPICTURES"

        SavePath = "C:\"

        Set rsObj = glbcn.Execute(SQL)

        Do While Not rsObj.eof

            Set binObj = CreateObject("ADODB.Stream")

            binObj.Type = adTypeBinary

            binObj.Open

            binObj.Write rsObj!PICTUREDATA

            binObj.SaveToFile SavePath & "\" & rsObj!PICTURENAME, adSaveCreateOverWrite

            binObj.Close

            Set binObj = Nothing

            rsObj.MoveNext

        Loop

        On Error GoTo 0

        Exit Function

       

    ExportBlobs_Error:

       

        MsgBox "Error " & Err.Number & " (" & Err.Description & ") in procedure ExportBlobs of Form frmTest"

    End Function

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Another pro: File storage in the database often takes less disk space, especially if files are small and disk size is large.  Since a disk cluster can only hold one file, the last cluster for a file often has wasted space in the cluster (e.g. if cluster size is 4k, and file is 1k, there is 3k unused space).  If you store the file in the database, you basically have all files in one physical file (the mdf), and the clusters are all filled, and pages in the database are more likely filled with the file data.

    Chris: I would also argue that your 1st point under your "cons".  If the the application designer is considering storing the files in the database, then the files are in fact "data", and should be stored in the database.  Files as data should be treated as data, and not left to the whims and vagaries of file servers (where files can "disappear", and security is more of a problem).  Having the files in the database make them protected by referential integrity rules, for instance, the "file" will not be deleted unless the related data row is deleted.

    I beleive (and design systems as such) that if files are data important to the system, protect them by storing them in the database.

    Hope this helps



    Mark

  • Your sample code is good. But it should be pointed out that once you have the file data in your Stream object (binObj), there are many options besides just writing it to files.  The stream can be written to disk files (as shown), or to other processes on the machine, or to HTTP protocol so that your local web browser can display the file (very handy for displaying pdf's), or to a communications stream.

    Hope this helps.



    Mark

  • ...or to other processes on the machine...

    Very interesting point! Would it then be possible to make the "document" behave like an OLE object? In other words: is it simple to extract a BLOB field (being a Word document for example) directly to Word so when clicking "save" in Word, the document is automatically updated in the field? And if so, what would be the example code?

    Thanks in advance!

  • While still searching for information about storing, retrieving and using documents in SQL Server, I stranded again on this very topic.

    A year later, I now wonder whether the above guestion is too hard to answer, was too stupid to ask or has being viewed too little. 😉

    Anyway, with Sql Server 2008 (Express Edition), maybe the new Filestream type shines another light on this matter?

    With a varbinary(max) Filestream column, it's possible to convert the data it holds to a WinAPI file handle. Would it anyhow be possible to create an (for example MS Word-) OLE-object from that file handle?

    I realize this may be posted in the wrong forum, but I didn't find a similar topic in the 2008 forum.

  • Why not use the IMAGE column ? You can assing a byte array to it to fill it with your data.

  • Just storing documents isn't the problem or question. It's the wish to open the document directly with an appropriate client like or as an OLE-object, without the trouble of making, maintaining and controlling a temporary file on the file system.

  • Hi There,

    I too, am debating on how to store word documents in 2005, there is a data type of "image" now, that can store Word, Excel, PDF and Visio documents. So, can someone explain the difference between an image data type and a varbinary(max) datatype.

    On another note, I do attend to agree with Ten Centuries about how to store the data.

    Thanks,

    Database Diva 😉

  • Hai Guys try this...

    Reading the File

    The files will be read into a File Stream and then the File Stream will be converted into byte array using BinaryReader in order to save into the database table.

    C#

    // Read the file and convert it to Byte Array

    string filePath = Server.MapPath("APP_DATA/TestDoc.docx");

    string filename = Path.GetFileName(filePath);

    FileStream fs = new FileStream(filePath, FileMode.Open, FileAccess.Read);

    BinaryReader br = new BinaryReader(fs);

    Byte[] bytes = br.ReadBytes((Int32)fs.Length);

    br.Close();

    fs.Close();

    --------------------

    //insert the file into database

    string strQuery = "insert into tblFiles(Name, ContentType, Data) values (@Name, @ContentType, @data)";

    SqlCommand cmd = new SqlCommand(strQuery);

    cmd.Parameters.Add("@Name", SqlDbType.VarChar).Value = filename;

    cmd.Parameters.Add("@ContentType", SqlDbType.VarChar).Value = "application/vnd.ms-word";

    cmd.Parameters.Add("@Data", SqlDbType.Binary).Value = bytes;

    InsertUpdateData(cmd);

    -------------------------

    string strQuery = "select Name, ContentType, Data from tblFiles where id=@id";

    SqlCommand cmd = new SqlCommand(strQuery);

    cmd.Parameters.Add("@id", SqlDbType.Int).Value = 1;

    DataTable dt = GetData(cmd);

    if (dt != null)

    {

    download(dt);

    }

    -----------------------------

    Download the File

    Here is the function which initiates the download of file. It basically reads the file contents into a Byte array and also gets the file name and the Content Type. Then it writes the bytes to the response using Response.BinaryWrite

    private void download (DataTable dt)

    {

    Byte[] bytes = (Byte[])dt.Rows[0]["Data"];

    Response.Buffer = true;

    Response.Charset = "";

    Response.Cache.SetCacheability(HttpCacheability.NoCache);

    Response.ContentType = dt.Rows[0]["ContentType"].ToString();

    Response.AddHeader("content-disposition", "attachment;filename="

    + dt.Rows[0]["Name"].ToString());

    Response.BinaryWrite(bytes);

    Response.Flush();

    Response.End();

    }

  • About my wish to open a document directly from field:

    I have given up hope, but I also have reasoned why not to try further (which I have...). For the dutch reading people among us: The reasoning. Plainly translated, it says:

    It's not the wish to prevent storing files in the normal file system, it's to prevent the necessity of managing temporary files. Normally, the downside of files in a database is the loss of performance due to growth, but with the new Filestream column type in SQL Server 2008, that downside is gone. The field type takes care of:

    - storing the files in the normal file system, outside the fysical database-file,

    - the ability to make full backups, including those files,

    - no loss of performance due to growth,

    - transactional capability (on NTFS drives).

    Well, with such a super modern Filestream field, you can obtain a filehandle, but I could not convert that handle to a valid file name for opening in an arbitrary application. What ís possible, is to open the file directly in an application, but with the following disadvantages:

    - a meaningless file name like 00000496-000002cb-007c,

    - you have to have the rights for that specific folder,

    - you are rumbling in the "database" directly.

    While the latter two are not unimportant, certain the first took my attention. At first sight, it looks like no problem to work in a file with a random name, but it will become problematic/unhandy when working on multiple files. The buttons on the task bar and the captions in the Alt-Tab window are now meaningless, which results in the necessity to read the document before determining that was not the document you was looking for. This disadvantage is the same with OLE-like files, so that wish is also gone.

    The next possibility is to work with Word-macro's (or a COM add-in), but not every file type supports (Office) macro's. So the one remaining solution still is to work with temporary files in combination with a Windows Shell change notifier on that specific folder.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply