how to design table for store document

  • I want to design table for storing document file.

    Which should be my design, any suggestion?

    1. store all content in column, so I can do full text search by sql server 2005

    2. store only path of file, but I have no ideat of how I can do full text search in the content of file

    Any help would be appreciated 🙂

  • this is a pretty regular debate. I usually fall back on business requirements to help decide.

    The one technical issue that does help decide is backups. If you store the document in the database when it gets backed up, the document gets backed up and the cost is larger databases. The benefits of storing a pointer is that the database itself is less bloated, but you have to worry about synchronized backups (and restores) with files on the file system. I usually fall on the side of a bigger database & a nice clean backup mechanism.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I would store the document in the database.

    I would put the table in question on its own filgroup.

    The disadvantages for storing only the link to the document in the filesystem:

    http://groups.google.com/group/microsoft.public.sqlserver.programming/browse_thread/thread/28cf7a9cf600daf0

    http://groups.google.se/group/microsoft.public.sqlserver.programming/browse_frm/thread/c5bfd673c1ee4524/455d6ec4c91a75d4?hl=sv&lnk=st&q=kurt+sune+image&rnum=18#455d6ec4c91a75d4

    From Steven Abbott:

    Not long ago, someone called with a request for advice about

    digitizing an important and massive 19th century manuscript

    archive. Google-aided research quickly convinced me that,

    contrary to popular wisdom, the problems associated with

    digitizing had nothing to do with scanners and storage.

    Information, not technology, is the problem. Just having a giant

    collection of a million images stored as Image1, Image2, and so

    on isn’t the answer. A historical archive needs to be an active

    database that scholars can access in all sorts of ways.

    Because the archive in question couldn’t feasibly be scanned

    with OCR to convert the image data to text data, the database

    would have to rely on the integrity of links between images and

    some sort of textual record of what each image was. For example,

    the textual database might say that Image99 is a letter from A to

    B about subject C, written on date D. Given a million (or so)

    images, it should be clear why the integrity of that textual

    database is crucial. Ideally, you’d want to make sure that each

    image stored sufficient data within itself that a workable

    database could be assembled from the images themselves...

    /m

  • forgot:

    Use the correct datatype: varchar(max), xml etx

    In SQL 2008 there will probably be a datatype of type file.

    /m

  • Thank you everyone,

    Now, I have another question. if I choose to store document in the database,

    I found that in sql server 2005 have image datatype and varchar(max)

    what the different between them?

    If my doucument is .doc and .pdf

    🙂

  • You should probably take a look at the binary data type. I'm not terribly familiar with its use, but I think that's what you need for storing .DOC files. VARCHAR(max) is for storing string data, not specific file structures.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I too am just weighing up text BLOBs against stored file locations, and it isn't a simple decision. But BOL says image is deprecated, and varbinary(max) preferred. (And I gather FILESTREAM will be new for 2008.)

  • Avoid Image, NText and Text datatypes where you can. They will soon no longer be with us. Not sure if it's SQL 2k8 that's taking them away or the version after that, though.

    Use Varchar(Max), NVarchar(Max) and Varbinary(MAX) instead. This way you don't have to worry about massive database changes the next time you upgrade your SQL Server edition.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Thank you for your inform

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply