How to fetch PDF files from a folder and saving it in the database

  • I have a small project to be done in which I need to fetch the pdf file from a my system and save it in database and also fetch the name of it and save it in the database.

  • You could use a SQL CLR routine for file access like: https://nclsqlclrfile.codeplex.com/[/url]

    Please spell out the process in more detail.

    CEWII

  • See http://www.sommarskog.se/blobload.txt for a simple example.

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • You could also use SSIS with the Import Column transformation[/url]. Very easy to set-up and no need of writing code 😀

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • I want to try your method to sort large number of PDF files in the E-library.:-) So how are things going in your project? By the way, I know how to get PDF from my system to database but I do not know how to get the name of those PDF files.

    ranamrana (7/23/2013)


    I have a small project to be done in which I need to fetch the pdf file[/url] from a my system and save file[/url] in database and also fetch the name of it and save it in the database.

  • There are advantages and disadvantages to storing documents actually within the database.

    If you look just at performance, storing files under 1MB in size generally gives faster access to the file than storing it out of the database. Above 1MB in size then storing it outside is generally faster, due to the different ways in which SQL Serve and raw NTFS deal with data management.

    If you look at data integrity, you generally have far better control when the data is stored inside SQL Server, unless you can use SQL 2012 or above and File Table storage.

    File Table can give you the best of all worlds for associating documents with traditional database data. The SQL Server database engine knows about the documents that are stored in the file table repository, and the files are also accessible to non-SQL Server applications.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • To add to what Ed posted...

    I went through this about a year ago with telephone call recordings. I, too, was well aware of storing larger files on disk instead of the database. To wit, most of the calls were geater than 1MB and I was determined to get them out of my database.

    I'm not sure why they did it but they stored the file in both places and also stored the file path in the database. Being the diligent pendant that I am, I wrote some code to verify that all of the files existed and {drum role please} they did NOT all exist. That led me to the following conclusion.

    No one protects data better than a DBA and his/her backups. Files can be touched by too many people and the DBA has no control over when or even if those files get backed up. Despite the performance hit and the need to do partitioning to accomodate the growing backup requirements, I'll take the performance hit and all of the work to keep those call recordings in the database because it would appear (another drum roll, please} that's the only place where I can guarantee that they won't disappear or get lost.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I'm with Jeff!

    If you put everything in one corner, you know where everything is!

  • As Jeff and aaron have said, data integrity can be the main issue with document storage, regardless of what people may say about performance.

    With Filetable storage, I think you can prevent anything other than SQL Server from deleting or adding documents. Filetable objects are also included in SQL backup and restore, which means data integrity is maintained through these operations. This effectively unifies the management of the documents and meta data.

    If you design a system with the documents and meta data being stored in separately managed containers, you should not expect the two containers to remain consistent.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • Nah, the point with FileTable is exactly that it makes it possible to update the database from Explorer and similar. The difference between storing just a file path in the database, is that if you delete the file from Explorer, this is also reflected in the database, so get the consistency. And the files are included in the database backup.

    Now, users needs to have access to the share where the files are located, and if they don't have access the files cannot be manipulated. But if they do not, there is not much point in using FileTable at all, but you can use regular FILESTREAM instead.

    For FILESTREAM to be meaningful, you should use the Win32API to write/read files. If you read them with SQL statements, you should get the same performance as regular blobs.

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply