Using Java script to import word and pdf into SQL server

  • pcq0125

    SSCrazy

    Points: 2875

    I am looking for example or use case codes  in using Java script to import Word document and pdf into SQL server.

    Thanks,

  • ZZartin

    SSC-Dedicated

    Points: 30385

    Do you mean you want to store the document as a varbinary in SQL Server and do so from javascript?

  • pcq0125

    SSCrazy

    Points: 2875

    That is correct. I want to store the content of the Word and PDF in vardinary into SQL server. Also, If can do some parsing using Java script as well. Please advise.

     

    Thanks.

  • ZZartin

    SSC-Dedicated

    Points: 30385

    pcq0125 wrote:

    That is correct. I want to store the content of the Word and PDF in vardinary into SQL server. Also, If can do some parsing using Java script as well. Please advise.

    Thanks.

     

    Hmm... I guess I would have to ask why are you trying to store the documents in SQL Server?  It's not an ideal file store.  And why are you using Java script instead of a language that might interface with SQL Server better like powershell?

  • pcq0125

    SSCrazy

    Points: 2875

    Thanks for your response. The purpose of the word document import into SQL server is to parse the data into a table and export into excel. The current framework was already done in Java script by the previous developer which I have not used Java to do such integration. Thus I need some advice.

     

     

  • Erland Sommarskog

    SSC-Insane

    Points: 23828

    I guess that pcq0125 want to use Javascript, because he/she is in a browser. You rarely run Powershell from a browser...

    I don't do Javascript, so I cannot offer any examples. But there are two alternatives for storing documents in SQL Server: a plain varbinary(MAX) or FILESTREAM. In the first case, it is not different from inserting any other value. You pass the value as a binary parameter. As long as the documents as small, below 1 MB, this method works well. FILESTREAM is good for larger documents, as you can load and retrieves the document faster if you use the OpenSqlFilestream interface. But how you would call it from Javascript, I don't know.

    By the way, I don't see any problems with storing documents in a database. If you store the documents outside the database, and only store directory paths in the database, you have a challenge to uphold the integrity of the data.

     

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • pcq0125

    SSCrazy

    Points: 2875

    Thank you Erland for your advice. I have not picked up the project yet but this is the first time I heard people who use Java script to import word and pdf to SQL server. I am still looking for advice. Thanks,

  • Jeff Moden

    SSC Guru

    Points: 995126

    pcq0125 wrote:

    Thanks for your response. The purpose of the word document import into SQL server is to parse the data into a table and export into excel. The current framework was already done in Java script by the previous developer which I have not used Java to do such integration. Thus I need some advice.

    SQL Server is the wrong tool to parse either WORD or PDFs whether you're exporting them to Excel or not.   So is Java.

    There are COTS programs that do this much better than most of us could (they usually come with OCR programs, which is an added bonus).  Spend the money on the purchase... it'll be much cheaper in the long run and the code will likely run better because it's been tested over years by thousands of customers.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

  • Erland Sommarskog

    SSC-Insane

    Points: 23828

    Jeff is right on the money. Parsing PDFs or Word documents in SQL is so completely alien that it did even occur to me that you may be considering that. I assumed that the question was only about storing them in SQL Server.

    (To be exact: there is one situation where such documents may be parsed in SQL Server, and that is if you full-text the column. In that case the full-text component has filters that understand these formats. But that is not code you right yourself; all you to is to set up full-text and specify which filters you want.)

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • pcq0125

    SSCrazy

    Points: 2875

    Hi Erland,

    I like to give an example with the link below under the section "CURRENT ASSET LINK" supposed to be a Word document. How can I parse the data First Name, Last, SSN and so forth using the full-text component that has filters as you mentioned. Any examples?

    Thanks

    https://eforms.com/estate-planning/current-assets-list/

  • Erland Sommarskog

    SSC-Insane

    Points: 23828

    You can't. These filters exists only for a specific purpose and you have no access to them. More precisely, they extract all words from the document, so that you search for documents that includes the word "weekend" or whatever you may want to search for.

    If you want to parse out the contents of a word document or a PDF, that is not a question for an SQL Server forum, because that is not a task which is possible to implement in SQL Server in any practical way.

    It is worth to point out that SQL Server is not a general-purpose programming environment, but a specialised environment to handle large volumes of data.

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • pcq0125

    SSCrazy

    Points: 2875

    My thought is to import the Word and PDF into a string vardinary(max)  or  nvarchar(max) data type and run data iteration trim and replace and insert into the allocated table field using SQL. For example,search for First Name value and insert into First Name field. Back to my original question how can this be done in Java or Java script?

    Thanks

  • Erland Sommarskog

    SSC-Insane

    Points: 23828

    The data type to use for Word or PDF documents is varbinary(MAX), nothing else. While you see text when you look at them in specialised editors (that is, Words and Acrobat), the files are entirely binary. .docx files are actually zip archives, by the way.

    How to parse such documents from Javascript is obviously not a question you can expect to get a fully good answer for in an SQL Server forums. But as Jeff says, you would use libraries that understand these file formats. You would never to the parsing from scratch.

    I would suggest that you consult Google for some ideas. I did a quick search on parse word document javascript and I got back hits both at Stackoverflow (that will give you concrete answers) and Github (which may be useful libraries).

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • pcq0125

    SSCrazy

    Points: 2875

    Thanks Erland.

Viewing 14 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply