SQLServerCentral Article

Rendering PDFs Natively in SQL

,

There are good reasons to generate reports from a reporting server like SSRS or from client-side applications. In general I want to minimize load on the central SQL server and allow as much processing as possible to be offloaded from the SQL server onto other machines. That said, sometimes I encounter a particular use case where generating a PDF document from SQL seems to make sense. One example I came across was in a couponing environment where the system needed to generate a serialized coupon for download from a web server.

Originally the PDF generation was being done on the web server that retrieved the data, including the serial number, from SQL. This worked OK, but i wanted to have tighter control over the creation of the coupon to discourage tampering with coupons, creation of forged coupons and the like. The actual issuance of the coupon happened within a SQL transaction, and resulted in the creation of a new Coupon Instance row that was tracked through the complete lifecycle of the coupon. Additionally, the template for the coupon was stored and managed within the SQL database.

Generating the actual PDF of that coupon based on this data was pretty light-weight, and encapsulating rendering of the coupon together with the related data routines in SQL seemed to make sense. In this way external untrusted or semi-trusted machines do not need to build the final PDF or touch individual coupon data elements.

Whether or not you agree with the architecture I selected, I want to share how I created a SQL CLR function to generate a PDF document within SQL.

First of all, I selected iTextSharp for the actual PDF manipulation. This is a quality port of iText, a quality and well-supported open source Java library for PDF generation and manipulation. iText is well-documented, with extensive free examples and tutorials online as well as available paid documentation. I placed the itextsharp.dll in my project folder.

I decided that I wanted my CLR function to return the PDF document in memory, without using the server's filesytem. I also decided that I wanted this function to accept two parameters: a) an XML parameter that can contain a list of text strings with X/Y positions and font size specifications, and b) a varbinary(MAX) parameter that can optionally contain an existing PDF document that will be used as a template, allowing the specified text to be overlayed on top of this template PDF.

I used Visual Studio 2010 to create a new "Visual C# SQL CLR Database Project". I added the required references, including the reference to the iTextSharp assembly and various system assemblies. Visual Studio is kind enough to template out a rough placeholder for the code. I edited this generated code to create a declaration like this:

public partial class Functions
{
   [Microsoft.SqlServer.Server.SqlFunction]
    public static SqlBytes RenderPDF(
        SqlBytes TemplatePDF,
        SqlXml FieldsXML
        )
    {
        // Put your code here
        //....
    }
}

These declarations indicate that we are creating a SQL UDF function that will return SqlBytes (the .NET equivalent of varbinary(MAX), and will accept the two parameters I described above.

Within this function, I added the following block:

using (MemoryStream PDFStream = new MemoryStream())
{
//....
}

The idea here is that I want to instantiate a new MemoryStream variable named PDFStream, but I want that instance to be disposed as soon as this block of code goes out of scope. This helps make sure that I don't tie up memory any longer than needed. I will use this MemoryStream to hold the PDF document that I render.

Within this block, I next created references to the font that I want my text to be rendered in. I considered allowing the font specification to be passed in as part of the XML specification: one day I may pursue that enhancement, but for now hard-coding to use Arial meets my needs.

BaseFont f_cn = BaseFont.CreateFont("c:\\windows\\fonts\\ARIAL.ttf", BaseFont.CP1252, BaseFont.NOT_EMBEDDED);

Next, I declared some local variables that the function uses internally. I could have created these with the "using" statement, but I find the nested "using" statements to be confusing and hard to read, so I decided to call .Dispose on my own as needed. Note that I could have (perhaps should have) wrapped things in a try/finally block to make sure that .Dispose gets called in the event of an error. Perhaps I am lazy, but I figured that if in the rare case of an unexpected error .Dispose didn't get called due to an exception that .NET's garbage collection would be good enough to clean up for me. (I might as well take advantage of the benefits of "managed" code at least a little bit, right?)

//for modfiying an existing PDF document
PdfReader reader = null;
PdfStamper stamper = null;
//for creating a new PDF document
Document doc1 = null;
PdfWriter writer = null;
//for directly writing to a PDF document
PdfContentByte canvas;

Next, I inspect the TemplatePDF parameter. If it NULL (meaning the caller did not provide an existing PDF document to use as a template), I need to create a new PDF document from scratch. Otherwise, I can manipulate the provided document.

Note that there is a difference between (TemplatePDF.IsNull) and (TemplatePDF == null): the first one checks to see if the SQL parameter was set to NULL, whereas the second one checks to see if the C# variable is null. We want to use the first check, not the second.

if (!TemplatePDF.IsNull)
{
  //modify existing PDF document
  reader = new PdfReader(TemplatePDF.Stream);
  stamper = new PdfStamper(reader, PDFStream);
  canvas = stamper.GetOverContent(1);
}
else
{
  //create new PDF document from scratch
  doc1 = new Document();
  writer = PdfWriter.GetInstance(doc1, PDFStream);
  doc1.Open();
  canvas = writer.DirectContent;
}

Next I inspect the FieldsXML parameter. If it is null (meaning the caller did not provide XML) then we have no text to overlay. This might be the case if the caller simply wanted the specified TemplatePDF document echoed back to them, though I am not sure why the caller would want such a thing.

If XML is provided in FieldsXML, I expect it to contain a document made up of a hierarchy like:

<Fields>
<Field>...</Field>
<Field>...</Field>
</Fields>

I then want to walk through each <Field> node, and look for elements of <TextValue>, <XPos>, <YPos> and <FontSize>. To walk through the XML I chose to use LINQ. I retrieve the text nodes of these elements and store the values in local variables to help make the code more readable. Then for each <Field> I set the font and position on the canvas and call canvas.ShowText to render the specified text.

if (!FieldsXML.IsNull) {
  string thisValue;
  int thisXPos;
  int thisYPos;
  int thisFontSize;
  canvas.SaveState();
  canvas.BeginText();
  //Process the fields specified in the FieldsXML parameter
  XmlReader fieldsXMLReader = FieldsXML.CreateReader();
  fieldsXMLReader.Read();
  fieldsXMLReader.ReadStartElement("Fields");
  while (fieldsXMLReader.Name == "Field")
  {
    XElement thisFieldNode = (XElement) XNode.ReadFrom(fieldsXMLReader);
    thisValue =
      thisFieldNode
      .Elements("TextValue")
      .Nodes()
      .OfType<XText>()
      .First()
      .Value;
    thisXPos = Convert.ToInt32(
      thisFieldNode
      .Elements("XPos")
      .Nodes()
      .OfType<XText>()
      .First()
      .Value);
    thisYPos = Convert.ToInt32(
      thisFieldNode
      .Elements("YPos")
      .Nodes()
      .OfType<XText>()
      .First()
      .Value);
    thisFontSize = Convert.ToInt32(
      thisFieldNode
      .Elements("FontSize")
      .Nodes()
      .OfType<XText>()
      .First()
      .Value);
    canvas.SetFontAndSize(f_cn, thisFontSize);
    canvas.SetTextMatrix(thisXPos, thisYPos);
    canvas.ShowText(thisValue);
  }

  fieldsXMLReader.ReadEndElement();
  canvas.EndText();
  canvas.RestoreState();
}

Finally, I dispose of the object instances I created, and return the rendered PDF as the result of the function.

if (!TemplatePDF.IsNull)
{
    stamper.Dispose();
    reader.Dispose();
}
else
{
    doc1.Close();
    doc1.Dispose();
}
return (new SqlBytes(PDFStream.ToArray()));

One note on the X/Y coordinates: the Y coordinate system starts from the bottom of the page and goes up. In other words, the bigger the Y value, the higher on the page the text will appear.

I made sure that the Project Properties / Database options indicated that the permission level was "External", and that the Project Properties / Application options indicated that the target framework was .NET Framework 3.5 (which is the most recent framework supported by SQL 2005 and 2008). I built the project, and this gave me the PDFCLR.dll that I needed.

I experimented with using Visual Studio 2010 to deploy the project to my local development SQL server, but was unsuccessful in getting this to work reliably. Even when I made it work a time or two, it seemed that Visual Studio's deployment took nearly as long as it took me to create the assembly in SQL by hand.

But once I had tested the assembly in SQL, I created a stored procedure to do the rebuilding of the C# code, the signing of the .DLL, and the registration of SQL all automatically using T-SQL. Please see my earlier article "Deploying CLR Assemblies with T-SQL" for a description of that approach. I found it to be a great way both to deploy the CLR assembly and to keep the source code and related SQL deployment configuration encapsulated within the database for portability.

The attached BuildAndRegisterCLRAssembly.cs file contains a script that you can run to build and properly register this entire assembly described above: no Visual Studio is needed.

Using the PDFCLR Assembly

Once the CLR assembly is deployed in the database, I can use it like this:

--Create XML to define field values and placement
DECLARE @FieldsXML xml
SET @FieldsXML = CAST(
'<Fields>
  <Field>
    <TextValue>Hello World</TextValue>
    <XPos>100</XPos>
    <YPos>700</YPos>
    <FontSize>18</FontSize>
  </Field>
  <Field>
    <TextValue>One more line, just for fun.</TextValue>
    <XPos>150</XPos>
    <YPos>650</YPos>
    <FontSize>12</FontSize>
  </Field>
</Fields>'
 AS xml)
--If the fields are to be overlayed an existing PDF document, provide the 
--existing PDF document data in @PDFTemplate.  (Otherwise a new, blank document
--will be created.)
DECLARE @PDFTemplate varbinary(MAX)
SET @PDFTemplate = NULL
DECLARE @ResultPDF varbinary(MAX)
SET @ResultPDF = dbo.udfRenderPDF(@PDFTemplate, @FieldsXML)
/*The PDF file now exists in the @ResultPDF variable.  You can do whatever you
want with the data.  For testing, you can write the binary data to a file using
this sputilWriteBinaryToFile utility procedure so that you can open the PDF in
Adobe Acrobat Reader.
*/SELECT @ResultPDF AS 'SamplePDF'
EXEC [dbo].[sputilWriteBinaryToFile]
  @FileData = @ResultPDF,
  @FilePath = 'C:\Temp',
  @Filename = 'test.pdf'
SELECT 'Check file C:\Temp\test.pdf for results' AS Notes  

Summary

Being able to call CLR functions from SQL is very powerful. But with this power comes great responsibility--both for proper deployment and configuration and for proper architecture. I want to be careful not to burden the SQL server with non-database processing that is better-suited for other distributed servers. At the same time, when portability, encapsulation and tight control of the environment are needed, I want to appropriately use CLR for specific uses.

This article attempts to show one such real-world example of server-side processing using CLR, that together with my earlier article "Deploying CLR Assemblies with T-SQL" provide fairly detailed step-by-step instructions on how to create and deploy a SQL CLR function in a way that should be approachable by a SQL developer or DBA who is not necessarily a .NET / Visual Studio developer.

Please see the attached BuildAndRegisterCLRAssembly.sql file for complete source code.

Resources

Rate

4.77 (62)

You rated this post out of 5. Change rating

Share

Share

Rate

4.77 (62)

You rated this post out of 5. Change rating