SQLServerCentral Article

JSON.SQL: A CLR-resident JSON serializer/deserializer for SQL Server

,

Introduction

Native XML support in SQL Server provides a fast, efficient means to translate relational data to hierarchical data and vice versa. However, business systems may use standards different than XML, for perfectly valid reasons such as compliance with industry standards like SWIFT ISO 15022 or HL7 V2.x Messaging, or to integrate with third parties that are imposing a standard on them. In those cases, different standards might be used.

JSON (short for JavaScript Object Notation) is one such non-XML lightweight data transmission standard that is in common use, particularly in front-end GUI systems for data transmission between a web server and a browser client. it is a language-independent data format, capable of representing simple data types and objects in the form of associative arrays.

Here is an example XML block with its corresponding JSON equivalent:

<Order>
<OrderId>123</OrderId>
<CustomerId>456</CustomerId>
<PalletNumber>A98765</PalletNumber>
<ShipToZip>33431-1234</ShipToZip>
<Discount>20%</Discount>
<OrderDate>2011-06-07 09:27:52.863</OrderDate>
</Order>
<Order>
<OrderId>124</OrderId>
<CustomerId>457</CustomerId>
<PalletNumber>B00001</PalletNumber>
<ShipToZip>37803-0000</ShipToZip>
<Discount>None</Discount>
<OrderDate>2011-06-07 09:29:28.217</OrderDate>
</Order>

which is this in JSON:

[
{
"OrderId": 123,
"CustomerId": 456,
"PalletNumber": "A98765",
"ShipToZip": "33431-1234",
"Discount": "20%",
"OrderDate": "\/Date(1307453272863-0400)\/" },
{
"OrderId": 124,
"CustomerId": 457,
"PalletNumber": "B00001",
"ShipToZip": "37803-0000",
"Discount": "None",
"OrderDate": "\/Date(1307453368220-0400)\/"
}
]

Notice that the character count (excluding whitespace) for the XML example is 437 characters, while the JSON example is 322 characters, a 26% reduction in overhead.

The one other item here that requires some mention is the date format used in JSON. Believe it or not, there is no standard for representing dates in JSON (due to what can only be referred to as an oversight in the EcmaScript specifications). In Microsoft's case, JSON uses a number that represents the number of milliseconds since January 1, 1970 12:00AM UTC, with the addition of the "Date()" literal to denote that the data is a datetime type and not to be interpreted as a string type. This might not port everywhere, since some JSON parsers use good ol' ISO 8601 (the more-readable YYYY-MM-DDThh:mm:ss.sTZD standard).

Project Goals

My goal for this project was to implement JSON as an importable and exportable format for SQL Server via the CLR. While I did find existing examples of JSON parsers and loaders on the Web, including one in native T-SQL that is one of the more, er, complex and lengthy T-SQL procedures I've seen, I decided that I'd rather go with a .NET CLR implementation to take advantage of the JavaScript JSON parsing logic available in .NET.

Due to my environmental restrictions which held me to the .NET Framework 2.0, I chose the JSON.Net CodePlex project to serialize and deserialize my JSON with. Other frameworks exist: Jayrock, or native JSON support available in later versions of the .NET Framework. I'll try these in future versions of my project, and leave it to you to do the same. Also, I'll leave for the Gentle Reader any support for ISO 8601 or other date formats you need.

JSON.Net

You can find JSON.Net online here, and the extensive technical documentation for it here.

Implementation

In JSON.Net, serialization and deserialization of JSON are pretty simple. Here are the methods I use to do both:

 private static String Serialize(DataTable dt, char QuoteChar)
{
JsonSerializer json = new JsonSerializer();
json.NullValueHandling = NullValueHandling.Include;
json.ObjectCreationHandling = ObjectCreationHandling.Reuse;
json.MissingMemberHandling = MissingMemberHandling.Ignore;
json.ReferenceLoopHandling = ReferenceLoopHandling.Ignore;
json.DefaultValueHandling = DefaultValueHandling.Include; json.Converters.Add(new Newtonsoft.Json.Converters.DataTableConverter()); StringWriter sw = new StringWriter();
JsonTextWriter jsonWriter = new JsonTextWriter(sw);
jsonWriter.Formatting = Formatting.Indented;
jsonWriter.QuoteChar = QuoteChar; json.Serialize(jsonWriter, dt); String output = sw.ToString();
jsonWriter.Close();
sw.Close(); return output;
}

 

private static void Deserialize(String jsonText, ref DataTable dt)
{
JsonSerializer json = new JsonSerializer(); json.NullValueHandling = NullValueHandling.Ignore;
json.ObjectCreationHandling = ObjectCreationHandling.Replace;
json.MissingMemberHandling = MissingMemberHandling.Ignore;
json.ReferenceLoopHandling = ReferenceLoopHandling.Serialize;
StringReader sr = new StringReader(jsonText);
JsonTextReader reader = new JsonTextReader(sr);
dt = (DataTable)json.Deserialize(reader, typeof(DataTable));
reader.Close();
}

In a nutshell, my Serialize method takes a populated DataTable and serializes the data in it to a JSON-formatted string (with the specified JavaScript-legal quote character for string values). Deserialize goes the other way, taking a string of JSON-formatted text, and deserializing it into columns and rows in the passed DataTable object.

I created two ways to query JSON data:

  1. Using a CLR function, a passed SQL SELECT statement is executed and the results returned to the caller as a single NVARCHAR(MAX) string of JSON text;
  2. Using a CLR procedure, a passed SQL SELECT statement is executed and the results returned to the caller as a multiple-row result set of JSON text (the JSON text is broken out into multiple rows delimited by carriage return/line feed).

Here is the function definition:

[Microsoft.SqlServer.Server.SqlFunction(DataAccess =  
       DataAccessKind.Read,IsDeterministic = true)]
[return: SqlFacet(MaxSize = -1)]
public static SqlString xf_JSON_Query( [SqlFacet(MaxSize = -1, IsNullable = false)] SqlString SelectStatement, [SqlFacet(IsNullable = false)] SqlBoolean UseDoubleQuotes )
{ DataTable dt = new DataTable();
SqlConnection conn = new SqlConnection("context connection=true");
conn.Open();
SqlDataAdapter da = new SqlDataAdapter((String)SelectStatement, conn);
da.Fill(dt);
char QuoteChar;
if (UseDoubleQuotes == true)
{
QuoteChar = '"';
}
else
{
QuoteChar = '\'';
}
String json = Serialize(dt, QuoteChar);
conn.Close();
return (SqlString)json; }

 

And here is the procedure:

[Microsoft.SqlServer.Server.SqlProcedure]
[return: SqlFacet(MaxSize = -1)]
public static void xsp_JSON_Query( [SqlFacet(MaxSize = -1, IsNullable = false)] SqlString SelectStatement, [SqlFacet(IsNullable = false)]
SqlBoolean UseDoubleQuotes)
{
DataTable dt = new DataTable();
SqlConnection conn = new SqlConnection("context connection=true");
conn.Open();
SqlDataAdapter da = new SqlDataAdapter((String)SelectStatement, conn);
da.Fill(dt);
char QuoteChar;
if (UseDoubleQuotes == true)
{
QuoteChar = '"';
}
else
{
QuoteChar = '\'';
}
String json = Serialize(dt, QuoteChar);
conn.Close(); // Create a single-column result set schema to send back the JSON data
SqlMetaData[] columns = new SqlMetaData[1];
columns[0] = new SqlMetaData("JSONData", SqlDbType.NVarChar, 4000);
SqlDataRecord dr = new SqlDataRecord(columns);
if (dt.Rows.Count == 0)
{ // send NULL back if no rows returned by query
SqlContext.Pipe.Send(dr);
}
else { // send the data back
bool PipeOpen = false;
StringReader str = new StringReader(json);
String line;
while ((line = str.ReadLine()) != null)
{
dr.SetValue(0, line);
if (PipeOpen == false)
{
SqlContext.Pipe.SendResultsStart(dr);
PipeOpen = true;
}
else
{
SqlContext.Pipe.SendResultsRow(dr);
}
}
if (PipeOpen == true)
{
SqlContext.Pipe.SendResultsEnd();
}
}
}

For the reverse process, I created two procedures:

  1. A CLR procedure that takes a parameterized JSON NVARCHAR(MAX) string and a string that is the fully-qualified name of an existing table in a database, and outputs a multiple-row result set whose columns match the passed table and JSON attributes one-to-one (the passed table itself is not modified or loaded, but simply used as a "template" whose schema is matched to the JSON attributes in the string);
  2. A CLR procedure that takes a parameterized JSON NVARCHAR(MAX) string and a destination table and performs a .NET SqlBulkCopy operation (bcp) to bulk load the JSON data into the passed table. All usual SqlBulkCopy options are exposed as parameters to the procedure.

These are:

[Microsoft.SqlServer.Server.SqlProcedure]
public static SqlInt32 xsp_JSON_Transform( [SqlFacet(MaxSize = -1, IsNullable = false)] SqlString JSONText, [SqlFacet(MaxSize = 128, IsNullable = false)] SqlString TemplateTableName )
{
DataTable dt = new DataTable();
SqlConnection conn = new SqlConnection("context connection=true");
conn.Open();
// load the result's schema from the template target table
String SelectStatement = "SELECT * FROM " + (String)TemplateTableName + " WHERE 1=2";
SqlDataAdapter da = new SqlDataAdapter(SelectStatement, conn);
da.Fill(dt);
// load the json into the empty dt with its schema
Deserialize((String)JSONText, ref dt); // create a list structure that matches the row schema List<SqlMetaData> OutputColumns = new List<SqlMetaData>(dt.Columns.Count);
SqlDbType dbType;
foreach (DataColumn col in dt.Columns)
{
SqlMetaData OutputColumn;
dbType = GetDBType(col.DataType); if (SqlDbType.Binary == dbType ||
SqlDbType.Image == dbType ||
SqlDbType.NText == dbType ||
SqlDbType.NVarChar == dbType ||
SqlDbType.Text == dbType ||
SqlDbType.Udt == dbType ||
SqlDbType.VarBinary == dbType ||
SqlDbType.VarChar == dbType ||
SqlDbType.Variant == dbType ||
SqlDbType.Xml == dbType)
{
OutputColumn = new SqlMetaData(col.ColumnName, dbType, col.MaxLength);
}
else
{
OutputColumn = new SqlMetaData(col.ColumnName, dbType);
}
OutputColumns.Add(OutputColumn);
} // Build our SqlDataRecord and start the results
SqlDataRecord record = new SqlDataRecord(OutputColumns.ToArray());
SqlContext.Pipe.SendResultsStart(record); // Now send all the cells
foreach (DataRow row in dt.Rows)
{
for (int col = 0; col < dt.Columns.Count; col++)
{
record.SetValue(col, row.ItemArray[col]);
}
SqlContext.Pipe.SendResultsRow(record);
} // And complete the results
SqlContext.Pipe.SendResultsEnd();
return (SqlInt32)dt.Rows.Count;
}

and

[Microsoft.SqlServer.Server.SqlProcedure]
public static SqlInt32 xsp_JSON_BulkCopy(
[SqlFacet(MaxSize = -1, IsNullable = false)] SqlString JSONText,
[SqlFacet(MaxSize = 128, IsNullable = false)] SqlString DestinationTableName,
[SqlFacet(MaxSize = 128, IsNullable = true)] SqlString DestinationServerName,
[SqlFacet(MaxSize = 128, IsNullable = true)] SqlString DestinationDatabaseName,
[SqlFacet(MaxSize = 128, IsNullable = true)] SqlString DestinationUserName,
[SqlFacet(MaxSize = 128, IsNullable = true)] SqlString DestinationPassword,
[SqlFacet(IsNullable = true)] SqlInt32 BatchSize,
[SqlFacet(IsNullable = true)] SqlInt32 Timeout,
[SqlFacet(IsNullable = true)] SqlBoolean CheckConstraints,
[SqlFacet(IsNullable = true)] SqlBoolean KeepIdentity,
[SqlFacet(IsNullable = true)] SqlBoolean KeepNulls,
[SqlFacet(IsNullable = true)] SqlBoolean TableLock,
[SqlFacet(IsNullable = true)] SqlBoolean FireTriggers,
[SqlFacet(IsNullable = true)] SqlBoolean UseInternalTransaction
)
{
DataTable dt = new DataTable();
SqlConnection conn = new SqlConnection("context connection=true");
String ConnStr;
SqlInt32 rtnval = -1;
conn.Open();

// get the connection's server and database name, if needed
String ServerName;
String DatabaseName;
if (DestinationServerName.IsNull == false)
{
ServerName = (String)DestinationServerName;
}
else
{
ServerName = conn.DataSource;
}
if (DestinationDatabaseName.IsNull == false)
{
DatabaseName = (String)DestinationDatabaseName;
}
else
{
DatabaseName = conn.Database;
} // get the schema of the destination table & build the datatable from it
String SelectStatement = "SELECT * FROM " + (String)DestinationTableName + " WHERE 1=2";
SqlDataAdapter da = new SqlDataAdapter(SelectStatement, conn);
da.Fill(dt); // close the connection
if (conn.State == ConnectionState.Open)
{
conn.Close();
}

// load the json into the empty dt with its schema
Deserialize((String)JSONText, ref dt); // open the connection to the bulk load destination
if (DestinationUserName.IsNull == false && DestinationPassword.IsNull == false)
{
ConnStr = "Server=" + ServerName + ";Database=" + DatabaseName + ";UserName=" + (String)DestinationUserName + ";Password=" +

(String)DestinationPassword;
}
else
{
ConnStr = "Server=" + ServerName + ";Database=" + DatabaseName + ";Trusted_Connection=Yes";
}
conn.ConnectionString = ConnStr;
conn.Open(); // prepare the bulk copy
SqlBulkCopyOptions options = new SqlBulkCopyOptions();
if (CheckConstraints.IsNull == true &&
KeepIdentity.IsNull == true &&
KeepNulls.IsNull == true &&
TableLock.IsNull == true &&
FireTriggers.IsNull == true &&
UseInternalTransaction.IsNull == true
)
{
options = SqlBulkCopyOptions.Default;
}
else
{
if ((bool)CheckConstraints == true)
{
options = options | SqlBulkCopyOptions.CheckConstraints;
}
if ((bool)KeepIdentity == true)
{
options = options | SqlBulkCopyOptions.KeepIdentity;
}
if ((bool)KeepNulls == true)
{
options = options | SqlBulkCopyOptions.KeepNulls;
}
if ((bool)TableLock == true)
{
options = options | SqlBulkCopyOptions.TableLock;
}
if ((bool)FireTriggers == true)
{
options = options | SqlBulkCopyOptions.FireTriggers;
}
if ((bool)UseInternalTransaction == true)
{
options = options | SqlBulkCopyOptions.UseInternalTransaction;
}
} SqlBulkCopy bulkCopy = new SqlBulkCopy
(
conn,
options,
null
); bulkCopy.DestinationTableName = (String)DestinationTableName;
if (BatchSize.IsNull == false)
{
bulkCopy.BatchSize = (int)BatchSize;
}
if (Timeout.IsNull == false)
{
bulkCopy.BulkCopyTimeout = (int)Timeout;
} // BCP IN the data in the data table
bulkCopy.WriteToServer(dt); // cleanup conn.Close();
rtnval = (SqlInt32)dt.Rows.Count;
return rtnval;
}

These two methods use a helper method to determine SQL data types:

 private static SqlDbType GetDBType(Type t)
{
SqlParameter p;
TypeConverter tc;
p = new SqlParameter();
tc = TypeDescriptor.GetConverter(p.DbType);
if (tc.CanConvertFrom(t))
{
p.DbType = (DbType)tc.ConvertFrom(t.Name);
}
else
{
//Try brute force
try
{
p.DbType = (DbType)tc.ConvertFrom(t.Name);
}
catch //(Exception ex)
{
//Do Nothing
}
}
return p.SqlDbType;
}

And that's it. A complete working solution with source and db installation script is included with the article.

But I Hate The CLR!

Yes, I do, and here's why. It violates that old, old rule that a database server should <> an application server. It's very easy (trust me, VERY easy) for a badly-written CLR procedure -- or a fantastically-written one running on a few thousand or million rows of data -- to take out your SQL Server, usually memory and CPU wise, if not via an out-and-out crash or stack dump (not to mention the irate users, coworkers, and bosses you'll leave behind after your moment of glory). CLR memory exists outside your SQL Server's normal memory pool, so what you allocate for buffer cache is unavailable for CLR and vice versa. Now try loading a billion row table into a CLR procedure, and watch your dynamically-adjusting SQL Server instance go boink. Ditto CPU and IO too: consider anything you run in CLR as taking away something from the SQL Server side. You must be as careful as a Tweeting Congressman at a press conference when using the CLR for much of anything.

Having said that, here are some other things I've used the CLR for, successfully:

  1. URL parsing
  2. String and key hashing
  3. File system access
  4. SMTP emailing directly from T-SQL
  5. Parallel threaded and thread-managed SQL execution from within a T-SQL procedure or SQL Agent job (spawns multiple SQL statements and waits for completion of all of them)

But back on topic, I wanted to know the particular performance characteristics of these JSON procedures. I suspected that they would be less efficient than the corresponding XML commands (FOR XML and OPENXML) on the same data, since XML support is native to SQL Server and does not use the CLR.

Turned out I was partially right.

Selected Performance Testing Results

I ran several iterations of tests, measuring serialization and deserialization of JSON vs XML on two 100K row tables, one narrow with mostly numeric data, and the other wide with mostly textual data, in increments of 1000 rows per run. I checked CPU, IO, and memory usage, both SQL and CLR memory used. I also ran both forwards-allocation and a backwards-allocation iterations: in other words, I ran up the number of records in 1000 row batches both up --1000, 2000, 3000, etc. -- and back down --- 100,000, then 99,000, then 98,000, etc. This allows us to consider garbage collection and dynamic memory management efficiency on the process: how quickly does CLR release memory back to the OS? I also ran a third random-allocation iteration, where the test could execute over a random number of rows between 1000 and 100K, in multiples of 1000, to see how rowset volatility affected performance and resources. Testing was done on a server with other concurrent heavy workloads, in the form of interactive and ad hoc SQL, T-SQL and CLR procedures, SQL Agent jobs, and about 20 other databases present and active with appx. 100 concurrent user connections.

Below are some selected results. The xsp_JSON_Query procedure takes 45 seconds or so to return 100K rows of data as JSON, with a 500MB or so CLR memory requirement:

 

 

The xf_JSON_Query bests that by returning 100K rows in less than 5 seconds. The two use identical calls to the JSON serializer, so the performance hit on the proc seems to be with streaming the results row by row to the client. Here are the results for the function:

 

Note that I plotted both raw numbers for CLR used memory as well as a second chart of the LOG of CLR used memory. I did that as a statistical check of the "plateaus" seen in the raw CLR memory numbers, which I interpret as fixed-allocation pieces of memory allocated by the CLR. The close-to-log growth of those numbers ensures me that rate of growth of memory consumption will flatten if and when larger result sets are serialized.

Finally, I checked the bulk copy procedure:

So generally you can bulk load 100K JSON records to 100K table rows in about 5 seconds. Note that the IO and memory characteristics of BCP are different from data-returning procedures.

And finally, as a control, here's the performance of the same data outputted using FOR XML AUTO, ELEMENTS:

On the surface, it appears that the JSON function outputs the same data faster than FOR XML: in fact, a good 60-75% faster; although, the flattening of the CPU and IO curves on the right of the FOR XML graphs is intriguing. Perhaps the XML routines do better at larger volume due to more efficient data caching and memory management algorithms. I'll leave that one for another day.

Also, I left out the memory chart deliberately, as it was flat and very low. FOR XML appears to be very memory efficient (I should mention that between tests I'm performing a DBCC DROPCLEANBUFFERS to clear cache as much as possible to a pristine state before starting the next test).

While further higher-volume testing might be appropriate,100K rows max was sufficient for my purposes. Maybe I'll try larger volumes when and if circumstances warrant.

Conclusion

JSON provides an interesting, viable, and perhaps even useful alternative to XML as an input/output choice for SQL Server. Hopefully in the future MS will start supporting native JSON in the SQLOS: can you say FOR JSON and OPENJSON?

Download

Please read the disclaimer below, and understand that I do not provide any support for this software, source or binaries, including but not limited to installation, configuration, maintenance, and tuning. If you're good with that, then you can download the complete binaries and source code here.

Disclaimer

The program (source code AND binaries) is provided to you as is, without warranty. There is no warranty for the program, expressed or implied, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose and non infringement of third party rights. The entire risk as to the quality and performance of the program is with you. Should the program prove defective, you assume the cost of all necessary servicing, repair and correction.

Acknowledgements

Thanks to James Netwton-King for allowing me the use of JSON.Net in this project and article.

 

 

Rate

4.81 (32)

You rated this post out of 5. Change rating

Share

Share

Rate

4.81 (32)

You rated this post out of 5. Change rating