JSON support in SQL Server 2016

At last, SQL Server has caught up with other RDBMSs by providing a useful measure of JSON-support. It is a useful start, even though it is nothing like as comprehensive as the existing XML support. For many applications, what is provided will be sufficient. Robert Sheldon describes what is there and what isn't.

SQL Server 2016 is finally adding support for JSON, a lightweight format for exchanging data between different source types, similar to how XML is used. JSON, short for JavaScript Object Notation, is based on a subset of the JavaScript programming language and is noted for being human readable and easy for computers to parse and generate.

According to Microsoft, it is one of the most highly ranked requests on the Microsoft connect site and so for many, its inclusion in SQL Server is welcome news. That is, unless you were expecting the same sort of robust support we’ve seen with XML. SQL Server 2016 does not approach JSON with such vehemence, nor does it match what you’ll find in products such as PostgreSQL.

SQL Server 2016 includes no JSON-specific data type and consequently none of the kinds of methods available to the XML data type. SQL Server 2016 continues to use the NVARCHAR type to store JSON data. However, it does provide several important T-SQL language elements that make working with JSON much easier than it has been in the past, so Microsoft is at least moving in the right direction, even if it still has some catching up to do.

Getting to know JSON

Although JSON is a bit more complex than what we’ll cover here, it can help to have a basic understanding of what makes up a JSON code snippet before starting in on the SQL Server support. At its most basic, a JSON snippet can contain objects, arrays, or both. An object is an unordered collection of one or more name/value pairs (properties), enclosed in curly braces, as shown in the following example:

For each property, the name component (FirstName, Current, Age, and Phone) is enclosed in double quotes and followed by a colon. The name component, sometimes referred to as the key, is always a string. The property’s value follows slightly different rules. If the value is a string, you should enclose it in double quotes. If it is a numeric value, Boolean value (true or false), or null value, do not enclose it in quotes.

An array is simply an ordered collection of values, enclosed in square brackets, as in the following example:

An array supports the same types of values as an object: string, number, true, false, or null. In addition, both objects and arrays can contain other objects and arrays as their values, providing a way to nest structures, as shown in the following example:

At the top level, we have a JSON object that includes a single property. The property’s name is Employees, and the value is an array, which contains two values. Each array value is a JSON object that includes the Name, PII, and LoginID properties. The Name and PII values are also JSON objects, which contain their own name/value pairs.

As we work through the examples in this article, you’ll get a better sense of how these various components work.

Formatting query results as JSON

One of the JSON-related features supported in SQL Server 2016 is the ability to return data in the JSON format, which we do by adding the FOR JSON clause to a SELECT statement. We’ll explore the basics of how to use a FOR JSON clause to return data in the JSON format, using either the AUTO argument or the PATH argument.

First, however, we need some data on which to work. The following SELECT statement retrieves two rows from the vEmployee view in the AdventureWorks2016CTP3 database:

It returns the following results, although you might see some differences with the final product, since the data and examples are based on the CTP 3 release of SQL Server 2016:

FirstName MiddleName LastName EmailAddress PhoneNumber
Terri Lee Duffy terri0@adventure-works.com 819-555-0175
Roberto NULL Tamburello roberto0@adventure-works.com 212-555-0187

AUTO mode

To return these results as JSON, to support a specific application, we simply add the FOR JSON clause to the statement, as shown in the following example.

Notice that the clause includes the AUTO argument, which indicates that the results should be returned in AUTO mode. When you specify this mode, the database engine automatically determines the JSON format, based on the order of the columns in the SELECT list and the tables in the FROM clause. In this case, the FOR JSON AUTO clause causes the SELECT statement to return the following results.

From these results, you might be able to see that the JSON output includes an array that contains two values, with each value a JSON object. Not surprisingly, as the results become more involved, it becomes more difficult to read them. In such cases, you can use a local or online JSON formatter/validator to turn the JSON snippet into something more readable. For example, I fed the previous results into the formatter at https://jsonformatter.curiousconcept.com/ and came up with the following JSON:

As you can see, it is now much easier to see our top-level array and the two object values it contains. Each object corresponds to a row returned by the SELECT statement. Going forward, I’ll show only the formatter-fed results so they’re more readable, but know that SQL Server returns the data as a single-line value, without all the whitespace and line breaks, as you saw above.

Now that you’ve gotten a taste of the FOR JSON AUTO clause, let’s look at what happens when we join tables:

As our SELECT statement becomes more complex, so too does the JSON output, as shown in the following results:

The information from the Person table is now part of the p array, which itself is one of the values in the parent object. As you’ll recall, AUTO mode formats the results based on the order of the columns in the SELECT list and the tables in the FROM clause, so let’s mix up that column order:

Now the SELECT statement will return the JSON with the data from the Employee table treated as the nested object:

As you can see, we have two e arrays, embedded in the outer objects. We can continue to play around with our SELECT statement to try to get closer to the JSON results we want, or we can instead use the PATH mode, which gives us full control over the format of the JSON output. For all but the most basic SELECT statements, you’ll likely want to use the PATH mode.

PATH mode

To use the PATH mode, we start be specifying PATH in the FOR JSON clause, rather than AUTO, as shown in the following example:

When we switch to the PATH mode, the database engine flattens out our results and returns the data as two object values within a single array:

Using the PATH mode in this way is fairly straightforward; however, this is PATH at its most basic. The mode lets us be far more specific. For example, we can control how the the database engine nests the JSON output by specifying column aliases that define the structure, as shown in the following SELECT clause:

In this case, we are defining the Name object, which contains the First, Middle, and Last values; the PII object, which contains the DOB and NatID values; and the LoginID name/value pair, as shown in the following results:

In some cases, you will want to add a single, top-level element to your JSON output to serve as a root. To do so, you must specify it as part of the FOR JSON clause, as shown in the following example:

To specify the root, we add the ROOT option to the FOR JSON clause and, in this case, name the root Employees, which gives us the following results:

If you compare these results to those from the previous example, you will see that the outer element has been changed from an array to an object that contains only the Employees property. The Employees value is now the array that was the outer element in the previous example.

You might have also noticed that the second employee, Roberto, includes no middle name. That is because the MiddleName column in the source table is null. By default, the database engine does not include a JSON element whose value is null. However, you can override this behavior by adding the INCLUDE_NULL_VALUES option to the FOR JSON clause, as shown in the following SELECT statement:

SELECT

Now the results will show that Roberto’s middle name is null by assigning the null value to the Middle property:

There are, of course, other considerations to take into account when using this clause, so be sure to refer to SQL Server 2016 documentation. In the meantime, let’s look at how to convert a JSON snippet to traditional rowset data.

Converting JSON to rowset data using the OPENJSON function

To return a JSON snippet as rowset data, we use the OPENJSON rowset function to convert the data to a relational format. The function returns three values:

  • key: Property name within the object or index of the element within the array.
  • value: Property value within the object or value of the array element specified by the index.
  • type: Value’s data type, represented numerically, as described in the following table:
Numeric value Data type
0 null
1 string
2 int
3 true or false
4 array
5 object

To test how the the OPENJSON function works, let’s assign a JSON snippet to a variable and then use the function to call the variable, as shown in the following example:

The JSON snippet contains a single object that includes a property for each data type. The SELECT statement uses the OPENJSON rowset function within the FROM clause to retrieve the JSON data as a rowset, as shown in the following results:

key value type
FirstName NULL 0
LastName Duffy 1
NatID 245797967 2
Current false 3
Skills [“Dev”,”QA”,”PM”] 4
Region {“Country”:”Canada”,”Territory”:”North America”} 5

Notice that the type column in the results identifies the data type for each value. As expected, the column shows the Skills value an array, with all of the array’s elements included in the results for that row. The same goes for the Region value, which is an object. The row includes all the properties within that object.

In some cases, you will want to return only the key and value columns, so you will need to specify those columns in your SELECT list:

Notice that you must delimit the key column because Microsoft chose to return a column name that is also a T-SQL reserved keyword. As the following table shows, the results include only those two columns:

key value
FirstName NULL
LastName Duffy
NatID 245797967
Current false
Skills [“Dev”,”QA”,”PM”]
Region {“Country”:”Canada”,”Territory”:”North America”}

Now let’s move on to a more complex JSON snippet, which we’ll use for the remaining examples in this article:

The JSON shown here comes from the output generated from the last example in the preceding section. As you’ll recall, the database engine actually outputs the JSON in a format much less readable than what is shown here, but it can be easier to work with when assigning the JSON to a variable. So that’s the approach we’ll take for the remaining examples:

If you plan to try out the next batch of examples, you can use this variable definition for each one, which avoids all the whitespace you get when you run the results through a parser. Now let’s use the OPENJSON function to convert the JSON in the variable:

The example uses OPENJSON at its most basic, with no other parameters defined. As a result, the SELECT statement returns only a single row for the Employees array, as shown in the following table:

key value
Employees [{“Name”:{“First”:”Terri”,”Middle”:”Lee”,”Last”:”Duffy”},”PII”:{“DOB”:”1971-08-01″,”NatID”:”245797967″},”LoginID”:”adventure-works\\terri0″},{“Name”:{“First”:”Roberto”,”Middle”:null,”Last”:”Tamburello”},”PII”:{“DOB”:”1974-11-12″,”NatID”:”509647174″},”LoginID”:”adventure-works\\roberto0″}]

To better control our results, we need to pass a second argument into the OPENJSON function. The argument is a JSON path that instructs the database engine on how to parse the data. For example, the following path instructs the database engine to return data based on the Employees property:

When you specify a JSON path, you start with a dollar sign ($) to represent the item as it exists in its current context. You then specify one or more elements as they appear hierarchically in the JSON snippet, using periods to separate the elements. In this case, the path specifies only the root element, Employees, giving us the results shown in the following table:

key value
0 {“Name”:{“First”:”Terri”,”Middle”:”Lee”,”Last”:”Duffy”},”PII”:{“DOB”:”1971-08-01″,”NatID”:”245797967″},”LoginID”:”adventure-works\\terri0″}
1 {“Name”:{“First”:”Roberto”,”Middle”:null,”Last”:”Tamburello”},”PII”:{“DOB”:”1974-11-12″,”NatID”:”509647174″},”LoginID”:”adventure-works\\roberto0″}

This time, we get a row for each element in the Employees array. If we want to break the results down even further, we must work down the hierarchy. For example, to reference an element within the Employees array, we must specify the element’s index, as it exists within the array. An array’s index is zero-based, which means the index count starts with 0, so if we want to retrieve the first element in the Employees array, we must specify 0 after the root name, within square brackets, as shown in the following statement:

The first element in the Employees array is a JSON object that contains three properties, so that is what the SELECT statement returns, as shown in the following results:

key value
Name {“First”:”Terri”,”Middle”:”Lee”,”Last”:”Duffy”}
PII {“DOB”:”1971-08-01″,”NatID”:”245797967″}
LoginID adventure-works\terri0

Because the first two values are objects, the entire contents of those objects are returned. However, we can instead return only one of those objects by specify the object name:

Now the SELECT statement returns only the three properties within the Name object:

key value
First Terri
Middle Lee
Last Duffy

The OPENJSON examples we’ve looked at so far have used the default schema when returning the data as a rowset, but there are limits to how well we can control the results. Fortunately, the OPENJSON function also lets us add a WITH clause to our SELECT statement in order to define an explicit schema. In the following example, the schema flattens out our data so we can easily see the details for each employee:

The WITH clause specifies each column, using names that link to the original JSON. For example, the Name.First column returns the employee’s first name. The column name is based on the First property within the Name object. For each column, we also provide a T-SQL data type. The SELECT statement now returns the results shown in the following table:

Name.First Name.Middle Name.Last PII.DOB PII.NatID
Terri Lee Duffy 1971-08-01 245797967
Roberto NULL Tamburello 1974-11-12 509647174

If we want to define more readable column names, we can instead create column definitions that each includes the new name, followed the data type, and then a path reference, as shown in the following example:

Notice that, for the path, we do not need to reference the Employees array itself. That’s taken care of in the OPENJSON function. But we still need to specify the dollar sign to show the current context. We then follow with the Name or PII object name and then the property name. The SELECT statement now returns the results shown in the following table:

FirstName MiddleName LastName BirthDate NationalID
Terri Lee Duffy 1971-08-01 245797967
Roberto NULL Tamburello 1974-11-12 509647174

The preceding examples should give you at least a basic idea of how to turn a JSON snippet into rowset data. Again, refer to SQL Server 2016 documentation to get more specifics about how to use the OPENJSON function.

More JSON functions in SQL Server 2016

In addition to OPENJSON, SQL Server 2016 includes several other functions for working with JSON data. We’ll review how to use the ISJSON, JSON_value functions, and JSON_ QUERY functions.

ISJSON

The ISJSON function lets you test whether a text string is correctly formatted JSON. This is a particularly important function, considering that SQL Server 2016 doesn’t support a JSON data type. At least this way, you have some way to validate your data.

The ISJSON function returns 1 if a string is valid JSON, otherwise returns 0. The only exception to this is if the string is null, in which case the function returns null. The following SELECT statement tests our ubiquitous @json variable to verify whether it is valid:

As we hoped, the SELECT statement returns the following results:

Now let’s pass in text that is not valid JSON by tagging on the Age element without a value:

As expected, we receive the second message:

JSON_VALUE

Another handy JSON-related function in SQL Server 2016 is JSON_VALUE, which lets us extract a scalar value from a JSON snippet, as shown in the following example:

The JSON_VALUE function takes two arguments. The first is the JSON itself, and the second is a path that defines which element’s value we want to retrieve. In this case, the path specifies the First property in the Name object, which is part of the first element in the Employees array. As we would expect, the SELECT statement returns the value Terri.

We can just as easily return the NatID value for the second employee:

Now the SELECT statement returns 509647174. Suppose, however, that we try to retrieve something other than a scalar value. For example, the following path specifies only the PII object for the second employee:

This time, the SELECT statement returns a null value. By default, the database engine returns a null value if the path does not exist or is not applicable to the current situation. In this example, we’ve specified an element that cannot return a scalar value, so the database engine returns the null value.

When specifying a path in a JSON-related expression, you can control the results by preceding the path with the lax or strict option. The lax option is the default and is implied if not specified, which means that the database engine returns a null value if a problem arises. For example, the following path explicitly includes the lax option:

Once again, out statement returns a null value because we’re specifying an element that cannot return a scalar value. We can instead specify the strict option, in which case, the database engine will raise an error if a problem occurs:

This time we receive very different results:

JSON_QUERY

Another useful JSON-related tool is the JSON_QUERY function, which can extract an object or array from a JSON snippet. For example, the following SELECT statement retrieves the PII object for the second employee:

Like the JSON_value function, the JSON_QUERY function takes two arguments: the JSON source and a path indicating what data to extract. The SELECT statement returns the following results:

If we want to return the Employees array, we simply specify $.Employees as our path:

Now the SELECT statement returns just about everything in our JSON snippet:

Summary: JSON and SQL Server 2016

This article should give you what you need to start working with JSON data in SQL Server. As you can see, however, JSON support is nowhere nearly as robust as XML support. And if you’re working with other database management systems, you’ll quickly discover that the JSON features in SQL Server 2016 have some catching up to do before they can match what’s been implemented in other products.

Even so, what SQL Server 2016 provides is better than nothing, and the JSON support is solid and could prove more than adequate much of the time. In fact, for some organizations, the JSON features already implemented in SQL Server 2016 will be enough to meet their needs. Best of all, the JSON-related functionality is straightforward and easy-to-use, so you should be able to incorporate it into your workflow with relatively little pain.