Importing JSON Collections into SQL Server

It is fairly easy to Import JSON collections of documents into SQL Server if there is an underlying ‘explicit’ table schema available to them. If each of the documents have different schemas, then you have little chance. Fortunately, schema-less data collections are rare.

In this article we’ll start simply and work through a couple of sample examples before ending by creating a SQL server database schema with ten tables, constraints and keys. Once those are in place we’ll then import a single JSON Document, filling the ten tables with the data of 70,000 fake records from it.

Let’s start this gently, putting simple collections into strings which we will insert into a table. We’ll then try slightly trickier JSON documents with embedded arrays and so on. We’ll start by using the example of sheep-counting words, collected from many different parts of Great Britain and Brittany. The simple aim is to put them into a table. I don’t use Sheep-counting words because they are of general importance but because they can be used to represent whatever data you are trying to import.

You will need access to SQL Server version, 2016 and later or Azure SQL Database or Warehouse to play along and you can download data and code from GitHub.

Converting Simple JSON Arrays of Objects to Table-sources

We will start off by creating a simple table that we want to import into.

DROP TABLE IF EXISTS SheepCountingWords

CREATE TABLE SheepCountingWords

(

Number INT NOT NULL,

Word VARCHAR(40) NOT NULL,

Region VARCHAR(40) NOT NULL,

CONSTRAINT NumberRegionKey PRIMARY KEY (Number,Region)

);

We then choose a simple JSON Format

[{

"number": 11, "word": "Yan-a-dik"

}, {

"number": 12, "word": "Tan-a-dik"

}, {

"number": 13, "word": "Tethera-dik"

}, {

"number": 14, "word": "Pethera-dik"

}, {

"number": 15, "word": "Bumfit"

}, {

"number": 16, "word": "Yan-a-bumtit"

}, {

"number": 17, "word": "Tan-a-bumfit"

}, {

"number": 18, "word": "Tethera-bumfit"

}, {

"number": 19, "word": "Pethera-bumfit"

}, {

"number": 20, "word": "Figgot"

}]

We can very easily use OpenJSON to create a table-source that reflects the contents.

SELECT Number, Word

FROM

OpenJson('[{

"number": 11, "word": "Yan-a-dik"

}, {

"number": 12, "word": "Tan-a-dik"

}, {

"number": 13, "word": "Tethera-dik"

}, {

"number": 14, "word": "Pethera-dik"

}, {

"number": 15, "word": "Bumfit"

}, {

"number": 16, "word": "Yan-a-bumtit"

}, {

"number": 17, "word": "Tan-a-bumfit"

}, {

"number": 18, "word": "Tethera-bumfit"

}, {

"number": 19, "word": "Pethera-bumfit"

}, {

"number": 20, "word": "Figgot"

}] '

)WITH (Number INT '$.number', Word VARCHAR(30) '$.word')

Once you have a table source, the quickest way to insert JSON into a table will always be the straight insert, even after an existence check. It is a good practice to make the process idempotent by only inserting the records that don’t already exist. I’ll use the MERGE statement just to keep things simple, though the left outer join with a null check is faster. The MERGE is often more convenient because it will accept a table-source such as a result from the OpenJSON function. We’ll create a temporary procedure to insert the JSON data into the table.

DROP PROCEDURE IF EXISTS #MergeJSONwithCountingTable;

CREATE PROCEDURE #MergeJSONwithCountingTable @json NVARCHAR(MAX),

@source NVARCHAR(MAX)

/**

Summary: >

This inserts, or updates, into a table (dbo.SheepCountingWords) a JSON string consisting

of sheep-counting words for numbers between one and twenty used traditionally by sheep

farmers in Gt Britain and Brittany. it allows records to be inserted or updated in any

order or quantity.

Author: PhilFactor

Date: 20/04/2018

Database: CountingSheep

Examples:

- EXECUTE #MergeJSONwithCountingTable @json=@OneToTen, @Source='Lincolnshire'

- EXECUTE #MergeJSONwithCountingTable @Source='Lincolnshire', @json='[{

"number": 11, "word": "Yan-a-dik"}, {"number": 12, "word": "Tan-a-dik"}]'

Returns: >

nothing

**/

MERGE dbo.SheepCountingWords AS target

USING

(

SELECT DISTINCT Number, Word, @source

FROM

OpenJson(@json)

WITH (Number INT '$.number', Word VARCHAR(20) '$.word')

) AS source (Number, Word, Region)

ON target.Number = source.Number AND target.Region = source.Region

WHEN MATCHED AND (source.Word <> target.Word) THEN

UPDATE SET target.Word = source.Word

WHEN NOT MATCHED THEN

INSERT (Number, Word, Region)

VALUES

(source.Number, source.Word, source.Region);

Now we try it out. Let’s assemble a couple of simple JSON strings from a table-source.

DECLARE @oneToTen NVARCHAR(MAX) =

(

SELECT LincolnshireCounting.number, LincolnshireCounting.word

FROM

(

VALUES (1, 'Yan'), (2, 'Tan'), (3, 'Tethera'), (4, 'Pethera'),

(5, 'Pimp'), (6, 'Sethera'), (7, 'Lethera'), (8, 'Hovera'),

(9, 'Covera'), (10, 'Dik')

) AS LincolnshireCounting (number, word)

FOR JSON AUTO

)

DECLARE @ElevenToTwenty NVARCHAR(MAX) =

(

SELECT LincolnshireCounting.number, LincolnshireCounting.word

FROM

(

VALUES (11, 'Yan-a-dik'), (12, 'Tan-a-dik'), (13, 'Tethera-dik'),

(14, 'Pethera-dik'), (15, 'Bumfit'), (16, 'Yan-a-bumtit'),

(17, 'Tan-a-bumfit'), (18, 'Tethera-bumfit'),

(19, 'Pethera-bumfit'), (20, 'Figgot')

) AS LincolnshireCounting (number, word)

FOR JSON AUTO

)

Now we can EXECUTE the procedure to store the Sheep-Counting Words in the table

EXECUTE #MergeJSONwithCountingTable @json=@ElevenToTwenty, @Source='Lincolnshire'

EXECUTE #MergeJSONwithCountingTable @json=@OneToTen, @Source='Lincolnshire'

--and make sure that we are protected against duplicate inserts

EXECUTE #MergeJSONwithCountingTable @Source='Lincolnshire', @json='[{

"number": 11, "word": "Yan-a-dik"}, {"number": 12, "word": "Tan-a-dik"}]'

Check to see that they were imported correctly by running this query:

1	SELECT * FROM SheepCountingWords

Converting to Table-source JSON Arrays of Objects that have Embedded Arrays

What if you want to import the sheep-counting words from several regions? So far, what we’ve been doing is fine for a collection that models a single table. However, real life isn’t like that. Not even Sheep-Counting Words are like that. A little internalized Chris Date will be whispering in your ear that there are two relations here, a region and the name for a number.

Your JSON for a database of sheep-counting words will more likely look like this (I’ve just reduced it to two numbers in the sequence array rather than the original twenty). Each JSON document in our collection has an embedded array.

[{

"region": "Wilts",

"sequence": [{

"number": 1,

"word": "Ain"

}, {

"number": 2,

"word": "Tain"

}]

}, {

"region": "Scots",

"sequence": [{

"number": 1,

"word": "Yan"

}, {

"number": 2,

"word": "Tyan"

}]

After a bit of thought, we remember that the OpenJSON function actually allows you to put a JSON value in a column of the result. This means that you just need to CROSS APPLY each embedded array, passing to the ‘cross-applied’ OpenJSON function the JSON fragment representing the array, which it will then parse for you.

SELECT Number, Word, Region

FROM

OpenJson('[{

"region": "Wilts",

"sequence": [{

"number": 1,

"word": "Ain"

}, {

"number": 2,

"word": "Tain"

}]

}, {

"region": "Scots",

"sequence": [{

"number": 1,

"word": "Yan"

}, {

"number": 2,

"word": "Tyan"

}]

}]' )

WITH (Region NVARCHAR(30) N'$.region', sequence NVARCHAR(MAX) N'$.sequence' AS JSON)

OUTER APPLY

OpenJson(sequence) --to get the number and word within each array element

WITH (Number INT N'$.number', Word NVARCHAR(30) N'$.word');

I haven’t found the fact documented anywhere, but you can leave out the path elements from the column declaration of the WITH statement if the columns are exactly the same as the JSON keys, with matching case.

SELECT number, word, region

FROM

OpenJson('[{

"region": "Wilts",

"sequence": [{

"number": 1,

"word": "Ain"

}, {

"number": 2,

"word": "Tain"

}]

}, {

"region": "Scots",

"sequence": [{

"number": 1,

"word": "Yan"

}, {

"number": 2,

"word": "Tyan"

}]

}]' )

WITH (region NVARCHAR(30), sequence NVARCHAR(MAX) AS JSON)

OUTER APPLY

OpenJson(sequence) --to get the number and word within each array element

WITH (number INT, word NVARCHAR(30));

The ability to drill into sub-arrays by cross-joining OpenJSON function calls allows us to easily insert a large collection with a number of documents that have embedded arrays. This is looking a lot more like something that could, for example, tackle the import of a MongoDB collection as long as it was exported as a document array with commas between documents. I’ll include, with the download on GitHub, the JSON file that contains all the sheep-counting words that have been collected. Here is the updated stored procedure:

DROP PROCEDURE IF EXISTS #MergeJSONWithEmbeddedArraywithCountingTable;

CREATE PROCEDURE #MergeJSONWithEmbeddedArraywithCountingTable @json NVARCHAR(MAX)

/**

Summary: >

This inserts, or updates, into a table (dbo.SheepCountingWords) a JSON collection

consisting of documents with an embedded array containing sheep-counting words for

numbers between one and twenty used traditionally by sheep farmers in Gt Britain and

Brittany. it allows records to be inserted or updated in any order or quantity.

Author: PhilFactor

Date: 20/04/2018

Database: CountingSheep

Examples:

- EXECUTE #MergeJSONWithEmbeddedArraywithCountingTable @json=@AllTheRegions,

- EXECUTE #MergeJSONWithEmbeddedArraywithCountingTable @json='

[{"region":"Wilts","sequence":[{"number":1,"word":"Ain"},{"number":2,"word":"Tain"}]},

{"region":"Scots","sequence":[{"number":1,"word":"Yan"},{"number":2,"word":"Tyan"}]}]'

Returns: >

nothing

**/

MERGE dbo.SheepCountingWords AS target

USING

(

SELECT DISTINCT Number, Word, Region

FROM OpenJson(@json)

WITH (Region NVARCHAR(30) N'$.region', sequence NVARCHAR(MAX) N'$.sequence' AS JSON)

OUTER APPLY

OpenJson(sequence)

WITH (Number INT N'$.number', Word NVARCHAR(30) N'$.word')

) AS source (Number, Word, Region)

ON target.Number = source.Number AND target.Region = source.Region

WHEN MATCHED AND (source.Word <> target.Word) THEN

UPDATE SET target.Word = source.Word

WHEN NOT MATCHED THEN INSERT (Number, Word, Region)

VALUES

(source.Number, source.Word, source.Region);

We can now very quickly ingest the whole collection into our table, pulling the data in from file. We include this file with the download on GitHub, so you can try it out. There are thirty-three different regions in the JSON file

DECLARE @JSON nvarchar(max)

SELECT @json = BulkColumn

FROM OPENROWSET (BULK 'D:\raw data\YanTanTethera.json', SINGLE_BLOB) as jsonFile

EXECUTE #MergeJSONWithEmbeddedArraywithCountingTable @JSON

--The file must be UTF-16 Little Endian

We can now check that it is all in and correct

SELECT SheepCountingWords.Number,

Max(CASE WHEN SheepCountingWords.Region = 'Ancient British' THEN