SQLServerCentral Article

JQ - Tool for Querying JSON documents using the CLI

,

In my role as a Data Engineer, I have an ever-increasing need for Querying JSON documents using the CLI (command line interface) in these ways:

  • Filtering the output from a cloud CLI to just the useful information
  • Extracting the output from an API
  • Evaluating JSON documents to ingest them into a database.
  • Reformatting information to submit to an API
  • Software/data archaeology

For the use cases above uploading JSON to a DB is overkill or premature, so the jq command line utility allows me to query JSON documents from the command line.  This tool is written in C and you can install it on Windows, Linux, or Mac OSX.

Installing jq

The Windows installation is the most involved as it depends on how you intend to run it.  As I use Mac OSX and Linux I can only offer approximate guidance on this.  I have a tool called Parallels Desktop on my Mac, which allows me to run Windows 11.

To install jq I used the steps outlined below.

  1. Downloaded jq-win64.exe from the most recent releases on https://github.com/jqlang/jq/releases. Depending on your machine processor you may have to choose jq-windows-amd64.exe
  2. Rename the downloaded exe to jq.exe
  3. Create a folder for your app.  I used C:\program files\jq
  4. Copy jq.exe to your new folder
  5. Open up your Windows system properties/environment variables and add your new folder to the PATH environment variable

For MacBooks it is just a case of using the homebrew package manager

brew install jq

To check that everything is working start a new command shell window and type

jq --version
# This should bring back something similar to
# jq-1.7.1

Helpful resources to help you learn jq

Using jq to pretty print a JSON file

Let us suppose we extract a small fragment of json from an application log.   As you can see it isn't easy to absorb what it is telling you though some of you might recognise it as part of an AWS S3 PUT event.

"s3": {"s3SchemaVersion": "1.0","configurationId": "testConfigRule","bucket": {"name": "example-bucket","ownerIdentity": {"principalId": "EXAMPLE"},"arn": "arn:aws:s3:::example-bucket"},"object": {"key": "test0.000000key","size": 1024,"eTag": "0123456789abcdef0123456789abcdef","sequencer": "0A1B2C3D4E5F678901"}}}

For the sake of these examples, rather than pipe the results of a cloud command line call into jq we are going to store the fragment in a file called logfragment.json. On Windows you can type the following into your command window

type logfragment.json|jq
# or as an alternative
jq '.' logfragment.json

This will pretty print your logfragment.json file.  As you can see it is more readable.

{
  "s3": {
    "s3SchemaVersion": "1.0",
    "configurationId": "testConfigRule",
    "bucket": {
      "name": "example-bucket",
      "ownerIdentity": {
        "principalId": "EXAMPLE"
      },
      "arn": "arn:aws:s3:::example-bucket"
    },
    "object": {
      "key": "test0.000000key",
      "size": 1024,
      "eTag": "0123456789abcdef0123456789abcdef",
      "sequencer": "0A1B2C3D4E5F678901"
    }
  }
}

Using JQ to strip down a JSON file

JSON documents can be verbose which,  for diagnostic purposes, we may need only a handful of attributes.

Look at our example logfragment.json file, we are only be interested in three attributes

  • The bucket name
  • The key within the bucket
  • The size of the object referenced by the key.

The jq application allows us to select just the items we are interested in.  In the example below we take those three attributes and render them in a far simpler document.

jq '.s3|{"name":.bucket.name, "key":.object.key, "size":.object.size}' logfragment.json

The pipe symbol allows us to chain jq operations together.

  • Extract the s3 node.  Below this node
  • Assign a key called "name" and get the value of the relative node path .bucket.name
  • Next, assign a key called "key" and get the value of the relative node path .object.key
  • Finally, assign a key called "size" and get the value of the relative node path .object.size

We could have written our jq query in referencing the full path of the attributes as follows.

jq '{"name":.s3.bucket.name, "key":.s3.object.key, "size":.s3.object.size}' logfragment.json

The document produced from either form is shown below.

{
  "name": "example-bucket",
  "key": "test0.000000key",
  "size": 1024
}

We don't have to produce output in JSON format,   we could have asked for just the values

# The default is to wrap strings in double-quotes
jq '.s3|.bucket.name, .object.key, .object.size' logfragment.json
# If we just wanted the "raw" values
jq -r '.s3|.bucket.name, .object.key, .object.size' logfragment.json

Using jq to produce a CSV file

We can ask jq to produce our output in CSV format and to do so the @csv operator our input must be an array. As we have a single record we have to wrap up our extracted fields within square brackets to convert them into a one line array.

# Just the csv data row
jq -r '[.s3|.bucket.name, .object.key, .object.size]|@csv' logfragment.json
# CSV with headers
jq -r '["name","key","size"],[.s3|.bucket.name, .object.key, .object.size]|@csv' logfragment.json

In the 2nd example above we are creating a UNION of 2 arrays.

  • First array member is our column names
  • Second array member is our data

Using jq to find the JSON keys at a particular level

When I have a large JSON object my starting point is to look at what keys exist.  I use this to give me an indication of the objects that are contained in the JSON document.  For our logfragment.json we scarcely need such a technique but the technique is as follows.

jq '.s3|keys' logfragment.json
#  This will produce the following
#
#    [
#      "bucket",
#      "configurationId",
#      "object",
#      "s3SchemaVersion"
#    ]

Using jq to extract from an array

So far we have simply flattened and simplified a small JSON fragment though you can apply the same technique to a much larger document.  What we will try next is to extract data from an array and its parent. For this I will use a simple document, which I will store in a file called orders.json, purely to illustrate the concept.

{
    "orderId": 123,
    "orderDate": "2024-01-20",
    "orderItems": [
        {
            "productCode": "KYB00001",
            "price": 102.50,
            "quantity": 1,
            "discountPercentage": 0.0
        },
        {
            "productCode": "CBL00002",
            "price": 2.50,
            "quantity": 100,
            "discountPercentage": 10.0
        }
    ],
    "customerId":"7a27135f-4b2f-4ade-8754-2b001639418b"
}

We want the order header information to repeat for each order item and perform a simple calculation or the total order item price.  We can do this with the following jq expressions

jq -r '.customerId as $custId|.orderId as $Id|.orderDate as $PurchaseDate|
    [
       "CustomerNumber",
       "OrderNumber",
       "OrderDate",
       "ProductCode",
       "UnitPrice",
       "Quantity",
       "ItemPrice"
    ],
(.orderItems[]|
    [
        $custId,
        $Id,
        $PurchaseDate,
        .productCode, 
        .price, 
        .quantity, 
        (.price * .quantity * (1 -.discountPercentage /100))
    ]
)
|@csv' orders.json

Our jq expression above illustrates some interesting concepts.

  • A jq expression can be split over many lines to aid readability
  • We can assign a value to a variable at the parent level to be used later
  • We can perform calculation

Our output will appear as follows

"CustomerNumber","OrderNumber","OrderDate","ProductCode","UnitPrice","Quantity","ItemPrice"
"7a27135f-4b2f-4ade-8754-2b001639418b",123,"2024-01-20","KYB00001",102.50,1,102.5
"7a27135f-4b2f-4ade-8754-2b001639418b",123,"2024-01-20","CBL00002",2.50,100,225

Building your library of jq expressions

So far the jq expressions I have shown you have been typed in long hand though, just like some of the useful SQL queries a DBA has at their disposal, it would be handy if these could be stored in files.  Fortunately, just as sqlcmd has the -i switch, jq has -f or --from-file.

Store everything within, but excluding, the single quotes in the previous jq expression in a file called order.jq.  Now you can run the following and get the same result as before.

jq -rf orders.jq orders.json

The first file contains our jq expression, the 2nd contains the json we wish to run it against.

Think of a jq expression stored in a file as being akin to a stored procedure such as the sp_blitz procs SQL Server DBAs know and love.

Again, just as you would pass arguments to SQLCMD using the -v switch, jq as the --arg and even --argjson switch.

Closing thoughts

There are times when ingesting JSON into a database is absolutely the right thing to do.  The jq utility is useful when what we need a quick and dirty extraction.

I have only given you the basics of what jq can do.  Believe me, this barely scratches the surface though for the more complicated interactions I think SQL Server JSON functionality may be more appropriate.

There is a yq utility too which can handle JSON, YAML and XML.  For JSON processing it doesn't support everything that jq can though, depending on your use case, it may fit your needs.  As there are a lot of web and database tools that use large YAML files for configuration I have found it similarly useful.

One use case I should mention is the use of jq/yq syntax when putting together ETL mapping specifications where a data source is a JSON source.  This is an subject on which I hope to write more at a later date.

For Python fans both the pyjq library and yq library allow Python applications can use jq and/or yq.

Where these utilities really come into their own is when using them in shell scripting (bash, powershell etc). This is especially true in resource constrained environments such as GitHub actions where small, efficient, but powerful,  tooling is a Godsend.

 

Rate

5 (1)

You rated this post out of 5. Change rating

Share

Share

Rate

5 (1)

You rated this post out of 5. Change rating