Reading and Writing your Database's Documentation using JSON

,

One of the problems to which I keep returning is finding the best way to read and apply documentation for databases. As part of a series of articles I'm doing for Redgate's Product Learning, I've been demonstrating how to maintain a single source of database documentation, in JSON, and then add and update the object descriptions during Flyway migrations. It is a single re-executable script that documents the tables, views and table-valued functions of a database, and their columns, from a central JSON document. This can be generated from the database using a separate script that produces a JSON representative of a list of hash tables giving the name and description for every column. It is easily readable and can be reliably parsed to apply any documentation that you wish to add to the database. Ideally, you'd use YAML for readable structured documents but JSON can easily be read by SQL. I chose to stop at listing table sources (tables, views, TVFs and so on) and their columns, feeling sure that most people would stop documenting at that point, if not before.

However, I then decided it was worth doing a general version for anyone doing SQL Server developments.

What should be documented?

Microsoft supports extended properties for almost anything that can be described in the metadata, but most often we use it for table-sources and columns. In the article I've mentioned, I included only columns. Then I thought to myself, 'Hell no Phil, you like documenting indexes too!'. Then I thought about all the triggers I'd forgotten about in my life: forgot they were there, and forgot what precisely they did. After moodily kicking gravel around for a while, I decided to rewrite everything to include all the major parts of tables any other objects that could be a table source, such as views and table-valued functions. These components would be constraints, triggers, columns, indexes and, in the case of functions, parameters too. By doing this, we can begin to track where the data is a database, and how it is being used.

Basically, there are two essential aspects to this task, generating the JSON collection, and saving it to the target database. When saving the descriptions, it checks for the existence of every column, index and constraint and only attaches documents those that exist. It should effectively document every version of the database and could even be executed against an empty database without breaking.

The JSON Collection

Here's just one table as an example, the indefatigable person.person table from Adventureworks2016. OK, it's wordy, but it is informative.

[
   {
      "TableObjectName":"Person.Person",
      "Type":"user table",
      "Description":"Human beings involved with AdventureWorks: employees, customer contacts, and vendor contacts.",
      "Attributes":{
         "Check constraint":{
            "CK_Person_EmailPromotion":"Check constraint [EmailPromotion] >= (0) AND [EmailPromotion] <= (2)",
            "CK_Person_PersonType":"Check constraint [PersonType] is one of SC, VC, IN, EM or SP."
         },
         "Column":{
            "BusinessEntityID":"Primary key for Person records.",
            "PersonType":"Primary type of person: SC = Store Contact, IN = Individual (retail) customer, SP = Sales person, EM = Employee (non-sales), VC = Vendor contact, GC = General contact",
            "NameStyle":"0 = The data in FirstName and LastName are stored in western style (first name, last name) order.  1 = Eastern style (last name, first name) order.",
            "Title":"A courtesy title. For example, Mr. or Ms.",
            "FirstName":"First name of the person.",
            "MiddleName":"Middle name or middle initial of the person.",
            "LastName":"Last name of the person.",
            "Suffix":"Surname suffix. For example, Sr. or Jr.",
            "EmailPromotion":"0 = Contact does not wish to receive e-mail promotions, 1 = Contact does wish to receive e-mail promotions from AdventureWorks, 2 = Contact does wish to receive e-mail promotions from AdventureWorks and selected partners. ",
            "AdditionalContactInfo":"Additional contact information about the person stored in xml format. ",
            "Demographics":"Personal information such as hobbies, and income collected from online shoppers. Used for sales analysis.",
            "rowguid":"ROWGUIDCOL number uniquely identifying the record. Used to support a merge replication sample.",
            "ModifiedDate":"Date and time the record was last updated."
         },
         "Default constraint":{
            "DF_Person_ModifiedDate":"Default constraint value of GETDATE()",
            "DF_Person_NameStyle":"Default constraint value of 0",
            "DF_Person_EmailPromotion":"Default constraint value of 0",
            "DF_Person_rowguid":"Default constraint value of NEWID()"
         },
         "Foreign key constraint":{
            "FK_Person_BusinessEntity_BusinessEntityID":"Foreign key constraint referencing BusinessEntity.BusinessEntityID."
         },
         "Index":{
            "PK_Person_BusinessEntityID":"Clustered index created by a primary key constraint.",
            "IX_Person_LastName_FirstName_MiddleName":null,
            "AK_Person_rowguid":"Unique nonclustered index. Used to support replication samples.",
            "PXML_Person_AddContact":"Primary XML index.",
            "PXML_Person_Demographics":"Primary XML index.",
            "XMLPATH_Person_Demographics":"Secondary XML index for path.",
            "XMLPROPERTY_Person_Demographics":"Secondary XML index for property.",
            "XMLVALUE_Person_Demographics":"Secondary XML index for value."
         },
         "Primary key constraint":{
            "PK_Person_BusinessEntityID":"Primary key (clustered) constraint"
         },
         "Sql trigger":{
            "iuPerson":"AFTER INSERT, UPDATE trigger inserting Individual only if the Customer does not exist in the Store table and setting the ModifiedDate column in the Person table to the current date."
         }
      }
   }
]

Notice that each table source has several categories and what is effectively a list of names and description for each category. In the example, we have Check constraint, Column, Default constraint, Foreign key constraint, Index, Primary key constraint and SQL trigger.

Two issues here:

  • Keeping it readable: Because the attributes of a table source, such as columns, indexes and constraints, have to be unique, there is no point representing them in an array unless you want to pile in the details. You can create them as a hashtable where the key is the name of the attribute and the value is the description stored as an EP. Simple eh? You then assign this hashtable to a key representing the class of attribute, such as a constraint, index, column, parameter, trigger or whatever. The FOR JSON cannot be tickled into doing this, and you need to create these hashtables in SQL and coerce them into a JSON hashtable using JSON_Query().
  • Remember the purpose: You will immediately say that you want more information about columns or indexes than just the name. Well, sure; and it can be automated, but it makes any documentation unwieldy, but it is a great temptation to grab information like datatype and dependencies. It is like taking a child into a sweetshop. You plead with them to take only what they can eat, but they just turn pink and their bottom-lip quivers. Instead, I'll merely point out that a data dictionary allows you the satellite image of your data, and the more information you add, the less you can see at a glance. If you want more you can do what the writers of Adventureworks did, and automate a lot of the documentation, adding details to the documentation. You have, after all 3500 NVARCHARs of information you can put into an EP. If you want more, you just create a series of EP blocks to hold it all. Don't overdo it, though: Remember that, like Marcel Proust's 'À la recherche du temps perdu', everyone will think it wonderful, but nobody will read it.

The trick I use is to have a sentence in the EP that says something like 'Uses a datatype of….' or 'This is referenced by ……' that can be updated automatically by SQL as the result of a metadata query. Actually, this second example is a bad one, because our example of person.person is referenced by

  • dbo.ufnGetContactInformation
  • dbo.uspGetEmployeeManagers
  • dbo.uspGetManagerEmployees
  • HumanResources.vEmployee
  • HumanResources.vEmployeeDepartment
  • HumanResources.vEmployeeDepartmentHistory
  • Person.CK_Person_EmailPromotion
  • Person.CK_Person_PersonType
  • Person.iuPerson
  • Person.vAdditionalContactInfo
  • Purchasing.vVendorWithContacts
  • Sales.iduSalesOrderDetail
  • Sales.vIndividualCustomer
  • Sales.vPersonDemographics
  • Sales.vSalesPerson
  • Sales.vSalesPersonSalesByFiscalYears
  • Sales.vStoreWithContacts.

We are in Marcel Proust territory already. Even adding the text of constraints can occasionally ruin the literary appeal of a data dictionary. You are much better off using a good collection of metadata queries when you need to drill down to the detail.

Getting JSON documentation from a live database

So here is the query that produces this JSON data document for an entire database. You might need this when creating your script or when finding out what changes have been made that need to be documented. Most of the time, you'll use the JSON script as the 'Master' source of documentation because it tells you what hasn't yet been documented: it has a NULL against it. (There will be a blank string assigned to it if there is a zero-length string assigned to the extended property).

 DECLARE @JSON NVARCHAR(MAX);
SELECT @JSON =
  (
  SELECT Object_Schema_Name(tables.object_id) + '.' + tables.name AS TableObjectName,
    Lower(Replace(type_desc, '_', ' ')) AS [Type], --the type of table source
    Coalesce(Convert(NVARCHAR(3800), ep.value), '') AS "Description",
     (
     SELECT JSON_Query('{' + String_Agg(EachAttribute.attribute, ',') + '}')
     FROM
       (SELECT '"'+type_desc+'":{'
          +String_Agg(
        '"'+String_Escape(AllChildObjects.name,N'JSON')
        +'":'
        +Coalesce('"'+String_Escape(Convert(NVARCHAR(3750),AllChildObjects.value),N'JSON')+'"','null')
        , ',')
          WITHIN GROUP ( ORDER BY AllChildObjects.NaturalOrder)  +'}' AS attribute
       FROM 
          (--first we get the indexes and their descriptions
          SELECT 'Index' AS type_desc, TheIndexes.name, value,
            TheIndexes.index_id AS naturalOrder
            FROM sys.indexes AS TheIndexes
              LEFT OUTER JOIN sys.extended_properties epcolumn --get any description
                ON epcolumn.major_id = tables.object_id
               AND epcolumn.minor_id = TheIndexes.index_id
               AND epcolumn.class = 7 --an index
               AND epcolumn.name = 'MS_Description' --you may choose a different name
            WHERE TheIndexes.object_id = tables.object_id
          UNION ALL--then we get the columns and their descriptions
          SELECT 'Column' AS type_desc, TheColumns.name, value,
            TheColumns.column_id AS naturalOrder
            FROM sys.columns AS TheColumns
              LEFT OUTER JOIN sys.extended_properties epcolumn --get any description
                ON epcolumn.major_id = tables.object_id
               AND epcolumn.minor_id = TheColumns.column_id
               AND epcolumn.class = 1 --a column or object
               AND epcolumn.name = 'MS_Description' --you may choose a different name
            WHERE TheColumns.object_id = tables.object_id
         UNION ALL
      --then we get the parameters and their descriptions
          SELECT 'Parameter' AS type_desc, TheParameters.name, value,
            TheParameters.parameter_id AS naturalOrder
            FROM sys.parameters AS TheParameters
              LEFT OUTER JOIN sys.extended_properties epcolumn --get any description
                ON epcolumn.major_id = TheParameters.object_id
               AND epcolumn.minor_id = TheParameters.parameter_id
               AND epcolumn.class = 7 --an parameter
               AND epcolumn.name = 'MS_Description' --you may choose a different name
            WHERE TheParameters.object_id = tables.object_id
          UNION ALL --and then we get the constraints, triggers and other child objects
      SELECT Stuff(Lower(Replace(type_desc,'_',' ')),1,1,Upper(Left(type_desc,1)))
       COLLATE DATABASE_DEFAULT , ChildObjects.name,  value, object_id 
        FROM sys.objects AS ChildObjects
             LEFT OUTER JOIN sys.extended_properties epcolumn --get any description
                ON epcolumn.major_id = ChildObjects.object_id
               AND epcolumn.minor_id = 0 --an object
               AND epcolumn.name = 'MS_Description' --you may choose a different name
            WHERE ChildObjects.parent_object_id = tables.object_id
              AND ChildObjects.type <> 'IT'--who wants internal tables?
          ) AllChildObjects
        GROUP BY type_desc
      ) AS EachAttribute
  ) AS Attributes
    FROM sys.objects tables
      LEFT OUTER JOIN sys.extended_properties ep
        ON ep.major_id = tables.object_id
       AND ep.minor_id = 0
       AND ep.name = 'MS_Description'
    WHERE Type IN ('IF', 'FT', 'TF', 'U', 'V')
  FOR JSON AUTO
  );
IF ((SELECT IsJSON(@JSON))=0) SELECT @JSON ELSE RAISERROR ('Bad JSON from statement',16,1 )

This can easily be executed withing SQLCMD.exe to provide a JSON file of the database.

Saving a JSON Documentation file to disk from a database

Here is a script that can be executed in PowerShell.

$param1=@{'server'='MyServer';'database'='MyDatabase';
          'uid'='MyUser';'pwd'='MyPassword';
          'outputReport'='MyPathTo\Datadictionary.JSON'
}
$SQLCmdAlias = "$($env:ProgramFiles)\Microsoft SQL Server\Client SDK\ODBC\130\Tools\Binn\sqlcmd.exe"
Set-Alias SQLCmd   $SQLCMDAlias -Scope local
#is that alias correct?
if (!(test-path  ((Get-alias -Name SQLCmd).definition) -PathType Leaf))
  { $Problems += 'The alias for SQLCMD is not set correctly yet' }
  $query = @'
Set Nocount on
DECLARE @JSON NVARCHAR(MAX);
SELECT @JSON =
  (
  SELECT Object_Schema_Name(tables.object_id) + '.' + tables.name AS TableObjectName,
    Lower(REPLACE(cast(type_desc as nvarchar(max)),cast( '_' as nvarchar(max)),cast( ' ' as nvarchar(max)))) AS [Type], --the type of table source
    Coalesce(Convert(NVARCHAR(3800), ep.value), '') AS [Description],
     (
     SELECT JSON_Query('{' + String_Agg(EachAttribute.attribute, ',') + '}')
     FROM
       (SELECT char(34)+type_desc+char(34)+':{'
          +String_Agg(
        char(34)+String_Escape(AllChildObjects.name,N'JSON')
        +char(34)+':'
        +Coalesce(char(34)+String_Escape(Convert(NVARCHAR(3750),AllChildObjects.value),N'JSON')+char(34),'null')
        , ',')
          WITHIN GROUP ( ORDER BY AllChildObjects.NaturalOrder), +'}', AS, attribute, FROM, (--first, we, get, the, indexes, and, their descriptions
          SELECT 'Index' AS type_desc, TheIndexes.name, value,
            TheIndexes.index_id AS naturalOrder
            FROM sys.indexes AS TheIndexes
              LEFT OUTER JOIN sys.extended_properties epcolumn --get any description
                ON epcolumn.major_id = tables.object_id
               AND epcolumn.minor_id = TheIndexes.index_id
               AND epcolumn.class = 7 --an index
               AND epcolumn.name = 'MS_Description' --you may choose a different name
            WHERE TheIndexes.object_id = tables.object_id
          UNION ALL--then we get the columns and their descriptions
          SELECT 'Column' AS type_desc, TheColumns.name, value,
            TheColumns.column_id AS naturalOrder
            FROM sys.columns AS TheColumns
              LEFT OUTER JOIN sys.extended_properties epcolumn --get any description
                ON epcolumn.major_id = tables.object_id
               AND epcolumn.minor_id = TheColumns.column_id
               AND epcolumn.class = 1 --a column or object
               AND epcolumn.name = 'MS_Description' --you may choose a different name
            WHERE TheColumns.object_id = tables.object_id
          UNION ALL --and then we get the constraints, triggers and other child objects
      SELECT Stuff(Lower(REPLACE(cast(type_desc as nvarchar(max)),cast('_' as nvarchar(max)),cast(' ' as nvarchar(max)))),1,1,Upper(Left(type_desc,1)))
       COLLATE DATABASE_DEFAULT , ChildObjects.name,  value, object_id 
        FROM sys.objects AS ChildObjects
             LEFT OUTER JOIN sys.extended_properties epcolumn --get any description
                ON epcolumn.major_id = ChildObjects.object_id
               AND epcolumn.minor_id = 0 --an object
               AND epcolumn.name = 'MS_Description' --you may choose a different name
            WHERE ChildObjects.parent_object_id = tables.object_id
              AND ChildObjects.type <> 'IT'--who wants internal tables?
          ) AllChildObjects
        GROUP BY type_desc
      ) AS EachAttribute
  ) AS Attributes
    FROM sys.objects tables
      LEFT OUTER JOIN sys.extended_properties ep
        ON ep.major_id = tables.object_id
       AND ep.minor_id = 0
       AND ep.name = 'MS_Description'
    WHERE Type IN ('IF', 'FT', 'TF', 'U', 'V')
  FOR JSON AUTO
  );
IF ((SELECT IsJSON(@JSON))=1) SELECT @JSON ELSE RAISERROR ('Bad JSON from statement',16,1 )
'@
  if (!([string]::IsNullOrEmpty($param1.uid)) -and ([string]::IsNullOrEmpty($param1.pwd)))
  { $problems += 'No password is specified' }
    if (!([string]::IsNullOrEmpty($param1.uid)))
    {
      $MyJSON = sqlcmd -Q "$query" -S "$($param1.server)" -d "$($param1.database)" -U $($param1.uid) -P $($param1.pwd) -o "$($param1.outputReport)" -u -y0
      $arguments = "-S $($param1.server) -d $($param1.database) -U $($param1.uid) -P $($param1.pwd) -o "$($param1.outputReport)"  -u -y0"
    }
    else
    {
      $MyJSON = sqlcmd -S "$($param1.server)" -d "$($param1.database)" -Q "$query" -E -o $($param1.outputReport) -u -y0
      $arguments = "$($param1.server) -d $($param1.database) -E -o $($param1.outputReport)  -u -y0"
    }
    if (!($?))
    {
      #report a problem and send back the args for diagnosis (hint, only for script development)
      $Problems += "sqlcmd failed with code $LASTEXITCODE, with parameters $arguments"
    }
    $possibleError = Get-Content -Path $param1.outputReport -raw
    if ($PossibleError -like '*Sqlcmd: Error*')
    {
      Remove-Item $param1.outputReport;
    }

 

Rate

Share

Share

Rate