Blog Post

Using PyApacheAtlas to import SQL Server extended properties to Azure Purview

,

Azure Purview is a compliance and data governance platform in Microsoft’s Azure cloud. The platform is currently in public preview and has quite a few features; but, does not currently import any of the extended properties that you may have added over the years to your SQL Server objects.

 

Luckily Azure Purview has a number of different ways you can interact with its APIs. Purview CLI is a rich tool for the command line, PyApacheAtlas is a python based tool, and you can .NET or even Kafka. For this topic we’ll use PyApacheAtlas, which I’m sure you already knew based on the blog title.

What’s in this post?

This post is going to be restricted to only SQL Server Table Columns and only Extended Properties named MS_Description. Quite a few years ago I worked on a data catalog project where we added descriptions for many of the tables, views, and columns to the database using extended properties named MS_Description. Let’s assume you have some of these for this post keeping in mind that the Purview APIs provide so many functions beyond what this post covers and that the code here could be modified to do so much more as well.

Starting out I thought it would be great to import the sensitivity classifications that SSMS creates. Pre-SQL 2019 these were held in Extended Properties and now have their very own DMV (sys.sensitivity_classifications). While this sounded great in theory it wasn’t as exciting when I wrote the code. This is because Azure Purview already has system classifications at a more granular scale for each of the ones you find in SSMS and Purview also adds these as it executes a scan on the data source. It does a pretty good job too. With that said, I shifted my focus to adding descriptions instead.

Remember these in SSMS?

ms_description

Using PyApacheAtlas with Purview

There’s a bit of upfront work to get the script running, you’ll need at a minimum:

  • Azure account
  • Azure Purview installed and configured
  • Python and PyApacheAtlas installed
  • Credentials (Tenant ID, Client ID, Client Secret, and the name of your Purview instance)
  • Likely you’ll also need a machine with internet access and a keyboard. Smile

Once you’ve collected these items you should be good to go. There are plenty of guides on Bing.com to help with configuring a service principal or Purview; so, I’m not going into detail on those here.

Let’s jump into the code! First we’ll cover the different sections of code and then the whole script will be posted at the bottom.

This first section imports any necessary items and authenticates with Purview. Note the sections starting on line 14 and 19. This is where you’ll need to fill in your environment’s data to successfully authenticate. Each of these values is blank because you really don’t want my credentials, right?

You’ll also want to change line 5 to point to whatever SQL Server you want to pull descriptions from.

image

In this next bit of code, starting on line 24, we query The SQL Server instance and load all the database names into a list named searchdbs. We’ll loop over this list to query each individual database for extended properties.

image

Looping over each database, we run a T-SQL query to grab the extended properties. This query has been limited to only tables and just MS_Description but could easily be changed. Keep in mind we are only after the description for this sample.

image

Lastly, we review each row from the query and try to add the description with a partial update using partial_update_entity. Caution! This will overwrite the existing description.

Note: For this sample, we’re allowing the update to fail if the entity doesn’t exist in Purview. You could easily add a check before hand with an IF condition to evaluate if the asset existed and if not skip. Keep in mind that Purview is currently limited to 10 million or 40 million API calls per month depending on your SKU. Reducing API calls to a minimum and using bulk API operations is highly recommended.

image

To pull of a successful update you’ll need a couple things:  The qualifiedName and the Purview typeName.

Using Purview Studio we can see that the type of this column is “MS SQL Column”.  The typeName for “MS SQL Column” is “mssql_column”. That is exclusive for a table column. Yes, it sounds silly but view columns have their own type so your query would need to account for this and put the appropriate typeName for your asset. For the record, a view column is “mssql_view_column”.

In Purview the qualifiedName follows a naming standard that is fairly simple. For relational databases this is:  type://highest level/n level/n level/…#lowest level. Using an easy naming standard allows us to programmatically build the qualifiedName in our query.

For SQL Server this would be: mssql://server/instance/db/schema/table#column

Take a look at the following example for a real world qualifiedName.

image

Also noted in the picture above is the description that our python code is targeting. Once you’ve added all the credentials and have successfully added your SQL Server to Purview and scanned it you should be ready to run the code and add your descriptions. Yay!

Here’s the full code (without descriptions):

import json

import os

import pyodbc

#this is the location of your SQL Server

server = ‘DESKTOP’

database = ‘master’

driver= ‘{ODBC Driver 17 for SQL Server}’

#setup authentication for Azure Purview

from pyapacheatlas.auth import ServicePrincipalAuthentication

from pyapacheatlas.core.client import PurviewClient

from pyapacheatlas.core.typedef import TypeCategory, ClassificationTypeDef

from pyapacheatlas.core.entity import AtlasClassification

oauth = ServicePrincipalAuthentication(

tenant_id=os.environ.get(“AZURE_TENANT_ID”, “”),

client_id=os.environ.get(“AZURE_CLIENT_ID”, “”),

client_secret=os.environ.get(“AZURE_CLIENT_SECRET”, “”)

)

client = PurviewClient(

account_name=os.environ.get(“PURVIEW_NAME”, “”),

authentication=oauth

)

#Query SQL Server for a list of databases

with pyodbc.connect(‘DRIVER=’+driver+’;SERVER=’+server+’;trusted_connection=Yes;’) as conn:

with conn.cursor() as cursor:

cursor.execute(“SELECT name FROM sys.databases where database_id > 4”)

alldbrows = cursor.fetchall()

databases = dict(enumerate(db[0] for db in alldbrows))

searchdbs = list(databases.values())

#For each database query extended properties and sensitivity labels

for db in searchdbs:

#print(db)

#with pyodbc.connect(‘DRIVER=’+driver+’;SERVER=’+server+’;trusted_connection=Yes;’) as searchconn:

with pyodbc.connect(‘DRIVER=’ + driver + ‘;SERVER=’ + server + ‘;PORT=1433;DATABASE=’ + db + ‘;trusted_connection=Yes;’) as searchconn:

with searchconn.cursor() as dbcursor2:

#query only looks at tables. SQL Server columns in tables for Purview are type mssql_column. Views are mssql_view_column.

dbcursor2.execute(”    SELECT CONCAT(‘mssql://’, @@SERVERNAME, ‘/’, TABLE_CATALOG, ‘/’, TABLE_SCHEMA, ‘/’, TABLE_NAME, ‘#’, COLUMN_NAME) AS qualifiedName, “

”        ‘mssql_column’ as typeName, “

”        CONVERT(nvarchar(4000), DATA_TYPE) as dataType, “

”        CONVERT(nvarchar(4000), ex.value) as columndescription “

”    FROM INFORMATION_SCHEMA.COLUMNS ISC “

”    OUTER APPLY ::fn_listextendedproperty(null,’Schema’, isc.TABLE_SCHEMA, ‘Table’, isc.TABLE_NAME, ‘Column’, isc.COLUMN_NAME) ex “

”    WHERE ex.value is not null and ex.name = ‘MS_Description'” )

row = dbcursor2.fetchone()

while row:

print (“Processing:” + str(row[0]) + ” | ” + str(row[1]) + ” | ” + str(row[2]) + ” | ” + str(row[3]))

qName = str(row[0])

tName = str(row[1])

dName = str(row[2])

columndescription = str(row[3])

try:

client.partial_update_entity(

typeName= tName,

qualifiedName= qName,

attributes={“description”:columndescription}

)

print(f”tUpdated {qName}”)

except Exception as e:

print (f”t{e} : This asset may not exist in Purview. Setup the datasource and scan the database to add it. Validate it exists by searching the qualifiedName ({qName})”)

#Move to next row in resultset

row = dbcursor2.fetchone()

For many more sample scripts with PyApacheAtlas and Purview:

https://github.com/wjohnson/pyapacheatlas

Original post (opens in new tab)

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating