Azure Purview Best Practice: Naming your scans

If you’ve ever been a DBA and seen the mess that you get with SQL Agent Jobs without a clean naming standard for your job schedules and job names then you’ll appreciate this tip.

If you haven’t been a DBA that’s OK too. Years ago I came up with my own naming standard for SQL Agent artifacts and I’ve always felt better when the messy room was clean. No Really! That’s exactly what this is like. A messy room where you are pretty sure you put the item you’re looking for in but you just can’t seem to find it until you clean 95% of the mess and then you’re so exhausted that you don’t have time to do what you wanted to in the first place. Ever been there?

If you’re new to Azure Purview or don’t know what it is check out this resource which gives a great overview of this Data Catalog and Governance platform: https://docs.microsoft.com/en-us/azure/purview/overview.

Scan Naming Best Practice

I’ve blogged previously on the SQL Agent naming standard which I applied to Azure Data Factory (ADF) triggers and you can find that here: https://sqltechblog.com/2020/04/30/azure-data-factory-trigger-tip/

The catch with Azure Purview is you can’t schedule a scan daily. To do this you’d need to setup a scan for each day of the week and that’s really unnecessary. Instead Purview allows you to configure a weekly, monthly, or on-demand (once) scan. Since we’re talking about meta data for assets that shouldn’t change frequently weekly or monthly is just fine; but, this breaks my naming standard for ADF and SQL Agent.

Let’s take a look at the messy room. To find scans that have been run you need to choose insights from the menu on the left and then click “Scans”, finally choose “View more” on the chart. Once clicked you’ll get a grid showing all scans for the last 30 days, which you can filter by name or change the time range. It should look like this but with extra columns for success, failed, canceled counts and more. I took those out because they aren’t needed for this topic. Let’s have a look!

The first thing you may notice is that Last scan run date and Scan name are not in any order. Yes, you can click the header and order the grid results but why?

Next you probably see that all these scans and the detail provided is really not helpful if you want to know what the resource is that was scanned. Which data lake was scanned? Are these all the same one? BUT, WAIT! That last line!! It’s different!!! Now’s the time where I say, I’ll give you $20 if you can’t tell me which server this scanned and when it runs.

Using a proper naming standard helps us keep the room clean. Just from that last line, I know this ran for a SQL Server named MySQLServer001 and it runs every Thursday at 2 PM.

I’m not saying you need to follow my standard or that you need one at all. What I am suggesting is that life will be happier if you identify a naming standard that works for you and adopt it for your organization.

Here’s what I’ve come up with:

Purview scans are Weekly, Monthly, or Once and scan names cannot exceed 63 characters. Knowing this we want to have a name that is easily readable from Purview Studio’s grid. My standard is [DataSource]-[day]-[time] for daily, [DataSource]-[DD]-[time] for monthly, and [DataSource]-[YYMMDDHH] for once.

MySQLServer001-Mon-2300 = Scans MySQLServer001 every Monday at 11 PM.

MySQLServer001-08-0500 = Scans MySQLServer001 on the 8th day of each month at 5 AM.

MySQLServer001-21081214 = Scanned MySQLServer001 once in the 2 PM hour on 8/12/21.

It’s worth noting that I didn’t account for scans that run every other month or every few months. I’d suggest running your scans more frequently than every quarter and with that said I didn’t bother thinking of anything past monthly but you “could” do [datasource]-[nM]-[DD]-[time] if you had a need to scan infrequently. In this case n represents the number of months (up to 5) and M is there for context.

Azure Purview Best Practice: Naming your scans

Scan Naming Best Practice

Rate

Share

Share

Rate