How might classification and better documentation improve data safety?


In this post, we imagine how auto-classification of data can be used to build better documentation that helps you trust that your organization can use data without posing a risk or compromising regulatory compliance.


You may have read recently about how we’ve spent the last month exploring the theme of data classification in Foundry, Redgate’s R&D division. ‘SQL Atlas’ is our imagined solution to the challenges you’ve been telling us about during this research.

So, what’s the problem?

All companies and organizations that exist, collect data in some way, shape or form. We rely on this data in order to be able to provide our products and services to our customers. Crucially, we’ve also realized it’s got additional value.

Business intelligence gives us the opportunity to become more efficient or spot new commercial opportunities. Using real data in development and QA means we can launch better more stable new features for our products. We can improve the customer support experience if customer service representatives can access the details to give them the context for the problem.

The problem is that all these business activities generate data sprawl; copies of data, strewn across your organization. As we become more aware of our obligations to treat particular pieces of sensitive data carefully, the free access and use of data is less attractive (especially when the risk of hefty HIPAA, SOX or GDPR fines could have dire consequences). And so we revoke access and build complex request systems for the consumers of data within our organization, trying to manage the flow of data centrally – yet also creating a bottleneck and choking the business.

How might ‘SQL Atlas’ help?

What if you could devolve and distribute the responsibility of accessing, preparing and using sensitive data appropriately? Here we’ve imagined how SQL Atlas might be able to make that work.

Step 1 – Auto-classification

In the first step, SQL Atlas does the hard work for you, traversing your SQL Server to help you answer the question What data have we got, and how dangerous is it?.

data classification 1
Fig 1. The results of a data scan showing the amount of sensitive data on production, and it’s classification
.

Step 2 – Treatment policy management

SQL Atlas uses classification categories – or you can use your own – from which to create a set of corresponding rules for handling that type of data.

data classification 2

Fig 2. Tuning the rules that apply the classification categories to each data type,
along with their corresponding treatment policies
.

The policies that these rules correspond to describe how the data must be handled according to the regulation(s) that your organization is subject to.

Fig 3. The collection of your policies, defined by the regulation
that describes how to deal with different pieces of sensitive data
.

Step 3 – Using the data catalogue

The result of a scan is a single repository, the catalogue of all the data held within SQL Server. For each piece of data, the catalogue holds basic information like:

  • A description – a description of the data
  • The owner – who is responsible for the collection/management of this data
  • Access conditions – who’s allowed access to this data (and in what form; raw, treated, etc)
  • Possible uses – a description of what the data is allowed to be used for

But crucially, it also now holds new information about the sensitivity of the data, including:

  • The classification category – what, if anything, makes this piece of data sensitive
  • The appropriate treatment policy – this is the policy that describes how this piece of data must be handled, according to its classification for BAU activities like BI or development

This means that SQL Atlas has created a single catalogue of the data held within the SQL Servers in your organization. Now, when someone within your organization needs to consume this data, they’re able to query the catalogue.

data classification 4

Fig 4. Users can select the data they want to use from the data catalogue
.

When the consumer selects the data they’re interested in, SQL Atlas presents them with the documentation you prepared in step 2, that describes how they must prepare and handle that data in order to be able to use it.

data classification 5

Fig 5. SQL Atlas presents the corresponding handling policy for each piece of data
that the user searching the catalogue is interested in
.

Try the demo

We’ve created an interactive demo so you can try out the tool for yourself.

What next?

Can you imagine implementing this in your organization? What might stop you? What opportunities could this create? Let us know – we’d love to talk. If you’d to get more involved with the development of this product, you can send us an email. Or if you’d just like to be kept informed of developments then sign up for the latest updates.