I have worked in more than one regulated industry, and since the banking crisis of 2008, I have witnessed a sea change in the approach to regulation. The UK Financial Services Authority (FSA) was seen to be a toothless tiger. The UK government replaced the FSA with two separate bodies, each with its own more focused set of responsibilities. They pursue those responsibilities with greater vigour.
GDPR regulation has harsh financial penalties. Because of these penalties, organisations take GDPR more seriously than they did the old Data Protection act. Especially given the number of high profile data breaches and a more hostile security threat environment. The companies that used to carry out audits are now being asked to audit a wider range of activities and at greater depth. Larger companies are likely to use more than one auditor in order to spread their risk.
So what does this mean for data engineers and DBAs?
In my case, I find I am at the beck and call of the auditing team. Many of their questions result in unpredictable workloads. My organisation may have planned for me to work on a number of revenue generating products and features but auditing work trumps that. I have started to think about how I can make the auditing process less painful for all concerned. To do this we have to design our systems and processes for auditability. I should like to share with you what it is like to work with the auditors and how we have reduced the impact of the inevitable auditing workload.
Answering questions from the auditors?
Always be honest with the auditors although some of the answers you have to give may be embarrassing to you and your organisation. Where I felt unable to answer their questions, I clearly stated that this was the case. The role of an auditor includes informing the organisation where they are at risk. They cannot do this if the answers they receive are inaccurate or untruthful.
I found my sessions with the auditors to be like an intensive session where I was grilled by a panel of business analysts who use the data warehouse BEAM methodology. This methodology was championed by Lawrence Corr and Jim Stagnitto. BEAM uses the 7Ws: who, what, where, when, how many, why, and how.
Where questions from auditors differ from those from business analysts is that the auditors ask to see documented proof to support your answers. You should be prepared to demonstrate systems whether or not you have documented proof. I do not work late on days when I spend significant time with the auditors. It is surprising how exhausting intensive questioning can be, particularly when you have faced a team of intelligent inquisitors.
Given the increasing number of high profile data breaches the auditors treat "who has access to customer data" as a major theme encapsulating many questions. A few examples:
- Where is the catalogue of all IT systems especially those holding customer data? This is a required artefact under GDPR Article 30 - record of processing activities.
- How many systems use your centralised login directory (Active Directory)?
- How many people have the capability of changing permissions to systems holding customer data?
- What is your software release process for systems holding customer data?
- How many people can release to a live environment?
- Can your processes be bypassed? Easily?
- How can you be sure a release does not put customer data at risk?
- Where do you hold decisions recorded for systems holding customer data?
- Can you demonstrate your implementation of the GDPR processes for a given list of requests?
Artefacts I found useful
We had done a lot of preparation work before the GDPR became active back in May 2018. We maintain a number of the original artefacts we produced to keep on top of our GDPR obligations, such as:
- Catalogue of our various systems
- Diagrams of the dataflows between our key systems
- The 3rd parties we engage with
Not only are these of use to the auditors they have real business value too.
Our data engineering CI/CD pipeline uses GitHub actions to deploy software. GitHub has the ability to represent a pipeline visually as a DAG (Data Acyclic Graph). A DAG is simply a data flow that does not perform in a loop. I provided a screenshot of our pipeline to the auditing team with supporting commentary which clearly showed that unless specific steps succeeded further steps could not run.
- Code linting had to pass in order for unit tests to run
- Unless unit tests passed integration tests would not run
- Deployment to higher environments depended on passing integration tests
We designed our GDPR processes to record evidence that they had carried out the desired actions and in what timeframes. We had to think carefully about this because without care it is easy to design a system that negates the whole purpose of a "right to be forgotten" request.
My team records the decisions we make with regard to the software and databases we build using Git markdown files in our Git repositories. These reveal why we wrote our software the way that we did but for the auditors it shows that we have a decision making process with a record of the decisions taken.
Diagrams for information flow are best kept up-to-date. Leaving an update until just before audit time introduces unnecessary stress and doubt.
Collating information for the auditors can be an intensive task with the relevant information in many locations. I find it useful to provide hyperlinks in each location to the other locations so that it is easy to navigate between them. In effect each location becomes a portal through to the others. This is also useful to internal staff and particularly to new starters.
I have found that organisational thought tends to homogenise over time so I found the the questions posed by the auditors to be useful in provoking different modes of thought. In some cases their questions changed the way I thought about what I was building.
One of the auditable tasks I have to go through every financial quarter is a critical systems access attestation. This reveals who has access to what and confirms whether or not our starters/leavers/movers process is working. It helps if this process is automated though retrofitting systems with such capability can be difficult.