Do 75% of data breaches really come from “insiders”?

There’s a lot of information out there on data breaches. I’ve written before about one source that I trust — the Verizon Data Breach Report (DBIR).

The 2018 DBIR studied a sample of 2,216 confirmed data breaches, and of these it found that 28% involved internal actors. The DBIR uses a publicly accessible database of security incidents, and applies quality filters to data before including it in the report.

Only 28%? I heard that 75% of breaches come from “insiders”

Different studies sample different breaches, so it’s natural that there would be some variance on findings about who is behind breaches. However, I heard about a report where the variance was enough that I wanted to look into it: a presentation at the PASS Summit in 2018 cited a 2017 article which found that three quarters of data breaches came from “insiders.”

This figure seems very high to me. Every day we hear about data breaches in the news — from bagels to literally everything else. Could three quarters of these really be due to the malice or incompetence of employees?

Let’s find out where that number comes from.

Follow the article trail

I’m not going to link to the articles involved in spreading this statistic, because I think they’re clickbait. I followed a trail of:

A somewhat legitimate looking security article which says, “nearly three-quarters of incidents are due to insider threats.” The article doesn’t define what insider threats are, but says “Not all insider threats are deliberate,” and then references a different study.
The source referenced for the “nearly three-quarters” stat in that article is an ebook by a security tools retailer. In the undated eBook, they write…
- “According to the most recent Clearswift Insider Threat Index (CITI) report, 74% of security breaches originate from within the extended global enterprise. ”

It looks like the original source of the “75%” number is the security company Clearswift. They used to publish their “Threat Index” as a PDF, but more recent years appear to simply be simpler press releases, such as this one for 2017.

What do we know about the methodology?

Studying data breaches is hard, for a few reasons. Not everyone wants to talk about them. Often, it’s a long time until the breach is discovered. And getting to the bottom of the breaches can be tough.

The folks at Clearswift describe their research as having…

surveyed 600 senior business decision makers and 1,200 employees across the UK, US, Germany and Australia
Clearswift 2017 threat index

It’s not clear how many of these respondents had confirmed data breaches that impacted customers, or the nature of how the causes of the breaches were assessed.

Reading the press release, here’s some clarification on the numbers for 2017:

“Threats from an employee – inadvertent or malicious – make up 42% of incidents”
“When looking at the extended enterprise – employees, customers, suppliers, and ex-employees – this number reaches 74%, compared to 26% of attacks from parties unknown to the organization.”

Looking at sources: do they define ‘data breach’ and ‘security incident’?

The folks who write the Verizon DBIR are very careful about defining terms at the beginning of their report — which is one of the reasons I’m such a big fan of it. Here are the definitions from the 2018 DBIR:

Security incidents are: “a security event that compromises the integrity, confidentiality or availability of an information asset”.
Data breaches are: “an incident that results in the confirmed disclosure — not just potential exposure — of data to an unauthorized party.”

Clearswift uses both of these terms, but their threat indexes do not define them or differentiate against them. It often reads as if they use the terms interchangeably. That’s a big problem — and it may be that the people who are taking their surveys aren’t sure what the definitions are, either.

These numbers shouldn’t add up, should they?

Another puzzling thing about the Clearswift numbers is that they add up a bit too neatly.

If 74% of attacks originate from ‘inside’ (the extended enterprise, due to malice and accident) and 26% originate from hackers outside, then were there 0% of cases where hackers collaborated with a malicious employee? Or where hackers took advantage of a mistake?

Isn’t it natural, and even likely, that many data breaches have multiple points of origin?

Separating the research from the clickbait

When I’m reading about data breaches, I ask these questions:

Is the source study clearly referenced and available for review?
Does the source study define its terms clearly?
Does the source study clearly state how many distinct confirmed data breaches were analyzed?
Does the source study allow multiple causes for a data breach?

If all of those are a ‘yes’, it’s probably not clickbait.

In this case, I think it probably is.