Blog Post

BIT9 PathNames

,

We live in a day and age when security (data, network, server, etc) is seemingly at the forefront of the daily news. In our quest to improve security we seemingly always find more and more products that are supposed to help with that security. Sometimes, those products require a data repository. As it happens, it is not uncommon for that repository to be in a database. Today, I am looking into a specific issue with the Bit9 product.

Why?

There are some very good reasons as a matter of fact. One really big reason is that I could find no decent information about this issue. The bigger reason is the seemingly indifferent level of response and the delays I saw in response from the Bit9 support channels.

In this specific case, the first response from their support channels took more than two days. Subsequent responses were more than a week later. For the client, this actually caused delays in a project they were working on. Was the issue significant? It was not an outage causing issue, but it was one that did seem to continue to grow and cause concerns with disk space.

The Issue

bit9It may be appropriate to discuss what Bit9 does prior to breaking into the details about the issue. If you have never had any experience with this product before, you may be running a product from a competitor such as Symantec. Bit9 is a an endpoint protection tool from Carbon Black. This helps to prevent against malware and endpoint attacks. In short it is a security based tool to protect your computing enterprise.

Go ahead and click the image to link to their site if you wish to learn more about how it works. For me, the nitty gritty specifics on how it works is a bit outside the scope of this article.

In short, Bit9 records a record of every filename and every filepath for every client machine. It stores these in a database in SQL Server. This shouldn’t be too much of an issue. For one instance serving about 300 client machines we saw about 1.2 million paths being stored. One another instance with far fewer clients and far fewer actual paths on those clients (physically checked) we had 114 million paths stored in the database. The difference in size was 20GB vs ~160GB. When the server is installed with all defaults, you can imagine how this looked for this instance – a very bloated OS volume.

Investigating this issue from a database perspective, I looked to find what was consuming so much space in the database. To do that, I ran my tablespace script and discovered the following:

bit9_tablespacehl

A significant standout there with the pathnames table. Looking at the definition of the table I see something far more interesting and disturbing all at once.

pathnames_table

I see a table with three columns and two large string fields. Each of these fields has a non-clustered index on it. This may or may not be such a big problem (other than the fact that the size of the non-clustered indexes on this table are much larger than the data) except that each of the string fields is an exact duplicate of the other. That’s right. Within this table, the data is duplicated into this second string field and each field has its own index. Not only does it appear that I have a ton of duplicated data, it appears I have entirely useless indexes (neither had been touched for a read since the server had been up)

I inquired about this to the folks at Bit9, both from a design perspective and from an archival perspective. Absolute silence on the design (expected). The inquiry about archival (or purge) of non-essential data did fetch a response – albeit a painfully slow response. The basic question is: “Is it safe to purge or archive old or unnecessary data.” The response we received was “run this and we will tell you what to do next.”

Great, they sent a script to help determine the state of data within the database. I am not posting their script here. Suffice it to say that the script they sent was not very pretty. They query about 20 tables, union the results from those tables, then perform a not in operation to see how many of the pathnames are invalid. No problem. Executing the script did reveal the following:

orphan_result

There seems to be the problem. 95.89% of the rows being stored in the pathnames table are orphaned records! This is a bit of a problem. The software does not appear to manage removal of invalid paths. From here, I knew what the course of action needed to be and acted on it. A big piece of the equation was provided from the results of the script. Another piece was provided in the makeup of the script. From these pieces of information, I created a purge script to help manage the orphan problem. I then put that script into an agent job and set it to run on a weekly basis.

If you find yourself in the same boat, here is the script needed to purge the data. If running this, I recommend disabling the two non-clustered indexes and then performing the delete (especially if you sit at 96% orphaned and over 100 million rows). After deleting the mass amount of orphans, go ahead and rebuild the indexes to stay in compliance with the software contract until Bit9 responds about the index requirement and the schema of the table.

-- select count(1) from [dbo].[OrphanedPathnameIds] (nolock)
CREATE TABLE #pathname ( pathname_id BIGINT );
GO
INSERT INTO #pathname
( pathname_id
)
SELECT pathname_id
FROM dbo.pathnames
WHERE pathname_id NOT IN (
SELECT DISTINCT pathname_id
FROM dbo.antibodies
WHERE pathname_id IS NOT NULL
UNION
SELECT DISTINCT pathname_id
FROM dbo.antibody_instances
WHERE pathname_id IS NOT NULL
UNION
SELECT DISTINCT pathname_id
FROM dbo.antibody_instances_deleted
WHERE pathname_id IS NOT NULL
UNION
SELECT DISTINCT pathname_id
FROM dbo.antibody_instance_groups
WHERE pathname_id IS NOT NULL
UNION
SELECT DISTINCT pathname_id
FROM dbo.temp_antibody_instances
WHERE pathname_id IS NOT NULL
UNION
SELECT DISTINCT pathname_id
FROM dbo.processing_temp_antibody_instances1
WHERE pathname_id IS NOT NULL
UNION
SELECT DISTINCT pathname_id
FROM dbo.processing_temp_antibody_instances2
WHERE pathname_id IS NOT NULL
UNION
SELECT DISTINCT pathname_id
FROM dbo.processing_temp_antibody_instances3
WHERE pathname_id IS NOT NULL
UNION
SELECT DISTINCT old_pathname_id
FROM dbo.temp_antibody_instances
WHERE old_pathname_id IS NOT NULL
UNION
SELECT DISTINCT old_pathname_id
FROM dbo.processing_temp_antibody_instances1
WHERE old_pathname_id IS NOT NULL
UNION
SELECT DISTINCT old_pathname_id
FROM dbo.processing_temp_antibody_instances2
WHERE old_pathname_id IS NOT NULL
UNION
SELECT DISTINCT old_pathname_id
FROM dbo.processing_temp_antibody_instances3
WHERE old_pathname_id IS NOT NULL
UNION
SELECT DISTINCT pathname_id
FROM dbo.antibody_instances_snapshots
WHERE pathname_id IS NOT NULL
UNION
SELECT DISTINCT pathname_id
FROM dbo.approval_requests
WHERE pathname_id IS NOT NULL
UNION
SELECT DISTINCT pathname_id
FROM dbo.internal_events
WHERE pathname_id IS NOT NULL
UNION
SELECT DISTINCT process_pathname_id
FROM dbo.internal_events
WHERE process_pathname_id IS NOT NULL
UNION
SELECT DISTINCT process_pathname_id
FROM dbo.events
WHERE process_pathname_id IS NOT NULL
UNION
SELECT DISTINCT process_pathname_id
FROM dbo.remote_events
WHERE process_pathname_id IS NOT NULL
UNION
SELECT DISTINCT process_pathname_id
FROM dbo.remote_internal_events
WHERE process_pathname_id IS NOT NULL
UNION
SELECT DISTINCT process_pathname_id
FROM dbo.approval_requests
WHERE process_pathname_id IS NOT NULL
UNION
SELECT DISTINCT pathname_id
FROM dbo.events
WHERE pathname_id IS NOT NULL
UNION
SELECT DISTINCT pathname_id
FROM dbo.remote_events
WHERE pathname_id IS NOT NULL
UNION
SELECT DISTINCT pathname_id
FROM dbo.remote_internal_events
WHERE pathname_id IS NOT NULL
UNION
SELECT DISTINCT file_path_id
FROM dbo.notification_files
WHERE file_path_id IS NOT NULL
UNION
SELECT DISTINCT proc_path_id
FROM dbo.notification_files
WHERE proc_path_id IS NOT NULL
UNION
SELECT DISTINCT proc_path_id
FROM dbo.notification_directories
WHERE proc_path_id IS NOT NULL
UNION
SELECT DISTINCT dir_path_id
FROM dbo.notification_directories
WHERE dir_path_id IS NOT NULL
UNION
SELECT DISTINCT proc_path_id
FROM dbo.notification_regkeys
WHERE proc_path_id IS NOT NULL
UNION
SELECT DISTINCT pathname_id
FROM dbo.uploaded_files
WHERE pathname_id IS NOT NULL );
GO
CREATE TABLE #filenames ( filename_id BIGINT );
GO
INSERT INTO #filenames
( filename_id
)
SELECT filename_id
FROM dbo.filenames
WHERE filename_id NOT IN (
SELECT DISTINCT filename_id
FROM dbo.antibodies
WHERE filename_id IS NOT NULL
UNION
SELECT DISTINCT filename_id
FROM dbo.antibody_instances
WHERE filename_id IS NOT NULL
UNION
SELECT DISTINCT filename_id
FROM dbo.antibody_instances_deleted
WHERE filename_id IS NOT NULL
UNION
SELECT DISTINCT filename_id
FROM dbo.antibody_instance_groups
WHERE filename_id IS NOT NULL
UNION
SELECT DISTINCT filename_id
FROM dbo.temp_antibody_instances
WHERE filename_id IS NOT NULL
UNION
SELECT DISTINCT filename_id
FROM dbo.processing_temp_antibody_instances1
WHERE filename_id IS NOT NULL
UNION
SELECT DISTINCT filename_id
FROM dbo.processing_temp_antibody_instances2
WHERE filename_id IS NOT NULL
UNION
SELECT DISTINCT filename_id
FROM dbo.processing_temp_antibody_instances3
WHERE filename_id IS NOT NULL
UNION
SELECT DISTINCT old_filename_id
FROM dbo.temp_antibody_instances
WHERE old_filename_id IS NOT NULL
UNION
SELECT DISTINCT old_filename_id
FROM dbo.processing_temp_antibody_instances1
WHERE old_filename_id IS NOT NULL
UNION
SELECT DISTINCT old_filename_id
FROM dbo.processing_temp_antibody_instances2
WHERE old_filename_id IS NOT NULL
UNION
SELECT DISTINCT old_filename_id
FROM dbo.processing_temp_antibody_instances3
WHERE old_filename_id IS NOT NULL
UNION
SELECT DISTINCT filename_id
FROM dbo.antibody_instances_snapshots
WHERE filename_id IS NOT NULL
UNION
SELECT DISTINCT filename_id
FROM dbo.approval_requests
WHERE filename_id IS NOT NULL
UNION
SELECT DISTINCT filename_id
FROM dbo.internal_events
WHERE filename_id IS NOT NULL
UNION
SELECT DISTINCT filename_id
FROM dbo.events
WHERE filename_id IS NOT NULL
UNION
SELECT DISTINCT process_filename_id
FROM dbo.internal_events
WHERE process_filename_id IS NOT NULL
UNION
SELECT DISTINCT process_filename_id
FROM dbo.events
WHERE process_filename_id IS NOT NULL
UNION
SELECT DISTINCT filename_id
FROM dbo.remote_internal_events
WHERE filename_id IS NOT NULL
UNION
SELECT DISTINCT filename_id
FROM dbo.remote_events
WHERE filename_id IS NOT NULL
UNION
SELECT DISTINCT process_filename_id
FROM dbo.remote_internal_events
WHERE process_filename_id IS NOT NULL
UNION
SELECT DISTINCT process_filename_id
FROM dbo.remote_events
WHERE process_filename_id IS NOT NULL
UNION
SELECT DISTINCT process_filename_id
FROM dbo.approval_requests
WHERE process_filename_id IS NOT NULL
UNION
SELECT DISTINCT file_name_id
FROM dbo.notification_files
WHERE file_name_id IS NOT NULL
UNION
SELECT DISTINCT proc_name_id
FROM dbo.notification_files
WHERE proc_name_id IS NOT NULL
UNION
SELECT DISTINCT proc_name_id
FROM dbo.notification_directories
WHERE proc_name_id IS NOT NULL
UNION
SELECT DISTINCT proc_name_id
FROM dbo.notification_regkeys
WHERE proc_name_id IS NOT NULL
UNION
SELECT DISTINCT filename_id
FROM dbo.uploaded_files
WHERE filename_id IS NOT NULL );
GO
--select *
DELETE fn
FROM #filenames f
INNER JOIN dbo.filenames fn
ON f.filename_id = fn.filename_id;
SELECT TOP 5000000 p.pathname_id
INTO #presel
FROM #pathname p;
--select *
DELETE pn
FROM #presel p
INNER JOIN pathnames pn
ON p.pathname_id = pn.pathname_id;
DELETE pn
FROM #presel p
INNER JOIN #pathname pn
ON p.pathname_id = pn.pathname_id;
DROP TABLE #presel;
DROP TABLE #pathname;
DROP TABLE #filenames;

Yep! It is just like that. The worst part of the whole thing is actually the delete. The delete operation will touch about 37 indexed views to update and delete. Be prepared for a long running process if you have a situation like I encountered.

For kicks, here is what that plan would look like in one of my favorite tools (SQL Sentry Plan Explorer).

bit9_deleteplan

What is your pain scale and story?

pain_scale

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating