Being a student at the British Columbia Institute of Technology (BCIT) is a special thing. Its an environment where real world tasks are out there waiting for you to apply your very own approach to them. That approach must not only be effective, robust, and innovative; it should be also acceptable in a real world environment. Its not just another school project it is something that tomorrow may be used in any scale organization and it better be working well.
With the growing interest in security across many organizations I found my course project on Network Intrusion Detection particularly interesting. Nowadays we work with many information systems under different platforms, running different operating systems, and working with a range of applications. In case of a system getting compromised the question is will I be able to extract the data I need, when I need it, and manipulate it across all of those systems to achieve the result I expect?
As there are so many opinions out there present at many organizations, that the concept of security system analysis is becoming much broader in every way. In such a heterogeneous environment its really easy to lose track of your system, which should never happen in the security world. The story that I wanted to share with you today will involve one tool that can gather, process, and manage security data coming from across many platforms that tool is the least expected to be heard about in this context it is SQL Server 2005.
The scenario for my project was the following: The Company, having a suspicion of a break into their internal network, has hired an analyst to gather information of how was the network compromised, what was the weak part of the network, and how to protect the network from similar attacks in the future. I was assigned as an analyst, having Intrusion Detection System (IDS) logs as the only artifacts available.
The task was clear until the most important piece of evidence came into play over 3 million records of intrusion detection alerts. The alerts covered parts of companys network that were monitored over 4 days using Snort (a popular IDS running under Linux). Alerts ranged from suspected break-ins to hundred of thousands of false positives. There were a total of more than 1500 types of alerts, indicating different levels of lethality. The challenge was to find out which of those attacks have successfully contributed to overall security compromise resulting in breaking into the system. That information was supposed to be helpful in determining what actions ought to be taken to prevent breaks in a future. After looking at the data and having no other data available about the network except those intrusion detection alerts, I would never expect to be clear on anything that happened on the network during those 4 days; the only part that gave a little hope was the 3 weeks that were given for the project to be completed.
Being at the Launch of SQL Server 2005, along with Visual Studio 2005, I remembered those case studies where corporate IT managed huge numbers of critical data with SQL Server. Having my complimentary copy of SQL Server 2005 Standard I decided to see how SQL can help me with tons of my critical data. There was no strict requirement for the tool to use for data analysis, and the logs were just huge collection of flat file records, so I started with loading all the records into a SQL Server database.
The format of the actual text file was a bit challenging as it took quite some time to get all the parts settled into their respective columns and have everything organized. In a couple of days after having all the data loaded and basic analysis done, I decided to find out what other approaches were taken by other students. Well, I wasnt surprised that of the hundreds of tools I Googled that some of them claimed to be the industry de facto in security log analysis. Among them was SnortSnarf (http://www.snort.org/dl/contrib/data_analysis/snortsnarf/) a PERL program that uses the original Snort logs to create custom reports based on the requirements specified by security analyst. Another tool SawMill (www.sawmill.net) was designed to analyze many types of logs and build custom reports based on queries. The list went on and on, but the two tools listed above were the most popular ones. To my disappointment, nobody went the SQL Server route because apparently it involved too much Transact-SQL programming.
Well it was too late for me, as all the data had been nicely loaded into my database and was ready to be analyzed. So I decided to check on the performance indicators of the approaches of other students. Once everyone had chosen their analysis process and implemented some basic data manipulations, I was astonished to find out that my friendly guy, SQL Server, had processed the data couple of thousand percent faster than other tools. Comparing to SnortSnarf, which used PERL, and took more then one day (non stop) just to load data, and SawMill taking gigabytes of RAM and hours of time to display a simple report with 2000 records, SQL Server took less than 10 minutes to load flat log files into the database. I was amazed and thought that this cant be yet I wasnt dreaming nice work, and those are specialized systems I thought.
Well that was just the beginning of it. Having all the data loaded in my SQL database I could run any sort of analysis on the log, group it in the most custom and flexible way, and manipulate every single parameter possible. I quickly spotted some trends that led me to an actual break in the relevant data was extracted and clearly presented in my report. I could provide actual proof based on my assumptions just by having quick access to the log components relevant to me and being able to view every angle of the database. That took me couple of hours over about 4 days.
And almost any aggregation procedure took no more than 10 seconds to output results and involved only basic Transact SQL knowledge. I must say, it was quite challenging for many students to develop any meaningful correlation using their tools; their main problem was trying to verify their suspicions within a reasonable time limit as well as their inability to manipulate certain parameters to be certain of an assumption.
Now I thought it would be nice to share a sample of my solution. So in the following part, I will present certain steps that start from loading the flat file into the database and doing some custom aggregation to detect serious security alerts.
Here is a typical log record from Snort IDS:
[**] [1:553:7] POLICY FTP anonymous login attempt [**]
[Classification: Misc activity] [Priority: 3]
06/18-20:42:05.904428 24.xx.x.xx:3440 -> 192.168.3.7:21
TCP TTL:124 TOS:0x0 ID:57593 IpLen:20 DgmLen:56 DF
***AP*** Seq: 0x7BCACCA4 Ack: 0xAD645A33 Win: 0xFFDC TcpLen: 20
Here we can see many relevant, interesting, and sometime optional data. The data that I cared about in this record was the following: The alert, date and time, Source and Destination IPs, Ports on source and destination, packet flags that were set.
To keep things as simple as possible, I have loaded SQL Server Management Studio and started up the Import/Export Wizard on one of the databases to load my flat file into the table. For simplicity, every single record was represented in a separate column. Once the data was loaded I wrote up a small query that took care of that single record to be split into a collection of columns that will represent parts of data that I need to work with.
In the diagram below we can see initial format of the table alert_log is getting converted to a delimited_alert_log table, where every piece of data relevant to us will have its respective column:
Here is an example of the string manipulation query that was used to convert the data from initial column containing individual alert in every row to a table with the format described in the schema above:
select SUBSTRING(Col001, patindex('%]%',Col001)+2, patindex('%[**]%',Col001)
-LEN(SUBSTRING(Col001, 0, patindex('%]%',Col001)))-4) as Alert,
SUBSTRING(Col001, patindex('%__/__%',Col001), 5) as Date,
SUBSTRING(Col001, patindex('%-__:%',Col001)+1, 8) as Time,
SUBSTRING(Col001, patindex('%.%',Col001)+8, patindex('%->%',Col001)
-patindex('%.%',Col001)-8) as [SourceIP and Port],
SUBSTRING(Col001, patindex('%->%',Col001)+3, patindex('%TTL%',Col001)
-patindex('%->%',Col001)-8) as [DestinationIP and Port],
SUBSTRING(Col001, patindex('%Seq:%',Col001)-9,8) as [Flags Set]
Now we have all the data nicely formatted into separate columns (in table delimited_alert_log) and ready to perform any sort of manipulation on it. Having a basic understanding of the alerts in the log we need to find a starting point which can give us more clues.
For example, according to Snort IDS documentation, alert ATTACK-RESPONSES directory listing implies the destination host has been successfully compromised and is sending the directory listing data to an originator of the attack thru an HTTP stream. Having that information, I might want to find out what alerts have been triggered before actual information was sent. That way I might be able to determine what the critical weakness of the system is. Also, I might want to have all the alerts to be sorted in a chronological way, to capture the exact steps of an attacker.
Here is the Transact-SQL query that will return alerts preceding the actual break in to my system:
select Date, Time, [SourceIP and Port], Alert
where SUBSTRING([SourceIP and Port], 0, patindex('%:%',[SourceIP and Port]))
select SUBSTRING([DestinationIP and Port], 0, patindex('%:%',[DestinationIP and Port]))
where Alert like '%ATTACK-RESPONSES%'
order by Date, Time
We can also run many other queries that will give us more detailed information about an actual break in as well as other types of reconnaissance done on our host.
This is an example of an ad hoc solution that could be used to implement analysis once the network has been compromised. The most useful thing that I find is being able to run on-demand data analysis. We could have had our log information automatically analyzed with any specified occurrences already setup using queries. That way we could have the most current information about the status of the system and will not have to spend many hours analyzing an already compromised infrastructure.
To achieve this level of automation we could use certain builtin components of SQL Server 2005 that will periodically upload new data in to a database from a log flat file. Once that data is uploaded, we could have stored procedures implementing analysis logic executed and custom alerts and notifications delivered to an administrator using SQL Server Notification Services.
As an automated tool, we could use SQL Sever Agent through the SQL Server Management Studio and have it execute a Job that would upload the log file at any schedule we choose and perform the automated analysis of our choice.
The most amazing thing is that all of the processing is done using one high performance database engine and doesnt require complex Transact-SQL understanding. Not only is high performance a factor but also accessibility as many scenarios described in here could be implemented using SQL Server 2005 Express Edition. Using a tool such as SQL Server minimizes adjustment to a client and the system that is used in particular environment.
Feel free to post you comments and share your opinions at email@example.com