«Abstract. Eight sites participated in the second DARPA off-line intrusion detection evaluation in 1999. A test bed generated live background traffic ...»
Although detection accuracy for old attacks in these two categories was roughly 80%, detection accuracy for new and novel attacks was below 25% even at high false alarm rates. These results demonstrated that current intrusion detection systems do not detect new attacks well and refocused research goals on techniques which can detect new attacks. Results of the real-time evaluation generally agreed with those of the off-line evaluation. Detection rates for the systems and attacks in common were similar. Two interesting results from the off-line evaluation were that slow stealthy scans were not well detected by some intrusion detection systems and false alarm rates of a network-based system used by the Air Force were similar to those of a reference keyword-based system used in the off-line evaluation.
3 Conclusions from the 1998 Evaluation
The 1998 evaluation uncovered a widespread interest in obtaining training and test corpora containing normal traffic and attacks to develop and evaluate intrusion detection systems. To date, more than 90 sites have downloaded all or part of the 1998 off-line intrusion detection archival corpus from a Lincoln Laboratory web site .
Information from sites which have downloaded this corpus indicates that it is being used to evaluate and develop both commercial and research intrusion detection systems (e.g. ) and to train security analysts. A processed subset of this corpus was also redistributed as part of a contest sponsored by the International Conference on Knowledge Discovery in Databases . This conference attracted 24 participants who used modern approaches to pattern classification to achieve high performance on a constrained intrusion detection task.
The 1998 evaluation also demonstrated that is possible to evaluate a diverse collection of intrusion detection systems but that this is more complex than initial analyses suggested. All components of the evaluation from designing and managing the test bed to generation background traffic, to scoring systems, to automating, running, marking ground truth, and verifying attacks included added complexities caused by the wide variety of traffic, attacks, and intrusion detection systems included. For example, labeling attacks involved annotating every network packet associated with each attack. This was partially automated, but it required extensive hand correction and analyses which had to be customized for each attack. Experiences of the Lincoln Laboratory evaluators led to suggestions for reducing the cost and complexity of the evaluation. These included simplifying scoring procedures, requesting more detailed and formal system descriptions from participants, more fully automating attack generation and verification, and automating more of the daily procedures required to continuously run the test bed. Experiences by the many participants and others also led to suggestions for improving the evaluation process.
These included providing training data containing no attacks to train anomaly detectors, simplifying scoring procedures, exploring false alarm rates with a richer range of background traffic, providing a written security policy, and performing more detailed analyses of misses and false alarms. All of these suggestions were incorporated in the 1999 evaluation.
4 Overview of the 1999 Evaluation
The 1999 evaluation was a blind off-line evaluation, as in 1998, but modified based on suggestions from 1998 and also with major extensions to enhance the analysis and cover more attack types. Figure 1 shows a block diagram of the 1999 test bed. Major changes for 1999 are the addition of a Windows NT workstation as a victim, the addition of an inside tcpdump sniffer machine, and the collection of both Windows NT audit events and inside tcpdump sniffing data for inclusion in archival data provided to participants. Not shown in this figure are new Windows NT workstations added to support NT attacks, new inside attacks, and new stealthy attacks designed to avoid detection by network-based systems tested in 1998. The Windows NT victim machine and associated attacks and audit data were added due to increased reliance on Windows NT systems by the military. Inside attacks and inside sniffer data to detect these attacks were added due the dangers posed by inside attacks. Stealthy attacks were added due to an emphasis on sophisticated attackers who can carefully craft attacks to look like normal traffic. In addition, two new types of analyses were performed. First, an analysis of misses and high-scoring false alarms was performed for each system to determine why systems miss specific attacks and what causes false alarms. Second, participants were optionally permitted to submit attack forensic information that could help a security analyst identify important characteristics of the attack and respond. This identification information included the attack category, the name for old attacks, ports/protocols used, and IP addresses used by the attacker.
Another major change in 1999 was a focus on determining the ability of systems to detect new attacks without first training on instances of these attacks. The 1998 evaluation demonstrated that systems could not detect new attacks well. The new 1999 evaluation was designed to evaluate enhanced systems which can detect new attacks and to analyze why systems miss new attacks. Many new attacks were thus developed and only examples of a few of these were provided in training data.
5 Test Bed Network and Background Traffic
The inside of the simulated Eryie Air Force base shown in Figure 1 contains four victim machines which are the most frequent targets of attacks in the evaluation (Linux 2.0.27, SunOS 4.1.4, Sun Solaris 2.5.1, Windows NT 4.0), a sniffer to capture network traffic, and a gateway to hundreds of other inside emulated PCs and workstations. The outside simulated internet contains a sniffer, a gateway to hundreds of emulated workstations on many other subnets and a second gateway to thousands of emulated web servers. Data collected to evaluate intrusion detection systems include network sniffing data from the inside and outside sniffers, Solaris Basic Security Module (BSM) audit data collected from the Solaris host, Windows NT audit event logs collected from the Windows NT host, nightly listings of all files on
the four victim machines, and nightly dumps of security-related files on all victim machines.
Custom software automata in the test bed simulate hundreds of programmers, secretaries, managers, and other types of users running common UNIX and Windows NT application programs. In addition, custom Linux kernel modifications provided by the AFRL allow a small number of actual hosts to appear as if they are thousands of hosts with different IP addresses. Figure 2 shows the average number of connections per day for the most common TCP services. As can be seen, web traffic dominates but many other types of traffic are generated which use a variety of services.
User automata send and receive mail, browse web sites, send and receive files using FTP, use telnet and ssh to log into remote computers and perform work, monitor the router remotely using SNMP, and perform other tasks. In addition to automatic traffic, the test bed allows human actors to generate background traffic and attacks when the traffic or attack is too complex to automate. Background traffic characteristics including the overall traffic level, the proportion of traffic from different services, and the variability of traffic with time of day are similar to characteristics measured on a small Air Force base in 1998. The average number of background-traffic bytes transmitted per day between the inside and outside of this test bed is roughly 411 Mbytes per day, with most of the traffic concentrated between 8:00 AM and 6:00 PM. The dominant protocols are TCP (384 Mbytes), UDP (26 Mbytes), and ICMP (98 Kbytes). These traffic rates are low compared to current rates at some large commercial and academic sites, but are representative of traffic measured at the beginning of this project. These rates also lead to sniffed data file sizes that can still be transported over the internet without practical difficulties. The flat test bed structure without firewalls or other protective devices simplifies maintenance and attack generation. Future evaluations will include firewalls, more complex architectures, attacks against firewalls, and more complex attacks including man-in-the-middle attacks that take advantage of a network hierarchy.
6 Attacks Twelve new Windows NT attacks were added in 1999 along with stealthy versions of many 1998 attacks, new inside console-based attacks, and six new UNIX attacks.
The 56 different attack types shown in Tables 2 and 3 were used in the evaluation.
Attacks in normal font in these tables are old attacks from 1998 executed in the clear (114 instances). Attacks in italics are new attacks developed for 1999 (62 instances), or stealthy versions of attacks used in 1998 (35 instances). Details on attacks including further references and information on implementations are available in [3,9,10,13].
Five major attack categories and the attack victims are shown in Tables 2 and 3. Primary victims listed along the top of these tables are the four inside victim hosts, shown in the gray box of Figure 1, and the Cisco router. In addition, some probes query all machines in a given range of IP addresses as indicated by the column labeled “all” in Table 2.
The upper row of Table 2 lists probe or scan attacks. These attacks automatically scan a network of computers or a DNS server to find valid IP addresses (ipsweep, lsdomain, mscan), active ports (portsweep, mscan), host operating system types (queso, mscan), and known vulnerabilities (satan). All of these probes except two (mscan and satan) are either new in 1999 (e.g. ntinfoscan, queso, illegalsniffer) or are stealthy versions of 1999 probes (e.g. portsweep, ipsweep). Probes are considered stealthy if they issue ten or fewer connections or packets or if they wait longer than 59 seconds between successive network transmissions. Stealthy probes are similar to clear probes because they gather similar information concerning IP addresses, vulnerable ports, and operating system types. They differ because this information is gathered at a slower rate and because less, but more focused, information is gathered from each attack instance. For example, stealthy port sweeps are slow and focus only on ports with known vulnerabilities. The new “illegalsniffer” attack is different from the other probes. During this attack, a Linux sniffer machine is installed on the inside network running the tcpdump program in a manner that creates many DNS queries from this new and illegal IP address.
The second row of Table 2 contains denial of service (DoS) attacks designed to disrupt a host or network service. New 1999 DoS attacks crash the Solaris operating system (selfping), actively terminate all TCP connections to a specific host (tcpreset), corrupt ARP cache entries for a victim not in others’ caches (arppoison), crash the Microsoft Windows NT web server (crashiis), and crash Windows NT (dosnuke).
Table 3. Remote to Local (R2L), User to Root (U2R), and Data attacks.
The first row of Table 3 contains Remote to Local (R2L) attacks. In these attacks, an attacker who does not have an account on a victim machine gains local access to the machine (e.g. guest, dict), exfiltrates files from the machine (e.g. ppmacro), or modifies data in transit to the machine (e.g. framespoof). New 1999 R2L attacks include an NT PowerPoint macro attack (ppmacro), a man-in-the middle web browser attack (framespoof), an NT trojan-installed remote-administration tool (netbus), a Linux trojan SSH server (sshtrojan), and a version of a Linux FTP file access-utility with a bug that allows remote commands to run on a local machine (ncftp). The second row of Table 3 contains user to root (U2R) attacks where a local user on a machine is able to obtain privileges normally reserved for the UNIX super user or the Windows NT administrator. All five NT U2R attacks are new this year and all other attacks except one (xterm) are versions of 1998 U2R attacks that were redesigned to be stealthy to network-based intrusion detection systems evaluated in 1998. Techniques used to make these U2R attacks stealthy are described in [3,10,13]. They include running the attack over multiple sessions, embedding the attack in normal user actions, writing custom buffer overflow machine code that does not spawn a root-level shell but simply “chmod’ a file, bundling the complete attack into one s” shell script, setting up delayed “time bomb” attacks, and transferring the attack and the attack output using common network services. The bottom row in Table 3 contains Data attacks. This is a new attack type added in 1999. The goal of a Data attack is to exfiltrate special files which the security policy specifies should remain on the victim hosts. These include “secret” attacks where a user who is allowed to access the special files exfiltrates them via common applications such as mail or FTP, and other attacks where privilege to access the special files is obtained using a U2R attack (ntfsdos, sqlattack). Note that an attack could be labeled as both a U2R and a Data attack if one of the U2R attacks was used to obtain access to the special files.
The “Data” category thus specifies the goal of an attack rather than the attack mechanism.
Attack implementation was simplified for U2R attacks in 1999 by integrating attack automation software with the automaton used to generate telnet sessions. This made it easier to embed attacks within normal telnet sessions. In addition, attack verification was simplified by running all attacks from a separate dedicated machine and sniffing traffic to and from that machine. This made it easier to collect network traffic generated by each attack. Custom software was required to change routing tables in the test bed gateways whenever the IP address of the dedicated attacker machine changed. This made it possible to isolate network traffic generated by attacks for all but inside attacks which were launched from the console of a victim and for attacks which installed trojans or other types of malicious software on inside machines Any network traffic for these two types of attacks had to be extracted from inside sniffer data by hand.
7 Participants and Scoring