«Workshop held 9-10 January 2008 in Arlington, VA Prepared for US Strategic Command Global Innovation and Strategy Center (USSTRATCOM/GISC) Prepared ...»
The majority of content analysis techniques that have been applied to the study of the behavior of VNSAs in cyberspace are quantitative in nature. The classification techniques described above are representative of approaches, where the goal of the analysis is to automate the process of identifying terrorism-related content within a dataset by comparing it to a “terrorist-content” template of some kind. This type of analysis does not address the qualitative properties of the data, however, which is a perspective that is being investigated by the researchers from the University of Arizona responsible for the Dark Web collection. Using a dataset collected by the semi-automated spidering methodology described above, they used quantitative methods to study qualitative attributes of the dataset, including technical sophistication, content richness, and Web interactivity, with the goal of gaining insight into the level of advancement and effectiveness of terrorists’ use of the Internet (Qin et al, 2007). The researchers also performed a benchmark comparison of the terrorist/extremist sites to US government sites, which have been identified as the top in the world in terms of Web technical sophistication and interactivity by the CyPRG group of the University of Arizona (CYPRG, 2008).
The study focused on a qualitative analysis of the Web presence of Islamic terrorist groups rooted in the Middle East, such as Al Qaeda, Palestinian Islamic Jihad, and Hamas. About 220,000 multimedia Web sites and documents were evaluated for 13 technical sophistication attributes, five content richness attributes, and 11 Web-interactivity attributes which compose the so-called Dark Web Attribute System (DWAS).
The level of a site’s technical sophistication was measured by its use of basic HTML techniques (lists, tables, frames, and forms), advanced HTML techniques (DHTML/SHTML, predefined and self-defined script functions), and embedded multimedia content, such as background Deterring VNSA in Cyberspace images, music and streaming of audio/video. In addition, a site’s use of dynamic Web programming languages, such as CGI, PHP, and JSP/ASP, for functions such as user login and online transaction processing was also evaluated. As shown in Table 2, each attribute was assigned a weight based on the opinion of Web experts obtained through an email survey (Qin et al, 2007).
Deterring VNSA in Cyberspace The content richness attributes evaluated the variety and volume of information offered by a site, and was measured by the number of hyperlinks and number of downloadable documents, images, and audio/video files it contained. The third attribute category, Web interactivity, evaluated the sites for three types of interactivity: one-to-one level interactivity, communitylevel interactivity, and transaction level interactivity such as online shops, online payment options, and online application forms that provide functionality for activities such as donating to extremist groups or applying for access to restricted content (Qin et al, 2007).
Attribute information was automatically extracted from the terrorist/extremist dataset and from 277,000 documents collected from US government sites. The results of a statistical analysis of the datasets indicate that US government sites are significantly more advanced in the use of basic HTML techniques to organize the sites and the implementation of dynamic programming languages to provide functions such as user login and online applications. The results also indicate that there is significantly more embedded media available on terrorist sites as compared to the government sites. The researchers believe this to be a significant finding that demonstrates the extent to which the Internet is used by NSA groups as a means of disseminating information.
Since multimedia content is more attractive and leaves a more lasting impression than text, the effort these groups have expended to include such content indicates a desire to make a strong statement to both supporters and enemies. Examples of such content include movie clips of suicide bombing attacks in Iraq posted to online forums, video clips of the beheading of American Nicholas Berg posted on a Malaysian terrorist site, and pictures of executed Iraqi “traitors” who cooperated with US forces (Qin et al, 2007).
The US government sites demonstrate a higher degree of content-richness based on the much larger volume of downloadable multimedia contents they provide. This result seems incongruous with the analysis of embedded media discussed above, but it is actually indicative of the nature of the majority of the terrorist sites investigated. While US government sites are usually hosted on dedicated Web servers, the NSA groups’ sites are often hosted by free ISPs, which restrict the sites’ size and use of bandwidth. It is theorized that this explains the extra effort expended to include embedded multimedia content as an alternative to downloadable files.
The results of the Web interactivity comparison indicate that the US government sites support significantly more one-to-one level interaction, while the terrorist/extremist groups support much more community-level interaction through online forums, bulletin boards, and chat rooms. This confirms the results of studies that indicate that NSAs are using the Internet as an integral method of communication. Forums such as www.shawati.com and www.kuwaitchat.net have tens of thousands of members and hundreds of thousands of postings, where the members are a mix of NSAs, supporters, and sympathizers. In some cases, forum members can receive regular messages from members of terrorist groups, such as the late terrorist leader Abu Mus’ab Zarqawi in Iraq, who used to post messages directly to the forum www.islamic-f.net.
The results of the statistical analysis of the two data sets demonstrate that although there are significant differences between the US government and terrorist/extremist sites sub-attributes, there is no appreciable difference in the broader categories of technical sophistication, content richness, and Web interactivity. This implies that NSA groups employ the same level of Internet Deterring VNSA in Cyberspace sophistication as the US government when it comes to communicating with the public (Qin et al, 2007).
The significant volume of forum and chat room postings that were uncovered during the data collection process indicates that they are methods of communication heavily employed by terrorists/extremists, and the authors of the study suggest that security and law-enforcement experts “should pay more attention” to these types of online communication. The researchers continue to pursue this type of qualitative analysis; future research directions include incorporating additional attributes to the DWAS, expanding the analysis to Web sites from other parts of the world, performing a time-series analysis of the Dark Web data, and exploring the use of more advanced machine learning techniques to search for patterns in the media content collected from the sites (Qin et al, 2007).
An adequately informed application of predictive modeling techniques has the potential to provide analysts with situational assessment, forecasting, and deterrence strategies (Asal et al, 2008). Traditionally, a realistic model can only be constructed from a training set of historical records, but (fortunately) terror-related plots are small in number, with “only one or two major terrorist incidents every few years - each one distinct in terms of planning and execution” (Jonas and Harper, 2006). Thus, the models often have to be augmented by input from outside authoritative sources, and rely heavily on hypotheses that are based on historical patterns of behavior. This makes the availability of a clean, content-rich dataset crucial to the success of the model. While they do not specifically model the emergent behavior of VNSAs in cyberspace, the examples that follow are indicative of state-of-the-art research in the application of predictive modeling techniques to the counter-terrorism domain. The models focus on the corporeal behavior of VNSAs; however, they are more well-informed by the inclusion of information regarding their cyberspace activities obtained by the collection and analysis methods described above.
Predictive Modeling: Hidden Markov Models and Bayesian Networks
Researchers at the University of Connecticut are developing a tool for modeling and detecting terrorist networks that can “assist analysts with: 1) identifying terrorist threats; 2) predicting possible terrorist actions; and 3) elucidating ways to counteract terrorist activities” (Allanach et al, 2004). The architecture of the so-called Adaptive Safety Analysis and Monitoring (ASAM) tool “is based on the premise that terrorist networks can be evaluated using transaction-based models” and suspicious links between people, places, and things. For example, a sequence of events (transactions) that may or may not be cause for concern could consist of an individual withdrawing money from the bank, buying chemicals that could be used to create a chemical weapon, and then purchasing a plane ticket to the United States. The ASAM tool models the evolution of such transactions using hidden Markov models (HMMs) and dynamic Bayesian networks (DBNs). An HMM is a type of stochastic signal model used to evaluate the likelihood of a sequence of observations and to infer the most likely sequence of events from a noisy sequence of observations. The model represents the interconnection between a hypothetical series of transactions that lead to the completion of the task that is being modeled. Figure 7 depicts a Markov chain model developed by the researchers to represent a plot by members of Al Deterring VNSA in Cyberspace Qaeda to execute a truck bombing during the 2004 Olympics in Athens, Greece. There are 9 states in this model and the probability of transition between the states is printed by the edges that connect them (Singh, Allanach et al, 2004). The HMM models are parameterized by the transition probability matrix, emission matrix, and the initial probability vector.
Figure 7: Markov chain representing a hypothetical plot by members of Al Qaeda to execute a truck bombing at the 2004 Olympics in Athens, Greece (Singh, Allanach et al, 2004).
The HMMs are the foundation of the ASAM tool, since they provide the “template models” for potential terrorist activity; new transactional data is compared to these templates as a way of tracking the development of scenarios. In general, if there is enough historical data available, the parameters of the model can be generated automatically from a “learning” algorithm called the Baum-Welch algorithm (Singh, Tu et al, 2004). For counter-terrorism applications, it is nearly impossible to collect adequate historical data to make this approach feasible, so the researchers designed the models and assigned parameters based on the recommendations of intelligence analysts.
The information gathered from the HMMs is reported to probabilistic models that represent larger scale terrorist activities. These overarching plots are represented by dynamic Bayesian networks (DBNs), and the ASAM system utilizes a hierarchy of subordinate dynamic Bayesian networks (sub-DBNs) that report upwards to a final comprehensive DBN that evaluates the overall probability of terrorist activity.
Figure 8 depicts the DBN representing the global threat model for potential terrorist activity at the 2004 Olympics (GeNle 2.0, 2003). Each node in the model represents a terrorist sub-plot that is described by an underlying HMM, and the links between nodes represent direct probabilistic dependencies between the subplots. The conditional probabilities of each node are updated whenever they receive information from their corresponding HMM, and the global threat level at any given time is a function of the current conditional probabilities assigned to each subplot. Simply stated, the global threat level increases as the terrorist groups successfully execute the subplots described by each node in the overarching network.
Deterring VNSA in Cyberspace Figure 8: Dynamic Bayesian network representing the global threat level for a terrorist attack at the 2004 Olympics in Athens (GeNle 2.0, 2003).
The inputs to the ASAM system are relevant transactional data, such as proven communication between suspicious individuals and financial transactions. Table 3 shows the transactions that would characterize the states of the HMM of the truck bombing scenario shown in Figure 7.
This type of transactional information could be generated by a program such as the Evidence Extraction and Link Discovery (EELD) project, a government initiative with the goal of extracting relevant data from large quantities of classified and unclassified data sources (Allanach et al, 2004), (EFF, 2008).
As the incoming transactions fulfill the state transition requirements of the HMMs, the transaction space evolves and the probability of the terrorists successfully executing the plot can be evaluated. Each state transition that is detected can be visualized as the completion of a link between nodes on a graph, as shown in Figure 9. Using a probabilistic graph matching methodology, the pattern these links create can be compared to the HMM state representing successful task completion, resulting in a measure of the probability that the terrorist group is executing the subplot.
Deterring VNSA in Cyberspace
This information is then reported to the corresponding sub-DBN and used to compute the global threat level. The ASAM tool can provide analysts with three types of results: 1) A likelihood of observations, which is a “measure of the confidence of the match between the observed events and the template models”; 2) Evidence from observations such as transaction type, description, and time; and 3) Probability of a terrorist attack, which is a function of the global threat level based on the overarching DBN.
Deterring VNSA in Cyberspace