Today's Article
While 'data mining can
help reveal patterns
and relationships, it
does not tell the user
the value or
significance of these
patterns.'
The American Spark
Data Mining May Not Work As Anti-Terrorist Tool: Report

By Cliff Montgomery - July 27th, 2007

A Congressional Research Service Report updated on June 5th, 2007 examines the limitations of data
mining as a terrorist-catching tool.

We quote from this report below:

"Data mining has become one of the key features of many homeland security initiatives. Often used as a
means for detecting fraud, assessing risk, and product retailing, data mining involves the use of data analysis
tools to discover previously unknown, valid patterns and relationships in large data sets.

"Consequently, data mining consists of more than collecting and managing data; it also includes analysis and
prediction.

"In the context of homeland security, data mining [is being used as] a potential means to identify terrorist
activities, such as money transfers and communications, and to identify and track
individual terrorists themselves, such as through travel and immigration records.

"While data mining represents a significant advance in the type of analytical tools currently available, there are
limitations to its capability.

"One limitation is that although data mining can help reveal patterns and relationships, it does not tell the user
the value or significance of these patterns. These types of determinations must be made by the user.

"A second limitation is that while data mining can identify connections between behaviors and/or variables, it
does not necessarily identify a causal relationship. Successful data mining still requires skilled technical and
analytical specialists who can structure the analysis and interpret the output.

"In the public sector, data mining applications initially were used as a means to detect fraud and
waste, but have grown to also be used for purposes such as measuring and improving program performance.
However, some of the homeland security data mining applications represent a significant expansion in the
quantity and scope of data to be analyzed.

"Some efforts that have attracted a higher level of congressional interest include the Terrorism Information
Awareness (TIA) project (now discontinued) and the Computer-Assisted Passenger Pre-screening System II
(CAPPS II) project (now canceled and replaced by Secure Flight).

"Other initiatives that have been the subject of congressional interest include the Multi-State Anti-Terrorism
Information Exchange (MATRIX), the Able Danger program, the Automated Targeting System (ATS), and
data collection and analysis projects being conducted by the National Security Agency (NSA).

"As with other aspects of data mining, while technological capabilities are important, there are other
implementation and oversight issues that can influence the success of a project’s outcome.

"One issue is data quality, which refers to the accuracy and completeness of the data being analyzed.

"A second issue is the inter-operability of the data mining software and databases being used by different
agencies.

"A third issue is mission creep, or the use of data for purposes other than for which the data were originally
collected.

"A fourth issue is privacy.

"Questions that may be considered include the degree to which government agencies should use
and mix commercial data with government data, whether data sources are being used for purposes other than
those for which they were originally designed, and possible application of the Privacy Act to these initiatives. It
is anticipated that congressional oversight of data mining projects will grow as data mining efforts continue to
evolve.

"While data mining products can be very powerful tools, they are not self-sufficient applications.

"[For instance], the validity of the patterns discovered is dependent on how they compare to 'real world'
circumstances. [...] While possibly re-affirming a particular profile, [a certain discovery] does not necessarily
mean that the application will identify a suspect whose behavior significantly deviates from the original model.

"Another limitation of data mining is that while it can identify connections between behaviors and/or variables, it
does not necessarily identify a causal relationship.

"Beyond these specific limitations, some researchers suggest that the circumstances surrounding our
knowledge of terrorism make data mining an ill-suited tool for identifying (predicting) potential terrorists before
an activity occurs.

"Successful 'predictive data mining' requires a significant number of known instances of a particular behavior in
order to develop valid predictive models. For example, data mining used to predict types of consumer behavior
(i.e., the likelihood of someone shopping at a particular store, the potential of a credit card usage being
fraudulent) may be based on as many as millions of previous instances of the same particular behavior.

"Moreover, such a robust data set can still lead to 'false positives' [errors].

"In contrast...a CATO Institute report suggests that the relatively small number of terrorist incidents or
attempts each year are too few and individually unique 'to enable the creation of valid predictive models.' "



Like what you're reading so far? Then why not order a full year (52 issues) of the The American Spark e-
newsletter for only $15? A major article covering an story not being told in the Corporate Press will be
delivered to your email every Monday morning for a full year, for less than 30 cents an issue. Order Now!