Data Diver Disses Terror-Mining

Published December 12, 2006

Jeff Jonas is one of the country's leading practitioners of the dark art of data analysis. Casino chiefs and government spooks alike have used his CIA-funded "Non-Obvious Relationship Awareness" software to scour databases for hidden connections.
So you'd think that Jonas would be all into the idea of using these data-mining systems to predict who the next terrorist attacker might be.
Think again. "Though data mining has many valuable uses, it is not well suited to the terrorist discovery problem," he writes in a new study, co-authored with the Cato Institute's Jim Harper. "This use of data mining would waste taxpayer dollars, needlessly infringe on privacy and civil liberties, and misdirect the valuable time and energy of the men and women in the national security community." Are you listening, NSA?
Jonas doesn't have a problem cobbling together information on suspects from various databases. It's using these databases to forecast a terrorist's behavior -- think market research, but for Al-Qaeda -- that Jonas hates. "The possible benefits of predictive data mining for finding planning or preparation for terrorism are minimal. The financial costs, wasted effort, and threats to privacy and civil liberties are potentially vast," he writes.

One of the fundamental underpinnings of predictive data mining in the commercial sector is the use of training patterns. Corporations that study consumer behavior have millions of patterns that they can draw upon to profile their typical or ideal consumer. Even when data mining is used to seek out instances of identity and credit card fraud, this relies on models constructed using many thousands of known examples of fraud per year.
Terrorism has no similar indicia. With a relatively small number of attempts every year and only one or two major terrorist incidents every few yearseach one distinct in terms of planning and executionthere are no meaningful patterns that show what behavior indicates planning or preparation for terrorism. Unlike consumers shopping habits and financial fraud, terrorism does not occur with enough frequency to enable the creation of valid predictive models. Predictive data mining for the purpose of turning up terrorist planning using all available demographic and transactional data points will produce no better results than the highly sophisticated commercial data mining done today [with results in the low single-digits ed.]. The one thing predictable about predictive data mining for terrorism is that it would be consistently wrong.
Without patterns to use, one fallback for terrorism data mining is the idea that any anomaly may provide the basis for investigation of terrorism planning. Given a typical American pattern of Internet use, phone calling, doctor visits, purchases, travel, reading, and so on, perhaps all outliers merit some level of investigation. This theory is offensive to traditional American freedom, because in the United States everyone can and should be an outlier in some sense. More concretely, though, using data mining in this way could be worse than searching at random; terrorists could defeat it by acting as normally as possible.
Treating anomalous behavior as suspicious may appear scientific, but, without patterns to look for, the design of a search algorithm based on anomaly is no more likely to turn up terrorists than twisting the end of a kaleidoscope is likely to draw an image of the Mona Lisa.

Civil libertarians and bloggers have talked 'til they're blue in the face about how lame this kind of terror-predicting is. But I don't think I've ever heard a giant of the field, like Jonas, come out against the practice -- at least not on-the-record. Let's hope this is one conversation that the feds are monitoring.
(Big ups: Daou)
UPDATE 11:49 AM: Shane Harris here. Die-hard proponents of pattern-based 'data mining' to catch terrorists will remain unconvinced by Jonas' and Harper's argument. While it's true that data mining in the commercial sector is based upon "training patterns," backers of systems such as Total Information Awareness will say, yes, and that's why data mining for terrorists has to start with hundreds -- maybe thousands -- of known or potential terrorist patterns to look for. A major part of TIA research was the creation of terrorist attack templates through red teaming exercises, in which experts were paid to come up with devious and clandestine plots that a terrorist might conceivably attempt. Their various machinations would, presumably, leave a set of digital footprints -- airline tickets purchased, money wired, hotels paid for, and so on -- and THAT data would be mined for clues.
What's also interesting about this paper is the combination of the authors. Jim Harper is a well-known and articulate activist, and has long since staked out central territory in the security vs. privacy debate. But Jonas has stayed out of politics. Indeed, those who've met him will know that he sticks out like a sore West coast thumb among Washington gear heads, being unafraid to use the word "dude" in formal conversation and happily acknowledging his ignorance of most Beltway insider baseball. But those who know Jonas and have heard him speak about electronic terrorist hunting know that, like his co-author Harper, he has a strong libertarian streak. Maybe Jonas wouldn't put it quite that way -- dude -- but it's there.

Story Continues