When the Pentagon asked Google to quietly build a tool that could identify objects in drone footage, what it found was a nascent worker revolt against explicitly building weapons.
The program, Project Maven, was launched in April 2017 with the goal of processing video taken by drones. The plan was to identify and label objects in those videos using computer algorithms, with those labels helpful for troops who would be picking targets.
It presented the possibility that the military could finally process all the thousands of hours of video collected by drones, much of which typically sat unused and unwatched, and then rapidly make life or death decisions relying on trusted computer algorithms.
But in March 2018, Google workers raised internal objections to the company's participation in the project, before coming out with a public letter arguing that Google should not be "[b]uilding this technology to assist the US Government in military surveillance -- and potentially lethal outcomes."
In response to the letter, and the resignations of multiple employees, Google announced it would not renew the contract and published a set of guiding principles for how it would use and develop artificial intelligence, or AI. While Google maintains military contracts to this day, Project Maven hangs over the Pentagon and all of Silicon Valley as a cautionary tale about applying commercial code to military ends.
But Maven was hardly the first time the Pentagon contracted tech companies to build an object recognition tool. A year before Maven got up and running, the Air Force Research Lab signed a contract on a little-noticed program called VIGILANT, details of which were disclosed in October 2021 as part of a Freedom of Information Act request.
"VIsual Global InteLligence and ANalytics Toolkit", or VIGILANT, was first commissioned in 2016.
While the Air Force has not publicly disclosed details of its deal with Kitware, a New York-based technology company, for VIGILANT, documents about the regularly changing contract offer some insight into how the military hopes to adapt the kind of data processing that thrives in Silicon Valley to use in wars abroad.
The contracts reveal a desire to increase the fundamental tempo of intelligence collection and, with it, targeting. The promise is time savings from algorithms taking a first sweep for potential targets. The possible follow-on effects are impenetrable targeting tools, with errors as classified as airstrikes and harder to attribute.
Crucial to developing algorithms to identify objects is training this processing on synthetic data, or fake data concocted from fragments of real information. Creating that data has long been a part of training automated identification algorithms. It's how tech companies regularly prepare machines to operate in the real world, where they might encounter rare events.
To create this synthetic data for drone footage, the Air Force Research Lab turned to Kitware, a company with an existing framework for processing data from multiple sources. The software is open source, meaning that all its code is publicly disclosed so that programmers can develop and riff off of its initial kernel of code.
When reached for comment, a spokesperson for Kitware said the company did not have permission to speak publicly about the program.
The documents outlining the contract with Kitware described how the Air Force thought the technology could be used for everything from combat to farming.
"This demonstration system will be delivered for analyst assessment and transition into operations at Air Force, the intelligence community and commercial companies," reads the contract award. "With great potential for both military and commercial analytics, applications range from mapping crop damage in precision agriculture to commercial vehicle counting to adversary order of battle monitoring."
VIGILANT, at least in 2016, was pitched as both sword-adjacent and a ready-made plowshare.
The Pentagon Gets Interested in Automated Target Recognition
The military had begun looking in earnest at object recognition after academic researchers demonstrated in a very narrow 2015 test that computers could do better than humans at labeling objects.
"As far as the Department of Defense was concerned, that was an important day," former Deputy Secretary of Defense Robert Work said in an interview with The Atlantic.
The 2016 contract award for VIGILANT focused only on satellite footage, and came with the premise that it would release "both the framework and the analytics as open source," facilitating its use by organizations outside of government.
A year after the initial award of VIGILANT, the DoD started funding Project Maven, with the stated objective of developing algorithms to label data in drone-collected videos, and then figure out how the military could incorporate those algorithms in planning.
Maven, like VIGILANT, was about using AI to shift the balance of time. In May 2017, John Shanahan was an Air Force lieutenant general overseeing a range of emerging tech acquisitions. He told Defense One that the goal of Maven was to clean up video, "finding the juicy parts where there's activity and then labeling the data." This would replace the work done at the time by three-person teams of analysts. With the AI doing a first pass over the video, the human analysts would in theory be able to devote more of their time to confirming highlighted findings, instead of discovering and analyzing changes themselves.
"Project Maven focuses on computer vision -- an aspect of machine learning and deep learning -- that autonomously extracts objects of interest from moving or still imagery," Marine Corps Col. Drew Cukor, head of the Algorithmic Warfare Cross-Function team, said in a DoD press release in July 2017.
While VIGILANT trained on satellite imagery, and Maven on drone images, any hard separations between the programs would blur with the launch of VIGILANT 2.
VIGILANT 2, awarded by the Air Force to Kitware in February 2018, expanded the focus to electro-optical sensors, primarily but not exclusively on satellites. Working with data across a range of sources was a goal of the software outlined in the first contract. In VIGILANT 2, it becomes explicit, with the contract noting, "While commercial satellite data will be the focus of this effort the technology developed can be applied to other electro-optical platforms both in the air and space domains."
That meant building a model with, and for, data from satellites and data from aircraft, including footage recorded by military drones. To be most useful, the algorithm processing that data would have to be able to work across a range of sensors. In practice, the algorithm was to deliver "change detection capabilities," by contrasting recent images with past collected imagery. That's a data processing challenge and a data-identifying challenge.
When done by open-source analysts, change detection often starts with identifying a point of interest, and then looking backward in time at earlier images while fixated on that point to recognize any small changes. The process, useful for investigations, is a kind of after-the-fact assessment. For the military, which wants to direct the movement of people and vehicles in real time, automated analysis could identify changes of military significance faster – at least, that's the idea.
If the algorithm could correctly identify a vehicle, and if it could track it across satellite and drone footage quickly enough, and comprehensively enough, then what VIGILANT 2 offered would be the means to see and follow that vehicle's movements across a country, possibly leading the military to locate and then target insurgent networks. That premise, tantalizing as it is, comes with deep caveats at every stage, from the specificity of tracking to even correctly identifying a vehicle in the first place.
Models of Failure
Accurately identifying a target, especially at distance from an actual firefight, can be a difficult task. Doing so in accordance with the laws of war, which set standards for when members of a military are legally allowed to pull triggers, is an elaborate process, one for which the Joint Chiefs of Staff published a 230-page manual in 2016. The manual, which was made available to the public in November of that year, emphasizes the importance of correctly identifying a target, noting, "In extreme cases, failure to exercise due diligence in target development can result in outcomes that have negative strategic repercussions for the United States and its allies."
Because so much in military targeting hinges on the identification being correct, it makes the stakes for any targeting tools high. This is especially difficult in a field, like computer vision, where errors are an almost inevitable part of development.
"Machine learning is only ever going to be as good as the quality of its labels. It is only ever going to be that good," said Liz O'Sullivan, who in January 2019 left a job at the tech company Clarifai over objections to that company's work on Project Maven. "If you have too many blue and orange people in your data set, and you're trying to identify pink people, it's not going to work as well. And it's going to have a higher error rate. What degree of error rate is acceptable when you're taking a human life?"
O'Sullivan left Clarifai after realizing that the company's technology would end up being used for military operations.
"Part of my journey was basically realizing that if you want to advance the science of object detection, you're just inadvertently going to be contributing to this global arms race. And so it didn't make any sense for me to ultimately continue down that path, " she said.
The Air Force Research Laboratory, which commissioned VIGILANT 2, did not respond to a request to comment for this story.
One of the special challenges for VIGILANT 2 is that it was training not only with publicly available data, but also classified data. It would also incorporate synthetic data into its analysis, so that the object identification algorithm could learn to find certain types of objects for the military without those items having been captured by satellites or drone footage. In these cases, Kitware would build 3D models of objects and then incorporate them into the process of object identification.
Incorporating synthetic data for rare or hard-to-observe events is a fairly common process in training identification algorithms. The stakes of doing it for the military, and generating synthetic data that anticipates as-yet undetected possible targets, risks targeting decisions being made on imagined fears, coded through AI into legibility.
"They're trying to use computer vision to read minds, and it's not a crystal ball," said O'Sullivan. "It can't see through walls, and the assertion that it can infer intent based on patterns is just outrageously naive."
Consider the steps one would take to identify a suspected chemical weapons facility. Are there external tanks or barrels? If those are known and modeled, can that model be reasonably incorporated into existing footage of facilities? And what happens if a building with such a setup is selected as a valid target? If the intelligence, modeling and targeting were all correct, then it's possible a strike on such a facility would meet its military objectives.
If any part of it was wrong, then a wholly innocuous facility that just happens to look like a valid target ends up destroyed, and lives are likely lost, too.
Some strikes, and the possibility of helpful data, are deliberately left off video. In a Dec. 12, 2021, story, The New York Times detailed the actions of Talon Anvil, a military unit charged with finding targets for the U.S. war against ISIS in Syria and Iraq. Talon Anvil operated from 2014 through 2019, a timeline that overlaps with the first known operational uses of Project Maven for computer-assisted object identification and targeting.
As reported, in a move designed to avoid accountability for casualties inflicted by its strikes, "Talon Anvil started directing drone cameras away from targets shortly before a strike hit, preventing the collection of video evidence."
In an update to the VIGILANT 2 contract, from September 2020, a work order specifically requested that more of this data labeling work be done by unsupervised machine learning. Unsupervised learning is a process that puts a tremendous amount of trust in AI to find similarities in images, and then group those found objects together in useful categories, rather than having a human dictate those categories.
This was a design call that leans toward speed of identification over accuracy, making more labels available quickly at the expense of training the algorithm to more accurately find known quantities.
"The contractor will explore techniques and algorithms that will provide the USAF with a high degree of [Automatic Target Recognition] flexibility by producing new Deep Learning based models that can be trained as fast as possible with the lofty goal of hours," reads the contract.
This is in contrast to existing targeting systems, which can take months or years to build and are sensor-specific. By asking Kitware to build Automatic Target Recognition that can load in hours onto a new camera, the Air Force is suggesting that the process itself is sufficiently trustworthy to be put into combat rapidly.
Another addition outlined in the VIGILANT 2 contract is an emphasis on incorporating sensors from satellites as well as sensors from aircraft into the same analysis. The Air Force also requested that the identification software be designed to fit on smaller devices, able to run entirely as much as possible on a satellite or a plane.
While this is still specified as a "user-in-the-loop" technique, meaning a person would still be involved in the analysis, the ability to process intelligence on a machine without sending it back to the computer of a human operator means the human would, at best, be approving targeting assessments made by an algorithm, instead of having the option to review both the data and the assessment.
In the world of public and peer-reviewed research on computer vision, examples of algorithmic error abound. In one of the more famous examples, researchers had a trained algorithm that correctly identified a Granny Smith apple with 85% confidence. But when the researchers instead put a paper on the apple that read "iPod," the algorithm said it was an iPod with 99.7% confidence.
Placing trust in algorithms for targeting decisions, even if it is just the initial winnowing down of collected evidence, means leaving military actions open to unique errors from machine intelligence.
The military is investigating some of these limitations, and the results are not promising. This month, Air Force Maj. Gen. Daniel Simpson described a target-recognition AI, trained on a specific angle of a missile. Fed a different angle of that same missile, the algorithm correctly identified it only 25% of the time. But more troubling for Simpson, Defense One reports, is that "It was confident that it was right 90 percent of the time, so it was confidently wrong. And that's not the algorithm's fault. It's because we fed it the wrong training data."
VIGILANT by Kitware is just one of a host of software tools built to bring data processing from the online worlds of Silicon Valley to the life-or-death stakes of military operations. As of September 2020, VIGILANT 2's contract award was for up to almost $8 million, or about 1/10th the cost of a single F-35A Joint Strike Fighter.
Yet despite its huge promise at discount rates, the technology is fundamentally trying to solve a human problem. AI can look for patterns in video footage, and it can make approximated guesses about what objects it has found on the ground below. But it is doing so at human direction, from where the cameras are put to what kinds of objects it is told to look for. What AI mostly adds to that is a kind of distancing between the fallible decisions of intelligence collection and the false certainty of algorithmic assessment.
"We've always been worried that the military is going to take their horrible track record of drone violence and use that as a training set to generate a model," O'Sullivan said. "And in doing so lock in all of the mistakes that we've made in the past as predictive of being what kinds of mistakes will happen in the future. And I think there's a really great risk of that happening with this case."