Want better AI for the DOD? Stop treating data like currency

Jaim Coddington

Five years ago, the U.S. military was behind on artificial intelligence. The easiest way to explain the state of AI in the Department of Defense was by comparison to Silicon Valley, and it wasn’t a flattering look. When this became apparent, it took time for the defense community to articulate why AI mattered and why it deserved greater attention in terms of budgets and brainpower.

Today, tech savvy leaders throughout the service branches and the combat support agencies have a clear concept of what’s at stake for defense AI. They understand that AI can make violent conflict more precise, more humane and more predictable; that it can reduce the likelihood of war by giving decision-makers better, timelier information and lowering the risk of misunderstanding between strategic rivals. Above all, defense leaders have embraced the reality that AI will give American and allied war fighters a strategic and tactical edge in the next war.

As an alumnus of the Algorithmic Warfare Cross Functional Team, I like to think that Project Maven played an outsized role in this process of discovery. The team’s original mandate was to pathfind — to “kick over rocks” and identify all of the technical and institutional friction points that made defense AI slow and hard.

We found a lot. Network and hardware limitations were profound, and data scarcity shocked some commercial vendors who were used to plentiful, unclassified AI training sets. We learned exactly how sparse cleared engineering talent can be, and we wrestled with the quirks of information assurance. Even mundane office tools were hard — we pushed for government-accredited access to Slack, to no avail.

One insight stands out from lessons learned over four years of bureaucratic. It speaks to an unfortunate cultural tendency, something that senior leaders could abolish if they realized how pervasive and problematic it is: Many data brokers in the defense community still treat data like currency. This was a No. 1 blocker at Project Maven, hands down. The team lost months of time waiting for partner organizations to release data from archives or grant access to data streams. The timeline for some of the team’s requests, even when initiated by senior leaders, could be measured not in days or weeks, but by the passing of seasons.

Data sharing can get political. Popular wisdom says that data is the new oil. As the raw material for AI and machine learning, data has intrinsic value. This is why data labeling companies working for the government sometimes try to make data proprietary after labeling, just to squeeze a little more money and leverage out of the pipeline. And there are legitimate concerns about data proliferation — if AI training data falls into the wrong hands, it could be used to reverse engineer military algorithms for adversarial purposes.

But between government offices, data-as-currency kills progress. It creates bottlenecks in AI pipelines. Slow data means expensive, talented engineers struggling to find work until the data dump finally arrives and the ingest process can begin. It means uncertainty for project managers, who must wait until data ingest and discovery is complete before they can understand the raw materials they have to work with. It means a bored and underemployed labeling workforce. It means vendors sitting on their heels while waiting for the government to provide data access and define priorities of work.

It means that testing and evaluation, the linchpin of artificial intelligence development that tells leaders whether their models are worth deploying at all, is harder to design because of incomplete data discovery and labeled data scarcity. Ultimately, slow data means that deploying performant algorithms for the war fighter is delayed by months, even years. On this timeline, we can forget about agile development and user-driven design.

Courtesy: (c4isrnet)