Engineers who do research and development (R&D) at companies that work with the U.S. Department of Defense often get the opportunity to work on cutting-edge technology. Sometimes, technology developed in R&D programs isn’t used on the traditional battlefield, but it is just as mission critical.
Cyber security is one example. One successful cyber program with far-reaching defense, and potentially commercial applications, is the U.S. Defense Advanced Research Projects Agency (DARPA) SafeDocs program.
The challenge: Every day, individuals and organizations in the military, government, and commercial industries receive electronic content, including Portable Document Format (PDF) and other and digital media files. This content can come from unauthorized or potentially compromised sources which creates security risks. In today’s world, an attacker can hide malware in plain sight.
FAST Labs’ solution: As announced in December 2019, BAE Systems’ FAST LabsTM research and development organization was contracted under the SafeDocs program to develop new cyber tools designed to help prevent vulnerabilities in electronic files that can lead to cyberattacks.
Phase 1 and Phase 2 of the SafeDocs program set forth the lofty goal of dramatically improving software’s ability to detect and reject invalid or maliciously crafted input data. Based on the success of its performance in these two phases, FAST Labs was recently awarded a Phase 3 option to collaborate with defense and industry partners to refine its toolset.
“As is often the case with disruptive early technology R&D programs, when we first began work on the Phase 1 contract, our approach was all conjecture,” said David Woolrich, technical director at BAE Systems’ FAST Labs. “At the start, we didn’t have any fully built tools, proven mathematical theorems, or ideas tested against actual data. Now, we have done all three with really solid results.”
FAST Labs’ R&D team created a tool suite to understand and identify safe features of electronic data formats using a Language-Theoretic Security (LangSec) approach developed by DARPA project manager Sergey Bratus and his collaborators. LangSec offers a systematic approach towards parser design/input validation. A parser, which is used to break data inputs down into manageable objects for further processing, can itself contain exploitable flaws and behaviors. The team tied its approach back to the underlying format grammar to classify files and identify areas of improvement in existing parsers.
Unique approach: The FAST Labs R&D team developed a technique that uses multiple existing file parsers that look for a wide range of features to detect malicious payloads in files. For a complex format such as PDFs, which can contain text processing, image rendering, links, and JavaScript, no single parser can analyze all of these features or detect all possible PDF flaws/malicious features. Having a diversity of parsers leaves little room for attackers to hide. FAST Labs’ tools exploit this diversity theoretically and practically.
Additionally, this approach is based on the combination of topological methods and statistics, which are not commonly used in formal computer language design or analysis, so this work is the first-ever attempt in this vein. The result has been extremely successful. In a recent exercise of analyzing one million files, all but 44 were correctly classified. This far exceeds the expectations for a tool at this stage of development.
“These are extraordinary results with far-reaching impact. The files we're testing are used by everyone in defense and commercial settings, including PDF, JPEG, MPEG, and CSV files,” added Woolrich. “All of these formats are in daily use, so being able to reliably determine the safety of these files helps everyone.”
Interested in hearing more about the opportunities at BAE Systems’ FAST Labs? Check out career opportunities at https://jobs.baesystems.com/fastlabs.