CAMBRIDGE, MA—What if people never again had to worry about software applications exposing them to cyber-attacks? A new tool in development, called DeepCode, could leave that task behind and let applications analyze and fix themselves.
Developed by Draper and Boston University, DeepCode is designed to comb through millions of lines of software code in seconds, pin down the location of vulnerabilities and make those defects not just easier to find but faster to fix. Experts estimate millions of vulnerabilities exist in software used by banks, hospitals and governments, and most of those defects go undetected and unpatched.
DeepCode tackles one of the trickiest issues of analyzing software applications, that of finding, prioritizing and repairing the worst-case vulnerabilities. Handling large datasets like that is possible because DeepCode is based on deep learning. To find the problems, the Draper-Boston University team equipped DeepCode with machine-learning algorithms that are capable of using natural, not synthetic, datasets, ensuring an approach that fits real-world situations.
Using real-world data is an advance on the current crop of static analyzers that rely on hard-coded rules to detect exploitable vulnerabilities in software, according to Rebecca Russell, a machine learning scientist at Draper and an author of a paper on the topic. “One of the biggest benefits of DeepCode is its ability to learn the patterns of vulnerabilities just from examples,” Russell said.
Russell and her colleagues trained DeepCode on millions of open source C and C++ functions. They also compared it against three open source tools and found DeepCode was more accurate. The result of their efforts, described in an upcoming paper accepted at IEEE ICMLA 2018, is a fast and scalable vulnerability detection tool based on deep feature representation learning that directly interprets source code—a first, the researchers said.
A second paper—accepted for the 2018 NIPS conference and developed by a second Draper-Boston University team—describes DeepCode’s ability to automatically repair software vulnerabilities it finds. Specifically, the team employed generative adversarial networks (GANs), which are deep neural networks that pair two systems together that are able to train themselves without human supervision. A good example of GANs is its ability to generate new data based on the data set it has trained on—essentially, repairing bad source code using good source code as the ‘teacher.’
“Using this approach, DeepCode can do what other repair systems can’t, which is to repair source code even when it doesn’t have labeled code pairs,” said Onur Ozdemir, a machine intelligence scientist at Draper and an author of the paper. “This type of ground-truth data is extremely hard to curate in real life and simply does not exist in large amounts enough to train deep learning models.”
Draper plans to continue developing DeepCode into a software program that can be deployed via a laptop in seconds—“and that’s a preferred scenario for customers who don’t want their software taken off-site for vulnerability detection and repair,” said Marc McConley, DeepCode’s technical director. DeepCode also suits customers with large, already-deployed software applications and databases, such as large enterprise customers in defense, the military, business and government.
“DeepCode can help you review code you already have and help prioritize your corrective actions. Realistically, a security analyst can only look at a subset of functions in a large, multimillion line codebase. DeepCode will flag the functions that most likely contain vulnerabilities—making for a much more efficient triage effort,” he said.
Funding for DeepCode was provided by the Air Force Research Laboratory and the DARPA Mining and Understanding Software Enclaves (MUSE) program.
Draper, which has amassed a large body of vulnerability, exploit, malware, rootkit and backdoor information, provides cyber security capabilities to commercial, government and nonprofit customers who are increasingly concerned about the next cyber threat. Draper has previously applied its multidisciplinary engineering capabilities to a variety of related programs including inherently secure processors; machine learning to combat online extremism, cyberbullying and other abuse of social media applications; and cryptographically encoded, high-bandwidth communications for UAVs.