But while the open source movement has spawned a colossal ecosystem that we all depend on, we don’t fully understand it, say experts like Aitel. There are countless software projects, millions of lines of code, numerous mailing lists and forums, and a sea of contributors whose identity and motivation are often obscure, making it difficult to hold them accountable.
This can be dangerous. For example, hackers have quietly inserted malicious code into open source projects on numerous occasions in recent years. Backdoors can elude detection for a long time, and in the worst cases, entire projects have been handed over to bad actors who take advantage of people’s trust in communities and open source code. Sometimes there are disruptions or even takeovers of the very social networks that these projects depend on. Tracking all of this has been mostly, but not entirely, a manual effort, which means it doesn’t match the astronomical size of the problem.
Bratus argues that we need machine learning to digest and understand the expanding universe of code – which means useful tricks like automated vulnerability discovery – as well as tools to understand the community of people who write, patch, implement and influence this code.
The ultimate goal is to detect and counter any malicious campaign aimed at submitting faulty code, launching influence operations, sabotaging development or even taking control of open source projects.
To do this, researchers will use tools such as sentiment analysis to analyze social interactions within open source communities such as the Linux kernel mailing list, which should help identify who is positive or constructive and who is negative and destructive.
Researchers want to know what kinds of events and behaviors can disrupt or hurt open source communities, which members are trustworthy, and whether there are particular groups that warrant increased vigilance. These answers are necessarily subjective. But at the moment, there are few ways to find them.
Experts worry that the blind spots of people running open-source software make the whole edifice ripe for manipulation and potential attacks. For Bratus, the main threat is the prospect of “untrusted code” running America’s critical infrastructure, a situation that could invite unpleasant surprises.
Here’s how the SocialCyber program works. DARPA has contracted several teams of what it calls “performers,” including small cybersecurity research shops with deep technical skills.