Sign up FAST! Login

Machine learning program fixes 10 times as many open-source code errors as its predecessors.

Stashed in: Turing, MIT TR, Machine Learning, Software is eating the world., Artificial Intelligence, Machines Writing Code, Deep Learning

To save this post, select a stash from drop-down menu or type in a new one:

Someday soon machine learning algorithms will be fixing all the bugs in software.

MIT researchers have developed a machine-learning system that can comb through repairs to open-source computer programs and learn their general properties, in order to produce new repairs for a different set of programs.

The researchers tested their system on a set of programming errors, culled from real open-source applications, that had been compiled to evaluate automatic bug-repair systems. Where those earlier systems were able to repair one or two of the bugs, the MIT system repaired between 15 and 18, depending on whether it settled on the first solution it found or was allowed to run longer.

While an automatic bug-repair tool would be useful in its own right, professor of electrical engineering and computer science Martin Rinard, whose group developed the new system, believes that the work could have broader ramifications.

“One of the most intriguing aspects of this research is that we’ve found that there are indeed universal properties of correct code that you can learn from one set of applications and apply to another set of applications,” Rinard says. “If you can recognize correct code, that has enormous implications across all software engineering. This is just the first application of what we hope will be a brand-new, fabulous technique.”


Users of open-source programs catalogue bugs they encounter on project websites, and contributors to the projects post code corrections, or “patches,” to the same sites. So Long was able to write a computer script that automatically extracted both the uncorrected code and patches for 777 errors in eight common open-source applications stored in the online repository GitHub.


As with all machine-learning systems, the crucial aspect of Long and Rinard’s design was the selection of a “feature set” that the system would analyze. The researchers concentrated on values stored in memory — either variables, which can be modified during a program’s execution, or constants, which can’t. They identified 30 prime characteristics of a given value: It might be involved in an operation, such as addition or multiplication, or a comparison, such as greater than or equal to; it might be local, meaning it occurs only within a single block of code, or global, meaning that it’s accessible to the program as a whole; it might be the variable that represents the final result of a calculation; and so on.

Long and Rinard wrote a computer program that evaluated all the possible relationships between these characteristics in successive lines of code. More than 3,500 such relationships constitute their feature set. Their machine-learning algorithm then tried to determine what combination of features most consistently predicted the success of a patch.

“All the features we’re trying to look at are relationships between the patch you insert and the code you are trying to patch,” Long says. “Typically, there will be good connections in the correct patches, corresponding to useful or productive program logic. And there will be bad patterns that mean disconnections in program logic or redundant program logic that are less likely to be successful.”

Software is eating... Software. 

This is really cool.

You May Also Like: