Translating from one language to another is not just a nuisance. The Commonwealth Bank of Australia spent 5 years and a whopping $750 million to convert the code on their platform from COBOL (a programming language first developed over 50 years ago) to Java.
A surprisingly large part of the data in our day to day lives is based on old programming languages, and translating them to newer and more efficient languages is extremely expensive and time-consuming.
According to a new paper, researchers have developed a transcompiler — a system that converts source code from a high-level programming language (such as C++ or Python) to another. When done by humans, this migration is challenging and time-consuming as it requires extensive knowledge in both languages. But if you could train an algorithm to do that for you, you’d save a lot of time and resources.
Facebook researchers trained the new Artificial Intelligence (AI) on open source GitHub projects. The AI is unsupervised, so it learns on its own, mostly by looking for patterns in datasets. It requires a minimal amount of human supervision and expertise.
The TransCoder (as the AI was called) was trained on 2.8 million open source repositories, targeting translation at the function level. The AI started by looking for common keywords like “for,” “while,” “if,” and “try” and also digits, mathematical operators, and common English words or strings that appear in the source code. After the initial training period, the algorithm also undergoes a process of denoising and back-translating components (which was only done on functions).
This is not the first time something like this was attempted — several translation algorithms have already been developed, but Facebook engineers say this AI outperforms them by a ‘significant’ margin.
The results weren’t perfect. The TransCoder was tested on 852 parallel functions in all the 3 languages, exhibiting remarkable (but not perfect) accuracy. When translating from C++ to Java, 74.8% of the functions returned the expected output. From C++ to Python, the figure was 67.2%. The highest accuracy was obtained when translating from Java to C++ (91.6%), and the lowest was obtained from Python to Java (56.1%).
So it’s still not exactly perfect, but the approach has promise — and to make things even better, it can be easily adapted to a number of different programming languages with ease.
“TransCoder can easily be generalized to any programming language, does not require any expert knowledge, and outperforms commercial solutions by a large margin,” the coauthors wrote. “Our results suggest that a lot of mistakes made by the model could easily be fixed.”
While the algorithm hasn’t yet been adapted to languages such as COBOL, it’s only one step away. A quick and cheap revolution could finally be coming to out ATMs.
Andrei's background is in geophysics, and he's been fascinated by it ever since he was a child. Feeling that there is a gap between scientists and the general audience, he started ZME Science -- and the results are what you see today.