Researchers from Facebook AI have developed a neural network model that can translate code from one programming language to another.
Their proposed model, called TransCoder is able to translate between the tree popular programming languages: Java, C++, and Python. The method was developed leveraging recent advances in unsupervised machine translation and learns to translate source code in a completely unsupervised way.
Transcoder works by first initializing the model with cross-lingual language model pretraining, which maps code to language-agnostic intermediate representations. Secondly, a denoising auto-encoder is used to train the decoder to output valid sequences even when fed with noisy and corrupt data. In the last step which is back-translation, the model generates parallel data that can be used for training. The final model which researchers trained was a transformer with 6 layers, 8 attention heads, and dimensionality of 1024. Single encoder and single decoders were used for all three programming languages.
Researchers conducted the experiments using the Github public dataset that contains more than 2.8 million open-source repositories. They also collected a test set consisting of 852 parallel functions. The evaluations showed that TransCoder is able to successfully translate programming functions between the three popular languages: Python, Java, and C++. Also, researchers claim that the proposed model outperforms rule-based commercial baselines for programming language translations.
More details about the method can be read in the paper published on arxiv.