OpenAI has released a new large-scale unsupervised language model that achieves state-of-the-art performance on many language modeling benchmarks.
The model, called GPT-2 is based on a super simple, yet incredibly powerful idea as the results show. OpenAI trained the model to simply predict the next word in 40GB of Internet text. This idea allows the model to learn in an unsupervised manner without any task-specific training.
The results show that GPT-2 is able to perform rudimentary reading comprehension, machine translation, question answering, and summarization by simply learning to predict a single word.
The new model is a successor of GPT model. It is a large transformer-based language model with 1.5 billion parameters, trained on a dataset of 8 million web pages. According to OpenAI, the diversity in the training dataset together with the simple training objective allows the model to learn many tasks across different domains.
GPT-2 outperforms other language models trained on specific domains (like Wikipedia, news, or books) without needing to use these domain-specific training datasets.
As mentioned in the official blog post, OpenAI is not going to release the full trained model, due to their concerns about malicious applications of the technology. However, they released their technical paper along with a smaller model for researchers to experiment with.
The implementation of GPT-2 can be found here.