OPT: Open Pre-trained Transformer Language Models

Meta has released a series of (semi) open language models up to 175 billion parameters with similar capabilities to GPT-3.

They’re releasing the largest model only to select research groups and the larger more powerful models will likely be difficult to run on consumer hardware in the immediate future.

In this technical report, we present Open Pretrained Transformers (OPT), a suite of decoder only pre-trained transformers ranging from 125M to 175B parameters, which we aim to fully and responsibly share with interested researchers. We train the OPT models to roughly match the performance and sizes of the GPT-3 class of models, while also applying the latest best practices in data collection and efficient training.

We are releasing all of our models between 125M and 30B parameters, and will provide full research access to OPT-175B upon request. Access will be granted to academic researchers; those affiliated with organizations in government, civil society, and academia; and those in industry research laboratories.

Meta had previously released an open source FairSeq translation model that can be run with CTranslate2.