We are pleased to announce the release of OpenNMT-py v3.0
The main motivation was to simplify the data loading API which relied on an old version of Torchtext.
We decided to remove completely torchtext from the scope of OpenNMT-py.
We made our best effort to uniformize some code structure but of course it is not perfect. Also, we have not reworked the Library examples and documentation yet. Help is very welcome.
We just released the version 3.0 of CTranslate2! Here’s an overview of the main changes:
The main highlight of this version is the integration of the Whisper speech-to-text model that was published by OpenAI a few weeks ago.
Its architecture is very similar to a text-to-text Transformer model but it uses Conv1D layers to transform the audio features. On GPU, Conv1D layers are implemented using cuDNN which is a new optional dependency.
The current implementation already supports many CTranslate2 features and optimizations such as quantization, asynchronous execution, decoding with random sampling, etc. It is up to 3x faster than the implementation in the Transformers library: