Mixtral open source LLM by Mistral AI

Mixtral is a recently released open source language model similar to ChatGPT. I’ve been experimenting with Mixtral by Mistral AI for the last day and it’s pretty good, on par with ChatGPT-3.5.

I tried Anyscale’s API to run the Mixtral model and it worked great. You have to make an account but light use for testing is free. Anyscale’s API is compatible with the OpenAI API so it’s easy to switch between providers to get a better price.

https://docs.endpoints.anyscale.com/

1 Like

My understanding is that Mixtral uses a Mixture of Experts architecture of some sort.

I’ve been quite impressed by OpenChat (https://openchat.team/), it’s FOSS and Apache2 licensed. Similar quality to ChatGPT 3.5.

I also use Chatbot Arena Leaderboard - a Hugging Face Space by lmsys to keep track of the latest challengers to OpenAI.

1 Like

We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every token, at each layer, a router network selects two experts to process the current state and combine their outputs. Even though each token only sees two experts, the selected experts can be different at each timestep. As a result, each token has access to 47B parameters, but only uses 13B active parameters during inference. Mixtral was trained with a context size of 32k tokens and it outperforms or matches Llama 2 70B and GPT-3.5 across all evaluated benchmarks. In particular, Mixtral vastly outperforms Llama 2 70B on mathematics, code generation, and multilingual benchmarks. We also provide a model fine-tuned to follow instructions, Mixtral 8x7B - Instruct, that surpasses GPT-3.5 Turbo, Claude-2.1, Gemini Pro, and Llama 2 70B - chat model on human benchmarks. Both the base and instruct models are released under the Apache 2.0 license.