Mistral AI has released its new Mixtral 8x7B language model, detailing its performance in a new blog post. It is claimed to be the best open language model currently available.
At the end of last week, Mistral released a new language model via a torrent link. Today, the company has released more details about the Mixtral 8x7B model, as well as announcing an API service and new funding.
According to the company, Mixtral is a sparse Mixture-of-Experts (SMoE) model with open weights, licensed under Apache 2.0. A similar architecture is rumored to be used by OpenAI for GPT-4. Mixtral selects two of the eight parameter sets for a query and uses only a fraction of the total number of parameters per inference, reducing cost and latency. Specifically, Mixtral has 45 billion parameters but uses only 12 billion parameters per token for inference. It is the largest model to date from the start-up, which released the relatively powerful Mistral 7B in September.
Mixtral 8x7B outperforms Meta’s LLaMA 2 70B
According to Mistral, Mixtral outperforms Llama 2 70B in most benchmarks and offers 6 times faster inference. It is also said to be more truthful and less biased than the Meta model. According to Mistral, this makes it the “strongest open-weight model with a permissive license and the best model overall regarding cost/performance trade-offs.” In standard benchmarks, it also matches or outperforms OpenAIs GPT-3.5.
Mixtral handles up to 32,000 token contexts, supports English, French, Italian, German, and Spanish, and can write code.
Mistral releases Instruct version of Mixtral
In addition to the base Mixtral 8x7B model, Mistral is also launching Mixtral 8x7B Instruct. The model has been optimized for precise instruction through supervised fine-tuning and Direct Preference Optimisation (DPO). It achieves a score of 8.30 in MT-Bench, making it the best open-source model with performance comparable to GPT-3.5.
Mixtral is available in beta on the Mistral platform. The smaller Mistral 7B and a more powerful prototype model that outperforms GPT-3.5 are also available there, Mistral said.