LibreTranslate Community

Efficient Large Scale Language Modeling with Mixtures of Experts

Argos Translate

research

argosopentech December 15, 2023, 11:08pm 2

Related:

Mixture-of-Experts with Expert Choice Routing General

Sparsely-activated Mixture-of-experts (MoE) models allow the number of parameters to greatly increase while keeping the amount of computation for a given token or a given sample unchanged. However, a poor expert routing strategy (e.g. one resulting in load imbalance) can cause certain experts to be under-trained, leading to an expert being under or over-specialized. Prior work allocates a fixed number of experts to each token using a top-k function regardless of the relative importance of diff…

Mixtral open source LLM by Mistral AI Argos Translate

Mixtral is a recently released open source language model similar to ChatGPT. I’ve been experimenting with Mixtral by Mistral AI for the last day and it’s pretty good, on par with ChatGPT-3.5. I tried Anyscale’s API to run the Mixtral model and it worked great. You have to make an account but light use for testing is free. Anyscale’s API is compatible with the OpenAI API so it’s easy to switch between providers to get a better price. https://docs.endpoints.anyscale.com/

show post in topic

Home
Categories
Guidelines
Terms of Service
Privacy Policy

Powered by Discourse, best viewed with JavaScript enabled