Thank you for awesome project. Can someone please provide any recommendation to get maximum perfomance?
I deployed LibreTranslate to Kubernetes. It’s better horizontally scale it (e.g to 5-10 pods), or have one instance with maximum ‘requests’ (k8s terminology) provided? Does GPU server required or it can work fast on CPU only? Can you recommend particular GPU (and why exactly it)?
The underlying translation engine is CTranslate2 which supports CPU and GPU execution and has benchmarking information. In my experience CPU works well and the best way to scale up is to make many simultaneous requests to LibreTranslate on different threads.
Ah, very cool! Would you be interested in sharing the kubernetes scripts (or instructions on how you did that) with others? I know there’s been some interest from others. If you publish those on GitHub we can then link your repo in the README.
Yep! I will post soon own example with Docker build + autoscaling config + deploy to DigitalOcean K8s via GitHub Actions CI