LTEngine - LLM powered local machine translation

Yes, of course. To clarify: I was referring to the cooling of the VRAM modules on the graphics card. If they get hotter than 84 degrees Celsius, the card seems to throttle. I’ve already replaced the thermal pads and installed a heat sink and fan on the backplate, which has lowered the temperature by about 6 degrees Celsius, but it still gets hot. When NOT all 15 threads are used, the RAM temperature rises above 84 degrees.

Here are the specifications. The other PC components may not be as important, as the main load is on the graphics card.

ASRock B850M-X R2.0

AMD Ryzen 5 7500F

Corsair DIMM 64 GB DDR5-5200 (2x 32 GB)

ASUS TUF GeForce RTX 3090 24 GB OC Version

1 Like

Thanks for the answer, can you please tell me what the translation speed is using LTEngine on Gemma 3 27b?

In regard to cooling, consumer graphics cards have vertical cooling slots, not designed for server cases. I got me a RTX 4060 Ti 16GB and it got >80 in a normal PC tower case, even with extra fans. I put that into a 4u server case with adequate cooling and this got the GPU down to 70 degrees at full load. At 70 degrees the GPU quickly ups its fans to more than 45+% load so the temperature only rarely reaches 71 degrees. You need to modify the GPU’s cooling curve to favor low temperature over “quiet”. I’ll try this soon, will report back.

Edit:
How much RAM does a llama.cpp server need? 2x GPU RAM?