PDF Translation

argosopentech · October 25, 2023, 10:56pm

Is PDF translation possible? My general experience with PDFs has been that they works well as a document format but trying to edit them is difficult.

argosopentech · October 25, 2023, 10:57pm

pierotofy · October 26, 2023, 1:34am

Yeah I think you could by using PyPDF2.

hu-man · March 8, 2025, 11:14pm

TL;DR: any update? Roadblocks?

I’m just trying to understand, do the selfhost open source devs who make & use this to translate .txt, .odt, .odp, .docx?!, .pptx?!, .epub, .html, DO THEY USE GOOGLE TRANSLATE FOR THEIR PDFS? For all these years? Or what?

Sorry to vent but I’m rigging my own pdf to odt/epub to pdf converter just to be able to translate my private pdf docs offline, libretranslate being the only offline doc translator which supports more than a handful of languages…

How are you supporting ppt but not pdf? It’s the most irrational contrast, we all use pdf’s, and you out here doing docx and ppts…

pierotofy · March 9, 2025, 7:07am

LibreTranslate is open source software; I understand you want PDF support, but none has had time to work on this feature yet.

If PDF support is important to you, you could either contribute a pull request, or offer to sponsor its development.

argosopentech · March 9, 2025, 9:48pm

Like @pierotofy said this is open source software. And you’re not paying us anything to use it. Most of the people working on it are volunteers. You’re welcome to submit a pull request to add support for PDF translation yourself.

PDF is a more difficult format to parse than ppts and docx. That’s why we currently support ppts and not PDF.

hu-man · March 10, 2025, 1:05pm

I can convert a pdf to epub preserving style, translate it with your software, then convert it back to pdf. But ‘no pdfs because ppts are easier’.

Zero talk about pdfs for 3 years. I would contribute if I saw any kind of acknowledgement and encouragement of this critical need.

So the devs working on this, for years, are translating their private/company pdfs, with google translate.

Sorry for being a ‘nuisance’ & Bye.

PS: This thread sorely called for -dev talk- not -mod talk-. Read the product vibes.

ArtanisTheOne · March 10, 2025, 2:32pm

Argos Translate’s main function is translating input, the provided formats are given to you because they are straightforward to implement and is easily beneficial to users without much maintaining demand. PDFs are messy, all over the place, and just not straight-forward to develop around.

pierotofy · March 10, 2025, 3:22pm

Again, you are welcome to contribute a pull request as we’d love to add support for PDFs. As others have mentioned, you’ll find it’s not as straightforward as you might think it is.

NicoLe · April 5, 2025, 2:49pm

We’ve been adding pdf support to our app (which is a parallel project to LT) because it was highly requested, but so far the feature is very basic and quite destructive in terms of layout (many tags switching upon translation) and even sometimes text coherence (when scanning multiple columns).
I’m going to dive into it later this spring but i don’t know how long it’ll take to get something decent in and out.
Currently, we’re working on other features to process multilingual text or audio inputs properly.