Issue with the argos-translate master branch/spacy and LibreTranslate

Hi,
I try to develop a dual stanza/spacy SBD for Argos, and I have tried replacing the files from argos 1.9.6 in my LT lab with those from the master branch.
With 1.9.6:

debian@libretranslate...:~$ argos-translate --from en --to de "Hello World!"
Hallo Welt!

Then:

cp /usr/local/lib/python3.9/dist-packages/argostranslate-master/* /usr/local/lib/python3.9/dist-packages/argostranslate/

Then:

debian@libretranslate-venv-b3-16-gra11:~$ argos-translate --from en --to de "Hello World!"
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/argostranslate/sbd.py", line 28, in __init__
    self.nlp = spacy.load("xx_sent_ud_sm", exclude=["parser"])
  File "/usr/local/lib/python3.9/dist-packages/spacy/__init__.py", line 50, in load
    return util.load_model(
  File "/usr/local/lib/python3.9/dist-packages/spacy/util.py", line 472, in load_model
    raise IOError(Errors.E050.format(name=name))
OSError: [E050] Can't find model 'xx_sent_ud_sm'. It doesn't seem to be a Python package or a valid path to a data directory.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/argos-translate", line 3, in <module>
    from argostranslate import cli
  File "/usr/local/lib/python3.9/dist-packages/argostranslate/cli.py", line 6, in <module>
    from argostranslate import translate
  File "/usr/local/lib/python3.9/dist-packages/argostranslate/translate.py", line 411, in <module>
    sentencizer: sbd.ISentenceBoundaryDetectionModel = SpacySentencizerSmall(),
  File "/usr/local/lib/python3.9/dist-packages/argostranslate/sbd.py", line 31, in __init__
    spacy.cli.download("xx_sent_ud_sm")
AttributeError: module 'spacy' has no attribute 'cli'

Course, I installed spacy before doing this, and from my experience in developing the same kind of dual dependency kit and using it on the Locomotive machine I have on the same infra, I can tell that spaci.cli.download works just fine on Locomotive.

Anyone can tell what may be wrong with the code in argos-translate/master? Do I need to open an issue on github for this?

Actually, I post my code from Locomotive, this may be helpful:

stanza_lang_code = config['from']['code']
spacy_utils = os.path.join(utils_dir,"spacy")
 
if os.path.isfile(os.path.join(spacy_dir, "senter", "model")):
    print(f'Spacy library ready.')
    segment_with = 'spacy'
elif os.path.isdir(os.path.join(stanza_dir, stanza_lang_code)):
    print(f'Stanza library ready.')
    segment_with = 'stanza'
else:
    while True:
        try:
            stanza.download(stanza_lang_code, dir=stanza_dir, processors="tokenize")             
            segment_with = 'stanza'
            break
        except Exception as e:
            if str(e).startswith('Unsupported language'):
                print(f'Stanza said: " {str(e)}"; hence, will use spacy multilingual.')
                os.remove(os.path.join(stanza_dir, "resources.json"))
                os.rmdir(stanza_dir)
# Spacy download is very verbose, trying to circumvent it.
                if not os.path.isfile(os.path.join(spacy_utils, "senter", "model")):
                    while True:                        
                        try:
                            spacy.cli.download("xx_sent_ud_sm")
                            print(f'Downloaded spacy model. Writing to utils.')
                            nlp = spacy.load("xx_sent_ud_sm")
                            nlp.to_disk(spacy_utils)
                            break
                        except Exception as e:
                            print(f'{str(e)}.')
                            exit(1)							   
                print(f'Spacy model saved. Copying...')
                shutil.copytree(spacy_utils,spacy_dir)
                print(f'Spacy library ready.')
                segment_with = 'spacy'
                break
            else:            
                print(f'{str(e)}.')
                exit(1)           	

I think I know what’s wrong with the argos code : it’s not waiting until spacy has finished downloading… and returns an error straight away.
I will code a function “get_spacy” in the networking module, cache the spacy model and recode the class to include a Path argument that’s set in the package module depending on whether spacy is explicit (i.e. contained in the package) or not.
Actually, there’s some code about downloading an sbd package that’s obsolete now, I’ll rewrite it too.

2 Likes

The code is In the final debugging stages.
I’ll finish next week, testing the last packages where i included spacy xx, will also try a zh package with zh_spacy instead of stanza, and will make the PR afterwards.
Argos should then support either stanza or spacy language-specific SBDs (in the package), or the spacy xx out of packages.

1 Like

Argos is debugged and functions. Now, LibreTranslate returns the following error, and doesn’t translate further:
image
Any idea?
(Meanwhile on the Argos CLI, nothing wrong:
image
I’ll check spacy packages and make a PR.

It looks like HTML is being passed when JSON is expected somewhere.

OK, so that looks like a type error within the LT code following the commits made since 1.9.6 in argos. Somewhat weird, but who knows.

My server (proxified with wsgi/Apache) returns an error 500 now, and activating LT_DEBUG does not yield anything.

I am going to check spacy on the CLI before debigging the LT code on my workstation w/ PyCharm. I’ve already got some decent sw-fr packages yesterday.

OK, so, I checked every case scenario and launched the PR in Argos.
Sorry, I am a poor Gitter, I sent 14 commits within this thing… tried to rebase, but to no avail.
One thing though, LT is still down.
I’ll first upgrade it to the very last version, maybe it’ll work again. If not, I’ll put myself to finding out what’s wrong with it.

OK, now that I am sure the code for Argos works, I built a LibreTranslate conda environment and plugged it to a PyCharm project.
Installed the environment, packages, including the ones with spacy (swahili and tatar, the latter I use for dev).
The flask interface on localhost is operational, debug mode returns logs on conda prompt. I’ll get to the bottom of it.
For now, with argos 1.9.6, translating german no problem

127.0.0.1 - - [30/Jan/2025 13:19:09] "POST /translate HTTP/1.1" 200 - 

Translating swahili:

[2025-01-30 13:20:11,890] ERROR in app: Exception on /translate [POST]
...
File "C:\Users\nglec\.conda\envs\LibreTranslate\lib\site-packages\stanza\pipeline\core.py", line 70, in __init__
    raise Exception(f"Resources file not found at: {resources_filepath}. Try to download the model again.")
Exception: Resources file not found at: C:\Users\nglec\.local\share\argos-translate\packages\translate-sw_fr-1_2\stanza\resources.json. Try to download the model again.

After final debug, the LT debug works fine:
Generic spacy, cached


(There’s some improvement margin on the tatar model, I’ll see what I can improve in Locomotive to make better datasets, and when I get some free time, to display alternatives as tabs in LT: the third alternative is way more accurate than the first).

Packaged Spacy


Actually it’s an xx_sent_ud_sm, but within the package directory.

Stanza:


No modesty in this.

As for my lab server, the guys at web ops did not give the service account permission to create a subdir in its $HOME :grimacing:… so it had some trouble creating the new $HOME/.config/argos-translate directory…

Issue closed but if you host an instance, beware permissions.