I think I’m going to try to start moving towards using spaCy for sentence boundary detection. Stanza works pretty well but it has a lot of bugs [1][2] and requires installing PyTorch which is a ~700MB dependency.
My tentative plan to make this backwards compatible is to:
- Keep including Stanza models in .argosmodel packages when possible
- Continue to support Stanza in Argos Translate with
stanza==1.1.1
I still need to figure out if I want to put the data files for spaCy in the .argosmodel packages or have spaCy download any models it needs on the first run.