I did some basic benchmarking for different sentence splitting libraries.
Here are the results:
Model | Average Accuracy | Average Runtime (seconds) |
---|---|---|
Spacy en_core_web_sm | 0.924311498164287 | 0.0250468651453654 |
Spacy xx_sent_ud_sm | 0.924311498164287 | 0.00476229190826416 |
Argos Translate 2 Beta | 0.515548280365557 | 1.87798078854879 |
Stanza en | 0.924311498164287 | 0.0219400326410929 |
It looks like both Spacy and Stanza are pretty accurate for English and can run quickly. The Spacy xx_sent_ud_sm
model is even faster without a loss in accuracy.