Efficiently Modeling Long Sequences with Structured State Spaces

New paper that looks like it’s a substantial performance improvement over Transformers in some areas.

1 Like