I think “10-100x faster” is probably an exaggeration. My understanding is that the
mmap system call maps a file into memory. This gives the OS more flexibility and control over loading the model weights into memory but I don’t think it’s going to be 10x faster. It probably is faster though especially for a single inference run.
It’s very exciting to see all of this open source experimentation being done!
Agreed, the way they named the PR is a bit misleading.