Yes the back-translation approach is a great method to test accuracy, yet doesn’t beat a linguist’s evaluation.
I’ve been thinking of launching an initiative to create a community group of native speakers that could help evaluate models. I just have to think of the right incentive model because I doubt people will just volunteer time to do this (and if they do, evaluations might not be done in a timely manner).