Filtering data with a pre-trained model

argosopentech · March 12, 2022, 1:56am

The technique suggested here is to use a pre-trained translation model to translate your source data, compare your generated translation to the target data, and then filter out the data that doesn’t match your generated translation well. You then retrain a new model on the filtered data.

argosopentech · March 12, 2022, 2:27am

If you had much more powerful language models in the future you could even try something like this:

Is this a high quality translation? (yes/no)

__en__ Hola Mundo

Hello World

yes