I managed to run a proof of concept on the stanzaās Italian tokenizer, removing pytorch inference from the loop.
- I edited stanzaās code to run the ONNX exporter, which gave me a ONNX model:
- I removed the pytorch inference code and replaced it with the newly exported ONNX model. The results were identical (although some numerical differences showed up).
[[{'id': (1,), 'text': 'Questa', 'misc': 'start_char=0|end_char=6'}, {'id': (2,), 'text': "e'", 'misc': 'start_char=7|end_char=9'}, {'id': (3,), 'text': 'una', 'misc': 'start_char=10|end_char=13'}, {'id': (4,), 'text': 'frase', 'misc': 'start_char=14|end_char=19'}, {'id': (5,), 'text': '.', 'misc': 'start_char=19|end_char=20'}], [{'id': (1,), 'text': 'Questa', 'misc': 'start_char=21|end_char=27'}, {'id': (2,), 'text': "e'", 'misc': 'start_char=28|end_char=30'}, {'id': (3,), 'text': 'un', 'misc': 'start_char=31|end_char=33'}, {'id': (4,), 'text': 'altra', 'misc': 'start_char=34|end_char=39'}, {'id': (5,), 'text': '.', 'misc': 'start_char=39|end_char=40'}]]
["Questa e' una frase.", "Questa e' un altra."]
In a nutshell, I edited Stanzaās trainer.py
as follows:
def predict(self, inputs):
self.model.eval()
units, labels, features, _ = inputs
if self.use_cuda:
units = units.cuda()
labels = labels.cuda()
features = features.cuda()
pred = self.model(units, features)
# torch.onnx.export(self.model, # model being run
# (units, features), # model input (or a tuple for multiple inputs)
# "/home/piero/Downloads/staging/it.onnx", # where to save the model
# export_params=True, # store the trained parameter weights inside the model file
# opset_version=10, # the ONNX version to export the model to
# do_constant_folding=True, # whether to execute constant folding for optimization
# input_names = ['units', 'features'], # the model's input names
# output_names = ['modelOutput'], # the model's output names
# )
# exit(1)
import onnxruntime
ort_session = onnxruntime.InferenceSession("/home/piero/Downloads/staging/it.onnx", providers=["CPUExecutionProvider"])
def to_numpy(tensor):
return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()
# compute ONNX Runtime output prediction
ort_inputs = {'units': to_numpy(units), 'features': to_numpy(features)}
ort_outs = ort_session.run(None, ort_inputs)
# compare ONNX Runtime and PyTorch results
# np.testing.assert_allclose(to_numpy(pred), ort_outs[0], rtol=1e-03, atol=1e-05)
# return ort_outs[0]
return pred.data.cpu().numpy()
Uncommenting the 2nd to last line uses the ONNX output rather than pytorch.