Log probability translation scores

argosopentech · September 14, 2024, 8:53pm

Someone emailed me and asked what the score values mean in Argos Translate and CTranslate2. I looked into it and they are a log probabilities.

They want to recreate the probability value which can be done with e^(log prob). However, I’m not certain that a natural is used and not log base 10 or something else. I tried to look at the code but wasn’t able to easily figure it out.

So they made this post on the OpenNMT forum to ask. Just thought I’d post this as a FYI or in case anyone here knows the answer.

pierotofy · September 14, 2024, 9:45pm

I think they might be computed from log_probs:

scores = self.model.generator(dec_out.squeeze(1))
log_probs = log_softmax(scores, dim=-1)

Where dec_out is the decoder layer output in the transformer. (This is from translator.py in OpenNMT). But I might be wrong, perhaps the folks on OpenNMT will provide more insight.

argosopentech · September 15, 2024, 1:02am

I saw a similar “log softmax” function in the CTranslate2 code in scoring.cc

github.com

OpenNMT/CTranslate2/blob/cb16c8e670d47f060c355c52a3009e26e4861d36/src/scoring.cc#L44


      
                                                                     preferred_size_multiple,
                                                                     &lengths);
          const StorageView output_ids = layers::make_sequence_inputs(output_sequences,
                                                                      device,
                                                                      preferred_size_multiple);
          
          decoder.update_output_layer(preferred_size_multiple);
          
          StorageView logits(decoder.output_type(), device);
          decoder(input_ids, lengths, state, logits);
          ops::LogSoftMax()(logits);
          StorageView log_probs = std::move(logits);
          
          StorageView scores(log_probs.dtype(), device);
          ops::Gather(/*axis=*/-1, /*batch_dims=*/2)(log_probs, output_ids, scores);
          
          if (scores.device() != Device::CPU)
            scores = scores.to(Device::CPU);
          if (scores.dtype() != DataType::FLOAT32)
            scores = scores.to_float32();

I asked ChatGPT about it and it said that it was likely the natural logarithm.

In the context of log_softmax, the logarithm is the natural logarithm (logarithm base eee, where e≈2.718e \approx 2.718e≈2.718).

In most machine learning libraries (such as PyTorch or TensorFlow), log_softmax and other log-based operations use the natural logarithm because it simplifies mathematical derivations and gradient computations, especially in backpropagation.

argosopentech · September 15, 2024, 1:13am

It looks like a softmax function generates probabilities. So the log softwmax is probably generating the log probabilities.

The softmax function, often used in the final layer of a neural network model for classification tasks, converts raw output scores — also known as logits — into probabilities by taking the exponential of each output and normalizing these values by dividing by the sum of all the exponentials. This process ensures the output values are in the range (0,1) and sum up to 1, making them interpretable as probabilities.

I’m not sure to what extent these are really “probabilities” versus just a value between 0-1 where the probabilities of all of the possible tokens in the vocab sum to 1. For a translation of a full sentence, not just an individual token, I think the token probabilities are combined together somehow.