AI Model Performance metrics

less than 1 minute read

Published:

- Perplexity

  • this measure reliably evaluates the model’s text prediction accuracy.
  • A language model with a perplexity of 20 is choosing from 20 equally likely options for each word in a sequence, per www.comet.com.
  • Lower perplexity signifies greater confidence and better predictions.

- Temperature

  • controls the randomness of output.
  • A common issue is hallucination, where a model generates false information.
  • Lowering the temperature can reduce, but not eliminate, hallucinations.

- Latency

evaluates a model’s response time, essential for interactive experiences.

  End-to-End Latency (E2E) = Network Latency + Model Processing Time + Output Generation Time + Post-processing Time.

- Token Throughput

the model’s processing speed and its token handling per second.

```For example, if a model generates 1,000 tokens in a batch of 50 within 10 seconds, the throughput is (1,000 * 50) / 10 = 5,000 tokens per second.```

*** Note: these metrics are important for model monitoring and improvement.