Latency

Simon BudziakCTO
Latency in AI refers to the time delay between sending a request to a model and receiving the response. It is a critical metric for user experience, especially in real-time applications like voice agents or interactive chatbots.
It is often measured in:
It is often measured in:
- Time to First Token (TTFT): How fast the model starts writing.
- Total Generation Time: How long it takes to finish the complete answer.