I wish someone would also thoroughly measure prompt processing speeds across the...

JLO64 · 2026-03-17T19:29:22 1773775762

In my use case for small models I typically only generate a max of 100 tokens per API call, with the prompt processing taking up the majority of the wait time from the user perspective. I found OAI's models to be quite poor at this and made the switch to Anthropic's API just for this.

I've found Haiku to be a pretty fast at PP, but would be willing to investigate using another provider if they offer faster speeds.

asselinpaul · 2026-03-17T20:53:19 1773780799

OpenRouter has this information

coder543 · 2026-03-17T22:40:27 1773787227

I do not see prompt processing, only some kind of nebulous “throughput” that could be output or input+output, but definitely not input only.