Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wish someone would also thoroughly measure prompt processing speeds across the major providers too. Output speeds are useful too, but more commonly measured.


In my use case for small models I typically only generate a max of 100 tokens per API call, with the prompt processing taking up the majority of the wait time from the user perspective. I found OAI's models to be quite poor at this and made the switch to Anthropic's API just for this.

I've found Haiku to be a pretty fast at PP, but would be willing to investigate using another provider if they offer faster speeds.


OpenRouter has this information


I do not see prompt processing, only some kind of nebulous “throughput” that could be output or input+output, but definitely not input only.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: