Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

there's more details under the Too narrow and too wide tests heading.

It would be interesting to see a deeper investigation, into how the models are dealing with this and whether the successful ones seemed to be trained on the benchmark.

 help



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: