Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
adityamwagh
17 hours ago
|
parent
|
context
|
favorite
| on:
SWE-bench Verified no longer measures frontier cod...
> We also found evidence that models that have seen the problems during training are more likely to succeed, because they have additional information needed to pass the underspecified tests.
No shit, Sherlock!
help
Consider applying for YC's Summer 2026 batch! Applications are open till May 4
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
No shit, Sherlock!