> We also found evidence that models that have seen the problems during training...

		adityamwagh 17 hours ago \| parent \| context \| favorite \| on: SWE-bench Verified no longer measures frontier cod... > We also found evidence that models that have seen the problems during training are more likely to succeed, because they have additional information needed to pass the underspecified tests. No shit, Sherlock!

		help