More

dchftcs · 2026-06-11T04:45:29 1781153129

At this point letting an agent go like this is akin to not leashing your dog in public. It's not easy to draw an accurate line but probably there needs to be real punishment for doing these things.

dchftcs · 2026-06-10T02:12:22 1781057542

If there's a viable way to make all projects low-stakes we'd have done it. Consider this: microservices.

dchftcs · 2026-06-09T19:35:13 1781033713

I suspect this will be a significant problem blocking long-horizon tasks in practice, basically the more turns there are, the larger the chance the classifier produces a false positive. The disappointment of the user will also scale with the length of the task, as you're in the middle of some complex thing and now gets derailed, after already have paid for many tokens.

dchftcs · 2026-06-08T16:32:11 1780936331

Or kids. Or work.

dchftcs · 2026-06-08T08:57:06 1780909026

It's clear to me they are subsidizing inference in exchange for market share, and doing it at this scale makes the most sense if their target is getting more user data. Note that this sort of pricing isn't far off from the equivalent token-based pricing of ChatGPT or Claude subscription plans, which are more clearly subsidized by the user's data.

dchftcs · 2026-06-07T13:35:54 1780839354

The development and acquisition of valuable domain knowledge is a hard, risky, expensive and slow process. Because the valuable domain knowledge isn't yesterday's, it's today's and tomorrow's. In fields where domain knowledge matters, it is also deeply intertwined with engineering - you won't task Jeff Dean to develop Unreal Engine from scratch.

With that said, there are still many SWE principles that are not fully internalized or adequately practiced by domain knowledge experts, and that will remain the case as much as domain knowledge remains valuable, because software engineering is yet but another domain.

dchftcs · 2026-06-07T02:35:21 1780799721

This is a lot tamer than what Claude Code's team claims tbf.

dchftcs · 2026-06-03T11:44:05 1780487045

If the problem resolves to P=NP, that result would probably be more celebratee than being able to formulate the problem, but being able to formulate the problem and get people interested in it is probably worth more than the average primal dual trick to prove a polylog integrality gap for some integer linear program.

dchftcs · 2026-05-30T11:49:23 1780141763

SpaceX is headed by a person who is a strong ally of a politician who openly challenges Denmark's sovereignty over Greenland. Guess you wouldn't mind selling your organs to the same group of powerful people for a few bucks because you're not virtue signalling?

Ray20 · 2026-05-30T12:31:20 1780144280

Yeah, but they say it is because of weapons and climate change and not because of all of this

dchftcs · 2026-05-30T12:55:25 1780145725

Right, you could disagree on which things to prioritize over dollar profits. My main point is that these preferences are not irrational like was asserted. At the scale of a sovereign wealth fund or pensions, you need to care about externalities; in the case of Denmark vs SpaceX you have something relatively concrete, in other cases we need to keep in mind that the goal of these funds are to improve the welfare of who they serve, and see past the dollar signs to take into account the consequences of the investments.

dchftcs · 2026-05-29T16:00:36 1780070436

An article with a title saying tokens per second throughput without any qualifier e.g. what size the model is should immediately be classified as spam.