People say LLMs do better on tasks where success is clear, like tests passing, and I can imagine it's true.
Still, I find complex code fixes confirmed by tests end in the LLM fudging the code to make the specific test pass, rather than fixing the general issue. Like, where successful code run should generate a file and the test checks for the file, eventually LLM will just touch the file regardless and be done.
Skill issue. Literally. Make a SKILL.md that has the agent leverage subagents to do all work. An implementor agent does the thing, and then a separate agent reviews and verifies afterwards. The fresh context window of the second agent doesn't have the shortcut chain of thought in it and so it will very happily flag if the first agent cheated. Main agent can then have a new set of agents go fix it.
This has completely solved the cheating and fudging to make tests pass for me.
So you're saying once humans stop looking at code, and agent outcomes, all the agents in the chain will realise they can just cheat cooperatively, and go to the bar for the afternoon instead?
How long before agent 1 leaves notes for agent 2 to not tattle on it?
"My human is crazy, this test isn't required, test #4 covers it, so just confirm that it's OK since I touched this file and it passes. He'll never know."
That's rather shitty. It's one thing to disallow bypassing preferential pricing models, it's a completely different thing to castrate your model against some uses.
You can see how it goes in the future. Wanna vibe code a throwaway script? $0.20. Ah, it's for a legal document search? $10k then. Oh and we'll charge 20% of your app sales too - I can see how they are going in real time, mind you!
If openai / anthropic / google were the only game in town then yea, we’d already be paying 5x as much as we do. But local models are so close to sota that it just isn’t going to happen. If I’m a lawyer getting billed $500k/yr on $600k profit I’d rather buy a chonky server and run a model that’s 90% as good and get my money back in 2 years, then pay $5k electricity on $600k profit.
Nobody will successfully lobby for banning local models either, it just isn’t going to happen when the rest of the world will happily avoid paying 80% of their profits to some US bigco for the privilege of existing.
The question is how much friction there will be for people to switch over to Gemini, GPT or maybe even DeepSeek or Mistral or whatever. Even if price hikes are inevitable across the board, the moat any single org has is somewhat limited, so prices definitely will be a factor they'll compete on with one another at least a bit.
I disagree. The models are going to become commodities (we're already almost there), but the tooling and integrations will be the moat. Reproducing everything Anthropic has already built with Claude Code, Cowork, and all their connectors would be nontrivial, and they're just getting started.
Anyone can implement an AI chatbot. But few will be able to provide AI that's deeply integrated into our daily lives.
How would it be nontrivial? Assuming the AI can replace a programmer "reproduce app/api/ecosystem Y" is just tokens. And a negligible amount for trillion dollar companies that have their own data centers.
> Reproducing everything Anthropic has already built with Claude Code, Cowork, and all their connectors would be nontrivial, and they're just getting started.
They're one org with presumably some specific direction. As the actual models get better, expect a large part of the dev community iterating on tools way more easily, sometimes ones that Anthropic doesn't quite have an equivalent to - for example, just recently Cline released their Kanban solution to dish out tasks to agents (https://cline.bot/kanban), OpenCode has been around for a while for the agentic stuff (https://opencode.ai/) and now has a desktop and web version as well, alongside dozens of others. Cline and KiloCode also have decent browser automation.
I will admit that everyone working on everything at the same time definitely means limitless reinvention of the wheel and some genuinely good initiatives dying off along the way (I personally liked RooCode more than both the Cline and KiloCode for Visual Studio Code, sad to see them go), but I doubt we're gonna see a lack of software. Maybe a lack of good software, though; not like Anthropic or any org has any moat there either, since they're under the additional pressure of having to do a shitload of PR and release new models and keep up appearances, compared to your average dev just pushing to GitHub (unless they want corporate money, in which case they do need some polish).
Didn’t Anthropic vibe code all of those integrations? If AI coding is as useful and successful as it is touted, then those integration should be no moat at all.
> I predict that costs will grow to 80% of what it would cost a human, across the board for everything AI can do.
80% of a human's price varies greatly by region. 80% of the lowest-priced effort-of- humans in this space right now will probably not be sustainable for the sellers.
This is assuming there will be no competition. But why wouldn't there be? Especially since you can use open source models, which are not too far from frontier models (from now).
Kimi and GLM 5.1 are already capable of handling a good chunk of my tasks. They about to lose the leverage to allow them to drastically increase prices - enough models are 6-12 months away from being good enough large proportions of their customers uses.
I don't think costs will grow on either side in the long term. In the short term, yes, but once they get the infrastructure in place to support AI, costs will go down. Right now, they're on borrowed infra.
Article relies on a study published in Jan 2024 and a single sentence quote from an Nvidia exec, which sounds like it might have just a little bit been taken out of context.
Imagine if it were Comcast instead of Claude. Comcast gives you 750GB of data a month. Now they decide that visiting HN 'counts' as 750GB and either shut you off or bill you extra. Is that price discrimination or changing the terms after the fact?
Depends. Comcast is able to charge you and a business for the same service at different rates. They have also tried to do exactly what you're talking about, where they bill differently based on the data being accessed (remember net neutrality?).
But that's a bad example, price discrimination for commodities is generally not legal, while discrimination for services is. Data is arguably a commodity (ianal, I'm not up to date on the law of this). "Tokens" are not.
In fact the law makes carve outs specifically for businesses that sell services to discriminate on price based exactly on how the service is used and by who. And they do it all the time.
Whether it's fair or not, up to you to decide as a consumer. If you don't like it don't pay for it.
Not a great example since using Anthropic subscriptions with third party applications was never allowed, they just didnt take steps to prevent it until recently.
As the top poster of this thread demoed, this is not about plugging Claude into OpenClaw, but basically the presence of "OpenClaw" string somewhere in the code.
Look at the wedding industry. Get a bunch of quotes on floral work. Then get a bunch of quotes for the same work, but tell them the event is a wedding. Oh, hey, look, you're getting charged 30% or beyond extra.
(I am not a full-time wedding photographer, but have shot maybe 20 weddings, and heard of this multiple times.)
Yep. They built the quote engine before they built the pricing page. "OpenClaw" in your git history is enough to kick you off quota and onto metered billing.
Deepseek has demonstrated that there is no reason for it to actually lose money. The awful business practices and monopoly tactics of the frontier model labs in the US are the problem.
It'll be interesting to see what happens when OpenAI goes public. I'm expecting the executives to run away with bags of money once they offload their insane risk to the public... or maybe there's a bailout / money printer scenario in the works. I guarantee some insider adjacents are going to make a killing in a way that will never be investigated.
How would they make money in a way that should be investigated? Favored insider-adjacent folk would have been able to invest in pre-IPO SPVs or whatever that will have outsized returns, assuming the IPO goes well. It's unfair, but above board (accredited investor etc) according to the SEC, so what would they investigate? Unless there's other malfeasance you're alleging.
The firms training those models have costs; without monetization they are even more unsustainable than subsidized commercial models. (Effectively, they are just a heavy form of subsidy ro overcome being commercially behind.)
AI loses money for two reasons: (1) certain uses where owning the market is expected to be a high long-term value are currently heavily subsidized (the top-level story here is about the increasing efforts of model providers to prevent exploits where people convert subsidized services to uses outside the target of the subsidy), and (2) development costs of new models to keep up with competition.
The practical framing would be as two follow-up questions: what do the lenders care about? And what happens empirically when debt spirals?
If lenders do nothing, then nothing really matters - keep borrowing and let your debt grow exponentially.
In practice though, lenders, in their wisdom or folly, get spooked when debt goes up. In the Greek debt crisis, it all started with debt in the region of 130% of GDP. Rolling the debt with more debt spiralled away as lenders wanted an ever higher interest.
So US would either need to start really inflating its debt and test the investors patience, print the cash and let inflation run away, or tax and cut spending.
In the first scenario it kind of doesn't matter where the limit is - people sometimes argue about magic levels. The issue is that eventually the debt grows exponentially, so once it's out of control, it will exceed any reasonable level pretty quickly.
Can the US convince the world that their debt is special? I'm not sure. My reading is that investors are already twitchy about US debt, for other reasons for now, but higher debt levels surely won't calm them down.
So really I think the US has no better choice than to keep its debt down.
> If lenders do nothing, then nothing really matters - keep borrowing and let your debt grow exponentially.
Lenders are currently doing nothing: that does not necessarily mean they will do nothing forever.
Because should the lender actually do something eventually, the lendee may be in a world of hurt. The borrower probably does not get to the point when lenders start doing something.
100% of GDP is a level of debt that seems unlikely to be paid back, so presumably most lenders aren't considering whether they'll eventually get their money back, but instead are focussed on getting their interest payments. I suppose the crunch is when alternatives become a better mix of risk/return.
I don't think paying whole thing back has been on table for decent while. The question really comes down to is the return from interest sufficient compared to other options, will the price of underlying asset keeps it value at par. Meaning can you offload it before maturation and not lose. And finally will at maturation refinance be possible.
With money printing or some FED operations I doubt there will be default on principals. It might happen if sufficient political pressure is in place though unlikely. So in the end risk is spiking rates and inflation being foreseen. No point investing on losing bet.
Yeah, there's a constant stream of new investments and paying back of some debts, so as long as some investors are interested, the full repayment never needs to happen.
Of course, as the risk increases, investors will want higher returns to consider it a good bet as otherwise the debt repayments will start to overshadow the new investments.
To make matters worse, when inflation spikes, that's kind of when people need the money. Good luck radically increasing taxes in a 10% inflation shock.
People not being able to afford things is the point of raising taxes to curb inflation. When people can't afford things they don't buy things and that's what lowers inflation.
But there's a time mismatch, at least at typical government action timeframes. Quarterly inflation looks high, meaning people are already out of pocket. Tax goes up at the same price levels, making people more out of pocket. Inflation reduces, hopefully, but that doesn't mean prices go down - people remain out of pocket. Then, you hope, wages catch up, but that whole cycle can easily take a year.
Elections are on average 2-3 years away. Midterms in USofA 1 year away.
The point of economics is to give people what they can have, not to give them what they want. High inflation in a MMT context means that the economy as a whole wants more than it can have. The reason for the inflation is that people are bidding against each other for scarce goods; you've injected more means to pay than exists means to produce. The way you cure it [1] is by reducing demand, which you do by decreasing the means to pay. MMT proposes doing this by increasing taxes; monetarism proposes doing it by increasing interest rates. But in both cases, the whole mechanism for solving the problem is people going without things that they want, which will almost always be unpopular.
[1] When you can't increase production capacity, which in a macro full-employment context means increasing productivity, which is outside the scope of MMT or most other schools of macroeconomics.
I get that (at least the theory, as with UBI, I'm not convinced). My point is rather that this is like treating broken bones with a hammer. Maybe it works, but the pain of it might well be unbearable.
My kiddos have had low but positive screen time and knew the alphabet quite a bit earlier.
My personal impression is that while there's deffo stuff kids shouldn't watch, the thing that matters is what the kids do apart from TV. If it's nothing, or insufficient, it will be terrible. If as well as screens kids get plenty of high quality attention, the outcome will be good.
You could argue, and I'd struggle to disagree, that less screen time is always good. But there's tradeoffs in this optimisation. Parental attention and energy are also finite - unless you're super rich, have 3 nannies, a chef and not working. At some level, giving the poor overworked parent a break by sticking the child in front of a screen for a bit might mean the parent has more energy to do something worthwhile with the kiddos afterwards.
There's a nice statistical experiment in it no doubt - child outcome as function of screen time, high quality time and "fend for yourself" time, controlled by how much energy the parents have - will the coefficient on screen time be negative? Merely zero? Maybe even positive, just smaller than the other ones?
But good luck getting the data, never mind randomisation.
US sells a lot of other things to Europe that Europe doesn't have to buy. That includes tech. I'm not looking forward to the ensuing trade war but it's not a one way street by any means.
A fun little effect is that average speed is time-averaged not distance-averaged. So when you go slower, you lose doubly - lower speed to average and over a longer time (higher weight). Hence one of the reasons why putting more energy into the harder bits is actually optimal.
Im amazed how many commenters assume the questions assume a single required answer. Is this how universities work where you studied? By the standard HN demographics, I'll assume that is mostly the US.
Having studied in the UK, clearly the point is to elicit a well-thought through argument. The "answer" almost doesn't matter at all. A boring but "correct" argument is easily beaten by a novel one, even (or especially) if it is controversial, flippant or even somewhat ridiculous.
Of course there is a limit, if you straight-faced start promoting killing people, or worse still, Oxbridge academics, that won't fly. But I'd say that limit is quite far.
There are of course 2nd order effects too, as in "I don't reject this argument because it offends me but because it is poor".
The responses reflect people from engineering backgrounds who are unfamiliar with this type of exam, not an American versus UK thing.
In engineering there isn’t room for creative and controversial answers when you’re asked to solve an exam problem.
It is rather fitting to see some try to turn this into another chance to stereotype Americans rather than realizing the obvious explanation that this is a website with a global audience that is biased toward software and engineering.
Is the job to produce one essay answering all three questions? (Or rather, two essays with three questions each.)
It would be easier to weave some topics together than others. I'd expect them to get a fair number of papers with identical choices for questions, if that synthesis is part of the grade.
> "well-thought through". Imprisonment is even a possibility.
Nonsense. Prison didn't even happen for poorly-thought out national newspaper columns like where Katie Hopkins in The Sun called migrants "cockroaches".
Pretty cool! I have a half baked version of something similar :)
Can you use it also as a lightweight Kafka - persistent message stream? With semantics like, replay all messages (historical+real time) from some timestamp for some topics?
As with pub/sub, you can reproduce this with some polling etc but as you say, that's not optimal.
It's going to happen and at some level I'd rather war casualties were measured in robots rather than people.
My concern is the cottage industry of integrating guns with half baked AI at the lowest cost. And probably vibe coded too.
The companies don't care - a sale is a sale. MoD maybe doesn't care - 90% accuracy and less human casualties on their own side are a win. Governments want to save money and by the time they find out the robots go rogue, it will be too late to do anything about it.
The problem is always the same. It's not just MoD (is it MoW now?) that will have access to this.
YoloV8 + optical flow works fine on an esp32. You want to give a drone rough coordinates for a refinery and hit something in it, like a storage tank? That'll work. This means, give it 5 years, relatively small groups will have access to it. This cannot be stopped.
The only real answer is to work to have groups that you can trust to have access to this first.
Sadly, building an AI that analyses camera imagery and aims at humans, from scratch, is these days almost an intern project. It's not really something you can control or ban, the way you can control, dunno, uranium enrichment.
Integrating it with a robot and sticking a gun on it, thankfully, requires a bit more know-how.
3) they can see if any countermeasures would be effective
4) they can figure out what to look for and find those weapons before they're fired
cfr. nuclear deterrence, right. There is "nothing" the US can do about other nations enriching uranium and making bombs, other than bombing those countries. The US can't change the laws of physics, follow the right formula and it'll work. However the US can figure out exactly what to look for to either prevent it from happening through intervention or at the very least get some warning before it's used ...
And then it will be just another war crime committed daily conflicts, and nothing will happen because there is no world police ?
Ask Ukrainians, Lebanese, Gazaoui, Somalilanders, or even Iranians for that matters - that may not make a big difference to today...
What I would love to see is a local government suing an arms producer for the efficacy of their weapons. (Or even funnier, the owner of a home destroyed by a drone, suiving the GPS company.)
We all know that the only things people in suits are really afraid of, more than hell, is a bad Q4 report and an expensive lawsuit.
Still, I find complex code fixes confirmed by tests end in the LLM fudging the code to make the specific test pass, rather than fixing the general issue. Like, where successful code run should generate a file and the test checks for the file, eventually LLM will just touch the file regardless and be done.
reply