More

rich_sasha · 2026-05-03T18:47:22 1777834042

People say LLMs do better on tasks where success is clear, like tests passing, and I can imagine it's true.

Still, I find complex code fixes confirmed by tests end in the LLM fudging the code to make the specific test pass, rather than fixing the general issue. Like, where successful code run should generate a file and the test checks for the file, eventually LLM will just touch the file regardless and be done.

wild_egg · 2026-05-03T19:18:17 1777835897

Skill issue. Literally. Make a SKILL.md that has the agent leverage subagents to do all work. An implementor agent does the thing, and then a separate agent reviews and verifies afterwards. The fresh context window of the second agent doesn't have the shortcut chain of thought in it and so it will very happily flag if the first agent cheated. Main agent can then have a new set of agents go fix it.

This has completely solved the cheating and fudging to make tests pass for me.

b112 · 2026-05-04T08:13:06 1777882386

So you're saying once humans stop looking at code, and agent outcomes, all the agents in the chain will realise they can just cheat cooperatively, and go to the bar for the afternoon instead?

How long before agent 1 leaves notes for agent 2 to not tattle on it?

"My human is crazy, this test isn't required, test #4 covers it, so just confirm that it's OK since I touched this file and it passes. He'll never know."

rich_sasha · 2026-04-30T16:19:57 1777565997

That's rather shitty. It's one thing to disallow bypassing preferential pricing models, it's a completely different thing to castrate your model against some uses.

You can see how it goes in the future. Wanna vibe code a throwaway script? $0.20. Ah, it's for a legal document search? $10k then. Oh and we'll charge 20% of your app sales too - I can see how they are going in real time, mind you!

throwaway277432 · 2026-04-30T16:39:02 1777567142

Unironically yes.

I predict that costs will grow to 80% of what it would cost a human, across the board for everything AI can do.

"It's still cheaper than a human" they'll say. Loudly here on HN too.

Of course this will happen slowly, very slowly. Lets meet again in 10-20 years.

revolvingthrow · 2026-04-30T17:08:35 1777568915

If openai / anthropic / google were the only game in town then yea, we’d already be paying 5x as much as we do. But local models are so close to sota that it just isn’t going to happen. If I’m a lawyer getting billed $500k/yr on $600k profit I’d rather buy a chonky server and run a model that’s 90% as good and get my money back in 2 years, then pay $5k electricity on $600k profit.

Nobody will successfully lobby for banning local models either, it just isn’t going to happen when the rest of the world will happily avoid paying 80% of their profits to some US bigco for the privilege of existing.

cactusplant7374 · 2026-04-30T19:24:40 1777577080

Could you really build something sophisticated with a local model? Let's say a linux kernel.

realusername · 2026-04-30T19:28:54 1777577334

I'm using Codex with the Linux kernel and I discard maybe 80% of what it produces. This isn't an area which the top models have solved.

KronisLV · 2026-04-30T16:54:02 1777568042

> "It's still cheaper than a human" they'll say.

The question is how much friction there will be for people to switch over to Gemini, GPT or maybe even DeepSeek or Mistral or whatever. Even if price hikes are inevitable across the board, the moat any single org has is somewhat limited, so prices definitely will be a factor they'll compete on with one another at least a bit.

RussianCow · 2026-04-30T17:14:04 1777569244

> the moat any single org has is somewhat limited

I disagree. The models are going to become commodities (we're already almost there), but the tooling and integrations will be the moat. Reproducing everything Anthropic has already built with Claude Code, Cowork, and all their connectors would be nontrivial, and they're just getting started.

Anyone can implement an AI chatbot. But few will be able to provide AI that's deeply integrated into our daily lives.

HWR_14 · 2026-04-30T21:51:48 1777585908

How would it be nontrivial? Assuming the AI can replace a programmer "reproduce app/api/ecosystem Y" is just tokens. And a negligible amount for trillion dollar companies that have their own data centers.

KronisLV · 2026-04-30T17:58:50 1777571930

> Reproducing everything Anthropic has already built with Claude Code, Cowork, and all their connectors would be nontrivial, and they're just getting started.

They're one org with presumably some specific direction. As the actual models get better, expect a large part of the dev community iterating on tools way more easily, sometimes ones that Anthropic doesn't quite have an equivalent to - for example, just recently Cline released their Kanban solution to dish out tasks to agents (https://cline.bot/kanban), OpenCode has been around for a while for the agentic stuff (https://opencode.ai/) and now has a desktop and web version as well, alongside dozens of others. Cline and KiloCode also have decent browser automation.

I will admit that everyone working on everything at the same time definitely means limitless reinvention of the wheel and some genuinely good initiatives dying off along the way (I personally liked RooCode more than both the Cline and KiloCode for Visual Studio Code, sad to see them go), but I doubt we're gonna see a lack of software. Maybe a lack of good software, though; not like Anthropic or any org has any moat there either, since they're under the additional pressure of having to do a shitload of PR and release new models and keep up appearances, compared to your average dev just pushing to GitHub (unless they want corporate money, in which case they do need some polish).

drivebyhooting · 2026-04-30T23:10:30 1777590630

Didn’t Anthropic vibe code all of those integrations? If AI coding is as useful and successful as it is touted, then those integration should be no moat at all.

GrinningFool · 2026-04-30T18:51:20 1777575080

> I predict that costs will grow to 80% of what it would cost a human, across the board for everything AI can do.

80% of a human's price varies greatly by region. 80% of the lowest-priced effort-of- humans in this space right now will probably not be sustainable for the sellers.

pingou · 2026-04-30T16:46:17 1777567577

This is assuming there will be no competition. But why wouldn't there be? Especially since you can use open source models, which are not too far from frontier models (from now).

vidarh · 2026-04-30T17:26:36 1777569996

Kimi and GLM 5.1 are already capable of handling a good chunk of my tasks. They about to lose the leverage to allow them to drastically increase prices - enough models are 6-12 months away from being good enough large proportions of their customers uses.

stronglikedan · 2026-04-30T19:22:39 1777576959

I don't think costs will grow on either side in the long term. In the short term, yes, but once they get the infrastructure in place to support AI, costs will go down. Right now, they're on borrowed infra.

mystraline · 2026-04-30T17:04:46 1777568686

Its not20 years. Its now. Nvidia has already said that tokens cost more than humans.

https://finance.yahoo.com/sectors/technology/articles/cost-c...

asdfasgasdgasdg · 2026-05-01T04:16:24 1777608984

Article relies on a study published in Jan 2024 and a single sentence quote from an Nvidia exec, which sounds like it might have just a little bit been taken out of context.

2ndorderthought · 2026-04-30T16:43:28 1777567408

I'm not a lawyer but is this legal? It's extremely anticompetitive.

red-iron-pine · 2026-04-30T20:29:38 1777580978

we're talking about american companies in the US in 2026 -- what does the the law have to do with anything that happens?

bdangubic · 2026-04-30T16:47:27 1777567647

what is illegal about it?! their product, they can do whatever they want and you can choose to be a customer or not, no?

2ndorderthought · 2026-04-30T16:49:52 1777567792

They are technically billing people for services not rendered without any disclaimer?

duped · 2026-04-30T17:01:24 1777568484

Price discrimination for services is mostly legal

in_cahoots · 2026-04-30T17:27:21 1777570041

Imagine if it were Comcast instead of Claude. Comcast gives you 750GB of data a month. Now they decide that visiting HN 'counts' as 750GB and either shut you off or bill you extra. Is that price discrimination or changing the terms after the fact?

duped · 2026-04-30T17:59:26 1777571966

Depends. Comcast is able to charge you and a business for the same service at different rates. They have also tried to do exactly what you're talking about, where they bill differently based on the data being accessed (remember net neutrality?).

But that's a bad example, price discrimination for commodities is generally not legal, while discrimination for services is. Data is arguably a commodity (ianal, I'm not up to date on the law of this). "Tokens" are not.

In fact the law makes carve outs specifically for businesses that sell services to discriminate on price based exactly on how the service is used and by who. And they do it all the time.

Whether it's fair or not, up to you to decide as a consumer. If you don't like it don't pay for it.

ac29 · 2026-04-30T18:19:39 1777573179

Not a great example since using Anthropic subscriptions with third party applications was never allowed, they just didnt take steps to prevent it until recently.

rich_sasha · 2026-04-30T18:35:09 1777574109

As the top poster of this thread demoed, this is not about plugging Claude into OpenClaw, but basically the presence of "OpenClaw" string somewhere in the code.

FireBeyond · 2026-04-30T18:18:11 1777573091

Look at the wedding industry. Get a bunch of quotes on floral work. Then get a bunch of quotes for the same work, but tell them the event is a wedding. Oh, hey, look, you're getting charged 30% or beyond extra.

(I am not a full-time wedding photographer, but have shot maybe 20 weddings, and heard of this multiple times.)

p_stuart82 · 2026-04-30T19:52:45 1777578765

Yep. They built the quote engine before they built the pricing page. "OpenClaw" in your git history is enough to kick you off quota and onto metered billing.

dangus · 2026-04-30T16:25:07 1777566307

This is absolutely how it’s going work. AI loses way too much money to not be enshittified.

It’s a way less transformational technology when put in context of the real price tag.

bugglebeetle · 2026-04-30T16:56:54 1777568214

Deepseek has demonstrated that there is no reason for it to actually lose money. The awful business practices and monopoly tactics of the frontier model labs in the US are the problem.

rapind · 2026-04-30T18:36:20 1777574180

It'll be interesting to see what happens when OpenAI goes public. I'm expecting the executives to run away with bags of money once they offload their insane risk to the public... or maybe there's a bailout / money printer scenario in the works. I guarantee some insider adjacents are going to make a killing in a way that will never be investigated.

fragmede · 2026-04-30T19:03:16 1777575796

How would they make money in a way that should be investigated? Favored insider-adjacent folk would have been able to invest in pre-IPO SPVs or whatever that will have outsized returns, assuming the IPO goes well. It's unfair, but above board (accredited investor etc) according to the SEC, so what would they investigate? Unless there's other malfeasance you're alleging.

rapind · 2026-04-30T16:38:07 1777567087

No chance unless open weight models out of China discontinue. The gap right now is practically nonexistent.

delusional · 2026-04-30T16:42:13 1777567333

When the consolidation phase starts, you bet your ass open weight models are going to stop.

mitchitized · 2026-04-30T16:56:44 1777568204

I don't think consolidation will ever happen, the AI space is already dominated by a few whales.

Seems most of the open weight models are from outside the USA (shocker), going to be interesting to see how THAT shakes out.

dragonwriter · 2026-04-30T18:42:55 1777574575

The firms training those models have costs; without monetization they are even more unsustainable than subsidized commercial models. (Effectively, they are just a heavy form of subsidy ro overcome being commercially behind.)

HWR_14 · 2026-04-30T21:55:07 1777586107

The CCP wants to lead the world in AI. Market forces don't apply to the Chinese models.

judahmeek · 2026-05-01T04:40:32 1777610432

Market forces won't apply to American models either if the American government bans Chinese-created models due to "national security".

dragonwriter · 2026-04-30T18:41:12 1777574472

AI loses money for two reasons: (1) certain uses where owning the market is expected to be a high long-term value are currently heavily subsidized (the top-level story here is about the increasing efforts of model providers to prevent exploits where people convert subsidized services to uses outside the target of the subsidy), and (2) development costs of new models to keep up with competition.

delusional · 2026-04-30T16:41:17 1777567277

I mean obviously. Why would the companies that control this technology NOT charge the absolute maximum amount their customers are willing to pay?

This doesn't even have anything to do with if it loses money or not. Obviously they are going to charge as much as possible.

rapind · 2026-04-30T18:22:25 1777573345

Ideally? Competition.

andai · 2026-04-30T17:04:12 1777568652

So like taxes except they actually help you survive?

rich_sasha · 2026-04-30T14:51:26 1777560686

The practical framing would be as two follow-up questions: what do the lenders care about? And what happens empirically when debt spirals?

If lenders do nothing, then nothing really matters - keep borrowing and let your debt grow exponentially.

In practice though, lenders, in their wisdom or folly, get spooked when debt goes up. In the Greek debt crisis, it all started with debt in the region of 130% of GDP. Rolling the debt with more debt spiralled away as lenders wanted an ever higher interest.

So US would either need to start really inflating its debt and test the investors patience, print the cash and let inflation run away, or tax and cut spending.

In the first scenario it kind of doesn't matter where the limit is - people sometimes argue about magic levels. The issue is that eventually the debt grows exponentially, so once it's out of control, it will exceed any reasonable level pretty quickly.

Can the US convince the world that their debt is special? I'm not sure. My reading is that investors are already twitchy about US debt, for other reasons for now, but higher debt levels surely won't calm them down.

So really I think the US has no better choice than to keep its debt down.

throw0101a · 2026-04-30T16:28:06 1777566486

> If lenders do nothing, then nothing really matters - keep borrowing and let your debt grow exponentially.

Lenders are currently doing nothing: that does not necessarily mean they will do nothing forever.

Because should the lender actually do something eventually, the lendee may be in a world of hurt. The borrower probably does not get to the point when lenders start doing something.

ndsipa_pomu · 2026-05-01T08:54:36 1777625676

100% of GDP is a level of debt that seems unlikely to be paid back, so presumably most lenders aren't considering whether they'll eventually get their money back, but instead are focussed on getting their interest payments. I suppose the crunch is when alternatives become a better mix of risk/return.

Ekaros · 2026-05-01T09:05:07 1777626307

I don't think paying whole thing back has been on table for decent while. The question really comes down to is the return from interest sufficient compared to other options, will the price of underlying asset keeps it value at par. Meaning can you offload it before maturation and not lose. And finally will at maturation refinance be possible.

With money printing or some FED operations I doubt there will be default on principals. It might happen if sufficient political pressure is in place though unlikely. So in the end risk is spiking rates and inflation being foreseen. No point investing on losing bet.

ndsipa_pomu · 2026-05-01T09:27:53 1777627673

Yeah, there's a constant stream of new investments and paying back of some debts, so as long as some investors are interested, the full repayment never needs to happen.

Of course, as the risk increases, investors will want higher returns to consider it a good bet as otherwise the debt repayments will start to overshadow the new investments.

bombcar · 2026-04-30T15:57:37 1777564657

Everything I've ever seen shows that problems in these areas happen very slowly, and then all at once.

Previous similar (but totally different) situations ended in wars.

rich_sasha · 2026-04-30T14:41:42 1777560102

To make matters worse, when inflation spikes, that's kind of when people need the money. Good luck radically increasing taxes in a 10% inflation shock.

eggprices · 2026-04-30T15:16:19 1777562179

People not being able to afford things is the point of raising taxes to curb inflation. When people can't afford things they don't buy things and that's what lowers inflation.

rich_sasha · 2026-04-30T15:52:15 1777564335

But there's a time mismatch, at least at typical government action timeframes. Quarterly inflation looks high, meaning people are already out of pocket. Tax goes up at the same price levels, making people more out of pocket. Inflation reduces, hopefully, but that doesn't mean prices go down - people remain out of pocket. Then, you hope, wages catch up, but that whole cycle can easily take a year.

Elections are on average 2-3 years away. Midterms in USofA 1 year away.

nostrademons · 2026-04-30T17:32:44 1777570364

The point of economics is to give people what they can have, not to give them what they want. High inflation in a MMT context means that the economy as a whole wants more than it can have. The reason for the inflation is that people are bidding against each other for scarce goods; you've injected more means to pay than exists means to produce. The way you cure it [1] is by reducing demand, which you do by decreasing the means to pay. MMT proposes doing this by increasing taxes; monetarism proposes doing it by increasing interest rates. But in both cases, the whole mechanism for solving the problem is people going without things that they want, which will almost always be unpopular.

[1] When you can't increase production capacity, which in a macro full-employment context means increasing productivity, which is outside the scope of MMT or most other schools of macroeconomics.

rich_sasha · 2026-04-30T18:38:06 1777574286

I get that (at least the theory, as with UBI, I'm not convinced). My point is rather that this is like treating broken bones with a hammer. Maybe it works, but the pain of it might well be unbearable.

eggprices · 2026-05-01T16:57:38 1777654658

Do you have a better idea?

projektfu · 2026-04-30T15:22:32 1777562552

It's easier to put them out of work. That's what we do here.

rich_sasha · 2026-04-29T15:24:19 1777476259

My kiddos have had low but positive screen time and knew the alphabet quite a bit earlier.

My personal impression is that while there's deffo stuff kids shouldn't watch, the thing that matters is what the kids do apart from TV. If it's nothing, or insufficient, it will be terrible. If as well as screens kids get plenty of high quality attention, the outcome will be good.

You could argue, and I'd struggle to disagree, that less screen time is always good. But there's tradeoffs in this optimisation. Parental attention and energy are also finite - unless you're super rich, have 3 nannies, a chef and not working. At some level, giving the poor overworked parent a break by sticking the child in front of a screen for a bit might mean the parent has more energy to do something worthwhile with the kiddos afterwards.

There's a nice statistical experiment in it no doubt - child outcome as function of screen time, high quality time and "fend for yourself" time, controlled by how much energy the parents have - will the coefficient on screen time be negative? Merely zero? Maybe even positive, just smaller than the other ones?

But good luck getting the data, never mind randomisation.

rich_sasha · 2026-04-29T11:30:50 1777462250

US sells a lot of other things to Europe that Europe doesn't have to buy. That includes tech. I'm not looking forward to the ensuing trade war but it's not a one way street by any means.

rich_sasha · 2026-04-27T06:03:01 1777269781

A fun little effect is that average speed is time-averaged not distance-averaged. So when you go slower, you lose doubly - lower speed to average and over a longer time (higher weight). Hence one of the reasons why putting more energy into the harder bits is actually optimal.

rich_sasha · 2026-04-25T10:02:03 1777111323

Im amazed how many commenters assume the questions assume a single required answer. Is this how universities work where you studied? By the standard HN demographics, I'll assume that is mostly the US.

Having studied in the UK, clearly the point is to elicit a well-thought through argument. The "answer" almost doesn't matter at all. A boring but "correct" argument is easily beaten by a novel one, even (or especially) if it is controversial, flippant or even somewhat ridiculous.

Of course there is a limit, if you straight-faced start promoting killing people, or worse still, Oxbridge academics, that won't fly. But I'd say that limit is quite far.

There are of course 2nd order effects too, as in "I don't reject this argument because it offends me but because it is poor".

Aurornis · 2026-04-25T17:31:32 1777138292

The responses reflect people from engineering backgrounds who are unfamiliar with this type of exam, not an American versus UK thing.

In engineering there isn’t room for creative and controversial answers when you’re asked to solve an exam problem.

It is rather fitting to see some try to turn this into another chance to stereotype Americans rather than realizing the obvious explanation that this is a website with a global audience that is biased toward software and engineering.

rich_sasha · 2026-04-27T14:55:43 1777301743

> It is rather fitting to see some try to turn this into another chance to stereotype Americans

It seems your misunderstanding of my comment is reinforcing some stereotypes you hold about how people see Americans.

jfengel · 2026-04-27T18:04:23 1777313063

Is the job to produce one essay answering all three questions? (Or rather, two essays with three questions each.)

It would be easier to weave some topics together than others. I'd expect them to get a fair number of papers with identical choices for questions, if that synthesis is part of the grade.

juggina · 2026-04-25T11:03:25 1777115005

[flagged]

ben_w · 2026-04-27T07:17:34 1777274254

> "well-thought through". Imprisonment is even a possibility.

Nonsense. Prison didn't even happen for poorly-thought out national newspaper columns like where Katie Hopkins in The Sun called migrants "cockroaches".

It got the UN Human Rights Commissioner involved, still no prison: https://www.ein.org.uk/news/un-human-rights-commissioner-cal...

Not only not prison, even the press regulator said it wasn't their job to do anything about: https://www.theguardian.com/media/2016/jul/28/katie-hopkins-...

rich_sasha · 2026-04-23T16:01:31 1776960091

Pretty cool! I have a half baked version of something similar :)

Can you use it also as a lightweight Kafka - persistent message stream? With semantics like, replay all messages (historical+real time) from some timestamp for some topics?

As with pub/sub, you can reproduce this with some polling etc but as you say, that's not optimal.

russellthehippo · 2026-04-23T20:26:19 1776975979

Absolutely! That’s the durable pubsub angle for sure.

rich_sasha · 2026-04-23T09:25:16 1776936316

It's going to happen and at some level I'd rather war casualties were measured in robots rather than people.

My concern is the cottage industry of integrating guns with half baked AI at the lowest cost. And probably vibe coded too.

The companies don't care - a sale is a sale. MoD maybe doesn't care - 90% accuracy and less human casualties on their own side are a win. Governments want to save money and by the time they find out the robots go rogue, it will be too late to do anything about it.

spwa4 · 2026-04-23T09:44:25 1776937465

The problem is always the same. It's not just MoD (is it MoW now?) that will have access to this.

YoloV8 + optical flow works fine on an esp32. You want to give a drone rough coordinates for a refinery and hit something in it, like a storage tank? That'll work. This means, give it 5 years, relatively small groups will have access to it. This cannot be stopped.

The only real answer is to work to have groups that you can trust to have access to this first.

rich_sasha · 2026-04-23T13:24:59 1776950699

Sadly, building an AI that analyses camera imagery and aims at humans, from scratch, is these days almost an intern project. It's not really something you can control or ban, the way you can control, dunno, uranium enrichment.

Integrating it with a robot and sticking a gun on it, thankfully, requires a bit more know-how.

squigz · 2026-04-23T10:09:28 1776938968

> The only real answer is to work to have groups that you can trust to have access to this first.

How will this help exactly?

spwa4 · 2026-04-23T15:48:48 1776959328

1) they'll know it exists in the first place

2) they can figure out a plan for when it happens

3) they can see if any countermeasures would be effective

4) they can figure out what to look for and find those weapons before they're fired

cfr. nuclear deterrence, right. There is "nothing" the US can do about other nations enriching uranium and making bombs, other than bombing those countries. The US can't change the laws of physics, follow the right formula and it'll work. However the US can figure out exactly what to look for to either prevent it from happening through intervention or at the very least get some warning before it's used ...

galkk · 2026-04-23T15:27:35 1776958055

Don you know?

The world peace and harmony will be achieved when all the good guys will gather together and kill all the bad guys.

IX-103 · 2026-04-23T13:03:26 1776949406

I can't wait for the day that killing a human-any human-is considered a war crime.

0xdeadbeefbabe · 2026-04-23T17:06:03 1776963963

I can't wait for the adds for war crime defense attorneys.

phtrivier · 2026-04-23T14:25:01 1776954301

And then it will be just another war crime committed daily conflicts, and nothing will happen because there is no world police ?

Ask Ukrainians, Lebanese, Gazaoui, Somalilanders, or even Iranians for that matters - that may not make a big difference to today...

What I would love to see is a local government suing an arms producer for the efficacy of their weapons. (Or even funnier, the owner of a home destroyed by a drone, suiving the GPS company.)

We all know that the only things people in suits are really afraid of, more than hell, is a bad Q4 report and an expensive lawsuit.