This take is always bizarre to me. You're not talking about the internet, you're talking about the websites you choose to use. There are alternatives for every single website/service that you don't like. They're often exactly like the Internet of yore in that they're not as streamlined, niche and have less people using it (these are aspects of the "fun" internet that people forget). The internet is a bunch of networked servers, not the handful of sites you feel like you're stuck using for some reason.
> This take is always bizarre to me. You're not talking about the internet, you're talking about the websites you choose to use. There are alternatives for every single website/service that you don't like.
Yeah, the problem is that a lot of those are effectively dead, subsumed by Reddit and Facebook.
I've sometimes dug up still existing sites from the 2000s I used to visit, and the results are typically depressing. Such as:
* Site still exists, but is terribly broken. Doesn't render, uses now incompatible SSL, or something. It's a forgotten server in somebody's closet, still chugging, but not being maintained, so whatever remains will probably vanish whenever the disk/PSU/etc fails.
* Last posts from 2015, mostly with "gee, it's kind of dead in here, anyone still around?" comments at the end of threads.
* Discussion is down to 5 people that post once a month, and there's also a thread with obituaries for past well known members.
Indeed. I was trying to sell a loft bed a couple years ago and Craigslist is essentially dead for that sort of thing now, killed by Facebook (I deleted my account in 2021 and I have to say that eschewing that corner of the internet has been a net positive for my mental health). The only replies I got were obvious scams (“I love this! I’ll pay you $100 more than you’re asking for it!”)
Some forums are still alive, although not with the vigor that they had twenty years ago (talkbass.com is one that springs to mind).
I maintain a blog, but I doubt I have many readers (or any). I made a deliberate choice to not put any sort of analytics on the site so that I won’t be tempted to obsess about whether anyone actually visits. Some of the individual blogs that I know don’t get much readership that I read via RSS, I make a point of commenting on the rare posts to encourage the authors to write more. It doesn’t seem to make much difference although I’m sure they appreciate the positive feedback.
A lot of the delightful weirdness is gone. All the tilde sites with hand-made HTML and lots of flashing gifs and blink tags may have been tacky, but they were fun. I don’t get the kind of pleasure from most websites that I did back in the days of worst of the net which often surfaced delightful strange things that were completely unfiltered.
It's almost preferable for sites to die than for them to be captured by the various ideological extremes that it seems necessary for them to subscribe to these days.
the handful of sites you feel like you're stuck using for some reason
Billions have been spent building walls around niche and small sites to funnel people into major platforms. Pretending this ad/discoverability infrastructure doesn't exist is very naive.
All those other markets are vastly bigger in scale and trade in trillions on a daily basis so it's an irrelevant comparison. Also the expectation is that the majority of that 25% are insiders. So you're comparing catching fish in a barrel where 25% of the participants hooked the fish prior to you getting a turn vs fishing in the open ocean (so 5% is pretty good with additional voting rights/tax benefits).
The reason is because one scenario just requires your imagination to facilitate a reality that currently doesn't exist (Doctor AI) vs actual experience which is messier and has more details than a story about the future.
People are starting to catch on to the AI scare mongering, let the quantum computer scare mongering begin. We should probably start giving these companies lots of money lest other countries beat us to it.
I don't think you even need to go that far. Just refute the charges with your credit card. Very high likelihood of a successful refund since they already acknowledged their error in writing.
There's a fundamental power imbalance: if you do this to any service, they will likely ban your account. So the monetary reward has to be enough to merit moving all your data and workflows off them in advance and never using them again.
I naively disputed Steam not honouring a refund (it was for about 0.5% of what I've spent with them up to that point), a couple of £pound at most. I'd paid by PayPal and as Steam refused to abide by UK law (Consumer Rights Act says broken stuff has to be fixed or refunded), I raised the issue with PayPal. I expected Steam would refund me, instead they did not dispute that they'd unlawfully failed to refund me, so PayPal - Steam's provider - cancelled the charge.
In response, Steam 'limited' my Steam account - effectively closing it temporarily. Now it's limited so they won't use PayPal to sell me anything now, so I haven't bought anything from them since [I have cashed in CS skins, and used that cash to 'buy' games].
It was an interesting lesson in 'might is right'. PayPal were able to refund the transaction because Steam want them and had no argument against the refund. Steam were able to cut me off because this appears to be a loophole in UK consumer law - sellers who break the law can just dismiss buyers who ask for refunds. Lesson learnt.
From Steam's point of view, they pissed off a customer and probably burnt 30mins-1hour of support time in answering my requests, way more than the cost of the refund. But selling games, which I later found Steam knew was broken, and then not refunding because I had the tenacity to try and fix it - meaning that the game sat open for longer than their auto-refund time - is not on imo. Petty of me for sure. Crap of Steam too.
Why should they? Freedom of association is key Western principle. Steam chose not to associate with them anymore. If the user don't like it they should have sued them in court instead.
If I report my employer for an OSHA violation and they retaliate that's illegal. Of course such laws hardly ever stopped anyone so it's a very bad idea to depend on it but the principle is certainly there.
I think there's a line between retaliating against someone, and refusing to help them in the future.
I do not believe that refusing to do business with an individual, where your business provides a non-life-critical service, is retaliation. A water company refusing to provide water to your home would be problematic. A luxury handbag store refusing to allow you to purchase more luxury handbags would not.
Image as a hypothetical that a customer goes into your store for the sole purpose of wasting your support staff's time. They are not going to make a purchase. They are also not directly committing a crime. They are just hurting your business for no particular reason.
Should you, as a business owner, be forced to allow them to continue to be on your property?
I think the ideal answer is yes for critical public spaces, and no for ordinary retail.
Steam clearly falls into the latter category and should be free to ban customers for any reason save discrimination against protected classes.
> I do not believe that refusing to do business with an individual, where your business provides a non-life-critical service, is retaliation.
This isn't accurate. It might not threaten your life or pose any great hurdle to overcome but retaliation has nothing to do with that. If they did it in response to an action you took not to solve a problem but instead out of spite or to otherwise get back at you then it is retaliation.
That isn't the same as refusing to do business with someone who isn't productive to associate with. The two are entirely separate categories.
Of course any business (including Steam) will attempt to argue that an instance of the former is actually the latter, and a difficult customer will attempt to argue that an instance of the latter is actually the former. Regardless, Steam (and most other businesses) behave in a clearly retaliatory manner regarding chargebacks. In cases where the company failing to respect the individual's legal rights is what led to the chargeback that shouldn't be permissible.
To frame it in the terms you used, any otherwise legal activity stemming directly from the company having violated an individual's legal rights should be treated in the same way that a protected class is.
I think someone exercising their legal rights, such as their right to enter a business open to the public and their right to free speech inside that establishment, in a way that harms the business should be something a business can "punish" by refusing to do business with that individual.
I do not think it would be good public policy to prohibit this. I also don't believe, in the United States at least, this conduct is currently legally prohibited.
I previously gave an example of a situation in which I think the correct resolution is for the business to, as you put it, retaliate against someone exercising their legal rights.
A second example of the same type of retaliation is a business denying future sales to an individual who repeatedly purchases and then returns physical merchandise. I think blacklisting that individual is both morally and legally sound.
For the record, I think the definition of "retaliation" needs to include a desire to harm the other party. If your only desire is self-protection, I do not believe it qualifies as retaliation.
A limited account is allowed access to all prior purchases. It can even download those purchases again (incurring costs on Valve's part without paying anything).
I don't believe anything was rescinded in the situation being discussed; Valve just prevented the user from continuing to use their community/marketplace services. This makes sense because they were put into the bucket containing fraudulent or abusive user accounts.
Are you saying it's fine, iyo, for companies to use market position to work around consumer protection laws? I don't feel like Valve/Steam should be allowed to sell games they know are broken and then refuse refunds (they could also fix them!).
>can even download those
So what you're saying is I should find a fat juicy data pipe somewhere and download stuff from Steam until I fill /dev/null... ;oP
Seriously the. 15 minutes or so of support time will have cost more than the game did in this case, but it really is the principle. Stealing lots of small amounts from lots of people is still criminally dishonest.
I grew up when we owned game systems and the games, and they couldn't phone home to see if I still had permission to play. I was recently considering installing Steam but this kind of thing gave me pause. I couldn't invest any money in something that could have the rug pulled out from at any time.
No that's not how that works. This stuff is a non-event. You refute the balance, they have a period where they can defend their claim (8/10 times they don't), you get your money. This is a very basic transaction that happens every single day to every major company. "Banning" you costs more than your refund and has additional legal risks.
I know being helpless against tech companies is a major trope in these comments but this is basic everyday transaction stuff. Plan on being on hold with your credit card company but not being a central target for a trillion dollar AI startup because you asked for a $100 refund.
I can tell you first-hand (from the side doing the banning) that you’re wrong.
You’re not going to get an email telling you that you’re banned. Your payments will just start being declined, and they won’t be able to help you. They’ll suggest you try another card. That won’t work either.
Maxmind includes a “chargeback risk score” in the api response for everybody who uses their minfraud service. They’re not doing that because companies don’t use it.
A scammer went to the trouble of creating an entirely different ebay account registered to literally "pirate[xxxxx]@..." using my same name. Then they found a tracking number to my same zip code. Then they bought (fake) items from a second scammer account using my stolen credit card to "wash" the money.
When I filed a chargeback ebay came back with a fat stack of paperwork and absolutely fucking buried me. They had the tracking number to "me", they had "me", they had the invoices to "me", they had my credit card, and their lengthy report had all the right words in all the right places and dressed up in all the right banking mumbo-jumbo and they convinced my bank so well that my bank suggested I was a fraudster myself and then my bank closed my accounts. I couldn't even sue them because at that precise time I moved cross country and couldn't get to the court to sue them in. I ended up eating the better part of $1000.
Ebay is absolutely fucking savage at chargebacks. They appear to have people trained specifically to bury in paperwork anyone that tries to challenge fraudulent charges.
I'm sorry that happened to you, but that's a <1% event. I don't know why I'm getting push back for suggesting a simple credit card refute request. It's almost as if the people responding are suggesting not doing anything is the way to go or you'll get banned--which of course, regardless of how many one off stories people may have--is a ridiculous assumption.
Good luck surviving a real court case against any faang company. They could bleed any individuals bank account with delays and forcing you to pay hundreds of thousands in lawyers retainer fees for years.
The guy who invented the windshield wipers went bankrupt and had to wait something like 20 years for his case. He won but it probably wasn't worth it.
I would be that would be highly unlikely to succeed. I have tried to dispute charges with my credit card for similar issues, and they always side with the business. I don’t think I they even check.
Probably, but if a business cheated me out of that much I wouldn't be doing business with them again regardless so at least to me it would make no difference.
I think the big secret is that AI is just software. In the same way that a financial firm doesn't all of sudden make a bunch of money because Microsoft shipped an update to Excel, AI is inert without intention. If there's any major successes in AI output it's because a person got it to do that. Claude Code is great, but it will also wipe out a database even though it's instructed not to (I can confirm from experience). The idea that there's some secret innovation that will come out any minute doesn't change the fact that it's software that requires human interaction to work.
Yes, and it has been said since day one of LLMs that all we need to do is keep things that way - no action without human intervention. Just like it was said that you should never grant AI direct access to change your production systems. But the stories of people who have done exactly that and had their systems damaged and deleted show that people aren't trying to even keep such basic safety nets in place.
AI is getting strong enough that if people give some general direction as well as access to production systems of any kind, things can go badly. It is not true that all implementations of agentic AI requires human intervention for all action.
My cynical rule of thumb: By default we should imagine LLMs like javascript logic offloaded into a stranger's web-browser.
The risks are similar: No prompts/data that go in can reliably be kept secret; A sufficiently-motivated stranger can have it send back completely arbitrary results; Some of those results may trigger very bad things depending on how you use or even just display them on your own end.
P.S. This conceptual shortcut doesn't quite capture the dangers of poison data, which could sabotage all instances even when they happen to be hosted by honorable strangers.
The problem is, out of ten companies who take this approach, nine will indeed destroy themselves and one will end up with a trillion-dollar market cap. It will outcompete hundreds of companies who stuck with more conservative approaches. Everybody will want to emulate company #10, because "it obviously works."
I don't see any stabilizing influences on the horizon, given how much cash is sloshing around in the economy looking for a place to land. Things are going to get weird, stupid, and chaotic, not necessarily in that order.
On a more serious note, they were mostly f*cked by their paas provider imo. Claude will always do dumb shit. Especially if you tell it to not do something... By doing so you generally increase the likelihood of it doing it.
It's even obvious why if you think about it, the pattern of "you had one job, but you failed" or "only this can't happen, it happened!" And all it's other forms is all over literature, online content etc.
But their PaaS provider not scoping permissions properly is the root cause, all things considered. While Claude did cause this issue there, something else would've happened eventually otherwise.
Also, some folks seem to be forgetting the virtues of boring, time-tested platforms & technologies in their rush to embrace the new & shiny & vibe-***ed. & also forgetting to thoroughly read documentation. It’s not terribly surprising to me that an “AI-first” infrastructure company might make these sorts of questionable design decisions.
The problem is that destruction isn't contained to the company. If an AI agent exposes all company data and that includes PII or health information, that could have an impact on a large number of people.
PII breaches have been pretty consistently a problem for the last several decades, predating modern LLMs.
So that is a structural problem with their data and security management and operations, totally independent of the architecture for doing large scale token inference.
Remember that these models are getting better; this means they get trusted with increasingly more important things by the time an error explodes in someone's face.
It would be very bad if the thing which explodes is something you value which was handed off to an AI by someone who incorrectly thought it safe.
AI companies which don't openly report that their AI can make mistakes are being dishonest, and that dishonesty would make this normalization of deviance even more prevelant than it already is.
That’s not a technical/AI problem in any sense, that’s a social problem in organizing and coordinating control structures
Further, it’s only a problem to the extent that the downsides or risks are not accounted for which again… is a social problem not a technological problem
This isn’t a problem for organizations that have well aligned incentives across their workflows
A well organized company that has solid incentives is not going to diminish their own capacity by prematurely deploying a technology that is not capable of actually improving
The issue is that 99% of the organizations that people deal with have entirely orthogonal incentives to them. They are then attributing the pain in dealing with that organization to the technology rather than the misaligned incentives
> That’s not a technical/AI problem in any sense, that’s a social problem in organizing and coordinating control structures
As @TeMPOraL here likes to point out, it can be genuinely fruitful to anthropomorphise AI. I only agree with partially, that this is true for *some* of the failure modes.
> A well organized company that has solid incentives is not going to diminish their own capacity by prematurely deploying a technology that is not capable of actually improving
Sure, but society as a whole doesn't have the right solid incentives to make sure that companies have the right solid incentives to do this. We can tell this quite easily by all the stupid things that get done.
> The issue is that 99% of the organizations that people deal with have entirely orthogonal incentives to them.
This is also fundamentally the AI alignment problem, that all AI are trained on some fitness function which is a proxy for what the trainer wanted, which is a proxy for what incentives their boss gave them, which is a proxy that repeats up to the owners in a capitalist society, which is a proxy for economic growth, which is a proxy for votes in a democracy, which is a proxy for good in a democracy.
I wrote a whole ass paper at the end of 2022 demonstrating that unless we fix society we will deterministically create anti-social AGI because humans do not generate pro-social data.
LLM models are a distribution. Unlike a python script or turning machine, a LLM model is capable of generating any series of tokens. Developers need stop reasoning about LLM agents as deterministic and to start to think about agents in terms of Monte Carlo and Las Vegas algorithms. It isn't enough to have an agents, it also requires a cheap verifier.
If I was a Ph.D. student today, I'd probably do a thesis on cheap verifiers for LLM agents. Since LLM agents are not reliable and therefore not very useful without it, that is a trillion dollar problem.
Once a developer groks that concept, the agents stop being scary and the potential is large.
If you told a programmer 30 years ago that someday we'd switch from a deterministic to nondeterministic paradigm for programming computers, they'd ask if we'd put lead back in the drinking water.
Not even a few years ago if you introduced a component to a system that would result in non-deterministic output... Hell, a single function... You would be named and shamed for it because it went against every principle you should be learning as a novice writer of software.
I have used the LLM tools, and I see the real-world potential for these things. But how it's all being sold and applied now: it's upside down.
It has always been non-deterministic but we relied on low level engineers who knew the dark magicks to keep the horrors at bay.
Bit flips in memory are super common. Even CPUs sometimes output the wrong answer for calculations because of random chance. Network errors are common, at scale you'll see data corruption across a LAN often enough that you'll quickly implement application level retries because somehow the network level stuff still lets errors through.
Some memory chips are slightly out of timing spec. This manifests itself as random crashes, maybe one every few weeks. You need really damn good telemetry to even figure out what is going on.
Compilers do indeed have bugs. Native developers working in old hairy code bases will confirm, often with stories of weeks spent debugging what the hell was going on before someone figured out the compiler was outputting incorrect code.
It is just that the randomness has been so rare, or the effects so minor, that it has all been, mostly, an inconvenience. It worries people working in aviation or medical equipment, but otherwise people accept the need for an occasional reboot or they don't worry about a few pixels in a rendered frame being the wrong color.
LLMs are uncertainty amplifiers. Accept a lot of randomness and in return you get a tool that was pure sci-fi bullshit 10 years ago. Hell when reading science fiction now days I am literally going "well we have that now, and that, oh yeah we got that working, and I think I just saw a paper on that last week."
With the old way of doing things you could spend energy to reduce errors, and balance that against the entropy of you environment/new features/whatever at a rate appropriate for your problem.
It's not obvious if that's the case with llm based development. Of course you could 'use llms until things get crazy then stop' but that doesn't seem part of the zeitgeist.
> It's not obvious if that's the case with llm based development. Of course you could 'use llms until things get crazy then stop' but that doesn't seem part of the zeitgeist.
Harnesses are coming online now that are designed to reduce failure rates and improve code quality. Systems that designate sub-agents that handle specific tasks, that put quality gates in place, that enforce code quality checks.
One system I saw (sadly not open source yet) spends ~70% of tokens on review and quality. I'll admit the current business model of Anthropic/OpenAI would be very unfriendly to that way of working. There is going to be some conflict popping up there. Maybe open weight models will save us, maybe not.
If Moore's Law had iterated once or twice more we wouldn't be having this conversation. We'd all be running open weight models on our 64GB+ VRAM video cards at home and most of these discussions would be moot. AI company valuations would be a fraction of what they are.
> It has always been non-deterministic but we relied on low level engineers who knew the dark magicks to keep the horrors at bay.
This is a disingenuous comparison.
First of all, what you're talking about is nondeterminism at the hardware level, subverting the software, which is, on an ideal/theoretical computer, fully deterministic (except in ways that we specifically tell it not to be, through the use of PRNGs or real entropy sources).
Second of all, the frequency with which traditional programs are nondeterministic in this manner is multiple orders of magnitude less than the frequency of nondeterminism in LLMs. (Frankly, I'd put that latter number at 1.)
This is part of a class of bullshit and weaselly replies that I've seen attempting to defend LLMs over the years, where the LLMs' fundamental characteristics are downplayed because whatever they're being compared to occasionally exhibits some similar behavior—regardless of the fact that it's less frequent, more predictable, and more easily mitigated.
> First of all, what you're talking about is nondeterminism at the hardware level, subverting the software, which is, on an ideal/theoretical computer, fully deterministic (except in ways that we specifically tell it not to be, through the use of PRNGs or real entropy sources).
Malloc and free were never deterministic outside of the simplest systems.
The second we accepted OS preemption we gave up deterministic performance.
Good teams freeze their build tools at a specific version because even minor revs of compilers can change behavior.
I've used way too many schema generator tools that I'd describe as "wishfully deterministic".
Heuristics have been used for years in computer science, resulting in surprising behavior. My point is that if we ramp up the rate of WTF we are willing to tolerate, the power of the systems we can build increases drastically.
> Second of all, the frequency with which traditional programs are nondeterministic in this manner is multiple orders of magnitude less than the frequency of nondeterminism in LLMs. (Frankly, I'd put that latter number at 1.)
Building a RAG lookup system that takes in questions from the user, looks up answers in a doc, and returns results, can be built with reliability damn near approaching 99.99%.
I have seen code generation harnesses that also dramatically reduce non-determinism of LLM generated code, but that will continue to be a hard problem.
My phone camera applies non-deterministic optimizations to images I take, and has done so for years now.
GPS is non-deterministic (noisy), we smooth over the issues. GPS routing is also iffy, but again we smooth over the issues.
The question is can useful products be made with a technology. You can shove enough guardrails on an LLM interface to make it useful. That much is clear. I derive massive value from LLMs and other transformer based systems literally everyday. From the modern speech transcription systems, that are damn near magic compare to what we had a few years back, to image recognition, to natural language interfaces to search over company documents.
If we completely discard coding agents, LLMs are still an insanely impactful technology.
Those guardrails add costs, and latency. For some scenarios that is fine, but for others it isn't. Chat bot support agents implemented by the lowest bidder don't have any attempt at guardrails. Better systems are better built.
I agree that current LLMs all suffer from the problem that the control messages are intermixed with data, that is a crappy problem that the industry has known is a bad pattern for literally decades (since the 70s, 80s?). It seems like an intractable flaw in the systems.
But that doesn't make the system unusable any more than the thousand other protocols suffering from the same flaw are unusable.
The single best example is for this discussion is Superscalar out-of-order execution which can't be used in aerospace, medical devices, and industrial control systems, or you need to guarantee that code finishes within a certain time, because technically it isn't deterministic.
Neither is stochastic gradient descent which is the cause of the LLM problem. Nor is UDP, the network protocol that powers video calls, live streaming, and online gaming.
> If I was a Ph.D. student today, I'd probably do a thesis on cheap verifiers for LLM agents. Since LLM agents are not reliable and therefore not very useful without it, that is a trillion dollar problem.
PhD thesis are for (ideally) setting up a new world standard in some research area (at the end, you build your PhD thesis out of the deep emotional shards of this completely destroyed life dream), and not for some personal self-discovery project of which you hope that it will turn you into the popular kid on the block.
That is like telling students to never do a PhD thesis on superscalar out-of-order execution, stochastic gradient descent, or UDP. I'm framing it as an analogous problem. What is missing is a cheap verification process.
> That is like telling students to never do a PhD thesis on superscalar out-of-order execution, stochastic gradient descent, or UDP.
No decent PhD advisor would let their PhD student base their PhD thesis on such well-known concepts: a doctoral study programme is a journey into something never-seen-before (with a very high likelihood of faling and shattering your life). Anything else is failure.
(Obvious exception: either he or the PhD student can convince the other one that there could be something really, really deep in, say, "superscalar out-of-order execution", "stochastic gradient descent" or UDP be found that generations of researchers overlooked, and which once discovered might necessitate rewriting all the standard textbooks about this topic).
What would a verifier even look like without having all of the same problems that the chatbot itself does? Are humans themselves not the cheap verifiers?
My observation is that the true believers really don't want to think of models as an inert pile of weights. There's some mysticism attached to imagining it's the ship's computer from Star Trek, HAL-9000 or C-3PO. A file loaded into memory and executed over is just so... _pedestrian_.
Canonically, the Star Trek computers have pretty much always been just computers, not themselves sentient because the software running on them just isn't.
I'm still not sure if HAL-9000 was supposed to be conscious or just an interesting plot device with a persona as superficial as LLMs are dismissed as today.
LLMs could definitely play the part of all three of your examples, given the flaws they showed on-screen. Could even do a decent approximation of Data (though perhaps not Lore without some jailbreaking).
Still weird that even the best of them isn't really ready to be KITT.
I think the market isn't for anyone but other businesses. We're all ants trying to understand how AI is going to eradicate the lower levels of society.
> doesn't change the fact that it's software that requires human interaction to work.
Have you ever seen Claude Code launch a subagent? You've used it, right? You've seen it launch a subagent to do work? You understand that that is, in fact, Claude Code running itself, right?
I don't think subagents are representative of anything particularly interesting on the "agents can run themselves" front.
They're tool calls. Claude Code provides a tool that lets the model say effectively:
run_in_subagent("Figure out where JWTs are created and report back")
The current frontier models are all capable of "prompting themselves" in this way, but it's really just a parlor trick to help avoid burning more tokens in the top context window.
It's a really useful parlor trick, but I don't think it tells us anything profound.
The mechanism being simple is the interesting part. If one large complex goal can be split into subgoals and the subgoals completed without you, then you need a lot fewer humans to do a lot more work.
The OP says AI requires human interaction to work. This simply isn't true. You know yourself that as agents get more reliable you can delegate more to them, including having them launch more subagents, thereby getting more work done, with fewer and fewer humans. The unlock is the Task tool, but the power comes from the smarter and smarter models actually being able to delegate hierarchical tasks well!
Wtf? A sub-agent is a tool you give an agent and say "If you need to analyze logs delegate to the logs_viewer agent" so that the context window doesn't fill up with hundreds of thousands of tokens unnecessarily. In what universe do you live in where that mechanism somehow means you need fewer humans?
Do you think this means "Build a car" can be accomplished just because an LLM can send a prompt to another LLM who reports back a response?
Does your Linux server decide what processes it should launch at what time with a theory of what will happen next in order to complete a goal you specified in natural language? If so yes, I reckon you sure have!
Claude does not have a "theory" of anything, and I'd argue applying that mental model to LLM+Tools is a major reason why Claude can delete a production database.
Well, humans also routinely accidentely delete production databases. I think at this point arguing that LLMs are just clueless automatons that have no idea what they are doing is a losing battle.
They’re not clueless they just don’t have a memory and they don’t have judgement.
They create the illusion of being able to make decisions but they are always just following a simple template.They do not consider nuance, they cannot judge between two difficult options in a real sense.
Which is why they can delete prod databases and why they cannot do expert level work
Not sure if you are being pedantic but mathematics is quite different from other fields because it is highly structured, reasoning is explicit and it contains a dense volume of high level training data. Results are able to be verified easily due to its structure.
Even then, they are most effective in assisting and are not able to produce results independently. If you have proof otherwise I would love to read up on it
I like to think of LLMs as idiot savants. Exceptional at certain tasks, but might also eat the table cloth if you stop paying attention at the wrong time.
With humans, you can kind of interview/select for a more normalized distribution of outcomes, with outliers being less probable, but not impossible.
I mean maybe it’s a losing battle today, but it is correct. So in a few years when the dust settles, we’ll probably all be using LLMs as clueless automatons that still do useful work as tools
Maybe. But probably not. It doesn't matter if it's AGI though. If those other apps and tools do simple things that are predictable, then we can be pretty sure what will happen. If those tools can modify their own configuration and create new cron jobs, it becomes much harder to say anything about what will happen.
Most of us work on software that can modify its own configuration and create new jobs. I, too, have worked in ansible and terraform.
The key break here is the lack of predictability and I think it's important that we don't get too starry eyed and accept that that might be a weakness - not a strength.
My claude has never yet launched itself from my terminal, gave itself a prompt, and then got to work. It has only ever spawned a sub-agent after I had given it a prompt. It was inert until a human got involved.
If that is software running itself, then an if statement that spawns a process conditionally is running itself.
Substance aside, I feel this comment is combative enough to be considered unhelpful. Patronizing and talking down to others convinces no one and only serves as a temporary source of emotional catharsis and a less temporary source of reputational damage.
All AI requires steering as the results begin to decohere and self-enshittify over time.
AI in the hands of an expert operator is an exoskeleton. AI left alone is a stooge.
Nobody has built an all-AI operator capable of self-direction and choices superior to a human expert. When that happens, you'd better have your debts paid and bunker stocked.
We haven't seen any signs of this yet. I'm totally open to the idea of that happening in the short term (within 5 years), but I'm pessimistic it'll happen so quickly. It seems as though there are major missing pieces of the puzzle.
For now, AI is an exoskeleton. If you don't know how to pilot it, or if you turn the autopilot on and leave it alone, you're creating a mess.
This is still an AI maximalist perspective. One expert with AI tools can outperform multiple experts without AI assistance. It's just got a much longer time horizon on us being wholly replaced.
Yes I guess there's also no such thing as stealing in torrents since the computer "learns" the data and returns it in a transcoded fashion so it's technically not a reproduction. Yes LLMs can reproduce passages from copyrighted works verbatim but that's only because it "learned" it and it's just telling you what it "knows".
The mental calisthenics required to justify this stuff must be exhausting.
> The mental calisthenics required to justify this stuff must be exhausting.
It's only exhausting if you think copyright ever reasonably settled the matter of ownership of knowledge and want to morally justify an incoherent set of outcomes that they personally favor. In practice it's primarily been a tool for the powerful party in any dispute to hammer others for disrupting their business model. I think that's pretty much the only way attempting to apply ownership semantics to knowledge or information can end up.
> Yes LLMs can reproduce passages from copyrighted works verbatim but that's only because it "learned" it and it's just telling you what it "knows".
Are you finding people that actually say this?
When it can quote something like that, it's a training error. A popular enough work gets quoted and copied by people online, and then it's not properly deduplicated. It's a very small fraction of works it can do that with, and the cleaner your data the less it happens.
I'll once again quote that stable diffusion launched with fewer weights than training images. It had some accidental memorizations, but there wasn't room for its core functionality to be memorization-based.
This is a perfect example of 'begging the question'. Arriving at a conclusion from a fact assumed as true without evidence. Your reductio does not actually demonstrate that copyright applies to LLMs, because you did not demonstrate how transcoding is comparable to inference, just that LLMs can reproduce some passages from copyrighted works. You could also produce passages from copyrighted works by generating enough random sequences of words, but no one is arguing that is comparable to transcoding. That the people who do not share this conclusion are engaging in motivated reasoning is based only on your assumption and has no logical backing, and is therefore begging the question.
AI is destroying the economic premise that has drawn so much investment into Silicon Valley. It's going from a capital light business model with network driven moats that allow market domination, to a capital heavy, high burn-rate model with the potential to not only offer ZERO moat protection but destroy the ones that already exist. Cloud infrastructure + vibe coding now make it possible to quickly replace existing apps with custom fit alternatives. Open source+cheap Chinese LLMs may not be as good as Opus but maybe good enough turns out to be good enough ( Sun Microsystems Vs. Linux is a good example). Currently AI has just as much potential destroying Silicon Valley as it does building it up.
This is some aggressive consultant fluff. Few companies have such distinctive "profit" measures. If "the financial logic is rarely examined carefully" than maybe there's a reason, since analysis like this is mostly fantastical and brittle. This is the sort of argument that is both rational and implausible. A manager might use this logic to rationalize firing an engineering team (which is mostly why guys like this get hired) but they won't use it to manage an engineering team.
reply