Counterpoint: If humans shipped perfect products they would no longer havejobs. The majority of time spent in an organization is fixing problems humans caused. For good reasons and bad excuses. We are not machines.
What we, collectively as a species are building now with AI is a mirror that reflects the failures and successes we contributed to.
No engineer here has a perfect record. No senior or principal either. We make a ton of mistakes that are rarely written about.
This is an opportunity for the ones that assume they have mastered the craft to put up or shut up. Anyone can write a blog with or without AI.
Put your skills to work and implement the system that solves the problem you lament. Otherwise, get off my lawn.
Its another voice screaming into the void without offering a solution. The solution is not to build a faster horse. It is not to reminisce about the past. That ship sailed.
Fix the problem. It's the 100th blog repeating the same thing we've read for two years. Nothing was accomplished here except wasting time on the obvious to pat yourself on the back.
A lot of time is being wasted writing blogs raising red flags.
I think it’s worth recognizing that people’s issues with LLMs isn’t that they make mistakes. And I think hammering the argument that humans also make mistakes indicates a bit of a disconnect with the more common reasons there is frustration with LLM use.
Ultimately I think people find it frustrating because many of us have spent years refining our communication so that it is deliberate and precise. LLMs essentially represent a layer of indirection to both of those goals. If I prepare some communication (email, code, a blog post, etc) and try to use an LLM more actively, I find at best I end up with something that more or less captures what I probably was going to communicate but doesn’t quite feel like an extension of my own thoughts as much as an slightly blurred approximation.
I think this also explains to some degree why it seems folks who were never particularly critical of their own communication have a hard time comprehending why anyone could be upset about this.
There is of course the flip side where now when receiving communication that I have to attempt to deduce if I’m reading a 5 paragraph, meticulously formatted email (or 200 line, meticulously tested function) because whoever sent it was too lazy to more concisely write 2-3 well thought out sentences (or make a 15-line diff to an existing function). And of course the answer here for the AI pragmatist is that I should consider having an AI summarize these extensive communications back down to an easily digestible 2-3 sentence summary (or employ an AI to do code review for me).
For those that value precise communications, this experience is pretty exhausting.
You won't ship a perfect product even if you make 0 mistakes. Software maintenance is adapting the product based on feedback from the outside world which you could never get during development.
We can because the reality is that America has led in AI since the beginning and has had the best frontier models. It's not like some other country held the top spot for any given period of time. No one in Europe or China. I'd give it the benefit of the doubt if there was precedent. But the only logical position to take is the lead is widening and while most AI's will go over some threshold where it is good enough for most people, the actual frontier will remain firmly in American soil.
A junior tinkering in their garage in domains they have little experience executed a flawed test and decided to call it a benchmark. It's extremely common nowadays because words dont mean anything anymore. The forums that used to be filled with technical people doing real work are now filled with the masses of vibe researchers doing this kind of stuff. This is what happens when anything goes over some popularity threshold.
HN is the last bastion of serious inquiry these days. But its not immune as OPs comment proves.
You're right, I've certainly been a bit presumptuous to call this'a benchmark'. It is indeed a flawed test. Yet,It's been giving me the occasion to try some open source models and for my workflow, some of them are incredibly competitive with sota closed source models.
Aside from the fabricated drama and the trend chasing, OpenAI still has the best overall model and API service. Anthropic is really good, no doubt. But gpt-5.4 is a better model than even Opus, even if its a marginal advantage. I use both.
I dunno, my experience mirrors the parent posters: we use opus for all our coding, but gpt 5.4 for all of our enterprise agentic work via api (much bigger amount of tokens). it just seems to be more optimized for this.
That is just idealism. Being "open" doesnt get you any advantage in the real world. You're not going to meaningfully compete in the new economy using "lesser" models. The economy does not care about principles or ethics. No one is going to build a long term business that provides actual value on open models. They can try. They can hype. And they can swindle and grift and scalp some profit before they become irrelevant. But it will not last.
Why? Because what was built with an open model can be sneezed into existence by a frontier model ran via first party API with the best practice configurations the providers publish in usage guides that no one seems to know exist.
The difference between the best frontier model (gpt-5.4-xhigh or opus 4.6) and the best open model is vast.
But that is only obvious when your use case is actually pushing the frontier.
If you're building a crud app, or the modern equivalent of a TODO app, even a lemon can produce that nowadays so you will assume open has caught up to closed because your use case never required frontier intelligence.
A model with open weights gives you a huge advantage in the real world.
You can run it on your own hardware, with perfectly predictable costs and predictable quality, without having to worry about how many tokens you use, or whether your subscription limits will be reached in the most inconvenient moment, forcing you to wait until they will be reset, or whether the token price will be increased, or your subscription limits will be decreased, or whether your AI provider will switch the model with a worse one, and so on.
Moreover, no matter how good a "frontier model" may be, it can still produce worse results than a worse model when the programmer who manages it does not also have "frontier intelligence". When liberated of the constraints of a paid API, you may be able to use an AI coding assistant in much more efficient ways, exactly like when the time-sharing access to powerful mainframes has been replaced with the unconstrained use of personal computers.
When I was very young I have passed through the transition from using remotely a mainframe to using my own computer. I certainly do not want to return to that straitjacket style of work.
The vision has been that the open and/or small models, while 8-16 months behind, would eventually reach sufficient capabilities. In this vision, not only do we have freedom of compute, we also get less electricity usage. I suspect long-term the frontier mega models will mainly be used for distillation, like we see from Gemini 3 to Gemma 4.
Can't we simple parse and remove any style="display: none;", aria-hidden="true", and tabindex="1" attributes before the text is processed and get around this trick? What am I missing?
If you do that and don't follow robots.txt, you are blocked. If you do that and follow robots.txt, fine. That's all we wanted you to do anyway. Just follow the instructions that well-behaved scrapers are meant to follow.
Just have the link visible, but css it so that its either small as hell, or just off screen. Google / bots will follow it, real peopple will never see it.
That solution can be recreated by a skilled AI boosted senior platform engineer in a few days and parity achieved in a few weeks. Nothing of value was lost.
What we, collectively as a species are building now with AI is a mirror that reflects the failures and successes we contributed to.
No engineer here has a perfect record. No senior or principal either. We make a ton of mistakes that are rarely written about.
This is an opportunity for the ones that assume they have mastered the craft to put up or shut up. Anyone can write a blog with or without AI.
Put your skills to work and implement the system that solves the problem you lament. Otherwise, get off my lawn.
Its another voice screaming into the void without offering a solution. The solution is not to build a faster horse. It is not to reminisce about the past. That ship sailed.
Fix the problem. It's the 100th blog repeating the same thing we've read for two years. Nothing was accomplished here except wasting time on the obvious to pat yourself on the back.
A lot of time is being wasted writing blogs raising red flags.
That's the easy part.
reply