Hacker Newsnew | past | comments | ask | show | jobs | submit | h4ny's commentslogin

Was it ever a good test? How do you even objectively assess what a good pelican on a bike is anyway?

SVG generation is a good test because it's extremely easy to subjectively assess with visual reasoning where humans are strong. However, pelican on a bike specifically may be overused at this point.

I'm not an AI skeptic but I'm skeptical of the intent of this article. It makes great claims about agent-first engineering and tries to make a real case based on a real product, with real users, and a real team that's been growing — all without even saying what was built or showing it, just like every other AI hype article.

At the time we wrote the article we hadn’t released the product and weren’t ready to talk about it. It was an internal prototype that looked very much like the current Codex app.

So, did this internal prototype ultimately end up being used to create/influence a real product, e.g. Codex app?

Yep!

No detail about this in the article or your comment here, but the voluminous lines of code get a big call-out. Very interesting!

And this thread too is filled with users that "I also have done this or that" but bar one user, nobody followed up with any link to anything.

You have the source to everything you use in life right? You can make your own car, patrol, shampoo, grow your own food, build your own house, wire your own electricity (and generate it), can switch to having your own reserve of drinking water anytime and plumb it, etc.

Nothing against you personally but that kind of logic is getting old. I get it that you don't trust corporations but asserting it like open source projects don't do rug pulls, and like having the source because you can spin up the version you even if they screw you over means it's safe is missing the point of how we all function as a society.

The problem isn't open source or corporations to begin with or someone made the mistake of trusting someone who seemed trustworthy to begin with, and people who take the opportunity to push their own beliefs and narratives by capitalizing on emotional situations like this instead of finding constructive ways to make things better are the worst.


If I woke up one day to find some corporation had snuck in overnight and subbed out my shampoo for their newest scent without asking, then yeah I'd be looking for more reliable options for that too


The car is a better example. I'd be infuriated if my car received an OTA that made it play ads or something. I have to trust that the company won't do that (or buy a car that doesn't have OTA capability).


As others pointed out it's not the same dynamic when it's not about software.

The big picture is that agency was lost and that's not OK.


> You can make your own car, patrol, shampoo, grow your own food, build your own house, wire your own electricity (and generate it), can switch to having your own reserve of drinking water anytime and plumb it, etc.

*can* is a lot better then *do*. I would prefer that all of these processes be documented such that new sources of these products and services can be created if need be. That's really what having the source for a given piece of software is; the documentation required to reproduce it.

Imagine if the process of generating electricity was a big secret and controlled by a single company. That company would be unreasonably powerful, no?

I trust corporations enough that i do pay them to provide me with goods and services. I do not trust them enough to set them up as the only viable source for goods and services.


I think that's a generally good approach and a fantastic example of framing things professionally but also doesn't fix the core of the problem, which I see problematic if leadership of an engineering-focused company doesn't understand immediately.

Luck with your employer also plays a big part in how you approach this too.


> I'm being a bit facetious here...

Maybe just don't do that? It's never helpful in good-faith discussions and just indicates a lack of empathy and maybe a lack of understanding of the actual issue being discussed.

> So, you haven't identified any actual problems with them being on social media though.

The problems GP raised seem pretty clear to me. Could gives us some examples of what you would consider to be "actual problems" in this context?

> Just that kids are doing something new and sometimes scary...

Any sane parent wouldn't send their kids to learn to ride a bicycle on the open road and without any supervision. You'd find a park or an empty lot somewhere, let them test it out, assess their ability to deal with potential dangers and avoid harming others at the same time, and let them be on their own once they are able to give you enough confidence that they can handle themselves most of the time without your help.

The problem with today's social media for children is that that there is no direct supervision or moderation of any kind. Like many have pointed out, social media extends to things like online games as well, and the chance that you will see content that are implicitly or explicitly unsuitable for children is extremely high. Just try joining the Discord channels of guilds of any online game to see for yourself.

Not all things new and scary come with a moderate to high risk of irreparable harm.


I encourage everyone to read the definition on the home page:

> Definition: A gaming dark pattern is something that is deliberately added to a game to cause an unwanted negative experience for the player with a positive outcome for the game developer.

And also the detailed descriptions of each of the dark patterns, for example:

https://www.darkpattern.games/pattern/12/grinding.html

Quoting just the short descriptions of the dark patterns without considering the definition above is effectively mischaracterizing the intent of the website and not using the tool as intended, and all the patterns seem like they can be/are just enjoyable mechanics to many.

Some of the users reviewing games on the website seem to also miss the point (inaccurate reviews), which leads to comments like https://news.ycombinator.com/item?id=45947761#45948330.

It is increasingly often the case in predatory games that a very subtle combination of the mechanics listed make them dark patterns collectively, so it's also important to consider the patterns in groups.


Some people criticize definition of dark patterns because they can't face their addiction.


This feels like a step backwards and now people who never bothered to write proper, appropriate commit messages for others to start with can care even less.

I personally don't see what the use case of this is -- you shouldn't even be hired in the first place if you can't even describe the changes you made properly.


GGP's sentiment resonates with me. I invest a fair bit of time into LLMs to keep up on how †hings are evolving and I do throw both small and large tasks at them. I'm seeing great results with some small task but with anything that is remotely close to actual engineering I just can't get satisfactory results.

My largest project is a year old, it's full-stack JavaScript, and I consciously use patterns, structures, and diligently add documentations right from the beginning for the code base to be as LLM friendly as possible.

I see great results on refactoring with limited scope, scaffolding test cases (I still choose to write my own tests but LLMs can also generate very good tests if I explicitly point to existing tests of highly related code, such as some repository methods), documenting functions, etc. but I'm just not seeing the kind of quality that people claim that LLMs can do for them on complex tasks.

I want to believe that LLMs are actually capable of doing what at least a good junior engineer can do but I'm not seeing that in my own experience. Whenever we point out these issues we are encountering, we just basically get the "git gud" response with no practical details on what we can actually dp to get the results that people claim to be getting. Then people start blaming our lack of structures, patterns, problems with our prompts, the language, our stack, etc. when we complain about the "git gud" response being too vague. Nobody claiming to be seeing great results seems to want to do a comprehensive write-up or, better still, a stream of their entire workflow to teach others how to do actual, good engineering with LLMs on real-world problems either -- they all just want to give high level details and assert success.

On top of that, the fact that none of the people I know in engineering working in both large organizations and respectable startups that are pushing AI are seeing that kind of results naturally makes me even more skeptical of claims of success. What I'm often hearing from them are mediocre engineers thinking that they are being productive but actually just offloading the work to their colleagues through review, and nobody seems to be seeing tangible returns from using AI in their workflow but people in C-suites are pushing AI anyway.

If just about anything can be "your fault", how can anyone claiming that LLMs are great for real engineering without showing evidence be so confident that what they're claiming but not showing is actually the case.

I feel like every time I comment on anything related to your blog posts I probably came across as belligerent and get down voted but I really don't intend to.


Which model and tools are you using it that repo?


Could you elaborate on what you mean by "moral basis" in your comment?


It is in their selfish interest to push for open weights.

That's not to say they are being selfish, or to judge in any way the morality of their actions. But because of that incentive, you can't logically infer moral agency in their decision to release open-weights, IP-free CPUs, etc.


By selfish interests you mean the public good?


Leaving China aside, it's arguably immoral that our leading AI models are closed and concentrated in the hands of billionaires with questionable ethical histories (at best).


I mean China's push for open weights/source/architecture probably has more to do with them wanting legal access to markets than it does with those things being morally superior.


Of course, but that translates in a benefit for most people, even for Americans. In my case (European), I cannot but support the Chinese companies in this respect, as we would be especially in trouble if the common models are the norm.


If by being selfish they end up doing morally superior thing, then, I much prefer to go with the Chinese.

Even more so now that Trump is in command.


Not speaking for everyone but to me the problem is the normalization of bad behavior.

Some people in this thread are already interpreting that policies that allow contributions of AI-generated code means it's OK to not understand the code they write and can offload that work to the reviewers.

If you have ever had to review code that an author doesn't understand or written code that you don't understand for others to review, you should know how bad it is even without an LLM.

> Why do you care? Their sandbox their rules...

* What if it's a piece of software or dependency that I use and support? That affects me.

* What if I have to work with these people in these community? That affects me.

* What if I happen to have to mentor new software engineers who were conditioned to think that bad practices are OK? That affects me.

Things are usually less sandboxed than you think.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: