Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Unfortunately for the people mad about this, I predict the only thing they will accomplish by pressuring the rsync maintainers, is to discourage everyone else from responsibly disclosing their use of AI. You’re just going to make people disable Claude attribution on their commits to avoid drama.
 help



I never care about AI usage disclosure, because I don't believe that human produced code is necessarily better than AI produced code, unless it's someone I personally know.

People need to be responsible for code they commit and push anyways. This has never changed. Whether the code is written by hand, by their cat walking over keyboard, or by AI, is not my concern.

A project's code quality can decline for all kinds of reasons. I don't think it's productive to laser-focus on whether it's produced by AI or not. That's a distraction. If a person just want to find excuse to criticize AI, and another person wants to fight back and defend AI, sure, go for it. But that's not how you would want to assess a project's code quality.


something as simple as requiring sign-offs like the DCO maybe relevant to people who care. I do think the driveby stuff may get smaller. People dont need to get stuff upstream. I have lots of patches I am keeping downmstrea and instead have a trigger system when new packages updates drop into debian and i rebuild the package with my patches on top using quill. Other systems like gentoo basically always supported this flow.

So - why bother forking or going upstream? maybe its selfish. I think publishing the patches are cool but I feel less of a need to force other people into doing what I want or even writing every possible configuration or solution. I just hack it for me


> People need to be responsible for code they commit and push anyways.

Well the GPL (which rsync is licensed under) says: "This program comes with ABSOLUTELY NO WARRANTY" so actually nobody is responsible for anything.


I think they meant in terms of karma/reputation for the individual, and the project. Traditionally open source is heavily based on these social currencies.

Nobody is suing the maintainer for support here so this is completely irrelevant.

> You’re just going to make people disable Claude attribution on their commits to avoid drama.

People should be doing this regardless of drama. No reason to provide free advertising for trillion dollar corporations. Generated-by trailers are only relevant when contributing to third party projects, in that case disclosure is polite.


The value of the Claude attribution is that you can tell at a glance who used AI.

I don't care about the advertising angle. We all know Claude by now. I want some indicator that AI was used.


At my employer, if AI is not used, it shows up on your performance report and you’ll be told if you don’t start using it, you will be dismissed. I work at a medium sized successful YC-backed SaaS. So here, the attribution is meaningless - they look at your Bedrock and LLM API calls as well as Claude Code history.

If the company policy is to have everyone using it then everyone is going to assume you're using it.

I don't see a need for an attribution line in this case.


Do you fellow ICs have access to those reports and can correlate commits from you to the prompts used to create them easily?

Not currently. Each IC's report is kept private unless they voluntarily share it, and IC's don't have visibility into other IC's Claude Code or Cursor logs. I think we're moving toward a model where it will be easier to correlate commits with chats, but timeline is not clear.

Seems far more efficient to just have a line the commit message then.


> they look at your Bedrock and LLM API calls as well as Claude Code history.

This is fucking insane. How does this correlate with productivity in any way? The results are all that matters, who cares how you got there?


  > The results are all that matters, who cares how you got there?
i actually said this at $JOB to a manager, to which they replied "yes, but in the future all code will be ai generated, so thats the 'results' we are looking for"....

If it's decent code, but attributed to AI, how does that change things? What real-world impacts does that have?

That's what I can't for the life of me figure out. Bad code is bad code regardless of who is writing it. Adding a disclaimer about how it was written is meaningless. Hell, it could say "Written by the Easter bunny" and that would have 0 impact on it's utility.


Not the commenter you replied to:

I think many people in this camp have political or ethical concerns and want to avoid contributing to or supporting the companies behind frontier-AI tools. Or they have moral or technical concerns and want to boycott usage to maintain their principles.

It should be fairly widely known at this point.


Obviously people have those concerns. The comment above specifically said:

> The value of the Claude attribution is that you can tell at a glance who used AI.

Specifies none of that, which is why I was asking the question.

> technical concerns

Which is exactly why I asked what I did. What technical concerns could possibly exist if the code is good? What does adding that attribution remove or add to technical concerns that you can't already see from the code itself?


Maybe you want to resist normalizing the use of GenAI for programming?

I know my personal choice doesn’t make much of a difference but I refuse to own a car. I advocate at my local city council to remove car storage from streets, remove parking minimums, add better transit, make the core of our city car-free. It sometimes feels easier to join in and just accept that this is the way of the world but I refuse to believe in inevitability: building cities for the benefit of cars is a choice.

Maybe some folks want to avoid AI code because they don’t want to make that choice?

I can’t say for them. But I do know there’s no sense pretending like they don’t have a point or feigning shock that someone might not have the same view as you do.


Are you intentionally misreading my comments?

I asked a simple question: What technical concerns could possibly exist if the code is good?

I made it very clear I wasn't talking about personal, political, or ethical arguments.

> But I do know there’s no sense pretending like they don’t have a point or feigning shock that someone might not have the same view as you do.

Where am I feigning shock? Are you reading the right comment thread before you're replying?


> That's what I can't for the life of me figure out. Bad code is bad code regardless of who is writing it. Adding a disclaimer about how it was written is meaningless.

Maybe I was reading too much into this part of your comment.

Plenty of folks don’t separate the ethical, political, or moral from the technology. For them using it is condoning it. Like for me, owning a car is contributing to car culture. It might be inconvenient for me or seem backwards to others but it’s worth resisting. They want to know that something was written with AI so they can avoid supporting AI or condoning its use.


That's fair. I guess at that point I think adding a label to the repo as opposed to the commit would make more sense to me.

And why do you want to know that? So you can call our projects slop? Ostracize us?

Because LLMs are not humans, and the code they produce will have a different distribution of failure modes than human written code, so attribution is useful info while reviewing?

> while reviewing

As I said, disclosure is polite when contributing code to third party projects which will undergo human review.

No need for such things in one's own projects.


>which will undergo human review

This can be largely assumed to be true for any open source code. It's kinda the point of open source.


Nope. It cannot be assumed at all. Maintainer could just as easily tell Claude to review the hand written code you sent instead of spending any effort on it. Maintainer could sit on the patch for months on end only to swoop in later and rewrite it instead of engaging with you, thereby erasing your contribution and attribution. Maintainer could just ignore you entirely despite the pervasive "patches welcome" attitude.

If there's one thing I learned not to do in open source, it's to assume nonsense like that.


I'm referring to the fact that "open source" quite literally means "readable by humans [and machines]", and anything beyond that is a subject of debate. There are more users than readers in nearly all cases, but being able to read the code as a user is a significant benefit at times, and it's one of the reasons it's such a large ecosystem in terms of both users and contributors. (it usually being free is another big reason, of course)

Even with coding agents gaining popularity, many humans still look at the code at some point.


I see. That depends on how much I care about the project. My favorite ones get weeks of review and refinement, to the point I still consider them to be more or less hand written. Not all projects get to be that important.

for the same reason we want to know who wrote an article, a book, a movie, a song, a play, a journal paper, a painting, and on and on.

why do you so many people want to hide who the real author is?

we should be very weary of anyone claiming they’re the author of something when they’re absolutely not. if jon wrote a book and i take credit, that’s shady as hell.


Ghostwriting is a thing.

Yes, and I respect ghostwritten work less than I do a work with a disclosed true author. Same would be true of unattributed AI-generated code.

yes because there's people who can't write but want to pretend that they can, just like the people who don't disclose they're using these tools. If you're the Gwyneth Paltrow of programming you're not making a great case for yourself, and I'd like to know before touching any of the software.

I don't know, am I? Why don't you check out my work and decide for yourself? Better than forming prejudiced opinions about others.

>Why don't you check out my work and decide for yourself?

because no person can read every line of code written in software they use, or track every commit made to a project. Integrity and authorship matters. If a person lies or obfuscates the origin of what they produce, an article, software, what have you they're doing it for a reason, otherwise they would be honest. That's not prejudice, that's recognizing deceit. And you don't eat fruit from a rotten tree.


> because no person can read every line of code written in software they use, or track every commit made to a project

Ask Claude to do it for you.

> they're doing it for a reason

And you concluded that the reason was they were pretenders who can't hack it.

That's your prejudice. Not interested in helping you categorize me, thanks.


That's fine; don't contribute to projects that ask for proper attribution, then, and I suppose everyone will be happy.

As I said, attribution is polite. If you're going to openly pre-judge and disrespect people on that basis though, the calculus changes.

So that the AI model that generated code can get proper credit and we'll know to use (or not use it) next time.

That's not at all what someone who wants to "tell at a glance who used AI" actually wants to know.

You don't need an AI attribution tag to recognize slop. In my experience reviewing PRs, the slop-pushers are most aggressive about stripping the AI attribution anyway. It's the normal devs who use a little bit of AI who leave it in.

The tag is helpful because AI authorship is different than the human authorship. When you work with a project or team for long enough you start to trust certain people and their intuition, but when they start submitting AI-produced code you have to reset and review it like AI code.

I use these tools a lot, too. But I want to know where the code came from so I can review it accordingly. The source matters.

> Ostracize us?

I don't know why you're so defensive. If AI wrote the code just be honest about it.

If you outsourced the code writing to some guy named Bob on Fiverr, I'd want to know that too.



I'm not interesting in joining into some argument you're having with someone on lobste.rs

You're not supposed to join. You said you didn't know why I was defensive. I showed you those posts as evidence of the stigma attached to LLMs and their usage. Now you know why.

Maybe you should step back and see if there's a reason why there's a stigma, instead of stubbornly insisting that there's nothing different between submitting work that you wrote yourself, vs. work done by an LLM.

Maybe you should stop making excuses for the summary dismissal of people's years old projects as slop on the basis that an LLM touched it.

It doesn’t help your case that your response is to say “well I’ll just hide my use.” That’s fraud.


Yes.

You're not entitled to know what specific tools were used to produce something, generally speaking.

In the absence of such an entitlement, not volunteering to disclose the tools used is not fraud.


Don't think calling a PR written by AI is the same thing as using a "tool". If code is largely generated by AI means that AI was an author and not you with some tool.

At what point does it cease to be AI generated and become my own work?

If LLM generates some code but I edit it, does it become my own work? How much editing must be done?

How large is "largely" ? Exactly how many bits of information must come from my fingers tapping the keyboard in order for me to qualify for authorship? Be precise.

If I write something but the LLM polishes it up a bit, is it still my work? Or is it AI generated?


I don't have an answer about the stage when something should be considered authored by AI - we are in an uncharted territory on this.

There are some precedents and rulings related to copyright and AI, so we have at least some rubric by which "authorship" can be determined. But when it comes to AI doing polishing of existing code - that is less certain.


Consider the rules around copyright. If your part of it is substantive, then it's your own work. If it isn't, then it isn't.

I'm not going to define substantive for you. That's something you should feel obligated to research and learn about yourself; anything less is dishonest.


Copyright provides for works made for hire. You are the author, yet your employer owns it. Your employer owns your output and gets credit for it despite not having written even a single bit of it. You're essentially ghostwriting your employer's software.

So "consider copyright" isn't really strengthening your position.


Some people prefer organic grown food for all kinds of reasons, does it matter to you they would want the same for code? (Also, I'm not picking a side here)

It matters when I'm contributing to their projects. In that case I'll go out of my way to be polite and learn their rules.

That's really all anyone is asking of you. It's odd that this is your position, and yet you seem to be arguing (in your other comments) in a way that seems like you think that you should be able to do whatever you want, with any project, their requirements be damned.

Because the reasons for doing it matter. The "different failure modes" argument is a fair point. Since it changes the way code is reviewed, it is polite to disclose use of LLMs.

But you and others in this thread seem hellbent on stigmatizing it to the point you take it as evidence of someone's incompetence. So I'm not at all sympathetic to your "requirements".

That's really all anyone's asking of you: enough respect for your fellow programmers that you avoid pre-judging them. If you can't do that, then what do we care about your "requirements"?


So we can know which commits will be infringing others’ copyright.

If Claude is actually good enough to commit to rsync, of course I'm going to look at that and think "it's good enough for my side project too." And (benefit to companies aside) that is info it is useful to know, if it's true.

Yeah, this is why it's obnoxious and this is why scummy marketers do it. If you don't aggressively turn it off, they leech an implicit endorsement out of you.

- Sent from my iPhone


Alto hug the iphone sigoff is hilaripus sonce fhe meyboard is so bad it always comes across asa an ask doe forgivebeds

— Sent from my iPhone


Indeed. The best endorsement is done explicitly by obnoxious users.

I use Linux, btw.


Is that a bad thing? I mean from the perspective of Anthropic's marketing department sure, but if agents are just another type of tool in developer's tool belt - as I see people recently like to claim - attribution feels kinda weird. In the end it is the developer who is responsible for their commits.

Yeah I think it's a bad thing. It's context about how open source code was written that is lost.

And I guess maybe there's no such thing as bad press but at least in this cases it doesn't seem like effective marketing for Anthropic.


“Don’t get mad at people for doing something unethical or immoral, or they’ll do something unethical or immoral!”

Disabling attribution of LLM-generated code is fraud, because you’re saying you wrote the code.

Of course that fits right in with the use of an LLM to generate code in the first place, since what it’s actually doing is regurgitating its inputs stripped of any license and copyright notice.


I'm very certain that this is not fraud, across multiple legal systems, both roman and common law. In both cases fraud requires a person is deprived of a material good. Neither the defrauded person or their material loss is present in this case. Maybe there is a oddball legal system somewhere in the world where fraud is something entirely different, but i doubt it. "Fraud", just like "Decorator Pattern" is a well established concept and pretty simple concept, even if there are edge cases. This does not fit at all.

In academia this is miss-attribution, outside of academia this does not exist.

This is clearly not not copyright infringement either as LLMs do not claim copyright, nor could they. Just like the photograph taken by the monkey, or pictures drawn by crows. LLM output is not a creative work either.

If this is unethical or immoral is a totaly different question. I really dont think so and I dont think you argue that position well.


It is misrepresentation for gain, that gain does not need to be monetary to be material. For example, it can be reputational.

It also is copyright infringement, because what the LLM “generates” are actually portions of its training set, which were covered by copyright. Just passing through an LLM does not remove that copyright from that work.


No, you are wrong.

In German and French (roman) legal systems this is a "Vermögensdelikt", and explicitly about material damage and gain. Yes, common law can be more broad (in canada it isn't really, it just also includes service, btw.), and yet it clearly does not meet the definition, as there is a damaged/defraued party and fraudulent/gaining party. We are not talking about somebody usurping somebody else reputation, after all.

You misuse a technical term that is well established since antiquity.

You do not know what this word means. If you want to argue about semantics, look up the definition. This works especially well for legal terms as laws define them.

(That said, IANAL and there are very many different legal systems and I am not ruling out there exists one that is competently different - laws can be changed a will, after all.)

It is also obviously not copyright infringement, because this is simply not how copyright works, at all. I cannot and will explain of all copyright here. Instead I will point this out: Every code produced by a human who read copyrighted code would fall under your definition.


No, you are wrong. You are either willfully misunderstanding what I’m calling fraud, or you are misinformed as to what “material gain” means in many legal systems.

With respect to the former, “fraud” is a shorthand for “fraudulent misrepresentation,” which is what you’re doing when you take someone else’s IP and try to contribute it to a project without securing the right to do so. It can be read as implicit in the attempt to contribute to the project that you have secured this permission (or do not need to, because the work is original to you). Whether the code came out of an LLM or was copied from another project or Stack Overflow doesn’t matter, it’s that you’re misrepresenting the rights you have that’s the fraudulent part.

For the latter, I specifically pointed out that the gain from fraudulent misrepresentation need not be monetary. The gain can be reputational or any other sort of benefit. For example, someone pretending to a fictional person to gain access to a space they otherwise wouldn’t is still committing fraud.

Finally, you’re wrong about whether the output of an LLM infringes copyright of material in its training set. Just running a copyrighted work through an LLM does not remove the copyright on that work if reproduced by the LLM.


You are misinformed, I suspect you have no idea what you are talking about.

As I said, I do not know all legal systems in the world. If there one where "material gain" matches your idea, please cite the law or a case that includes LLM usage. As I explained in the canadian law even includes services and yet it is so much very much not matching the defintion for reasons explained.

I do understand very well what you mean by "fraud", I do not miss represent it - your opinion on what it should be is plain and simple wrong. I explained why in my previous posts.

You are under the impression that legal science is some kind of folk etymology. It absolutely is not. Fraud is §263 StGB, Art. 313-1 Code penal or §380 of the canadian criminal code. (They all are remarkably similar, because they share a millennia old tradition. Making them IMHO fascinating cultural artifacts.) Here [0] is a structured version of on of these texts. Think of it as a symbolic execution of the law. You can see there is structural mismatch with your "case". Nobody ubsurbs anything from somebody else, and all three laws incude that in their defintion. That was my original claim.

You think you somehow can make up your own private definitions, develop your own private theories about them, apply them and argue about the semantics your made up terms. That is the opposite of how jurisprudence works. It rigorous, with well established scientific and scholastic methods. It operates on term defined by the law. In the case of "fraud" the previous citations, especially in criminal law, and nothing else. German legal science has its own theory what counts as "nothing else" under the name "Wortlautgrenze". These terms and methods vary from jurisdiction to jurisdiction, but by surprisingly little.

Dont call your code a decorator pattern, because you think it is decorative. Different pattern libraries have definitions for that and you need to be able to argue it fits. Like wise, if you feel something involves some kind of misrepresentation its probably not fraud. If things have different names, that probably for a good reasons, especially in legal science.

[0] https://www.iurastudent.de/schemata/schema-zum-betrug-263-i-...


"Disabling attribution of LLM-generated code is fraud, because you’re saying you wrote the code."

Should there by attribution for Google or Stack Overflow copy/paste? Who should we bully about this?


> Should there by attribution for Google or Stack Overflow copy/paste?

Obviously, and I'm a bit taken aback that anyone thinks otherwise.


Yes, in fact, this is why people who do that are looked down upon.

They are in fact committing fraud if they do not attribute the code in their commit properly, because by committing it they’re claiming to have rights by virtue of authorship that they do not have. (Namely, the right to contribute that code to the project,.) They may also be committing copyright infringement, depending on the copyright and license status of some code they found via Google or Stack Overflow.

It’s always fascinating to me to see how many people on Hacker News have such extremely poor understanding of how intellectual property actually works, and how misrepresenting themselves or their work can actually have consequences.


Are there any court cases you can point to that have clearly established that using LLM generated code can be a copyright violation? My understanding is that this is very far from being settled law.

What cases can you cite that have determined it’s not?

It’s clear on its face that LLMs can and do store and reproduce copyrighted works; using a form of (somewhat) lossy data compression. And using a lossy stochastic or perceptual form of compression to reproduce a copyrighted work doesn’t somehow make it not storage or reproduction, otherwise sharing MP3 files wouldn’t be copyright infringement.

Anyone engaging in responsible risk management should assume that anything LLM-generated is infringing until determined otherwise by the courts, not the other way around.


There are billion-dollar entities preparing to fight this very question out in court as we speak.

Your interpretation of the law is certainly plausible, but it is clearly not a settled question.

If you really are so confident, go bet on Kalshi and make some easy money: https://kalshi.com/markets/kxnytoai/new-york-times-wins-open...


Outside of situations where it is required by contract, attributing AI usage is a courtesy, nothing more.

So it’s OK to just paste other people’s IP into a change you’re submitting to a project without caring about the license or originator?

It's only fraud if a person signed their name stating such.

Their name being attached to the commit is itself, irrelevant, as their is no way to submit a patch otherwise. You could use a fake name, but you're just moving this fraud problem around.

You're going to have a hard time convincing anyone that using a tool constitutes fraud. Frankly, it's silly, if not genuinely stupid.

Film photographers in the early 2000s routinely called digital "not real photography" and Photoshop "cheating" because you could delete bad shots and fix everything later. Traditional musicians and critics dismissed drum machines, synthesizers, and autotune as soulless tools.


Intent and custom both matter quite a bit in law. It is customary to treat the name attached to a commit as the copyright holder of any changes represented by that commit, just as it was for the sender of an email containing a patch back when that was how such work was done.

Often this is also spelled out in a project’s contribution guidelines, and some projects have even had more explicit copyright assignment policies they required contributors to agree to, but the lack of such guidelines or assignment policies does not mean the custom as normally observed in the field is irrelevant.


> Intent and custom both matter quite a bit in law.

Indeed, and I'm not aware of any (Western, at least) legal system that would consider it fraud to not disclose that an LLM had generated some code.

I'd like to gently point out that your insistence of fraud here is hurting your overall argument, and is causing people to focus on the language you're using, instead of the substance of what you're trying to say. I do agree with you that people should disclose LLM generation when writing commits. But the way you're going about arguing this "fraud" thing is an unproductive dead end.


The fraud isn’t (directly) in hiding that the LLM generated some code. The fraud is in the (implicit) misrepresentation of ownership of and/or rights to the code.

When you send a patch or pull request to a project, you’re saying (implicitly) that you have the necessary rights to contribute the intellectual property it contains. If you used an LLM to “generate” some of it, that is not necessarily the case.

A similar situation would occur if you agreed to pay someone else to create a patch, and then submitted it under your own name without paying them. Because it’s a work for hire, it’s not yours until they’re paid for it, so you’re fraudulently misrepresenting your rights to that patch to the project. If you did pay the creator, you don’t have to attribute them unless it’s in the contract between you and the creator, or unless the project requires such attribution.


This argument gets trotted out every time but it doesn't convince me of anything. Yes, calling things out creates an incentive for people to hide them, but so what?

Setting aside the whole AI = bad argument, let's do a metaphor. Tax evasion is bad and unethical and you should call it out where you see it. But wait, that creates an incentive for people to hide it! So I'd better not call it out, it's best to just keep my mouth shut.


I'd be willing to be that an undisclosed LLM disclosure will follow a developer around for the rest of their career

That kind of fraud absolutely should. (I suspect you mean “undisclosed LLM use.”)

Thank you, that's what I meant

I'm willing to be that in two years that's going to be completely irrelevant because the amount of code written by hand will drop to less than 10%.

I mean, I don't think commits are the place for tool attributions. I want to know what the change was, I'm not really interested in your tool selection (put that in the PR if it's relevant). It'd be just as irrelevant to see "written on my macbook in neovim"

Depends on what the claude attribution actually means. A lot of people will just get the thing building and then ship. To me that attribution is generally a red flag.

It means “this contribution likely infringes someone else’s copyright.”

[flagged]


It makes no sense at all to do that. The only thing that matters is whether the code is good.

That’s not the only thing that matters. The provenance of the code also matters enormously, specifically whether the person contributing it actually has the right to do so.

If I contributed code to an Open Source project behind my old employer’s back, that would have been bad, because that code was owned by them and not me, even if I wrote it on my own time using my own equipment, because of the contract I signed with them.

If I copied code out of an AGPLv3-licensed codebase and contributed it to a BSD-licensed codebase without telling anyone, that would have been bad, because I did not have the right to change the license on that code to BSD (or change the license on the codebase to which I was contributing to AGPLv3).

If you use an LLM to produce code, you may well be doing the latter since an LLM is actually just regurgitating portions of its inputs. This is not a hypothetical scenario; I’ve personally encountered a case of someone using an LLM attempt to contribute code I recognized from a specific Open Source project under one license to another project under a different license, while claiming they “wrote it themselves.”

Any project that accepts contributions needs to take liability seriously and manage their risk appropriately.


> This is not a hypothetical scenario; I’ve personally encountered a case of someone using an LLM attempt to contribute code I recognized from a specific Open Source project under one license to another project under a different license

You say you "recognized code". Does it mean that you weren't able to find the exact match?

> an LLM is actually just regurgitating portions of its inputs

You seem to be talking about the inputs to the autoregressive pretraining stage. Correct? Then it's not how LLMs work, unless we use a definition of portions as a "few letters blocks."


I found exact matches. I also found inexact matches, where C functions had been turned into C++ member functions and the like. “Recognized” does not somehow imply a lack of precision.

The LLM the person used was trained on a very large corpus of Open Source code, and reproduced that code exactly. Just like LLMs have reproduced chapters of books and articles from the New York Times exactly.


> I found exact matches.

Were those functions trivial? With, say, 1% probability of someone who have not seen them writing them like that?

> Just like LLMs have reproduced chapters of books and articles from the New York Times exactly.

Have you read the articles? As far as I remember they fed large chunks of an article multiple times to an LLM to sometimes get a not-so-long exact match. It can mean that LLMs can infer a style and humans are predictable.


> […] fed large chunks of an article multiple times to an LLM […]

So they had to prompt? An LLM? I got this argument before and still don’t get what it’s trying to say. These models do not output anything unless prompted, that’s not any kind of gotcha.

On the code outputting front there is a lot of relevant evidence beyond the NYC lawsuit [0].

If I slightly modify GPL code, that doesn’t give me the right to relicense.

[0] https://arxiv.org/html/2601.02671?amp=&amp= and https://arxiv.org/abs/2506.12286 and https://ai.stanford.edu/blog/verbatim-memorization/


No, the functions weren’t trivial, and a lot of the surrounding code and structure bore substantial similarities as well. If you saw the two files next to each other, you’d assume it was the result of a copy-paste-adjust process if you didn’t know an LLM was involved.

I can only speculate that the model that generated the code hasn't undergone selective unlearning for verbatim data (SUV) or something similar. As you understand "sometimes generates verbatim code" and "just regurgitates [non-trivial] portions its input" are different statements.

The possibility of SUV clearly shows that a model does more than "just regurgitating."


"LLM produced licensed code and person contributed it" is indistinguishable from "person contributed licensed code". The LLM is irrelevant. Result is the same as if they had copy pasted it.

Yes, exactly.

Unfortunately, a large number of people are being told—and here, you can see many who believe it—that the output of an LLM either carries no copyright or is copyright by the one prompting it. In other words, even right here on Hacker News it’s widely believed that LLMs “launder” copyright.


Irrelevant either way. It's your name on the commit, and the code either infringes or it does not. Whether an LLM was used is immaterial.

Not irrelevant. A large number of people who would not copy and paste code from one project to the another will attempt to contribute the copyright-infringing output of an LLM and not think twice.

[flagged]


Is this comment LLM generated?

Have fun with 1000x more Buns that literally no one is using or maintaining. An entire software industry built on top of a burning garbage pile of crappy, dead code.


It is, that user has responded me using LLMs before…

> An entire software industry built on top of a burning garbage pile of crappy, dead code.

That has been the case for the last, oh, decade or so. Where do you think LLMs learned to slop code?


Things have been bad, but every company using its own bespoke LLM reimplementation of rsync and similar is so, so much worse.

Why would every company do it though? They'll just all be using the same (Anthropic's) AI-enabled fork.

You think Anthropic wants to be the sole maintainer of thousands of forked OSS projects...? I seriously doubt that would happen, for legal, marketing, and logistical reasons alike.

Anthropic, probably not. I could totally see Altman or even Musk deciding to do that exact thing as a showcase of sorts.

[flagged]


It just reads like Linkedin slop. One melodramatic sentence after another.

Consider collecting related thoughts into paragraphs.


The Fortune 10 company that I spent decades at and retired from just a couple years ago noticed this issue immediately and issued a blanket ban on the use of these tools for the company’s own code that to my knowledge has not been rescinded. (They also started developing their own coding-specific LLM, training solely on code they owned, around the same time.)

You might consider that there is a very large incentive by the large and public players in this market to promote the idea that this is not true, that they consider themselves large and powerful enough to actually flout the law, and that they plan to use the argument that enforcement will be too damaging to the economy to make their view the “new normal.”

This playbook has been run before, by Uber and Lyft, by AirBnB, by Tesla with “FSD,” and so on. It’s very clearly the approach being taken.


They’re using Claude lmao

[flagged]


Or you’re misinformed about what my old employer is actually doing, or how they’re doing it.

I'm not

"let's go the opposite way"

Do you have any popular open source projects? Or are you just an Internet gremlin?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: