It's not a clean-room implementation, but not because it's trained on the intern...

Calavar · 2026-02-05T22:38:36 1770331116

The classical definition of a clean room implementation is something that's made by looking at the output of a prior implementation but not at the source.

I agree that having a reference compiler available is a huge caveat though. Even if we completely put training data leakage aside, they're developing against a programmatic checker for a spec that's already had millions of man hours put into it. This is an optimal scenario for agentic coding, but the vast majority of problems that people will want to tackle with agentic coding are not going to look like that.

visarga · 2026-02-06T05:43:07 1770356587

This is the reimplementation scenario for agentic coding. If you have a good spec and battery of tests you can delete the code and reimplement it. Code is no longer the product of eng work, it is more like bytecode now, you regenerate it, you don't read it. If you have to read it then you are just walking a motorcycle.

We have seen at least 3 of these projects - the JustHTML one, the FastRender and this one. All started from beefy tests and specs. They show reimplementation without manual intervention kind of works.

Calavar · 2026-02-06T06:18:19 1770358699

I think that's overstating it.

JustHTML is a success in large part because it's a problem that can be solved with 4 digit LOC. The whole codebase can sit in an LLM's context at once. Do LLMs scale beyond that?

I would classify both FastRender and Opus C compiler as interesting failures. They are interesting because they got a non-negligible fraction of the way to feature complete. They are failures because they ended with no clear path for moving the needle forward to 80% feature complete, let alone 100%.

From the original article:

> The resulting compiler has nearly reached the limits of Opus’s abilities. I tried (hard!) to fix several of the above limitations but wasn’t fully successful. New features and bugfixes frequently broke existing functionality.

From the experiments we've seen so far it seems that a large enough agentic code base will inevitably collapse under its own weight.

jayd16 · 2026-02-06T19:12:42 1770405162

> Code is no longer the product of eng work

Never was.

franktankbank · 2026-02-06T14:55:22 1770389722

Great way to get constantly moving holes.

array_key_first · 2026-02-05T22:54:11 1770332051

If you read the entire GCC source code and then create a compatible compiler, it's not clean room. Which Opus basically did since, I'm assuming, its training set contained the entire source of GCC. So even if they were actively referencing GCC I think that counts.

nmilo · 2026-02-06T00:00:50 1770336050

What if you just read the entire GCC source code in school 15 years ago? Is that not clean room?

hex4def6 · 2026-02-06T00:27:28 1770337648

No.

I'd argue that no one would really care given it's GCC.

But if you worked for GiantSodaCo on their secret recipe under NDA, then create a new soda company 15 years later that tastes suspiciously similar to GiantSodaCo, you'd probably have legal issues. It would be hard to argue that you weren't using proprietary knowledge in that case.

Zambyte · 2026-02-06T15:28:38 1770391718

Given that GCC is not public domain, the copyright holders will probably care.

pertymcpert · 2026-02-06T06:44:37 1770360277

I read the source. If anything it takes concepts from LLVM more than GCC, but the similarities aren't very deep.