More

EMM_386 · 2026-06-03T13:30:43 1780493443

This is an excellent and very interesting write-up.

It's so refreshing to read technical articles that are clearly written by a knowledgeable human and explained perfectly like this. By walking the reader through this with the example screenshots it unfolds and gets more interesting as you continue reading.

It's also strange to realize that these days, most articles are not like this.

ammar2 · 2026-06-03T16:43:51 1780505031

heh, a friend actually pointed out a typo on a first draft and said "maybe you shouldn't fix it to show it's not LLM written".

EMM_386 · 2026-06-02T16:01:33 1780416093

Be careful with this sort of logic ("reflection=expensive").

Everything should obviously be measured.

I've worked with large .NET code bases that used attributes for things like plugins and it was completely negligible for overall performance in the grand scheme of things.

EMM_386 · 2026-05-31T14:57:04 1780239424

I always use a standard workflow and it has never been a problem.

- Define the task and the goal, write a short spec document (markdown is fine)

- Point the agent at it in plan mode and have it write the plan to disk with phases. Iterate on its plan if necessary here and now.

- Have each agent tackle a phase and have it update it as a living document (switch models if some phases are more difficult than others)

- Clear and repeat until done

I've never had to overcomplicate this and it's worked both on enterprise-scale projects and personal projects. I am not sure what I'm missing - if anything.

visarga · 2026-06-01T04:32:52 1780288372

I think what you are doing is good, I also have a similar workflow, but the idea here is to automate some of your manual approval work with coded tests. Since they are easy to generate, have as many as possible, think hard about what to test for, and the agent will deviate less and be more autonomous.

EMM_386 · 2026-05-20T21:27:36 1779312456

"I don't see them be making the next great song"

Meanwhile, songs are hitting number one on some charts on Spotify that people think are humans and are actually AI. And Spotify has to start labelling them as such. One AI "band" had an entire album of hits.

Also - music is a subjective. Mathematics isn't.

And in this case, an LLM discovered a new way to reason about a conjecture. I don't know how much proof is needed - since that is literally proof that it can be done.

truncate · 2026-05-20T21:58:24 1779314304

>> Meanwhile, songs are hitting number one on some charts on Spotify that people think are humans and are actually AI. And Spotify has to start labelling them as such. One AI "band" had an entire album of hits.

There is quite some questions around that. Music is subjective and obviously different people have different taste, but I wouldn't call any of them to be actual good music / real hits.

>> LLM discovered a new way to reason about a conjecture

I wasn't questioning LLMs ability to prove things. Parent threads were talking about building new kind of maths , or approaching it in a creative/artistic way. Thats' what I was referring to.

I can't speak for maths of hard science as I'm not trained in that, but the creativity aspect in code is definitely lacking when it comes to LLMs. May not matter down the line.

EMM_386 · 2026-05-11T16:16:03 1778516163

If an AI agent finds zero bugs in a software utility, how can that be viewed in the sense the AI agent is not very good at finding bugs?

What if there are actually zero bugs?

> Five issues felt like nothing as we had expected an extensive list.

The expectation here may not match reality, but not necessarily because Mythos isn't as capable as claimed. curl may just happen to be a well-hardened tool that doesn't have too many security vulnerabilities in its present state.

zamadatix · 2026-05-11T17:09:00 1778519340

The author considered the same w.r.t. remaining bugs:

> More to find

> These were absolutely not the last bugs to find or report. Just while I was writing the drafts for this blog post we have received more reports from security researchers about suspected problems. The AI tools will improve further and the researchers can find new and different ways to prompt the existing AIs to make them find more.

> We have not reached the end of this yet.

> I hope we can keep getting more curl scans done with Mythos and other AIs, over and over until they truly stop finding new problems.

And that makes sense, it'd be quite the argument of coincidence to say there was just 1 proper find remaining & it was only Mythos that managed to find it just at the point in time it released while the other projects have been hoovering up every other find quickly until that point. Possible, but not the safest assumption to start questioning with.

EMM_386 · 2026-05-11T13:24:40 1778505880

There are terminal libraries that do this:

https://github.com/vadimdemedes/ink

Which is what Claude Code CLI uses (or was using?) and it caused many issues such as flickering, thrashing, and latency.

EMM_386 · 2026-05-11T02:26:38 1778466398

You don't need to go back to coding by hand if you know how to do it already. There is a middle ground.

If you understand good software architecture, architect it. Create a markdown document just as you would if you had a team of engineers working with you and would hand off to them. Be specific.

Let the AI do the implementation of your architecture.

EMM_386 · 2026-05-06T15:18:38 1778080718

I think Firefox applies more aggressive subpixel rendering and path smoothing before stroking. It resamples the glyph outline path at a higher precision level before handing it to the stroke algorithm.

EMM_386 · 2026-05-03T22:01:12 1777845672

> "in September 2025, Banksy painted a mural on the Royal Courts of Justice depicting a judge bludgeoning a protester with a gavel"

His other works aren't subtle.

EMM_386 · 2026-04-19T23:15:48 1776640548

You can read the latest Claude Constitution plus more info here:

https://www.anthropic.com/news/claude-new-constitution