This is an excellent and very interesting write-up.
It's so refreshing to read technical articles that are clearly written by a knowledgeable human and explained perfectly like this. By walking the reader through this with the example screenshots it unfolds and gets more interesting as you continue reading.
It's also strange to realize that these days, most articles are not like this.
Be careful with this sort of logic ("reflection=expensive").
Everything should obviously be measured.
I've worked with large .NET code bases that used attributes for things like plugins and it was completely negligible for overall performance in the grand scheme of things.
I always use a standard workflow and it has never been a problem.
- Define the task and the goal, write a short spec document (markdown is fine)
- Point the agent at it in plan mode and have it write the plan to disk with phases. Iterate on its plan if necessary here and now.
- Have each agent tackle a phase and have it update it as a living document (switch models if some phases are more difficult than others)
- Clear and repeat until done
I've never had to overcomplicate this and it's worked both on enterprise-scale projects and personal projects. I am not sure what I'm missing - if anything.
I think what you are doing is good, I also have a similar workflow, but the idea here is to automate some of your manual approval work with coded tests. Since they are easy to generate, have as many as possible, think hard about what to test for, and the agent will deviate less and be more autonomous.
Meanwhile, songs are hitting number one on some charts on Spotify that people think are humans and are actually AI. And Spotify has to start labelling them as such. One AI "band" had an entire album of hits.
Also - music is a subjective. Mathematics isn't.
And in this case, an LLM discovered a new way to reason about a conjecture. I don't know how much proof is needed - since that is literally proof that it can be done.
>> Meanwhile, songs are hitting number one on some charts on Spotify that people think are humans and are actually AI. And Spotify has to start labelling them as such. One AI "band" had an entire album of hits.
There is quite some questions around that. Music is subjective and obviously different people have different taste, but I wouldn't call any of them to be actual good music / real hits.
>> LLM discovered a new way to reason about a conjecture
I wasn't questioning LLMs ability to prove things. Parent threads were talking about building new kind of maths , or approaching it in a creative/artistic way. Thats' what I was referring to.
I can't speak for maths of hard science as I'm not trained in that, but the creativity aspect in code is definitely lacking when it comes to LLMs. May not matter down the line.
If an AI agent finds zero bugs in a software utility, how can that be viewed in the sense the AI agent is not very good at finding bugs?
What if there are actually zero bugs?
> Five issues felt like nothing as we had expected an extensive list.
The expectation here may not match reality, but not necessarily because Mythos isn't as capable as claimed. curl may just happen to be a well-hardened tool that doesn't have too many security vulnerabilities in its present state.
The author considered the same w.r.t. remaining bugs:
> More to find
> These were absolutely not the last bugs to find or report. Just while I was writing the drafts for this blog post we have received more reports from security researchers about suspected problems. The AI tools will improve further and the researchers can find new and different ways to prompt the existing AIs to make them find more.
> We have not reached the end of this yet.
> I hope we can keep getting more curl scans done with Mythos and other AIs, over and over until they truly stop finding new problems.
And that makes sense, it'd be quite the argument of coincidence to say there was just 1 proper find remaining & it was only Mythos that managed to find it just at the point in time it released while the other projects have been hoovering up every other find quickly until that point. Possible, but not the safest assumption to start questioning with.
You don't need to go back to coding by hand if you know how to do it already. There is a middle ground.
If you understand good software architecture, architect it. Create a markdown document just as you would if you had a team of engineers working with you and would hand off to them. Be specific.
Let the AI do the implementation of your architecture.
I think Firefox applies more aggressive subpixel rendering and path smoothing before stroking. It resamples the glyph outline path at a higher precision level before handing it to the stroke algorithm.
It's so refreshing to read technical articles that are clearly written by a knowledgeable human and explained perfectly like this. By walking the reader through this with the example screenshots it unfolds and gets more interesting as you continue reading.
It's also strange to realize that these days, most articles are not like this.
reply