More

sutterd · 2026-06-14T22:26:50 1781476010

I think there are different views you can have here. I think PG is in the group that thinks if you get a billion dollars, you earned a billion dollars. His distinction is between getting the billion dollars honestly or dishonestly. The alternate view is that you can get 100 billion dollars, presumably honestly, but that doesn't mean you earned a billion dollars. The first group will say this is splitting hairs. The second group will say that is the whole point. It the company gets a billion dollars, did it earn a billion dollars? Even more to the point, if the company earns a billion dollars, does the founder/CEO, or whoever is refernced in this post, earn a billion dollars? I think the two groups will just see this differently.

sutterd · 2026-06-13T15:50:40 1781365840

Fable was a big improvement in planning for me over Opus. I usually do a bit of work preparing tasks before handing them off to Opus or else I get bad results. I didn't plan on writing software this week because I was working on other things but changed my mind to test out Fable. I didn't have any work prepared. Fable was able to write the high level plans that later turned into coding tasks. Of course any model could wirte plans like that, but I had confidence in these plans similar how Opus 4.5 gave me a huge jump in confidence in the code it wrote. (Honest, I am not paid to write this.)

justiceforsaas · 2026-06-13T16:03:07 1781366587

Honestly the code gen part has been “good enough” for a while now, especially with models like Opus. The broader point this post is making is that newer SOTA models are improving at the "planning layer", and this is usually the the part a senior developer would usually handle (identifying edge cases, thinking ahead, thinking about tradeoffs, etc.)

dpbrinkm · 2026-06-13T16:38:48 1781368728

Was it really that bad when you would use skills like the superpowers pack?

sutterd · 2026-06-06T18:23:32 1780770212

A large share of invested money is passive, especially in the S&P 500. If some people pull out, it could cause a very damaging cascade. There would be a forced sale of stocks with maybe no buyers.

sutterd · 2026-06-04T22:48:52 1780613332

I am doing a solo project that is pretty big, meaning it is not something I could vibe code. I can do alot with AI that I could never do on my own, but I am not seeing several mulitples improvement in my productivity. I spend so much time doing what I call "AI wrangling", trying to get it to do what I want. Claude is writing all the javscript and python code, but ultimately I am programming in English. What is good is that it is effectively a very high level computer language, where the agent can implement a lot of underlying code with a short English description, often. But many other times it takes a lot of work to get what you want.

matheusmoreira · 2026-06-05T00:19:10 1780618750

I measured an ~8x increase in the number of commits I've been pushing, and I've actually been trying to restrain myself. I could do a lot more if I stopped reviewing and editing the code. I think it's got more to do with my executive ability than raw productivity though. AI essentially cured my ADHD by making the execution of my ideas virtually painless.

raptor99 · 2026-06-05T00:56:14 1780620974

LOL "I measured an 8x increase in the number of commits Ive been pushing" is an absolutely useless statement

matheusmoreira · 2026-06-05T03:38:48 1780630728

Subscribed to Claude a few months ago. I immediately started working with it on my programming language. Since then, I've implemented a compacting garbage collector, a size class based memory allocator, a unified value heap, deeply optimized hash tables and even implemented shapes like V8 and Self, redesigned the value representation, created a Common Lisp style condition system, implemented UTF-8 text decoding, refined the generators API, increased the number of tests from ~200 to ~1200 and improved the test suite to the point it runs all of those tests in parallel in under two seconds, implemented stack protection support, added an aarch64 matrix to the GitHub CI, fixed a zillion bugs, improved performance, perfected tail call optimization. I did so much stuff I'm probably forgetting some. And these aren't "lol just do it" prompts either, I'm putting effort into refining design and implementation. I review every line. Just finished designing safe hash table iteration in spite of mutability: generation counters that get bumped whenever the table is reallocated. It's actually gonna be more powerful than what other languages do. Next up on my todo list is to implement my language's unified pattern matcher, static allocation for all interpreter internal data in order to get rid of all initialization code and achieve nearly zero startup time, and then finally a bytecode interpreter to close the performance gap on the likes of Python.

Dramatically improved my static site generator Pugneum to the point it's better than markdown and added Atom and RSS feeds, used it to write several articles about my language. Pace is so fast I actually need to write those articles by hand in order to crystalize the knowledge I learned. If I don't I'm afraid I'll just forget everything. No LLMs for the articles themselves, but they sure as hell took all the pain away from writing them. Pugneum even has back references and table of contents generation now. Claude even helped me refine my website's CSS, something I'm not very good at.

Also created my own invoicing system for $DAYJOB so I can invoice companies from my terminal. Started a decompilation project for my cherished childhood games and I've already almost finished decompiling one game's engine after just a few days. Been working on my cyberdeck project too, this one's a bit slow because I got to the point where I'll actually need to spend money on it to move forward. All this inside the rootless development virtual machine system built on top of QEMU and systemd that I developed together with Claude, whose network isolation I'm currently hardening. Started reverse engineering my laptop again! And I'm actually making progress! Made a color scheme app for the keyboard LEDs controller I made many years ago, with loads and loads of color schemes! Found some kind of bug in my keyboard while doing it, in less than an hour I had the root cause and a fix applied locally, sent the fix to systemd, it got merged. Planning to ramp up my free and open source software participation as well now that exploring codebases is a breeze. Already have some mesa patches ready for upstream. Have been playing with strace since I use it so much.

Better?

jimbokun · 2026-06-05T04:00:40 1780632040

I’m sure rapor99 is unimpressed while not being able to point to any similar accomplishments in their own work in the same timeframe.

onlyrealcuzzo · 2026-06-04T23:10:46 1780614646

I'm building a memory safe programming language with a declarative concurrency model that's close to release.

There is ZERO chance I would ever be able to complete it on my own.

I doubt it'll get traction, but if it doesn't, I am pretty confident a future language will take the ideas for polymorphic synchronization and profile-guided optimization.

It has an easy version/mode of compilation that makes Rust's affine ownership accessible like a high-level scripting language, and it can progressively become more strict, where the compiler does ~99% of the work for you, and you just pick options as it finds issues (that it explains to you like you're 5) along the way.

Along the way, I also built a suite of tools that helps identify complexity better than anything I've seen (which was necessary to get the LLMs to be able to unslop themselves and write something that actually works).

I doubt the Ruby community shrugs it off, but time will tell.

pizlonator · 2026-06-05T00:35:06 1780619706

How do you know it’s actually memory safe?

onlyrealcuzzo · 2026-06-05T01:57:58 1780624678

I have ~5500 memory safety fuzz tests, four different test suites with between ~80%-99% line/branch coverage each, and the same design as Rust, and haven't found a memory safety issue in 4 weeks, and I'm still planning another ~4 weeks of testing before release, more if need be.

Rust had memory safety bugs well after release - IIUC all the way until after the 1.0 release.

So, it's highly unlikely to be perfect, but I think it'll be in better shape than Go or Rust were when they initially launched.

mohamedkoubaa · 2026-06-04T23:11:46 1780614706

I have the same experience, though I feel myself getting better at wrangling over the past few months

sutterd · 2026-05-10T00:37:38 1778373458

This url worked fine:

https://chrismorgan.info/no-query-strings#:~:text=So%20I%E2%...

but this one was too long:

https://chrismorgan.info/no-query-strings?a=1

sutterd · 2026-05-10T00:50:03 1778374203

Doh! The part past the # does not go to the sever, so that wasn't a longer URL. How about:

https://chrismorgan.info/%6e%6f-%71%75%65%72%79-%73%74%72%69...

abanana · 2026-05-10T11:25:30 1778412330

Indeed, that's not a query string! The #, and following text, is a fragment, is client-side only, and isn't the subject of the blogpost. Neither is percent encoding, which is just another way to send the exact same path from your browser to the server.

Note that it has nothing to do with the length of the URL. That's just the error message he's chosen to use, because "4xx stop pissing about with my URLs" doesn't exist in the spec.

chrismorgan · 2026-05-10T13:32:43 1778419963

> percent encoding, which is just another way to send the exact same path

This is not true for all characters. Some can only be expressed by percent-encoding, and decoding them will either break things completely (e.g. %20) or change the meaning of the URL (e.g. %2F, %3F in paths).

Yes, you can encode x as %78 and it should work identically, and you can decode %78 to x and it should work identically—though in both cases, I reckon there’s a strong case for blocking the request as suspicious, and I will probably start doing that soon.

But take these examples of improperly decoding:

• /foo%2Fbar/baz.html has path «"foo/bar", "baz.html"».

• /foo/bar/baz.html has segments «"foo", "bar", "baz.html"».

• /foo%3Fbar/baz?quux has path «"foo?bar", "baz"» and query "quux".

• /foo?bar/baz?quux has path «"foo"» and query "bar/baz?quux".

abanana · 2026-05-10T18:20:24 1778437224

Indeed, it's essential in some cases. I was talking about in the context of sutterd's suggestion, where just lower-case letters have been encoded.

> strong case for blocking the request as suspicious

Yep, as there shouldn't be any "normal" reason to do such a thing.

sutterd · 2026-04-26T19:34:14 1777232054

I never adopted Opus 4.6 because it was too prone to doing things on its own. Anthropic called it "a bias towards action". I think 4.5 and 4.7 are much better in this regard. I'm not saying they are immune to this kind of thing though.

sutterd · 2026-04-26T18:32:00 1777228320

I’m working on a side project and AI is writing all the code. The code it produces is not good, and this comes from someone who has experience producing bad code. One thing I’m worried about is places like GitHub being full of AI code, which leads to AI being trained on AI code. It seems like this will lead to a downward spiral.

sutterd · 2026-04-23T20:32:30 1776976350

What kind of performance are people getting now? I was running 4.7 yesterday and it did a remarkably bad job. I recreated my repo state exactly and ran the same starting task with 4.5 (which I have preferred to 4.6). It was even worse, by a large margin. It is likely my task was a difficult or poorly posed, but I still have some idea of what 4.5 should have done on it. This was not it. What experiences are other people having with the 4.7? How about with other model versions, if they are trying them? (In both cases, I ran on max effort, for whatever that is worth.)

sutterd · 2026-04-20T03:27:39 1776655659

With my use of Claude code, I find 4.7 to be pretty good about clarifying things. I hated 4.6 for not doing this and had generally kept using 4.5. Maybe they put this in the chat prompt to try to keep the experience similar to before? I definitely do not want this in Claude code.

mh- · 2026-04-20T05:22:00 1776662520

I agree with your thoughts on 4.6.

It's possible they tried to train this out of it for 4.7 and over corrected, and the addition to the system prompt is to rein it in a bit.

sutterd · 2026-04-19T02:33:49 1776566029

You don't have to use adaptive thinking. It had been turned off on my main work computer. I was using a different computer on a trip and I started getting so angry at Claude for doing a bad job. I evetually figured out it was adaptive thinking and set it to "hard" and it started working again. At the time I think "hard" was the top choice. With 4.7, my computer now shows "xhard", which I assume is the equivelent setting. There is one higher setting than this, which I haven't tried yet. I would tell you how to change these settings, but I don't remember. By the way, I have been happy with 4.7 so far. I actually did not like 4.6 and preferred 4.5 and used that most of the time until this new release.

scrollop · 2026-04-19T06:14:18 1776579258

"With Opus 4.6, extended thinking was a toggle you managed: turn it on for hard stuff, off for quick stuff. If you left it on, every question paid the thinking tax whether it needed to or not. Now, with Opus 4.7, extended thinking becomes adaptive thinking. "

https://claude.com/resources/tutorials/working-with-claude-o...

You want extended thinking? It's not adaptive thinking and opus will turn it on if it thinks it needs to. But it probably won't, according to user reports as tokens are expensive. Except opus 4.7 now uses 35% more and outputs more thinking tokens.

sutterd · 2026-04-20T03:07:46 1776654466

I am getting pretty good performance. Even on trivial questions it seems to go through the thinking process end. If they are using adaptive thinking, it seems to work much better than before. I will see how my experience goes with more usage.

matheusmoreira · 2026-04-19T17:14:12 1776618852

> You don't have to use adaptive thinking.

With Opus 4.7 you absolutely do. Users don't have a choice.

https://code.claude.com/docs/en/model-config

> Opus 4.7 always uses adaptive reasoning. The fixed thinking budget mode and CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING do not apply to it.

__s · 2026-04-19T04:22:01 1776572521

/effort