More

threepts · 2026-05-09T12:45:00 1778330700

He statistically is the second coming of Christ Nobel and several breakthroughs in nature.

threepts · 2026-05-09T12:41:48 1778330508

Demis was boon to a rather poor family, he used his own money to buy his first computer (from chess winnings.)

His father didn't have a steady job even, he is truly a black swan.

threepts · 2026-05-05T15:37:17 1777995437

Hahaha this hits home too hard, back in early 2000s people would moan all the time whenever they spotted a hint of autotune, in 2026 its the industry standard.

I think its really speaks on the incredible ability of people to be able to be stuck in the past rather than new technology being "bad".

munksbeer · 2026-05-05T17:21:18 1778001678

This is an amazing comment. I'm old. I was born in the 70s, grew up in the 80s and 90s and miss those times so much. But that is because I was young, immortal, the world was mine to discover.

In 20 years people will be missing the 2020s too. It is just human nature to complain.

threepts · 2026-05-05T15:34:54 1777995294

Before internet existed, people went to clubs and malls, I wonder if they still do it now?

threepts · 2026-05-05T14:27:59 1777991279

That is why we have SWE bench pro, they test architecture design too, turns out 1000 dollars of tokens outperform 10k dollars of labor in meta design.

SpicyLemonZest · 2026-05-05T14:38:00 1777991880

That's just not accurate. I haven't studied SWE Bench Pro in detail, so I can't tell you exactly what the flaw is, but SOTA models routinely make bad architectural choices I have to intervene to fix.

threepts · 2026-05-05T15:31:45 1777995105

You can read the paper here: https://labs.scale.com/papers/swe_bench_pro

TL;DR its very effective as it directly tests model on REAL codebases: "The benchmark is constructed from GPL-style copyleft repositories and private proprietary codebases". The use case is very real.

SpicyLemonZest · 2026-05-05T15:52:31 1777996351

It doesn't sound to me like this benchmark is attempting to measure architecture design. As far as I see in the paper, they do not evaluate the architectural quality of a task completion, only whether the model is capable of completing it at all.

dawnerd · 2026-05-05T16:23:29 1777998209

1000 dollars of subsidized tokens.

threepts · 2026-05-01T21:32:09 1777671129

Do you believe skilled immigration is overall detriment or increment to the American economy?

threepts · 2026-05-01T20:46:56 1777668416

Yes and it is still impressive even with regard to that.

In the near future we will probably have a mini 50B parameter model prompting the bigger model and we would have these results consistently.

andai · 2026-05-02T19:45:28 1777751128

Could you elaborate? I hear some people say a big model should be driving a smaller model, I hear some people say a small model should be driving a bigger models.

When I have an expensive task that is clearly defined, I will get opus to write an LLM workflow for it, and then I will execute it with a smaller model. (Starting with the smallest one, and then upgrading if the task fails.)

But this is a single well defined task, designed by me and Opus in concert. If I need ongoing agentic work, Opus would be too expensive. I'm not sure if Haiku is big enough to be the driver yet. And Sonnet is probably too big! Haha.

(Grok looks promising, optics aside... Grok 4 Fast was almost there but not quite. Great for interactive / realtime (steered) work though.)

But I'm thinking you need a smallish model which can delegate both up and down. I'm not exactly sure what that looks like though. Cause the model needs to be big enough to know that it's struggling... Instead of pattern matching to something stupid and getting stuck in a loop trying to solve it the wrong way.

threepts · 2026-05-05T15:42:55 1777995775

All of the major model's memory are handled by smaller more specific models.

I do not know about the future, but I believe, like the human brain (the amylgada + cerebral cortex), AGI will have smaller but more specific submodels running in parallel to craft an compelling heuristic.

threepts · 2026-05-01T20:44:42 1777668282

This was GPT with 2 orders of magnitude of less compute.

Imagine what 5.5 is capable of.

threepts · 2026-05-01T20:39:13 1777667953

Like a lot of AI scams you will see nowadays, its quite easy to circumvent AI detection.

If they implement an "bot" badge which has false positives 80 percent of the times, it will DAMAGE the human artists more.

threepts · 2026-05-01T20:37:41 1777667861

The thing about Spotify is that is NOT driven by record labels, it is an platform for the individual meaning an individual can upload their music in an laissez-faire situation.

If they disallow AI artists tomorrow, they are going against what they created the company for.