More

tgv · 2026-05-05T12:03:55 1777982635

There have been a bunch. Did any auditor lose a license, credibility, or even a night's sleep? Even accountants aren't held to their standards, and they are supposed to guard the holiest of holiest: shareholder money.

tgv · 2026-05-05T11:23:21 1777980201

Ok, then. What shitshow? Does it not pale in comparison to Chrome and Edge?

tgv · 2026-05-04T20:04:35 1777925075

I think the commenter means that some Linux applications store the passwords they need for access to external resources in plain text.

tgv · 2026-05-04T09:14:03 1777886043

It does seem to be a word game, because "it's not stopping any bullets" either isn't honest (it does stop bullets from hitting you when the enemy doesn't know where to shoot) or it's limited, just like obscurity is ("it may stop a few bullets, but it won't stop all, and there will be other weapons it can't stop either"). I think public key exchange is considered security, but it still requires to obscure your private keys.

Perhaps a better word would be resistance (to intrusion), which is a dimension orthogonal to visibility.

tgv · 2026-05-03T16:43:13 1777826593

There is no need for age verification: ban all mobile devices and access to anything vaguely "social" (and possibly porn) for everyone under 18. Place heavy fines. But it requires more guts than governments have. They get lobbied, coerced by corruptible friends, and threatened, and we end up with the worst of all outcomes: addicted children without attention and lack of privacy.

tgv · 2026-05-03T12:49:03 1777812543

In a sense, but it is a bit more devious. It basically invalidates all past fMRI studies. Not that anyone should have taken those seriously, but it looks like another nail in the coffin. fMRI analysis is (was?) basically: squeeze each brain scan into a standard box, then average the BOLD responses (that's roughly oxygen usage between 3s and 9s after activity). This abstract says that --at least in some cases-- those averages are wrong. Not just hiding information through aggregation, but flat-out lying.

Just from reading the link, I do see an objection: they studied repetitions, which are known to be different from the initial response, so this may not be the fMRI's eulogy.

quaunaut · 2026-05-03T14:44:17 1777819457

What cases was it saying it was lying? An average and a median can be drastically different without the average being false, right?

tgv · 2026-05-03T17:05:19 1777827919

The averages a "standard" fMRI analysis produces highlights brain areas which may not even have been involved in the majority of subjects, because the pattern is so wide spread. That is in contrast with your usual average or median over e.g. height. It's a bit like averaging squares on a chess board and concluding that the opening is played in the middle two columns.

moi2388 · 2026-05-03T19:02:07 1777834927

I mean.. it kind of is, and even more about control of these two columns..

tgv · 2026-05-03T08:34:09 1777797249

This may be objectively scored, but it is not an indication of anyone's coding capabilities. This test measures which model almost accidentally came up with the best strategy (against other bots). This is not representative of coding. You would need to test 100 or more of such puzzles, widely spread across the puzzle spectrum, to get an idea which model is best at finding strategies involving an English dictionary.

robbomacrae · 2026-05-03T09:25:21 1777800321

I don't think that is entirely fair.. I don't see them stating anywhere they are measuring coding capabilities... "Using complex games to probe real intelligence."

And this seems very much in line with the methodology in ARC-AGI-3.

The results here, in the OP article and in https://www.designarena.ai all tell a similar story: Kimi K2.6 is up and in the SOTA mix.

tgv · 2026-05-03T10:33:44 1777804424

The task was writing a "bot" to play the game. The title is "Kimi K2.6 just beat Claude, GPT-5.5, and Gemini in a coding challenge." How does that not imply measuring coding capabilities?

Galanwe · 2026-05-03T09:13:22 1777799602

> You would need to test 100 or more of such puzzles, widely spread across the puzzle spectrum

Would you? I am not very knowledgable on LLMs, but my understanding was that each query was essentially a stateless inference with previous input/output as context. In such a case, a single puzzle, yielding hundreds of queries, is essentially hundreds of paths dependent but individual tests?

tgv · 2026-05-03T10:31:16 1777804276

From what I understood, it's a coding challenge: the models wrote a player for that specific word game. E.g. https://github.com/rayonnant-ai/aicc/blob/main/wordgempuzzle...

biscuit1v9 · 2026-05-03T09:19:01 1777799941

Generally speaking, would you take a conclusion based only an event that happened once?

gertlabs · 2026-05-03T15:41:53 1777822913

If you are referring to the parent post, yes, hard to draw conclusions from such a small sample size.

For our testing, we use hundreds of different environments across disciplines, and it seems to line up with subjective experience better than other benchmarks. We test coding, agentic coding, and non-coding reasoning in the environments.

tgv · 2026-05-01T12:15:43 1777637743

It's literally the plot of https://en.wikipedia.org/wiki/Flushed_Away

lproven · 2026-05-03T11:19:05 1777807145

Watched that just a month or 2 ago with my 6YO. It's great, a very underrated film.

But it's not set during the Superb Owl flush isn't because the film is set in London, and most Londoners do not watch American football.

tgv · 2026-04-30T08:22:18 1777537338

> dismissing the Manhattan Project as hopelessly stalled in 1944

Then again, there are enough examples of failed projects. Why should this be comparable to the Manhattan project? In 1944, it was only two years underway, whereas Shor's algorithm is over 30. Tons of articles have been published on quantum computing, while the A bomb was kept as secret as possible, making learning from other countries, sometimes even from colleagues, impossible. In 1942, an atomic explosion was still hypothetical, whereas quantum computing had its first commercial service 7 years ago. Etc.

So, while in principle lack of progress doesn't guarantee failure, a comparison to the Manhattan Project is stylistic bullshit.

throw0101c · 2026-04-30T13:10:15 1777554615

> Then again, there are enough examples of failed projects. Why should this be comparable to the Manhattan project? In 1944, it was only two years underway, whereas Shor's algorithm is over 30.

1944 is a bit arbitrary. Szilard for one was thinking about it earlier:

> […] He conceived the nuclear chain reaction in 1933, and patented the idea in 1936. In late 1939 he wrote the letter for Albert Einstein's signature that resulted in the Manhattan Project that built the atomic bomb….

* https://en.wikipedia.org/wiki/Leo_Szilard

Partly inspired in 1932 by reading Wells' book, published in 1914:

* https://en.wikipedia.org/wiki/The_World_Set_Free

How long was humanity thinking about flying before the Wright brothers and 1903? We had Babbage's analytical engine (and Lovelace) in 1837, with Zuse's Z2 and the British bombes both in 1940; Zuse's Z3 in 1941.

sanxiyn · 2026-04-30T09:02:42 1777539762

The main point is that just as you can't ask for tiny nuclear explosion because nuclear physics just doesn't work that way, you also can't ask for factoring of 21 with Shor's algorithm. Quantum computing just doesn't work that way, sorry.

Revanche1367 · 2026-04-30T10:12:23 1777543943

The analogy between nuclear fission and quantum computing doesn’t really work. Fission was a relatively new physical phenomenon the Manhattan Project scientists were studying to turn it into a weapon of mass destruction on a scale that too had no precedent except in natural disasters. Quantum computing is a new technology that is supposed to make already effectively computable problems computable faster; it is ideally supposed to provide an increase in capacity, not capability. It should definitely be able to make tiny computations work before going for the bigger problems. That’s how all computing works, if it can’t solve simple problems, it’s never going to solve bigger ones. What you’re saying here essentially sounds like “there will be a magical event one day when quantum computing solves the biggest computing problems and we’ll all realize it works.”

I am not particularly invested either which way about the likelihood of quantum computing being a major breakthrough or not but this is seeming like yet one more area of computing research like crypto and LLMs which in recent years is increasingly being flooded by people on a hype train.

mrguyorama · 2026-04-30T21:28:16 1777584496

>The main point is that just as you can't ask for tiny nuclear explosion because nuclear physics just doesn't work that way

You absolutely can, which is why Fermi did just that as part of the Manhattan project with the Chicago Pile 1, demonstrating the first self sustaining nuclear chain reaction.

In 1942.

Your analogy is broken.

tgv · 2026-05-01T06:26:13 1777616773

That was not a tiny explosion, me thinks.

sehansen · 2026-04-30T10:50:11 1777546211

Given that 15 has already been factored using Shor's algorithm on a real quantum computer, I think we can.

sanxiyn · 2026-04-30T11:00:27 1777546827

No you really can't. Being able to factor 15 but not 21 with Shor's algorithm is normal. I know it sounds absurd, but it really is that way. Because factoring 21 is about 100x times harder than factoring 15.

See https://algassert.com/post/2500 for details.

sehansen · 2026-04-30T11:43:01 1777549381

My point was that the comparison with nuclear explosions is wonky, since we (in the world of that analogy) already have seen a tiny nuclear explosion 15 years ago. And we kept being told that explosions 100 times larger are just around the corner, but explosions 25% larger are way too hard to expect.

I get that there's a lot of R&D going on to make larger quantum computers a thing and that there's been very definite progress, but factoring 21 is just too hard to expect for now. But that also pushes the date where pre-quantum cryptography is broken further into the future. If we still struggle to factor one of the smaller 5 bit numbers, factoring the 128 bit numbers necessary to break elliptic curve cryptography seems quite far away.

tgv · 2026-04-29T12:32:03 1777465923

Some cleaning up is probably required. I opened the "regelrecht" repo, and it contains a bunch of links and references to github.