More

slashdave · 2026-05-07T17:08:28 1778173708

Some of us actually want to get work done

slashdave · 2026-05-07T02:51:35 1778122295

You forgotten the important part: permissions

slashdave · 2026-05-04T01:20:52 1777857652

They are also surprising good at finding bugs that humans often miss

slashdave · 2026-05-04T01:18:59 1777857539

> like AGILE and SCRUM

Yeah, likely

> development patterns that are good practices.

Wait, now you lost me

notepad0x90 · 2026-05-04T04:36:18 1777869378

while those things sucked, patterns imprinted on devs minds, passed on from generation of devs to the other until that's just the way things are done. CI is a good example, as are some of the documentation practices. Lots of teams that don't use agile or scrum formally, follow those concepts because that's just how they've seen others do things in the industry. I predict, LLM work will get some overdone and abstract thing like agile/scrum that lots of people hate, and few thing is useful, but then out of it the actually useful bits and pieces become self-organizing standards of sorts.

slashdave · 2026-05-04T01:14:30 1777857270

> When a sysadmin moved to AWS, they didn't feel like they were losing their ability to understand networking.

Wait, is this the same AWS I have been using?

slashdave · 2026-05-04T00:56:29 1777856189

Eh, conceptually true, but in practice, it is rather hard to get any decent performance out of a GPU and still produce a deterministic answer.

And in any case, setting the temperature to zero will not produce a useful result, unless you don't mind your LLM constantly running into infinite loops.

slashdave · 2026-05-03T05:05:51 1777784751

This particular change feels... human driven.

slashdave · 2026-05-03T04:59:54 1777784394

I was surprised by the ranking, until I read what the test was. Not horribly relevant for coding.

The current ranking of all tests makes more sense (well, except for how well Gemini does)

https://aicc.rayonnant.ai

mpeg · 2026-05-03T07:06:17 1777791977

If you look at the ranking breakdown though, Kimi K2.6 has only participated in the last 5 challenges (claude dominated before then) and if you only count those it would be in first place

Sammi · 2026-05-03T12:44:44 1777812284

It also has a DNF. So it has a high ceiling but also unfortunately a low floor. So using Kimi means accepting high variability of the output.

Personally what I've found that has made coding agents more and more useful over the last year is that they have gotten a higher and higher floor, not that they have gotten a higher and higher ceiling. They were already plenty smart a year ago, it was just that they failed so often and so spectacularly that it made them a liability. Now they have become much more reliable, which is the key thing that has transitioned them into being actually useful. For the most part I don't use them to work on really intellectually difficult tasks. I mostly use them to work on very boring and labor intensive tasks. Most commercial software development work is just boring drudgery like this. Certainly the bulk of what I need them for is. I need them to just not crap their pants all the time while they're at it.

So I'm kinda wary seeing the poor reliability of Kimi.

mpeg · 2026-05-03T14:48:04 1777819684

If you look at the last 5 challenges (the ones Kimi was in) both Claude and Kimi have 1 DNF, chatgpt has 2

I'm not sure this is enough data to form an opinion, but going by what we have Kimi would be as reliable as Claude

SeriousM · 2026-05-03T07:36:55 1777793815

The ranking of gold medals only makes sense if all models would gave participate all tests.

DNP = Did not participate

In this regard, kimi got more and better medals than Claude.

dvfjsdhgfv · 2026-05-03T07:57:25 1777795045

Well, the link you provided basically confirms Kimi's dominance.

r0fl · 2026-05-03T13:25:36 1777814736

All those models and the site is not responsive on mobile. Ironic.

slashdave · 2026-04-26T16:21:32 1777220492

One of the most important ingredients of a proposal for a drug development program is viability: is there a path to market. The issue with Alzheimer’s is that until recently there was no practical diagnostic test. Without a test, there is no way to run a clinical trial. No clinical trial means no drug approval. Research is halted before it even begins.

tim333 · 2026-04-26T17:46:09 1777225569

The effects of Alzheimer’s are pretty obvious. I don't think not being able to detect it is much of an issue.

slashdave · 2026-04-26T05:46:59 1777182419

Proving a negative is a pretty high bar. You also have the problem of defining "real intelligence", which I suspect you can't.

famouswaffles · 2026-04-26T06:05:57 1777183557

Intelligence is Intelligence. It's intelligent because it does intelligent things. If someone feels the need to add a 'real' and 'fake' moniker to it so they can exclude the machine and make themselves feel better (or for whatever reason) then they are the one meant to be doing the defining, and to tell us how it can be tested for. If they can't, then there's no reason to pay attention to any of it. It's the equivalent of nonsensical rambling. At the end of the day, the semantic quibbling won't change anything.

latexr · 2026-04-26T08:51:15 1777193475

> It's intelligent because it does intelligent things.

Most people would consider someone who can calculate 56863*2446 instantly in their head to be intelligent. Does that mean pocket calculators are intelligent? The result is the same.

> then they are the one meant to be doing the defining, and to tell us how it can be tested for. If they can't, then there's no reason to pay attention to any of it.

That is the equivalent of responding to criticism with “can you do better?”. One does not need to be a chef (or even know how to cook) to know when food tastes foul. Similarly, one does not need to have a tight definition of “life” to say a dog is alive but a rock isn’t. Definitions evolve all the time when new information arises, and some (like “art”) we haven’t been able to pin down despite centuries of thinking about it.

famouswaffles · 2026-04-26T13:16:33 1777209393

>Most people would consider someone who can calculate 56863*2446 instantly in their head to be intelligent. Does that mean pocket calculators are intelligent? The result is the same.

If you wanted to insist a calculator wasn't intelligent and satisfy my conditions then you can. At the very least you can test for the sort of intelligence that is present in humans but absent from calculators and cleanly separate the two. These are very easy conditions if there is some actual real difference.

>That is the equivalent of responding to criticism with “can you do better?”. One does not need to be a chef (or even know how to cook) to know when food tastes foul.

No it's not, and this is a silly argument. Foul food tastes different. Sometimes it even looks different. You can test for it and satisfy my conditions.

You come across a shiny piece of yellow metal that you think is gold. It looks like gold, feels like gold and tests like gold. Suddenly a strange fellow comes about insisting that it's not actually gold. No, apparently there is a 'fake' gold. You are intrigued so you ask him, "Alright, what exactly is fake gold, and how can I test or tell them apart ?". But this fellow is completely unable to answer either question. What would you say about him ? He's nothing more than a mad man rambling about a distinction he made up in his head.

What I'm asking you to do is incredibly easy and basic with a real distinction. I'm not going to tell you to stop believing in your fake gold, but I am going to tell you I and no one else can be expected to take you seriously.

latexr · 2026-04-26T14:37:03 1777214223

> At the very least you can test for the sort of intelligence that is present in humans but absent from calculators and cleanly separate the two.

But you can only do that now, in hindsight. Before calculators, one could argue being able to do math was a sign of intelligence, but once something new comes along which can do math in a non-intelligent way, you can realise “ah, right, my definition was incomplete/incorrect, I need something better”.

> Foul food tastes different.

You’re right, that was a bad example.

> You come across a shiny piece of yellow metal that you think is gold. (…) He's nothing more than a mad man rambling about a distinction he made up in his head.

No, that is not right. Fool’s gold is a thing.

https://en.wikipedia.org/wiki/Pyrite

It’s not the same as gold and you can test for it, but that doesn’t mean you know how to do it. Yet it’s perfectly possible that by being exposed to the real and fake thing you’ll get a feel for each one as there are subtle visual clues. It doesn’t mean you can articulate exactly what those are, yet you’re able to do it.

It’s like tasting two similar beers or sodas. You may be able to identify them by taste and understand they’re difference but be unable to articulate exactly how you know which is which to the point someone else can use your verbal instructions to know the difference. That doesn’t mean the difference isn’t there or that you can’t tell, it just means you haven’t yet found yourself the proper way to extract and impart what you instinctively understood.

famouswaffles · 2026-04-26T15:02:41 1777215761

>But you can only do that now, in hindsight.

No you could always do that. The meaning you take from it is up to you but you could always separate humans and calculators.

>No, that is not right. Fool’s gold is a thing.

I know what fools gold is. I used it for contrast. Fools gold can be tested for.

>but that doesn’t mean you know how to do it.

It doesn't matter. If you claim it exists but you don't know how to do it and you can't point to anyone who can, it's the same as something you made up.

>It’s like tasting two similar beers or sodas. You may be able to identify them by taste and understand they’re difference but be unable to articulate exactly how you know which is which to the point someone else can use your verbal instructions to know the difference.

You are still making the same mistake. Two similar beers or sodas taste different. No one is asking you to come up with a theory for intelligence. All you have to say here is the equivalent of "It tastes different" and let me taste it for myself. But even that much, you can not do. So why on earth should I treat what you say as worth anything ?

slashdave · 2026-04-26T15:54:55 1777218895

> Intelligence is Intelligence.

> It's the equivalent of nonsensical rambling

I see

famouswaffles · 2026-04-26T16:06:47 1777219607

Glad I could clear that up for you