while those things sucked, patterns imprinted on devs minds, passed on from generation of devs to the other until that's just the way things are done. CI is a good example, as are some of the documentation practices. Lots of teams that don't use agile or scrum formally, follow those concepts because that's just how they've seen others do things in the industry. I predict, LLM work will get some overdone and abstract thing like agile/scrum that lots of people hate, and few thing is useful, but then out of it the actually useful bits and pieces become self-organizing standards of sorts.
Eh, conceptually true, but in practice, it is rather hard to get any decent performance out of a GPU and still produce a deterministic answer.
And in any case, setting the temperature to zero will not produce a useful result, unless you don't mind your LLM constantly running into infinite loops.
If you look at the ranking breakdown though, Kimi K2.6 has only participated in the last 5 challenges (claude dominated before then) and if you only count those it would be in first place
It also has a DNF. So it has a high ceiling but also unfortunately a low floor. So using Kimi means accepting high variability of the output.
Personally what I've found that has made coding agents more and more useful over the last year is that they have gotten a higher and higher floor, not that they have gotten a higher and higher ceiling. They were already plenty smart a year ago, it was just that they failed so often and so spectacularly that it made them a liability. Now they have become much more reliable, which is the key thing that has transitioned them into being actually useful. For the most part I don't use them to work on really intellectually difficult tasks. I mostly use them to work on very boring and labor intensive tasks. Most commercial software development work is just boring drudgery like this. Certainly the bulk of what I need them for is. I need them to just not crap their pants all the time while they're at it.
So I'm kinda wary seeing the poor reliability of Kimi.
One of the most important ingredients of a proposal for a drug development program is viability: is there a path to market. The issue with Alzheimer’s is that until recently there was no practical diagnostic test. Without a test, there is no way to run a clinical trial. No clinical trial means no drug approval. Research is halted before it even begins.
Intelligence is Intelligence. It's intelligent because it does intelligent things. If someone feels the need to add a 'real' and 'fake' moniker to it so they can exclude the machine and make themselves feel better (or for whatever reason) then they are the one meant to be doing the defining, and to tell us how it can be tested for. If they can't, then there's no reason to pay attention to any of it. It's the equivalent of nonsensical rambling. At the end of the day, the semantic quibbling won't change anything.
> It's intelligent because it does intelligent things.
Most people would consider someone who can calculate 56863*2446 instantly in their head to be intelligent. Does that mean pocket calculators are intelligent? The result is the same.
> then they are the one meant to be doing the defining, and to tell us how it can be tested for. If they can't, then there's no reason to pay attention to any of it.
That is the equivalent of responding to criticism with “can you do better?”. One does not need to be a chef (or even know how to cook) to know when food tastes foul. Similarly, one does not need to have a tight definition of “life” to say a dog is alive but a rock isn’t. Definitions evolve all the time when new information arises, and some (like “art”) we haven’t been able to pin down despite centuries of thinking about it.
>Most people would consider someone who can calculate 56863*2446 instantly in their head to be intelligent. Does that mean pocket calculators are intelligent? The result is the same.
If you wanted to insist a calculator wasn't intelligent and satisfy my conditions then you can. At the very least you can test for the sort of intelligence that is present in humans but absent from calculators and cleanly separate the two. These are very easy conditions if there is some actual real difference.
>That is the equivalent of responding to criticism with “can you do better?”. One does not need to be a chef (or even know how to cook) to know when food tastes foul.
No it's not, and this is a silly argument. Foul food tastes different. Sometimes it even looks different. You can test for it and satisfy my conditions.
You come across a shiny piece of yellow metal that you think is gold. It looks like gold, feels like gold and tests like gold. Suddenly a strange fellow comes about insisting that it's not actually gold. No, apparently there is a 'fake' gold. You are intrigued so you ask him, "Alright, what exactly is fake gold, and how can I test or tell them apart ?". But this fellow is completely unable to answer either question. What would you say about him ? He's nothing more than a mad man rambling about a distinction he made up in his head.
What I'm asking you to do is incredibly easy and basic with a real distinction. I'm not going to tell you to stop believing in your fake gold, but I am going to tell you I and no one else can be expected to take you seriously.
> At the very least you can test for the sort of intelligence that is present in humans but absent from calculators and cleanly separate the two.
But you can only do that now, in hindsight. Before calculators, one could argue being able to do math was a sign of intelligence, but once something new comes along which can do math in a non-intelligent way, you can realise “ah, right, my definition was incomplete/incorrect, I need something better”.
> Foul food tastes different.
You’re right, that was a bad example.
> You come across a shiny piece of yellow metal that you think is gold. (…) He's nothing more than a mad man rambling about a distinction he made up in his head.
It’s not the same as gold and you can test for it, but that doesn’t mean you know how to do it. Yet it’s perfectly possible that by being exposed to the real and fake thing you’ll get a feel for each one as there are subtle visual clues. It doesn’t mean you can articulate exactly what those are, yet you’re able to do it.
It’s like tasting two similar beers or sodas. You may be able to identify them by taste and understand they’re difference but be unable to articulate exactly how you know which is which to the point someone else can use your verbal instructions to know the difference. That doesn’t mean the difference isn’t there or that you can’t tell, it just means you haven’t yet found yourself the proper way to extract and impart what you instinctively understood.
No you could always do that. The meaning you take from it is up to you but you could always separate humans and calculators.
>No, that is not right. Fool’s gold is a thing.
I know what fools gold is. I used it for contrast. Fools gold can be tested for.
>but that doesn’t mean you know how to do it.
It doesn't matter. If you claim it exists but you don't know how to do it and you can't point to anyone who can, it's the same as something you made up.
>It’s like tasting two similar beers or sodas. You may be able to identify them by taste and understand they’re difference but be unable to articulate exactly how you know which is which to the point someone else can use your verbal instructions to know the difference.
You are still making the same mistake. Two similar beers or sodas taste different. No one is asking you to come up with a theory for intelligence. All you have to say here is the equivalent of "It tastes different" and let me taste it for myself. But even that much, you can not do. So why on earth should I treat what you say as worth anything ?
reply