> No. It did not. Of course it didn't. Why would they do that? I feel like this ...

paxys · on Sept 13, 2024

The point still is that considering they 1) named the model strawberry, 2) released a promo video showing the model solving this exact problem successfully and 3) put a strawberry joke as an example prompt on the landing page, you'd have expected them to actually have fixed it. Otherwise why even bring it up so many times?

viraptor · on Sept 13, 2024

People who understand where the problem comes from know to ignore it and understand the irony of the naming. Others have a few seconds of fun. The problem and solving it is almost irrelevant.

paxys · on Sept 13, 2024

An AI that people consider to be AGI, or at least on a path to AGI, not being able to count the number of letters in a word is "almost irrelevant"?

viraptor · on Sept 13, 2024

Yes. Nobody is seriously going to ask the system to count letters in a word they just typed. It could be interesting for crosswords and similar things, but adding "use python" solves those already.

Bringing up the strawberry is like saying: "people consider computers to be great at dealing with numbers, but they can't even add 0.1 and 0.2 correctly". You learn about this limitation once, understand how to deal with it, get on with your life.

paxys · on Sept 13, 2024

"Nobody is seriously going to..."

People will throw any problem at it they want to. That's the entire point of general intelligence.

lupire · on Sept 13, 2024

Answering simple computational questions correctly is not almost irrelevant.

viraptor · on Sept 14, 2024

If the question is simpler than any reasonable usecase, then it's basically irrelevant.

It's as irrelevant as asking a math genius what's 1+1 and getting answer "a bazillion". Was it wrong - yes. Was everyone's time wasted - also yes.

Openai people know this is something models don't answer right. And they barely care to attempt fixing it - it's basically a joke at this point. But it's irrelevant because nobody pays OpenAI to ask about spelling of words they just typed. If they actually had that need, then "use python" or similar approaches work just fine. The model could be also taught to call a function to get the token-to-spelling mapping it needed.

bdjsiqoocwk · on Sept 14, 2024

Right, I also thought that was a strange take.

zamadatix · on Sept 13, 2024

The author's battle is that it doesn't solve the class problem correctly 100% of the time like a one off hack does, not that it does nothing for it at all. https://i.imgur.com/Ar3rlJ1.png

paxys · on Sept 13, 2024

This was the case with 4o as well. It sometimes solved it, sometimes didn't.

zamadatix · on Sept 13, 2024

One could say that down to a markov chain with noise, the perspective is the new model named after the problem solves it significantly more reliably than the previous without a problem specific hack.

It's also worth noting the current model is the lower scoring o1-preview, not o1.

GaggiX · on Sept 13, 2024

Well they did in the testing I have seen.

SecretDreams · on Sept 13, 2024

Sometimes I wonder if our brain is some heuristic tool or just an infinite amount of cobbled together fast paths.

Like, there are so many adults that when faced with a new task, they really struggle to pick it up. Is this because tangentially related fast paths weren't learned in their "training phase"?

zamadatix · on Sept 13, 2024

I think a particularly useful "feature" of the brain is it can often identify when it doesn't have a fast path (or the fast path is broken) for something and then revert to trying inefficient and generic approaches, even after the optimal learning period has passed.

This is a start difference to LLMs where it's either learned or not, "just add noise". Models like o1 take things in a very small step in that kind of direction.

SecretDreams · on Sept 13, 2024

Ya, basically LLMs just always "go for it" even if they've got no chance of getting it right. Just needs a feature where it can identify when it's interpolating vs extrapolating.. since it's not so great at the latter :).

lupire · on Sept 13, 2024

A century of neuroscience has said "yes" every time and every way that has been checked.

lupire · on Sept 13, 2024

The OP went a little too far with the specific prompt, but "generally intelligently decide when to use an exact computation (programming) module" was part of the LLM plugin model that was solved last year. Why the regression?