Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I responded to your point empirically, with problems not conventionally understood to be solvable with "text generation", and your response was in effect that I must be wrong because I'm afraid you might be right. Not an especially strong debate move.

Can you refute the argument I made, or do you just want to claim LLMs are drinking all our water?

 help



Well, I don't believe the LLM solved those problems. I believe the user did. The LLM aggregated large amounts of information statistically, then the user read that and realized there was something to it and fixed it. Those accounts don't mention the 1000 other prompts that technical user did that yielded garbage results and the user was intelligent enough to disregard those.

No, that's false, in every example I gave. But I appreciate you making clearer that I correctly ascertained your original claim, that you believe they literally are just random text generators, and that people are simply cherry picking the rare meaningful text out of them.

That's what I thought you meant by "statistical text generator", and is why I was moved to comment.


1) I never said random 2) I never said cherry picking RARE meaningful text 3) It is not false in every example you gave just because you say that it is 4) If I didn't know better, I might think you're confused about what statistical means (hint: it's not random)

No, it's false in each example because I'm either a first or secondhand party to it happening (except for the Erdos thing) and I know it's false.

You managed to include in your blanket and conclusory rebuttal "solving undergrad math problems instantaneously". That was one of my examples because (1) it pertains to the subthread, (2) I was talking about it upthread, and (3) I have direct firsthand knowledge.

As I said elsewhere: I've fed thousands of math problems through ChatGPT (starting with 4o and now with 5.5). They've all been randomized. They do not appear in textbooks. They cover all the ground from late high school trig to university calc III. I do this habitually, every time I work an "interesting" problem, to get critiques on my own work. GPT has been flawless, routinely spotting errors or missed opportunities. If I have any complaint, it's that GPT tends to be too much better than I am at any given point, using concepts from later courses to solve simpler problems.

Square that with the claim you're making.

I can do the same thing with vulnerability research (I've been a vuln researcher since 1996 and I use LLMs to find vulnerabilities). But this thread is about math, and it's even easier to show you're wrong in the context of math.


That's convenient. But I have a challenge for you if you're brave enough to face your delusions. Paste this into your LLM of choice and see what happens:

"A farmer has 17 sheep. 9 ran away. He then bought enough to double what he had. His neighbor, who had 4 dogs and 14 sheep, gave him one-third of her animals. The farmer sold 5 sheep on Monday and again the next day, which was Wednesday. Each sheep weighs about 150 lbs. How many sheep does the farmer have?"


17 sheep - 9 ran away = 8 sheep

He bought enough to double what he had: 8 more sheep, so 16 sheep

Neighbor has 4 dogs + 14 sheep = 18 animals

One-third of her animals = 6 animals

But the problem does not say all 6 were sheep. It says “animals.” So the exact sheep count depends on which animals she gave him.

Then:

16 + s sheep from neighbor - 5 - 5 = 6+s

where s is the number of sheep among the 6 animals she gave him.

So the answer is not uniquely determined.

Possible sheep count: 6 to 12 sheep, depending on whether the neighbor gave him 0 to 6 sheep.

(I clipped the GPT5 answer here, but will note additionally that even the LLM built into the Google search results page handles this question; both note the possible trick question with the days of the week.)


And that's the wrong answer. It's a word problem, not a math problem. Also, if it really was a math problem, it wouldn't be 0-6 sheep from the neighbor, it would be 2-6. So it even failed on the math.

Are you trying to win this debate with a Facebook "ONLY THE SMARTEST 1% CAN SOLVE" question? The whole point of the question is for some loser to be able to say "no you missed XYZ" ambiguity any time a sane answer is given.

By your logic, the only "correct" answer for an LLM to give to this is "the person who asked you this is fucking with you, this is not a real question". I concede: this is a limitation of modern LLMs: they will try to answer stupid questions.


No, it's a real question. And if it were a math question. The neighbor has 18 animals, only 4 of which are dogs. The farmer receives 1/3 of those which is 6. So for the farmer to receive 0 sheep would require the farmer to receive 6 dogs. But there are only 4 dogs. LOGICALLY, the farmer must receive at least 2 sheep from the neighbor. There's no ambiguity. That's logic. That's intelligence. It's real actual math. Basic arithmetic. A person can easily sit down and work this out. It illustrates that the AI is generating responses statistically and not actually thinking. There are two full layers of failure here: the word problem, and the math problem underneath it.

I'm really not interested in this Calvinball argument where we try to conclude whether or not LLMs can do math by avoiding as much as possible actually doing math.

Obviously, they can do math.


A concise problem that requires actual logic will naturally seem a bit convoluted, but an intelligent being can sit down and work it out logically. Anyway, it's not an argument. It's empirical evidence that supports my argument. You have chosen to ignore it or otherwise rationalize it away. Nothing I can do about that.

I'm comfortable with what the thread says about our respective arguments at this point. Thanks!



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: