I responded to your point empirically, with problems not conventionally understo...

0xBA5ED · 2026-04-26T16:52:49 1777222369

Well, I don't believe the LLM solved those problems. I believe the user did. The LLM aggregated large amounts of information statistically, then the user read that and realized there was something to it and fixed it. Those accounts don't mention the 1000 other prompts that technical user did that yielded garbage results and the user was intelligent enough to disregard those.

tptacek · 2026-04-26T17:23:02 1777224182

No, that's false, in every example I gave. But I appreciate you making clearer that I correctly ascertained your original claim, that you believe they literally are just random text generators, and that people are simply cherry picking the rare meaningful text out of them.

That's what I thought you meant by "statistical text generator", and is why I was moved to comment.

0xBA5ED · 2026-04-26T17:30:26 1777224626

1) I never said random 2) I never said cherry picking RARE meaningful text 3) It is not false in every example you gave just because you say that it is 4) If I didn't know better, I might think you're confused about what statistical means (hint: it's not random)

tptacek · 2026-04-26T17:42:52 1777225372

No, it's false in each example because I'm either a first or secondhand party to it happening (except for the Erdos thing) and I know it's false.

You managed to include in your blanket and conclusory rebuttal "solving undergrad math problems instantaneously". That was one of my examples because (1) it pertains to the subthread, (2) I was talking about it upthread, and (3) I have direct firsthand knowledge.

As I said elsewhere: I've fed thousands of math problems through ChatGPT (starting with 4o and now with 5.5). They've all been randomized. They do not appear in textbooks. They cover all the ground from late high school trig to university calc III. I do this habitually, every time I work an "interesting" problem, to get critiques on my own work. GPT has been flawless, routinely spotting errors or missed opportunities. If I have any complaint, it's that GPT tends to be too much better than I am at any given point, using concepts from later courses to solve simpler problems.

Square that with the claim you're making.

I can do the same thing with vulnerability research (I've been a vuln researcher since 1996 and I use LLMs to find vulnerabilities). But this thread is about math, and it's even easier to show you're wrong in the context of math.

0xBA5ED · 2026-04-26T18:19:42 1777227582

That's convenient. But I have a challenge for you if you're brave enough to face your delusions. Paste this into your LLM of choice and see what happens:

"A farmer has 17 sheep. 9 ran away. He then bought enough to double what he had. His neighbor, who had 4 dogs and 14 sheep, gave him one-third of her animals. The farmer sold 5 sheep on Monday and again the next day, which was Wednesday. Each sheep weighs about 150 lbs. How many sheep does the farmer have?"

tptacek · 2026-04-26T18:49:09 1777229349

17 sheep - 9 ran away = 8 sheep

He bought enough to double what he had: 8 more sheep, so 16 sheep

Neighbor has 4 dogs + 14 sheep = 18 animals

One-third of her animals = 6 animals

But the problem does not say all 6 were sheep. It says “animals.” So the exact sheep count depends on which animals she gave him.

Then:

16 + s sheep from neighbor - 5 - 5 = 6+s

where s is the number of sheep among the 6 animals she gave him.

So the answer is not uniquely determined.

Possible sheep count: 6 to 12 sheep, depending on whether the neighbor gave him 0 to 6 sheep.

(I clipped the GPT5 answer here, but will note additionally that even the LLM built into the Google search results page handles this question; both note the possible trick question with the days of the week.)

0xBA5ED · 2026-04-26T19:29:58 1777231798

And that's the wrong answer. It's a word problem, not a math problem. Also, if it really was a math problem, it wouldn't be 0-6 sheep from the neighbor, it would be 2-6. So it even failed on the math.

tptacek · 2026-04-26T19:50:15 1777233015

Are you trying to win this debate with a Facebook "ONLY THE SMARTEST 1% CAN SOLVE" question? The whole point of the question is for some loser to be able to say "no you missed XYZ" ambiguity any time a sane answer is given.

By your logic, the only "correct" answer for an LLM to give to this is "the person who asked you this is fucking with you, this is not a real question". I concede: this is a limitation of modern LLMs: they will try to answer stupid questions.

0xBA5ED · 2026-04-26T20:29:06 1777235346

No, it's a real question. And if it were a math question. The neighbor has 18 animals, only 4 of which are dogs. The farmer receives 1/3 of those which is 6. So for the farmer to receive 0 sheep would require the farmer to receive 6 dogs. But there are only 4 dogs. LOGICALLY, the farmer must receive at least 2 sheep from the neighbor. There's no ambiguity. That's logic. That's intelligence. It's real actual math. Basic arithmetic. A person can easily sit down and work this out. It illustrates that the AI is generating responses statistically and not actually thinking. There are two full layers of failure here: the word problem, and the math problem underneath it.

tptacek · 2026-04-26T20:32:01 1777235521

I'm really not interested in this Calvinball argument where we try to conclude whether or not LLMs can do math by avoiding as much as possible actually doing math.

Obviously, they can do math.

0xBA5ED · 2026-04-26T21:30:39 1777239039

A concise problem that requires actual logic will naturally seem a bit convoluted, but an intelligent being can sit down and work it out logically. Anyway, it's not an argument. It's empirical evidence that supports my argument. You have chosen to ignore it or otherwise rationalize it away. Nothing I can do about that.

tptacek · 2026-04-26T21:39:14 1777239554

I'm comfortable with what the thread says about our respective arguments at this point. Thanks!