Hacker Newsnew | past | comments | ask | show | jobs | submit | kang's commentslogin

this misunderstands whats thinking is ..

> Thinkism sets aside practice and experience

thinking succeeds experience & precedes practise, its not apart from it


try replacing the word with 'thinking'

orgdown is better than markdown which is better than markup, except the text ain't hyper. i've been working on this since feb & have reached xanadu

Unlike artificial carbon capture, natural carbon capture like algae here become insect/worm/bird feed or manure/coal.


“Hello, I’m from the algae company. I came to replace your tired algae with a fresh one.”

> The lower bound for contributing to mathematics will now be to prove something that LLMs can’t prove, rather than simply to prove something that nobody has proved up to now and that at least somebody finds interesting.

5.5pro is amazing but this implication might not be true & is the core argument of this piece.

AI will prove all sort of things - interesting, boring & incorrect.

To sort it will be the task of the PhD.


The task of a proof verifier is much simpler than the task of a proof finder (it’s basically equivalent to P vs. NP), and hence the bar for the required skills is lower. Merely verifying proofs isn’t research, and doesn’t impart research skills.


Verification on its own is not research, but judgement is research.

"Hey, Prove something a machine can't", sure I can't, "Hey, Say something worth proving & judge it well", ah, now I might have a few unique observation/ideas/curiosities/problems from my having being a human.

Imo, the feeling of intelligence or the process of originality(originativity) test for ai is subjective & is coming down to 4 paths: novel relative to a reference class, valuable within a domain, counterfactually sensitive to internal state and environment, and revisable through learning.


Verification is generally a much lower bar than solution generation. I don’t think it’s likely sorting out the right from wrong will end up being this huge PhD level effort.

Verification & solution generation are both part of problem generation & defining the passing test - judgement.

I saw this experiment decades ago on the internet and it was to a music concert, i always wanted to do a cursor moshpit


somebody did this a month ago https://www.youtube.com/watch?v=fdbXNWkpPMY

i am increasingly going schizo, where every single thing I post/see posted gets copied and karma farmed on social media. further, any novelty I share with an llm gets eaten/absorbed by the harness as a feature.


right? I've had that feeling dozens of times during summer 2025 with the earliest claude cli. It would fail, I'd fix the bug, next session when I ask it to solve the same problem, it would succeed!


Timestamp?


I am not sure if it aligns to the approach in OP's article but it's the last ~minute or two of the linked video.


this economic model works for all 'bounty' related work


They claim their models have PhDs but they still can't automate their own red teams. The bounty is not a bounty, it is for gathering training data so that they can claim for the next deployment they have the safest possible & most super duper aligned agentic computer using AI that will never ever make any bio weapons.

I am also willing to bet money that for their next marketing campaign they will claim they have automated the red team for bioweapons research prevention & whatnot.


The answer should be obvious that its both.

Zurada was one of our AI textbook that makes it visual that right from a simple classifier to a large language model, we are mathematically creating a shape(, that the signal interacts with). More parameters would mean shape can be curved in more ways and more data means the curve is getting hi-definition.

They reach something with data, treating neural network as blackbox, which could be derived mathematically using the information we know.


Well both aren’t “more important”, since that’s illogical. I think recent strides in high performance small LLMs have shown that the tasks LLMs are useful for may not require the level of representational capacity that trillion-parameter models offer.

However: the labs releasing these high-intelligence-density models are getting them by first training much larger models and then distilling down. So the most interesting question to me is, how can we accelerate learning in small networks to avoid the necessity of training huge teacher networks?


It seems you haven't done the due diligence on what part of the API is expensive - constructing a prompt shouldn't be same charge/cost as llm pass.


It seems you haven't done the due diligence on what the parent meant :)

It's not about "constructing a prompt" in the sense of building the prompt string. That of course wouldn't be costly.

It is about reusing llm inference state already in GPU memory (for the older part of the prompt that remains the same) instead of rerunning the prompt and rebuilding those attention tensors from scratch.


You not only skipped the diligence but confused everyone repeating what I said :(

that is what caching is doing. the llm inference state is being reused. (attention vectors is internal artefact in this level of abstraction, effectively at this level of abstraction its a the prompt).

The part of the prompt that has already been inferred no longer needs to be a part of the input, to be replaced by the inference subset. And none of this is tokens.


>It seems you haven't done the due diligence on what part of the API is expensive - constructing a prompt shouldn't be same charge/cost as llm pass.

I think you missed what the parent meant then, and the confusing way you replied seemed to imply that they're not doing inference caching (the opposite of what you wanted to mean).

The parent didn't said that caching is needed to merely avoid reconstructing the prompt as string. He just takes that for granted that it means inference caching, to avoid starting the session totally new. That's how I read "from prompting with the entire context every time" (not the mere string).

So when you answered as if they're wrong, and wrote "constructing a prompt shouldn't be same charge/cost as llm pass", you seemed to imply "constructing a prompt shouldn't be same charge/cost as llm pass [but due to bad implementation or overcharging it is]".


You are right, I was wrong in my understanding there. It stemmed from my own implementation; an inference often wrote extra data such as tool call, so I was using it to preserve relevant information alongwith desired output, to be able to throw away the prompt every time. I realize inference caching is one better way (with its pros and cons).


I said "prompting with the entire context every time," I think it should be clear even to laypersons that the "prompting" cost refers to what the model provider charges you when you send them a prompt.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: