Hacker Newsnew | past | comments | ask | show | jobs | submit | zby's commentslogin

"In the 890s, having recently converted to Orthodox Christianity, Boris ensured his church would be independent from the Patriarchate of Constantinople." --- I thought Orthodox Christianity was created by the Great Schism in 1054.

I don't know if you are trolling or what, but you win at the internets today


I concur - it does not make sense to do in llm prompts what can be done in code. Code is cheaper, faster, deterministic and we have lots of experience with working with code.

Especially all bookkeeping logic should move into the symbolic layer: https://zby.github.io/commonplace/notes/scheduler-llm-separa...


""" we need to build:

    Formal specification layers that agents execute against, not just prompts
"""

It is probably easier to just write that program.


yep. "Formal specification layers " aka code.

Talked with someone this morning who is using "formal methods" to validate their AI generated code.

They are using the same AI to generate the proofs.


And how they are doing? I think this might be a solid research program - but that blog presented it as some kind of practical approach.

That's why you should just subscribe to multiple LLM vendors. One model to write specs, one to write code against the specs and another to validate the code. Problem solved. (I have heard this proposed at work.)

Right, because to trust that those "formal specifications" are correct, you will have to write them by hand.

First you need to write these specifications and if you say just tell the llm to write them - then how would it be different from just tell the llm to write the program?

I guess you can argue that these are two independent processes so you can combine them to get something more reliable than both - this might be a viable path. But from what I heard writing formal specifications is just really hard - I haven't seen anything practical in this area.


It is not novel - but with the new models it is just becoming practical.

If you have a test that fails 50% times - is that test valuable or not? A 50% failure rate alone looks like a coin toss, but by itself that does not tell us whether the test is noise or whether it is separating bad states from good ones. For a test to be useful it needs to have positive Youden’s statistic (https://en.wikipedia.org/wiki/Youden%27s_J_statistic): sensitivity + specificity - 1. A 50% failure rate alone does not let us calculate sensitivity and specificity.

I can see a similar problem with this article - the author notices that LLMs produce a lot of errors - then concludes that they are useless and produce only simulacrum of work. The author has an interesting observation about how llms disrupt the way we judge knowledge work. But when he concludes that llms do only simulacrum of work - this is where his arguments fail.


Gee, a thing by a guy, with a name. What are you saying exactly? So the test in question is a test the LLM is asked to carry out, right? Then your point is that if it's a load of vacuous flannel 49% of the time, but meaningful 51% of the time, on average this is genuine work so we can't complain about the 49%?

Wait, you're probably talking about the test of discarding a report based on something superficial like spelling errors. Which fails with LLMs due to their basic conman personalities and smooth talking. And therefore ..?


> For a test to be useful it needs to have positive Youden’s statistic

This is not true as stated. I'd try to gloss over the absolutes relative to the context, but if I'm totally honest, I'm not sure I understand what idea you're trying to communicate.


I don't know - looks like an interesting idea - but ... I am struggling to put that in a polite manner. When I go into the repo and find out that it does stuff like lip syncing of talking avatars then I start to think what percentage of the development effort goes into marketing?

The idea is for non tech people to relate to agents through a human style interaction - that part is actually only a rfelatively small pice of the system but it brings it to life for people.

Its a way to encapsulate the personality and expertise - at least that's the idea :)


I like how the author notices that it really got a start with cloud computing.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: