Hacker Newsnew | past | comments | ask | show | jobs | submit | braebo's commentslogin

No type system is as strong as TypeScript — certainly not Kotlin.

Give Scala a try :)

Packing people into tiny spaces like sardines should be illegal.

You can easily persist agent memories in a markdown file though.

And the memento guy had tattoos of key information. That didn’t make it so he didn’t have memory loss.

Pretty good metaphor.

Limited space to work with, highly context dependent and likely to get confused as you cover more surface area.


Yup, and the agent will happily ignore any and all markdown files, and will say "oops, it was in the memory, will not do it again", and will do it again.

Humans actually learn. And if they don't, they are fired.


To me it sounds like a tooling problem. OP seems to be trying to use probabilistic text systems as if they enforce rules, but rule enforcement should really live outside the model. My sense is that there was a failure to verify the agent's intent.

The tooling that invokes the model should really define some kind of guardrails. I feel like there's an analogy to be had here with the difference between an untyped program and a typed program. The typed program has external guardrails that get checked by an external system (the compiler's type checker).


What tooling? It's a probabilistic text generator that runs in a black box on the provider's server. What tooling will have which guardrails to make sure that these scattered markdown files are properly injected and used in the text generation?

That's the million dollar question. Maybe have systems of agents that all validate each other's work? Maybe something needs to be done at the harness level? I don't suppose that we could realistically expect 100% accuracy, but if we take 100% to be the upper limit, we could build systems that get us closer to that ideal.

This is faith in magic. "There's some magic way to make probabilistic text generator running in the cloud to never miss local files"

No no, that’s not what I’m saying. The fact that the data is stored in files is incidental. It could be in a database, in a knowledge graph, derived from so other data Regardless of where it is, something should know to include it in the context, but only when it’s relevant.

So for instance you could start by trying to classify the prompt in some way. If you use an LLM for this, you might need to get it to return a machine parsable data format. Then your harness can pattern match on the classification and use it to enrich the prompt with additional context. The challenge would be in determining how exactly you want to go about this, balancing tradeoffs such as accuracy, cost, time, etc..

For the classification step you might begin with something like "Determine whether the following prompt is a QUESTION or a STATEMENT. Respond using only one of the two words. Prompt: $PROMPT"

You could have multiple back-and-forths like this and at each round you gain more information about the prompt, and you can use that information to determine further classifications and/or context to include.


> Regardless of where it is, something should know to include it in the context,

Magic. You're talking about magic. You keep re-iterating the same faith that "There's some magic way to make probabilistic text generator running in the cloud to never miss local files", where "files" is "files, knowledge graphs, databases etc.".

It doesn't matter how data is stored. You can't know when to include something relevant in the context because the whole thing including context is running in the cloud. You are not in the driver's seat. Literally anything you include locally in the prompt can and will be ignored.


I’m not following. If I run an agent on ollama locally, it’s not in the cloud. I don’t see what cloud has anything to do with the argument.

As to your other point about anything you include in the prompt can and will be ignored. Yes, I agree. You could draw an analogy to how a teacher assigns an in-class reading assignment and follows it up with a reading comprehension quiz. If your mind wanders during the reading you may come to find that you will fail the quiz because “anything you include in the prompt can and will be ignored”. Therefore, the quiz result serves the purpose of an evaluation.


Which it will start ignoring after two or three messages in the session.

and you'll blow the context over time and send to the LLM sanitorium. It doesn't fit like the human brain can.

If a junior fucks production that will have extroadinary weight because it appreciates the severity, the social shame and they will have nightmares about it. If you write some negative prompt to "not destroy production" then you also need to define some sort of non-existing watertight memory weighting system and specify it in great detail. Otherwise the LLM will treat that command only as important as the last negative prompt you typed in or ignore it when it conflicts with a more recent command.


> and you'll blow the context over time and send to the LLM sanitorium. It doesn't fit like the human brain can.

The LLM did have this capability at training time, but weights are frozen at inference time. This is a big weakness in current transformer architectures.


That's not learning.

Which open model has the same performance as Opus 4.7?


They dont have to be parity today.

If the frontier models reach a point of barely any noticeable improvements the trade off changes.

You do not need a perfect substitute if you are getting it for free...

People will factor in future expectations about the development of open source vs frontier models. Why do you think OAI and anthropic are pushing hard on marketing? its for this reason. They want to get contractual commitments that firms have to honour whilst open source closes the gap.


The person they were responding to said "Open models have the same performance on coding tasks now." AFAIK this is bullshit, but I'd love to be corrected if I'm wrong.


Claude Code Desktop is as close as I can see them getting as it seems the big bet is that the IDE is on its way out as models improve.


That’s the lsp not runtime. Bun runs Typescript very fast. It’s a fantastic language and ecosystem.


I’ve just checked FFI in bun and it’s marked as experimental. There are great libraries in C/C++ world and FFI is kinda table stakes to use them.


No where did I say "runtime".

Even with Bun it's because of Zig, not TypeScript and that only proves my point even more.


you're right. we should just not use any interpreted/script languages because they're not as fast as compiled ones.

why does a CLI tool that just wraps APIs need this native performance?


Svelte for eliminating countless categories of complexity introduced by React.


We could use llms to scan source code and list all of the behavior not listed in the extensions page, like adware and geolocation tracking for example. Then another LLM locally to disable it and warn you with a message explaining the situation.


Same but it’s certainly too slow.


Avoiding this with Opus has been trivial in my experience.


Even with Sonnet or "lower" models like Kimi is trivial. The only thing I still find with AI-generated code is some degree of overengineering.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: