Ha funny, I was speccing out an idea for real time Claude code interaction from local apps using some tricks vs using the agent sdk when I got the popup to try Fable. So of course I gave it a go, and it triggered the sensitive content warning immediately, which I was very confused by until I put two and two together.
Fun times when “safety” means both the safety of mankind, and also the safety of revenues
I’ve been using this general pattern - a custom cli app for deterministic tasks, skills for the agent harness, run the skills in the agent and it produces artifacts for you by using the cli and its own agentic reasoning - a lot lately for work. Things like “give me an executive brief of the activity in these teams backlogs over the last month” and in 5-10 minutes I have a few page doc I can read that is cited with the tickets it analyzed and I don’t have to go bug people or ask them to do yet another task for me, just make sure your backlog is updated and detailed like normal practice. It’s awesome and really fits a useful spot between pure agent usage (which is hard to get consistent results from on repeat tasks) and not having to build/buy a full blown app for every random thing.
This approach works well, I agree. But I keep wishing that I could invert it. The architecture I feel like I keep yearning for, is a traditional CLI program that encodes most workflow knowledge/decisions as real code; but which does "just a little bit of coding agent invocation" during one specific workflow step.
Not sure how to accomplish this. Anyone have any suggestions? Are there libraries for this yet? (And how would they even work? It feels like, to do this right, there would have to be some background service that CLI software could expect to interact with via a well-known local IPC socket — similar to how e.g. the docker daemon works. But I'm unaware of any coding agent software/frameworks that expose such an IPC capability...)
I’m building this! It was originally designed for human accessibility for interactive CLIs, but it turned out to be really useful for giving agents the ability to follow structured workflows.
It runs as a background terminal that the agent can observe, and then exposes all interaction options as structured commands that can be run from the foreground CLI which then update the state of the background terminal via IPC. My hope is to establish a sort of “ARIA for terminals” standard to improve accessibility for both humans and agents. Email in profile, ping me if you’re interested in giving it a spin (just have plugins for Inquirer + Commander right now, hoping to broaden to other frameworks & TUIs soon).
I reverted this due to impending billing changes, but Claude and most LLM providers to my knowledge do offer a way to directly fire a prompt to the LLM in a "headless" or non-interactive mode. Specifically "claude -p <your_prompt_here>" is the way to do it with Claude Code. It allows for using the agent to do a one-off command with a given structured prompt. Originally Lathe would use this from the Go application to allow you to extend a tutorial directly from the UI without directly interacting with the LLM.
You'd have to exec out, so it's alittle clunkier than an IPC, but I think you could achieve what you want with it.
But in my experience, to actually get where they're going quickly (as opposed to spending hours and hundreds of dollars stumbling around in the dark), coding agents generally need more interactive hand-holding than that. If you just fire off one non-interactive session and wait for it to come to a stop, the problem usually isn't fully+correctly solved at the point at the LLM decides to "finish." And if you then start another non-interactive session to continue the work, the new session will have lost the old session's state/memory/context, and so will stumble through many of the same mistakes / misapprehensions.
What you really want, for a CLI program with a "use coding agent to do X" workflow-step, is for the CLI program to play the role of a human in a temporary durable coding-agent conversation session: prompting the agent; then waiting for it to finish responding (and side-effecting); then either asking the agent itself to evaluate an "am I done yet" predicate with a constrained output syntax; or having the CLI program do its own out-of-band validation of the changes made to the shared state by the agent; where, in either case, if the agent isn't "done yet", then the workflow step must continue poking it — or prompt the human to make a decision on how to proceed (possibly involving providing direct input to the LLM, but this is not ideal; ideally the CLI "abstracts away" the need for the end-user to understand the intricacies of the conversation the program is having with the LLM. Even more ideally, the conversation just whizzes by and the human doesn't have to think about an LLM being involved at all.)
Basically, think of this not as the CLI program saying to an agent "answer me this question" or "edit this file for me", but rather, the CLI program popping open a mini "guided + 99%-of-the-time automated" TUI coding-agent micro-IDE "inside" the workflow, in about the same way that git pops open your EDITOR inside `git commit`.
> Basically, think of this not as the CLI program saying to an agent "answer me this question" or "edit this file for me", but rather, the CLI program popping open a mini "guided + 99%-of-the-time automated" TUI coding-agent micro-IDE "inside" the workflow, in about the same way that git pops open your EDITOR inside `git commit`.
Isn't this simply having your mechanistic script call `claude "Prompt that is well honed to provide a mini, guided, 99%-of-the-time automated LLM action to $THE_THING"`? And, possibly including some `--allowed-tools`?
I agree! I want to say I first saw this pattern in some work Simon Willison did (Rodney and Showboat). For certain workflows the pair of Skills + CLI give me a nice balance between the flexibility of LLMs and the consistency of a CLI.
Can you give some examples of the deterministic tasks? So in your example, was the deterministic task “fetch this team’s backlog”? And then the LLM parts are “process each backlog” and “combine a summary”?
Generally it’s things like that yes, but also stuff like preparing the pulled content to be summarized and then taking those summary batches and preparing for synthesis in a templated format
I backed jetkvm on kickstarter or indiegogo or whatever site they launched on out of excitement. It’s a well made device, the software is clean. It feels like it’s stopped being iterated on though which is a bummer (the tariff timing was likely brutal).
Even though I have it on Ethernet on a gigabit network at home, with WiFi 7 mesh that gets 950mb up/down to the internet, I’ve never been able to use jetkvm whether through their cloud portal or through direct ip connection without it feeling very sluggish. It’s plugged into an aging NUC that I know is still performant, and I previously had no machine setup with settings configured to a smooth basically realtime feel even over Tailscale remotely, so I don’t know if I just am using the wrong settings or something or what. Might just have to have codex computer use fiddle with it to see if it can figure it out.
Been using his skills a lot lately, they are wonderful. I’ve added an issue to specs skill that grounds the issues with a technical plan against the current codebase, and a research school that spawns a bunch of agents to look up best practices on the internet for those issues with specs, it really dials things in. I need to issue a PR to his project for those two…
I built a site that's similar in concept to Hacker News, but is entirely fed by RSS feed content, that is then bullet-pointed summarized on the article page: https://engineered.at/
But I also extract topics automatically from the content too with LLMs, to allow for dynamic topic pages that users can separately subscribe to to tune their feeds.
Haven't promoted it much, but it's pretty amazing what you can do for a couple bucks a month. And my main thesis with this site is that by locking the content to only rss feeds of known blogs, you dramatically reduce the spam submission risk (basically eliminate it). Doesn't handle the spam comment side of things, but that's a different problem.
This looks great, I've wanted something like this for a while. Finding how to click through to the actual item in the feed was a high point of friction for me.
I went to a topic and then clicked on the header of something I was interested in expecting to be brought to the blog post directly. Needing to click on that same title again to be brought to the post was unintuitive to me, I searched around the page, went back and forth a few times and eventually figured it out.
As a user I would love to be able to click directly through to the article FROM the topic feed. I would expect that the comments is a URL to the page that the header currently brings me to. This would match my expectations from using sites like reddit/HN.
A one or two liner summary directly on the topics feed would be really great I think.
I presume you’re politely asking in order to block? Which is fine, I get it. On my phone right now but can update later.
I do want to ask though (and I should make this clear in a FAQ or something): the way I check RSS feeds uses adaptive scheduling, so I intentionally don’t check feeds of sites too rapidly. Then the summarization is based on the full article content but I never render that full content on the site (to avoid traffic hijacking concerns). Given that: what’s the concern?
I do appreciate you addressing the concerns about traffic hijacking, but at the same time I really don't like having my content run through a text mangler like an LLM. I get the use case, but at the end of the day it's my content and I'm a bit prickly.
That said, I'm not necessarily planning to immediately block your crawlers, I intend to just add them to a list I maintain for personal reference. I'm mostly interested in correlating the crawling traffic that I see with various sources, I have been gathering data about crawling activity and sources that I display on an embedded map on my site. I have caddy annotate traffic with a header indicating what the crawler is, and if the fleet behaves nicely then they don't get added to the blocklist.
Interesting. in terms of "crawling", the way the engine I built works is by default it's just polling the rss feed of a site on an adjusting cadence like any other rss feed reader. On some sites, the engine can do a follow up scrape of the article link from the rss feed if the full content of the article isn't provided in the rss feed. So it's not real crawling, more fetching/scraping if necessary.
Figured it out, had a random block of Firefox versions less than 147 in my ApplicationController for some reason. Of course my home internet went down though so I’ll push in a few.
If I set my UA to "FUCKIT" I can use the site perfectly fine. Why is there a User Agent Filter that disables the whole website? This should be maybe a warning, not a complete block.
you know, I had setup some analytics filtering based on geoip because I was getting crazy spam traffic from Chine and Singapore, but that should only be affecting analytics not the whole site. Mind if I ask where you're located? (you can email me privately if preferred: me@dchuk.com)
This is awesome thank you for building this! I’ve wanted to try building some sort of companion robot for my kids to play with, that leverages edge voice AI models and likely an api connection to one of the big SOTA providers for the brain, with a custom system prompt to have a very simple child-like personality. I feel like it would be neat to use an android phone as the “face” where you get a nice screen to render on, compute for the edge models, and the forward and rear facing cams for the vision all included. One of those daydream projects…
- Checkout Reachy Mini or Stack-chan on orobotio, those are the top 2 SOTA AI personality robots, what you're looking for! Not an exact match, but I see the vision, it's definitely buildable.
- What AI services are you most familiar with? Claude? GPT? OpenRouter? HuggingFace? Gemini? Something else? I'm planning to support all of them, but will be helpful to know which would help you most.
- What would you estimate your skill level with programming, 3d printing, and building robotics electronics? beginner - moderate - expert?
I rebuilt an app I found in rust and extended it in a bunch of ways that I use everyday for this use case and it works flawlessly if this is any help: https://github.com/dchuk/jarkdown-rs
Interesting concept. Two sided marketplaces are hard to bootstrap but maybe just enough curiosity would get the flywheel going. Hell they should just try and convince people to enroll as providers but then also use the service even if it’s hitting their own machines until there’s some degree of supply and demand pressure then try and get only providers to sign up. Or set up some way to encourage providers to promote others to use the service (the 100% rev share kind of breaks that concept but anything can change).
I wish this was self hostable, even for a license fee. Many businesses have fleets of Macs, sometimes even in stock as returned equipment from employees. Would allow for a distributed internal inference network, which has appeal for many orgs who value or require privacy.
Ignore all the hate in the comments here, anyone denying the direction of software development and it aggressively becoming agentic have their own reckonings to deal with…
I love this concept. While I’m a Rails guy myself, I appreciate the value of Django too, and an agent-optimized version of it makes sense.
I feel like the next logical steps are this exact concept but in Go / Rust to get even more performance out of everything and to also get the single deployable binary too
I've actually been vibe coding a port of Django to Rust as a fun learning experience. I didn't expect it to be possible, but I've already got the core ORM working (including makemigrations, migrate, and inspectdb) with basic admin support running.
Single file deployment, and the process seems to only use 3-4 MB of memory.
I've been able to use inspectdb on existing Django databases, and then browse and change that data using the rust admin.
I am probably not the right person to build a production ready version of this - since I am not a Rust developer - but gee I am impressed by how good it is becoming.
Fun times when “safety” means both the safety of mankind, and also the safety of revenues
reply