When running long autonomous tasks it is quite frequent to fill the context, eve...

SequoiaHope · 2026-03-14T01:16:36 1773450996

Yep I have an autonomous task where it has been running for 8 hours now and counting. It compacts context all the time. I’m pretty skeptical of the quality in long sessions like this so I have to run a follow on session to critically examine everything that was done. Long context will be great for this.

lukan · 2026-03-14T09:09:58 1773479398

Are those long unsupervised sessions useful? In the sense, do they produce useful code or do you throw most of it away?

brookst · 2026-03-14T12:00:02 1773489602

I get very useful code from long sessions. It’s all about having a framework of clear documentation, a clear multi-step plan including validation against docs and critical code reviews, acceptance criteria, and closed-loop debugging (it can launch/restsart the app, control it, and monitor logs)

I am heavily involved in developing those, and then routinely let opus run overnight and have either flawless or nearly flawless product in the morning.

MikeNotThePope · 2026-03-14T01:18:11 1773451091

I haven't figured out how to make use of tasks running that long yet, or maybe I just don't have a good use case for it yet. Or maybe I'm too cheap to pay for that many API calls.

ashdksnndck · 2026-03-14T01:33:34 1773452014

My change cuts across multiple systems with many tests/static analysis/AI code reviews happening in CI. The agent keeps pushing new versions and waits for results until all of them come up clean, taking several iterations.

tudelo · 2026-03-14T01:40:52 1773452452

I mean if you don't have your company paying for it I wouldn't bother... We are talking sessions of 500-1000 dollars in cost.

takwatanabe · 2026-03-14T11:45:54 1773488754

Right. At Opus 4.6 rates, once you're at 700k context, each tool call costs ~$1 just for cache reads alone. 100 tool calls = $100+ before you even count outputs. 'Standard pricing' is doing a lot of work here lol

brookst · 2026-03-14T12:03:20 1773489800

Cache reads don’t count as input tokens you pay for lol.

https://www.claudecodecamp.com/p/how-prompt-caching-actually...

boredtofears · 2026-03-14T01:15:58 1773450958

All of those things are smells imo, you should be very weary of any code output from a task that causes that much thrashing to occur. In most cases it’s better to rewind or reset and adapt your prompt to avoid the looping (which usually means a more narrowly defined scope)

grafmax · 2026-03-14T01:31:08 1773451868

A person has a supervision budget. They can supervise one agent in a hands-on way or many mostly-hands-off agents. Even though theres some thrashing assistants still get farther as a team than a single micromanaged agent. At least that’s my experience.

not_kurt_godel · 2026-03-14T02:44:23 1773456263

Just curious, what kind of work are you doing where agentic workflows are consistently able to make notable progress semi-autonomously in parallel? Hearing people are doing this, supposedly productively/successfully, kind of blows my mind given my near-daily in-depth LLM usage on complex codebases spanning the full stack from backend to frontend. It's rare for me to have a conversation where the LLM (usually Opus 4.6 these days) lasts 30 minutes without losing the plot. And when it does last that long, I usually become the bottleneck in terms of having to think about design/product/engineering decisions; having more agents wouldn't be helpful even if they all functioned perfectly.

avereveard · 2026-03-14T02:59:46 1773457186

I've passed that bottleneck with a review task that produces engineering recommendations along six axis (encapsulation, decoupling, simplification, dedoupling, security, reduce documentation drift) and a ideation tasks that gives per component a new feature idea, an idea to improve an existing feature, an idea to expand a feature to be more useful. These two generate constant bulk work that I move into new chat where it's grouped by changeset and sent to sub agent for protecting the context window.

What I'm doing mostly these days is maintaining a goal.md (project direction) and spec.md (coding and process standards, global across projects). And new macro tasks development, I've one under work that is meant to automatically build png mockup and self review.

not_kurt_godel · 2026-03-14T03:17:20 1773458240

What are you using to orchestrate/apply changes? Claude CLI?

avereveard · 2026-03-14T04:52:57 1773463977

I prefer in IDE tools because I can review changes and pull in context faster.

At home I use roo code, at work kiro. Tbh as long as it has task delegation I'm happy with it.

grafmax · 2026-03-14T23:17:26 1773530246

I work on 1M LOC 15 yr old repo. Like you it's across the full stack. Bugs in certain pieces of complex business logic would have catastrophic consequences for my employer. Basically I peel poorly-specific work items off my queue into its own worktree and session at high reasoning/effort and provide a well-specified prompt.

These things eat into my supervision budget:

* LLM loses the plot and I have to nudge (like you) * Thinking hard to better specify prompts (like you) * Reviewing all changes (I do not vibe code except for spikes or other low-risk areas) * Manual thing I have to do (for things I have not yet automated with a agent-authored scripts) * Meetings * etc

So, yes, my supervision budget is a bottleneck. I can only run 5-8 agents at a time because I have only so much time in the day.

Compare that vs a single agent at high reasoning/effort: I am sitting waiting for it to think. Waiting for it to find the code area I'm talking about takes time. Compiling, running tests, fixing compile errors. A million other things.

Any time I find myself sitting and waiting, this is a signal to me to switch to a different session.

chrisweekly · 2026-03-14T01:31:12 1773451872

weary (tired) -> wary (cautious)

saaaaaam · 2026-03-14T01:48:29 1773452909

Wary, not weary. Wary: cautious. Weary: tired.

dentalnanobot · 2026-03-14T07:21:28 1773472888

This is really common, I think because there’s also “leery” - cautious, distrustful, suspicious.