More

Frannky · 2026-05-06T04:16:02 1778040962

I want to just talk to the Mac and have it do things. I tried computer use and other alternatives, but the latency made it unusable.

I want to be able to control both Mac, apps and the browser. I also need it to figure out things by itself given a goal.

Claude Code with the --chrome flag is kind of good, but it's too slow. I wanted to try faster APIs, like the one hosted on Cerebras, but it's too expensive.

Any solution I might be missing?

jasomill · 2026-05-06T08:48:35 1778057315

Do you want to do something that can't be done through AppleScript, macOS accessibility APIs, and something like Puppeteer to control the browser?

Or something you don't understand how to do manually?

Because I guess I don't understand the attraction of using an LLM for system automation where existing interfaces exist, other than as a form of documentation, or to write code using these interfaces.

Frannky · 2026-05-04T04:16:18 1777868178

Grok is pretty bad. No wonder usage is low. I think they messed up when they removed the human annotation team and went in the direction of automation.

The bet can eventually pay off when they figure out how to train without human help and also generate useful models. Imagine is terrible too.

More competition is great for us users. I hope they recover. In the meantime why not hosting oss models like google does?

nwah1 · 2026-05-04T04:47:41 1777870061

My understanding is that inference (running existing models) is around 1/4th of the average compute budget for AI companies. Training new models takes up about 3/4ths.

As such, using only 11% of their GPUs indicates that they've elected not to do as much training as they are capable of.

brazukadev · 2026-05-04T12:05:11 1777896311

if they "elected" to do that, with such a terrible model, they are the most incompetent AI lab ever.

Frannky · 2026-05-04T03:57:42 1777867062

I was interested in trusted execution environments and how safe they were. If you look on google scholar and start reading, they seem super vulnerable. The feeling is that the industry has no better option and that they are a way to tell customers they are safe when they're not

Frannky · 2026-05-03T04:49:29 1777783769

I have to try Kimi. I was looking for an alternative. If you have any experience, advice, please share. I saw Kimi is at the top of the Open Router ranking.

DeathArrow · 2026-05-03T05:03:13 1777784593

Kimi K2.6 is great but I advice you to get a coding plan from Kimi.com as that way is much cheaper than paying for API calls using OpenRouter.

Frannky · 2026-05-03T05:08:06 1777784886

Thanks, I am trying it right now. I had an opencode plan 5$/month, so I will play with that. I use ZED and I added Pi ACP, so I can try the both pi and Kimi. I will also try it in opencode and via Kimi code.

prvnsmpth · 2026-05-03T05:18:41 1777785521

Use kimi 2.6 for planning and a cheap model (preferably local) for execution, and then kimi once again for reviewing it. Then finally I review the code. Saves a lot on tokens.

Frannky · 2026-05-03T05:47:13 1777787233

Very interesting, thanks for sharing. I am testing it with Pi in Zed and it seems pretty good.

zorked · 2026-05-03T05:01:33 1777784493

I use Kimi at home via a kimi.com subscription and Kimi CLI (sometimes running inside Zed, sometimes not). My favorite model by far. And it's just $20.

I have to use a supposedly frontier model at work and I hate it.

Frannky · 2026-05-03T05:08:59 1777784939

Nice, thanks for sharing!

Frannky · 2026-05-03T02:12:37 1777774357

I am looking for a good alternative to Claude code + opus that is not codex. I tried switching back to opus 4.6. The attitude of 4.7 is what is more problematic. Difficult to enforce checking stuff before answering, and it suppose he knows better than me and reality. Plus all the latest shenanigans they did. Pretty disgusted I am still using them

rane · 2026-05-03T07:47:48 1777794468

You can use other models in Claude Code

https://github.com/raine/claude-code-proxy

https://api-docs.deepseek.com/quick_start/agent_integrations...

Frannky · 2026-05-03T02:33:29 1777775609

I have forgotten to add the tendency of not owing problems and taking care and solve immediately but instead deflecting and saying it shouldn't be done now it's not my responsibility etc Just terrible

alxhslm · 2026-05-03T07:23:34 1777793014

100% this! So often it complains failing unit tests are not its fault.

Frannky · 2026-04-30T06:02:31 1777528951

I love zed. What CLI agent and model do you use with it? I am looking for something on par with CC+Opus4.6, possibly subscription-based

Frannky · 2026-04-30T01:45:16 1777513516

It's lazy, does not take ownership and responsibility, wants to defer work, and I have to force it to check reality. It likes to guess and assume it's correct and I am wrong. Agents.md is not helping at all. It's in full enshittification phase, yay!

Frannky · 2026-04-24T00:13:27 1776989607

I have been noticing a similar pattern on opus 4.7, I repeat multiple times during a conversation to solve problems now and not later. It tries a lot to not do stuff by either saying this is not my responsibility the problem was already there or that we can do it later

Frannky · 2026-04-22T21:08:00 1776892080

I would love to unleash parallel agents, but I am still checking every single edit while enforcing minimal, stateless, modular code, and I have the AI check in with me before writing the next file.

A lot of times, I find it has incredibly stupid ideas and tends to make the code very messy. I would love to figure out how to stop that from happening automatically.

The upside of checking in on the code, though, is that I can come up with smart directions for the AI from both a product and tech perspective. This is especially helpful when the dumb suggestions add a lot of complexity.

I think it's like when a product person asks for a new feature, or when a founder building their own product selects which feature is smarter to build and how.

mswphd · 2026-04-22T23:15:46 1776899746

I'm expecting we'll likely end back up on agents making PRs, and having to review them. Either that or giving up on quality etc/dealing with very messy code. I've been trying various automated testing/linting/etc strategies, and they only work so well.

Frannky · 2026-04-22T23:26:39 1776900399

That would be a nightmare. One thing is to review a PR generated by a human using AI and caring about the code; another is reviewing wild agents, especially when they make changes everywhere

mswphd · 2026-04-22T23:41:36 1776901296

I'm not excited about it, but the only main way I've been able to discover LLM-isms that sneak in are

1. via seeing them glimpse by in the agents' window as its making edits (e.g. manual oversight), or 2. when running into an unexpected issue down the line.

If LLMs cannot automatically generate high quality code, it seems like it may be difficult to automatically notice when they generate bad code.

imtringued · 2026-04-23T15:22:16 1776957736

Why? The vast majority of PR tools let you comment on specific lines of code, which you can't do with a prompt text area. The PR UI is superior over the standard agent UI.

paulddraper · 2026-04-22T21:39:54 1776893994

> I would love to figure out how to stop that from happening automatically.

AGENTS.md

Frannky · 2026-04-22T23:12:35 1776899555

I think the issue is deeper than prompts, agents.md, smart flows, etc. I think the problem is that LLMs are searchers, trained on preferring some results. So, if the dumb solution is there, and the smart solution is not there, they won't spit it out.

jazzypants · 2026-04-22T22:11:27 1776895887

> AGENTS.md

-- which will be ignored just often enough that you can never quite trust it.

theowaway213456 · 2026-04-22T22:49:30 1776898170

Yup. No matter how much you tell it to keep things simple, modular, crisp, whatever, it generates tons of garbage much too often.

bigmadshoe · 2026-04-22T23:24:51 1776900291

Btw it may be obvious but afaik claude by default only reads CLAUDE.md and not AGENTS.md

paulddraper · 2026-04-22T23:18:43 1776899923

And yet still less often than the average developer.

monooso · 2026-04-23T15:34:50 1776958490

By all means elaborate. I can't imagine "don't have stupid ideas or write messy code" is going to make much difference.

paulddraper · 2026-04-23T15:46:52 1776959212

To elaborate: That advice isn’t as objective as you think.

What one developer calls clean the other calls messy.

My advice is to use it, then document the issues when it gets messy. It takes some time, but no more than recruiting, training, paying another engineer.

Frannky · 2026-04-22T16:23:30 1776875010

If these results are because of vampire attacks, the results will stop being so good when closed ones figure out how to pollute them when they are sucking answers.

Also, they are not exactly as good when you use them in your daily flow; maybe for shallow reasoning but not for coding and more difficult stuff. Or at least I haven't found an open one as good as closed ones; I would love to, if you have some cool settings, please share