I want to just talk to the Mac and have it do things. I tried computer use and other alternatives, but the latency made it unusable.
I want to be able to control both Mac, apps and the browser. I also need it to figure out things by itself given a goal.
Claude Code with the --chrome flag is kind of good, but it's too slow. I wanted to try faster APIs, like the one hosted on Cerebras, but it's too expensive.
Do you want to do something that can't be done through AppleScript, macOS accessibility APIs, and something like Puppeteer to control the browser?
Or something you don't understand how to do manually?
Because I guess I don't understand the attraction of using an LLM for system automation where existing interfaces exist, other than as a form of documentation, or to write code using these interfaces.
Grok is pretty bad. No wonder usage is low. I think they messed up when they removed the human annotation team and went in the direction of automation.
The bet can eventually pay off when they figure out how to train without human help and also generate useful models. Imagine is terrible too.
More competition is great for us users. I hope they recover. In the meantime why not hosting oss models like google does?
My understanding is that inference (running existing models) is around 1/4th of the average compute budget for AI companies. Training new models takes up about 3/4ths.
As such, using only 11% of their GPUs indicates that they've elected not to do as much training as they are capable of.
I was interested in trusted execution environments and how safe they were. If you look on google scholar and start reading, they seem super vulnerable. The feeling is that the industry has no better option and that they are a way to tell customers they are safe when they're not
I have to try Kimi. I was looking for an alternative. If you have any experience, advice, please share. I saw Kimi is at the top of the Open Router ranking.
Thanks, I am trying it right now. I had an opencode plan 5$/month, so I will play with that. I use ZED and I added Pi ACP, so I can try the both pi and Kimi. I will also try it in opencode and via Kimi code.
Use kimi 2.6 for planning and a cheap model (preferably local) for execution, and then kimi once again for reviewing it. Then finally I review the code. Saves a lot on tokens.
I use Kimi at home via a kimi.com subscription and Kimi CLI (sometimes running inside Zed, sometimes not). My favorite model by far. And it's just $20.
I have to use a supposedly frontier model at work and I hate it.
I am looking for a good alternative to Claude code + opus that is not codex. I tried switching back to opus 4.6. The attitude of 4.7 is what is more problematic. Difficult to enforce checking stuff before answering, and it suppose he knows better than me and reality. Plus all the latest shenanigans they did. Pretty disgusted I am still using them
I have forgotten to add the tendency of not owing problems and taking care and solve immediately but instead deflecting and saying it shouldn't be done now it's not my responsibility etc Just terrible
It's lazy, does not take ownership and responsibility, wants to defer work, and I have to force it to check reality. It likes to guess and assume it's correct and I am wrong. Agents.md is not helping at all. It's in full enshittification phase, yay!
I have been noticing a similar pattern on opus 4.7, I repeat multiple times during a conversation to solve problems now and not later. It tries a lot to not do stuff by either saying this is not my responsibility the problem was already there or that we can do it later
I would love to unleash parallel agents, but I am still checking every single edit while enforcing minimal, stateless, modular code, and I have the AI check in with me before writing the next file.
A lot of times, I find it has incredibly stupid ideas and tends to make the code very messy. I would love to figure out how to stop that from happening automatically.
The upside of checking in on the code, though, is that I can come up with smart directions for the AI from both a product and tech perspective. This is especially helpful when the dumb suggestions add a lot of complexity.
I think it's like when a product person asks for a new feature, or when a founder building their own product selects which feature is smarter to build and how.
I'm expecting we'll likely end back up on agents making PRs, and having to review them. Either that or giving up on quality etc/dealing with very messy code. I've been trying various automated testing/linting/etc strategies, and they only work so well.
That would be a nightmare. One thing is to review a PR generated by a human using AI and caring about the code; another is reviewing wild agents, especially when they make changes everywhere
I'm not excited about it, but the only main way I've been able to discover LLM-isms that sneak in are
1. via seeing them glimpse by in the agents' window as its making edits (e.g. manual oversight), or
2. when running into an unexpected issue down the line.
If LLMs cannot automatically generate high quality code, it seems like it may be difficult to automatically notice when they generate bad code.
Why? The vast majority of PR tools let you comment on specific lines of code, which you can't do with a prompt text area. The PR UI is superior over the standard agent UI.
I think the issue is deeper than prompts, agents.md, smart flows, etc. I think the problem is that LLMs are searchers, trained on preferring some results. So, if the dumb solution is there, and the smart solution is not there, they won't spit it out.
To elaborate: That advice isn’t as objective as you think.
What one developer calls clean the other calls messy.
My advice is to use it, then document the issues when it gets messy. It takes some time, but no more than recruiting, training, paying another engineer.
If these results are because of vampire attacks, the results will stop being so good when closed ones figure out how to pollute them when they are sucking answers.
Also, they are not exactly as good when you use them in your daily flow; maybe for shallow reasoning but not for coding and more difficult stuff. Or at least I haven't found an open one as good as closed ones; I would love to, if you have some cool settings, please share
I want to be able to control both Mac, apps and the browser. I also need it to figure out things by itself given a goal.
Claude Code with the --chrome flag is kind of good, but it's too slow. I wanted to try faster APIs, like the one hosted on Cerebras, but it's too expensive.
Any solution I might be missing?
reply