Hacker Newsnew | past | comments | ask | show | jobs | submit | extr's commentslogin

Really funny to describe OpenAI/Anthropic as a "SaaS"

Yeah, care to elaborate? I'm not seeing the joke.

This isn't even about cyber attacks. This is just LLM development which is increasingly just called software development. And at least for cyber it says "Sorry I can't help with that"!

I'm a big fan of Anthropic. Just check my post history. I've been accused of working there. But this is complete bullshit and they need to get real. Silent sandbagging is not acceptable, especially given they've shown with this release their safety filters have HUGE amounts of false positives.

It's increasingly obvious that the only safeguard we got is open models and semi open ones like from China. Crazy world

Interesting it's in python!

The points in this article don't really land for me. They are mostly critiques of particular MCP implementations rather than the modality itself. My impression right now:

- MCPs are great for stateless, mostly read-only interactions with document store type things. Notion/Slack/Linear are perfect use cases. I have those MCPs connected to claude code and they work great. These tools never had CLIs or super well used public APIs to begin with. MCP handles the auth for me. Cool.

- MCPs are great but not fully necessary for "function shaped" things where you're trying to run some Function and that Function has a lot of parameters with some subtlety to them and perhaps needs some examples to really help the LLM understand. Though you can get away with a skill + curl, or a hand rolled script even.

- MCPs are not so great for interacting with more complex stateful systems with large surface area. You don't want/need an AWS MCP, for example. And of course Cloudflare is the canonical example here where they do have an MCP but it has a special "Code Mode" because they have a huge product surface and a lot of state.

Most companies are somewhere in the vast space between being a document store type thing and AWS, so aren't really sure what their MCP should look like, or how customers will use it, but feel like they're missing the boat if they don't ship something. So they ship an MCP and perhaps the people who need the document type stuff load it up and get some use out of it, but others are not so satisfied. Or maybe from the other direction, people are trying to use your product but aren't super technical or don't know how to best use it with AI, but "loading up an MCP" seems like a reasonable way to start, so they ask everyone "Where's your MCP"?

I run into this at work all the time. We get a lot of requests for an MCP. But our product is not so simple to just stuff into a bunch of stateless API calls. And we question whether the people requesting the MCP really know what they want it for, exactly, other than to hook up to claude code so they can say "claude go do everything" (which is a valid sentiment, but implies a lot of work on our end to figure out how to make that work well).


IMO they have all been clean and noticeable upgrades over their predecessors. Opus 4.7 in particular was a solid jump in capabilities.


I think it's telling how split the opinions are around all of this. A lot of people distinctly disliked 4.7.

Are the dividing lines around personality? Working domains? Opinionated software stuff?

Who knows?


most of my coworkers feel the opposite about 4.7 and that 4.6 was, to them, significantly better to point that several stopped using claude code


4.5 -> 4.7 was a solid jump for me having skipped 4.6. It probably does depend on the specific tasks.


It didn't change at all, same as 4.6. Good morning to the Anthropic office btw.


The advantage is that /code-review supplies a structured idea of how to review and what that process should look like and then launches independent subagents to approach the issue from multiple angles.

It's analogous to how in the early days you could see benefits by telling the models to "think step by step". /code-review is something like "review angle by angle". "Consider removed behavior" and also "Look at language gotchas" and also "Look at test changes"...etc. Yes these are all somewhat implicitly already part of what "code review" means, but the models perform best with explicitness.

If you want my 2c as a power user: just don't think about it and use /code-review xhigh --fix. This will cover like 98% of what you want out of code review. It's a good skill.


We've all spent time -fixing someone's bright idea of a -fix. I'm sceptical of the time saving of applying a -fix before I understand the problem(s).

Outsourcing comprehension to a machine is probably gonna cost you more time in the long run.


I don't even bother looking at the code until I've run a code review pass on it. Why waste my time with trivial bug fixes? I find the best way to spend time right now is like:

- Defining the issue/ticket, what "success" looks like (if I have a good idea of this), high level approach guidance 50%

- Dispatch agent to work on it 5%

- Occasionally return and nudge agent + send /simplify or /code-review 5%

- Look at the code/session summary, divergences from the plan, ask followup questions 40%

Occasionally yes there is some solution the AI chose that is suboptimal and I would prefer fixed in a different way. Mostly though it's straightforward.


Thank you I will try this!

Is there something equivalent when coding in the first place? Eg /code high “prompt”


Are you thinking of the /effort level in Claude Code? I would just go with xhigh as a reasonable default. Most important thing in prompting is specifying what "done" and "success" looks like to you. Ask Claude to help you come up with a well formed request and spend most of your time on that, then paste that into a brand new session.


No more like is there a specific slash tool to be using when coding or planning. I guess that’s just Claude code in general but since there’s a specific review tool I was curious about specific coding tools


It’s simpler to just use “review code”. It’s also way cheaper


Hey Boris, some feedback. I like the new /code-review skill but was disappointed you guys removed /simplify because I quite liked the focus on finding code reuse/efficiency opportunities.

I see now in 2.1.152 you added those focus areas back to /code-review, but still bundled with the correctness finding. It would be great to have more fine grained control over the /code-review angles beyond just effort level. Or maybe you would recommend that I just specify that as freeform input after effort level?


Yep, you can add free-form input. Will update /simplify to only check for code quality and not bugs (the way it used to work), that's a good suggestion.


Damn already there in 154. Thank you man.


For a lot (most) of what we do with programming, the process actually doesn't matter. I understand you are a real ass dude who is in this shit for the love of the game. I respect that. You are a true artisan and exist in a kind of rarified space. There will always be a place for people like you and in some senses you are correct - you are not replaceable by any AI as they currently function today.

However, 99.9999% of coding is not like that. Non-coders don't care about the code at all. They just care about outcomes. People don't care if it's "slop" if it works. Similar to bug prevalence, the optimal level of slop is not zero and will be decided by the market, not by coders.


LLMs are even more useful to experts who know the limitations. However the process matters even more if you want to build robust and scalable secure systems that generate millions of dollars and can explain that accurately to high value clients.

I do not want a $10M - $100M dollar issue (lawsuits) because I admitted that I don't understand why a breach happened after using a coding agent. Responsiblity and reputation can't be vibe-coded.

So:

> However, 99.9999% of coding is not like that. Non-coders don't care about the code at all. They just care about outcomes. People don't care if it's "slop" if it works. Similar to bug prevalence, the optimal level of slop is not zero and will be decided by the market, not by coders.

There's a vast difference between code that works as a prototype vs how it works in production. I don't think you would trust anyone with no experience to fly a commercial plane with them vibe-coding a flight simulator without knowing the process of becoming a pilot.

But since "it works", it is ok right?


Did you RTFA?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: