Hacker Newsnew | past | comments | ask | show | jobs | submit | frabcus's commentslogin

LLMs are reducing n-day exploit time rapidly.

https://red.anthropic.com/2026/n-days/

So that is a poor bandaid to use now. Maybe instead validate things before, and have more of a cathedral and human reputation system.


Have any kind of provenance. eg like Debian has for 30 years. Key signing in person etc

That has also been implemented recently. With staged publishing the author must verify a new release with 2FA so automated attacks dont work anymore. Some human in the loop must verify a release.

They can't under GDPR. The DMA is for market access - there are other laws for privacy. Those require use commensurate with what is needed for the service, so anyone who e.g. scraped all of a user's local info and stolen it would be breaking EU privacy laws themselves.

This is not complicated. Even in the US, every other industry is regulated to your benefit, you're just used to it and haven't realised. Digital technology obviously needs to be too. And yes, you have to do it properly.


AISI in the UK has been doing this for years - there are lots of papers https://www.aisi.gov.uk/category/safeguards and specific reports, e.g. this on GPT 5.5 https://www.aisi.gov.uk/blog/our-evaluation-of-openais-gpt-5...

This old post goes into lots of detail about what they do to red team and why: https://www.aisi.gov.uk/blog/early-lessons-from-evaluating-f...

NIST's similar unit in the US is now called CAISI https://www.nist.gov/caisi - interesting that the most recent post is an evaluation of DeepSeek capabilities, which sound more like watching China. But presumably this executive order alters the emphasis?


Right the original article says "Do you think macOS will get better or worse in the next 2 years?" (rhetorically implying "worse").

That could easily be true and Apple "will use even more tokens and spend even more money".


Sure but what would be a bet that can disprove the author? If there’s nothing then the post is useless.


Something that would disprove the author would be an increase in industry code quality. A reduction in bug rates might be a reasonable proxy measure.


Well, or not spawn any external commands, and actually have tools made of code written by someone who thought about what the agents at each level should be limited to doing.


Or just run agents in a container…


In the limit we want the llm to write the code (like in RLMs).


Make harness independent of model, so when pricing or quality changes you can switch.

Avoid lock in to stack from one provider (things like a harness that only works with models from one provider and so on).

Use local models (a couple of them do work a bit now, if you have 20Gb video RAM), which saves money and is more private, and works offline.

Can improve the harness, fix bugs in it, make it compatible with different systems and techniques.

This game happens every time in new cycles of developer technology. The good bet historically has always been to use open source - there's a reason most developer tooling just pre-AI revolution was open source (even things like Java and .NET which used to be proprietary).


>Make harness independent of model

You can use Claude Code with almost any model.

>Use local models (a couple of them do work a bit now, if you have 20Gb video RAM), which saves money and is more private, and works offline.

You can do that with Claude Code.


Reporters without Borders recently released Press Freedom Index 2026 puts Malta 67th, and the UK at 18. So no, certainly not much better - although looking at some of the historic data, it was better e.g. in 2010.

https://rsf.org/en/index


Yeah US is 64th lol. Totally unbiased


Agreed. All I see is a grok summary of a lot of X posts. The original link is not suitable. Anyone have a link to a proper announcement?



I was finding this really interesting, that maybe a human had written it and it really reflected a vision for how we build software in this new world. I want to know the way, I'm curious!

Until I got to "One platform, three modes." and my brain just pattern matched "AI slop" and the entire post dissolved into meaningless for me.

I don't know if I can stop my mind reaching this conclusion. I'm sure someone at GitLab made some effort to carefully edit the post... But that it wasn't entirely rooted in a human who'd worked out how this stuff goes, but clearly had lots of AI writing it out... Just made my instinct go "this isn't worth paying attention to after all".


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: