More

frabcus · 2026-06-10T07:19:37 1781075977

LLMs are reducing n-day exploit time rapidly.

https://red.anthropic.com/2026/n-days/

So that is a poor bandaid to use now. Maybe instead validate things before, and have more of a cathedral and human reputation system.

frabcus · 2026-06-10T07:16:20 1781075780

Have any kind of provenance. eg like Debian has for 30 years. Key signing in person etc

tpetry · 2026-06-10T08:51:52 1781081512

That has also been implemented recently. With staged publishing the author must verify a new release with 2FA so automated attacks dont work anymore. Some human in the loop must verify a release.

frabcus · 2026-06-09T06:09:27 1780985367

They can't under GDPR. The DMA is for market access - there are other laws for privacy. Those require use commensurate with what is needed for the service, so anyone who e.g. scraped all of a user's local info and stolen it would be breaking EU privacy laws themselves.

This is not complicated. Even in the US, every other industry is regulated to your benefit, you're just used to it and haven't realised. Digital technology obviously needs to be too. And yes, you have to do it properly.

frabcus · 2026-06-03T07:23:15 1780471395

AISI in the UK has been doing this for years - there are lots of papers https://www.aisi.gov.uk/category/safeguards and specific reports, e.g. this on GPT 5.5 https://www.aisi.gov.uk/blog/our-evaluation-of-openais-gpt-5...

This old post goes into lots of detail about what they do to red team and why: https://www.aisi.gov.uk/blog/early-lessons-from-evaluating-f...

NIST's similar unit in the US is now called CAISI https://www.nist.gov/caisi - interesting that the most recent post is an evaluation of DeepSeek capabilities, which sound more like watching China. But presumably this executive order alters the emphasis?

frabcus · 2026-05-25T08:07:36 1779696456

Right the original article says "Do you think macOS will get better or worse in the next 2 years?" (rhetorically implying "worse").

That could easily be true and Apple "will use even more tokens and spend even more money".

simianwords · 2026-05-25T08:08:52 1779696532

Sure but what would be a bet that can disprove the author? If there’s nothing then the post is useless.

lmm · 2026-05-25T08:35:00 1779698100

Something that would disprove the author would be an increase in industry code quality. A reduction in bug rates might be a reasonable proxy measure.

frabcus · 2026-05-17T07:44:06 1779003846

Well, or not spawn any external commands, and actually have tools made of code written by someone who thought about what the agents at each level should be limited to doing.

alfiedotwtf · 2026-05-17T07:58:48 1779004728

Or just run agents in a container…

zbyforgotp · 2026-05-17T07:47:07 1779004027

In the limit we want the llm to write the code (like in RLMs).

frabcus · 2026-05-17T07:40:27 1779003627

Make harness independent of model, so when pricing or quality changes you can switch.

Avoid lock in to stack from one provider (things like a harness that only works with models from one provider and so on).

Use local models (a couple of them do work a bit now, if you have 20Gb video RAM), which saves money and is more private, and works offline.

Can improve the harness, fix bugs in it, make it compatible with different systems and techniques.

This game happens every time in new cycles of developer technology. The good bet historically has always been to use open source - there's a reason most developer tooling just pre-AI revolution was open source (even things like Java and .NET which used to be proprietary).

DeathArrow · 2026-05-17T10:54:02 1779015242

>Make harness independent of model

You can use Claude Code with almost any model.

>Use local models (a couple of them do work a bit now, if you have 20Gb video RAM), which saves money and is more private, and works offline.

You can do that with Claude Code.

frabcus · 2026-05-17T07:20:54 1779002454

Reporters without Borders recently released Press Freedom Index 2026 puts Malta 67th, and the UK at 18. So no, certainly not much better - although looking at some of the historic data, it was better e.g. in 2010.

https://rsf.org/en/index

nlitened · 2026-05-17T11:15:14 1779016514

Yeah US is 64th lol. Totally unbiased

frabcus · 2026-05-14T08:09:48 1778746188

Agreed. All I see is a grok summary of a lot of X posts. The original link is not suitable. Anyone have a link to a proper announcement?

paulddraper · 2026-05-14T13:56:03 1778766963

https://twitterwebviewer.com/?tweet=2054610152817619388

frabcus · 2026-05-11T22:05:54 1778537154

I was finding this really interesting, that maybe a human had written it and it really reflected a vision for how we build software in this new world. I want to know the way, I'm curious!

Until I got to "One platform, three modes." and my brain just pattern matched "AI slop" and the entire post dissolved into meaningless for me.

I don't know if I can stop my mind reaching this conclusion. I'm sure someone at GitLab made some effort to carefully edit the post... But that it wasn't entirely rooted in a human who'd worked out how this stuff goes, but clearly had lots of AI writing it out... Just made my instinct go "this isn't worth paying attention to after all".