What's novel here is the extremely small KV cache memory usage per long context windows, like 0.77GB with 512K, a 90% memory usage reduction compare to the already really small KV cache memory usage of Deepseek V4 Flash.
I wouldn't expect any of the american labs to be particularly great (or have much desire) to work on efficiency, they've been consistently proven to be uninterested (if not incapable) of actually improving on those types of things. The closest we've seen lately is that maybe GPT-5.5 (and Opus 4.{7,8}?) are more token-efficient, i.e. they solve things with less tokens...? It hasn't been coupled with any other kind of efficiency bump, though, and we're seeing higher costs anyway in most places where the american labs are involved.
The only players that seem to be capable of a consistent pattern of doing more with less currency are the chinese labs.
The guide describes it as projection although there is apparently an extra step: "A factorized coordinate lookup (X and Y matrices) attaches spatial location information directly to the input."
12B at int8 would take up 12G memory, or 75% of the system memory which technically fits within 16GB but the OS will not like that. EDIT: On my 18G memory MacBook Pro, LM Studio reports a "partial GPU offload" for the int8 MLX weights. Can't test because the `gemma_unified" architecture is NYI.
There's a 64MB game boy advance cartridge with shrek on it [1]. Looks pretty horrible [2]. But the GBA only has 16KB fast / 256KB slow RAM, and a 16MHz CPU.
Video resolution: 128x72, hahah. Late 90s RealPlayer postage stamp video is back! To its credit, that whole movie is probably smaller than RealPlayer itself was.
AGI is just "artificial" (a program) version of general intelligence (the general purpose intelligence humans have).
Nothing in AGI implies "surpass humans in every cognitive task".
Not even "match in every cognitive task" is really required. There are humans that by definition have "general intelligence" that still don't match other humans "in every cognitive task", just in some.
Why should AGI need to match ALL humans in EVERY congitive task then? An AGI just needs to be as good as an average (or even slightly below average) human, in human-like cognition.
I wonder does it mean that ublock origin has anti-anti-adblock functionality? (My guess is yes but I wanted to take the opportunity to spell that word)
reply