I do find it fascinating how blatantly people lie about things, the article mentions "Organic farming uses 84% more land for the same yield, but yields are 55% lower by area than conventional." yes, and that is the exact signature indicating that organic produce receives less harsh chemicals and is less exploitive of the ecosystem than "regular" produce. Not worth the read, it's simply anti-organic propaganda. Note: I'm not personally religious about organic/not-organic but neither do I enjoy overt misinformation campaigns.
By pointing out the exact things that will likely happen you are oddly enough hedging against (at least some of them) happening!
A) I reckon it's true that smaller models will continue to improve massively through optimization and better and better harnesses, this tech is all still very young and A LOT of resources and (good-)will is being thrown at it.
B) The 1T+ models will be able to sideload and improve upon a lot of the fundamental improvements that happen to the smaller models to speed up incredibly while getting better at tools while (on a gradient) getting -more- things right.
C) More of an observation that I think is worth keeping in mind clearly; Karl Popper's black swan and all, truth in our temporal world IS a gradient!
> The 1T+ models will be able to sideload and improve upon a lot of the fundamental improvements that happen to the smaller models to speed up incredibly while getting better at tools while (on a gradient) getting -more- things right.
There's less room to improve in things on several fronts.
GRAM very likely may scale sub-linearly with parameter growth. A 100M param model may gain reasoning by a factor of 4000, while a 100B model gains reasoning by a factor of 2, and a 1T model actually gets worse.
Additionally, the 1T model with reasoning is already pretty good. It can only improve in certain things so much.
If you score 0.02% on a metric (which small models often do), you can pretty easily get 4000x better. If you're already scoring >50%, you can't even get 2x better.
Worse in my opinion since the look is simply Tesla (whether one likes that or not), no one would have blinked an eyelid if Tesla released this car whereas Ferrari doing so comes off incoherent.
Excellent practical guide and pictures, if OP is around on this thread: well done! Your future self is going to appreciative too when this needs repeating at some point!
These days it's for sure the dev environment that is lacking, hardware is okay (potentially great?!), software abysmal. To run a local llm in a stable manner implies using Vulkan.. any attempt at ROCm is totally hamstrung by haphazard support of hardware alongside with an online presence poisoned by people primarily discussing work-arounds rather than work when it comes to AMD as a platform. Argh.
On my gfx1030 "consumer grade hardware", ROCm means using SDMA, and that is broken for my system. Forcing `HSA_ENABLE_SDMA=0` makes it "work", but also makes loading tensors to VRAM take 15x longer.
A vulkan computer shader is more portable, chances are also that the tooling for it will still be supported in a few years for your GPU (which isn't a given for ROCm especially when dealing with consumer cards)
It's sad that they have gone political whereas their goal should, in my optics, be almost technocratically in favour of their own stated goals of "protecting user privacy from government/corporate surveillance, defending free speech online, enforcing net neutrality, promoting encryption, and combating abusive intellectual property laws".
reply