This is pure speculation on my part, but the way I think this will play out is something like this:
- current CapEx will make the production side increase capacity
- advances in TPUs, NPUs, open weight and quantization will keep going at a rapid pace
- when the spending slows/stops, hardware prices will drop, hard
- most AI workloads will move to the edge (except frontier models) because the hardware is cheaper than a subscription
(and at some point there could be a crash like 2008)
For example, most of my AI use lately has been running Qwen3.6-35B-A3B-UD-Q8_K_XL on a 64GB MacBook Pro with an M3 Max. It runs at ~57 tokens/s and it's mostly fine.
I do use the frontier models a bit, but only when the task is too complex for the local model.
Basic crap, like analyzing an existing codebase and bouncing ideas, making small changes, the local model is enough.
If you run enough samples you'll get results matching the learned probability distribution, the more you sample the higher the chances that you'll land on an unlikely response.
I'm wondering what American society or the economy would look like following the current trends.
An economy of capital owners and everyone else on govt assistance or working for scraps? Sounds like a recipe for "interesting" times. Unhinged people are already making attempts on Sama and we are just getting started.
It is possible to treat as purely relational but it can be suboptimal on data access if you follow through with it.
The main cost is on the join when you need to access several columns, it's flexible but expensive.
To take full advantage of columnar, you have to have that join usually implicitly made through data alignment to avoid joining.
For example, segment the tables in chunks of up to N records, and keep all related contiguous columns of that chunk so they can be independently accessed:
That balances pointer chasing and joining, you can avoid the IO by only loading needed columns from the segment, and skip the join because the data is trivially aligned.
yeah updates are where it falls over for us. inserts were fine, reads were great, but any workflow that needed to correct a small slice of rows after the fact got painful fast. we ended up keeping the row store for the hot path and rebuliding the columnar copy overnight. probably not elegant but it stopped the bleeding.
Nice, the only weird thing was the assumptions about OLAP (and I had to speed it up to ~1.4x).
Like it uses strings (OLAP works way better over integral data, it sucks at strings) or that it's easy to scale.
It is easy-ish under fixed queries (classic MOLAP for example) but not over arbitrary queries and frequent updates, then it degenerates to a problem much worse than OLTP.
These are all poorly designed systems from a CX perspective (the billing systems).
Billing is usually event driven. Each spending instance (e.g. API call) generates an event.
Events go to queues/logs, aggregation is delayed.
You get alerts when aggregation happens, which if the aggregation service has a hiccup, can be many hours later (the service SLA and the billing aggregator SLA are different).
Even if you have hard limits, the limits trigger on the last known good aggregate, so a spike can make you overshoot the limit.
All of these protect the company, but not the customer.
If they really cared about customer experience, once a hard limit hits, that limit sets how much the customer pays until it is reset, period, regardless of any lags in billing event processing.
That pushes the incentive to build a good billing system. Any delays in aggregation potentially cost the provider money, so they will make it good (it's in their own best interest).
It's not typically a problem that usage is event driven. At least not for prepaid phone plans. Or debit cards. Or mailboxes. Or any myriad of prepaid or quota'd services. It's not rocket science, just a bad business practice on the part of Google.
For example, most of my AI use lately has been running Qwen3.6-35B-A3B-UD-Q8_K_XL on a 64GB MacBook Pro with an M3 Max. It runs at ~57 tokens/s and it's mostly fine.
I do use the frontier models a bit, but only when the task is too complex for the local model.
Basic crap, like analyzing an existing codebase and bouncing ideas, making small changes, the local model is enough.
reply