Waiting for official support in llama.cpp. There is a fork that can run a lightl...

		zozbot234 15 days ago \| parent \| context \| favorite \| on: DeepClaude – Claude Code agent loop with DeepSeek ... Waiting for official support in llama.cpp. There is a fork that can run a lightly quantized (Q2 expert layers) DeepSeek V4 Flash in 128GB RAM without offloading weight fetches from disk.

Ouch. Can't run that on my M4 mac with 48GB RAM.