Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can run very large models on a Macbook M2 with 96 GB. They run 1/3 to 1/4 slower in tokens/s than the faster hardware, but they fit in memory.

(400 GB/s is a lot in the form factor but the 4090 and equivalent have 1 TB/s, and H100s several times that)

Edit: Here someone asked the same question: https://www.reddit.com/r/LocalLLaMA/comments/14319ra/rtx_409...



Thanks, I've been trying to find this information - Mac shared memory vs Nvidia VRAM performance differences - for the longest time and your answer and the Reddit link were both super helpful!


You’re welcome — it’s too late for me to edit my wording but hopefully you understood what I meant by “1/3 to 1/4 slower.”

That is ambiguous, instead I should have said that models that fit in memory on both take 3-4 times as long on the M2 as they do on the 4090.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: