Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Ollama is kind of ok to get started, but as I understand it they don't give you a choice in the quantisation you'll use. Please correct me if I'm wrong.

One thing I am sure about it is they store large model files renamed as large globally unique identifier, and I still haven't understood that part of the design as anything but some silly obfuscating embrace... And here again, I'd love to be shown how I'm wrong.



All in the name of UX. It’s modeled after Docker, so it defaults to doing things that way. Really does make for great ease of use, imo.


you can, when you search for a model on the ollama website there is a drop down that lets you select a “tag”. Sort of like a docker container tag. This lets you pick the quantization you want.

example: https://ollama.com/library/llama3.2/tags


You can choose the quantization by appending the right tag to the model name, but they don't support other more advanced useful features (e.g. you need a special flag to enable flash attention and you cannot use KV cache quantization for large contexts).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: