Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is something that should be possible in principle, since the machines underneath are deterministic, it’s just a limitation of the implementation.


"In principle" - sure, but in practice, even if you pin the seed, your float32 calculations are going to drift due to non-deterministic CUDA kernels during parallel execution. You'll never get bit-for-bit identical tensors across different GPUs or even different driver versions, it's a fundamental property of parallel computing




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: