Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

PyTorch doesn't offer an inplace softmax which contributes about 1GiB extra memory for inference (of stable diffusion). Although all these are not significant improvements comparing to just switch to FlashAttention inside the UNet model.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: