Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm honestly surprised that they trained a StyleGAN. Recently, the Imagen architecture has been show to be both easier in structure, easier to train, and even faster to produce good results. Combined with the "Elucidating" paper by NVIDIA's Tero Karras you can train a 256px Imagen* to tolerable quality within an hour on a RTX 3090.

Here's a PyTorch implementation by the LAION people:

https://github.com/lucidrains/imagen-pytorch

And here's 2 images I sampled after training it for some hours, like 2 hours base model + 4 hours upscaler:

https://imgur.com/a/46EZsJo

* = Only the unconditional Imagen variant, meaning what they show off here. The variant with a T5 text embedding takes longer to train.



Or, since they are comparing to Craiyon, why not just finetune Craiyon itself? Craiyon already exists, just take it off the shelf, you don't need to retrain it from scratch, so the cost to train it from scratch on everything (which is indeed quite large) is not relevant to someone who just wants to generate great food photos.


We haven't experimented much with Imagen, but our initial conclusions were that:

- It's hard to train to a photorealistic quality (we'd be happy to be proven wrong!)

- There is no strong pretrained model available yet

Checking the LAION Discord, the situation doesn't seem to have considerably evolved.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: