Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Extremely feasible. The model predicts a sentence for an image in ~3ms on a GPU, and ~100ms on CPU. Convolutional Networks, Recurrent Networks, in general most neural nets are very efficient at test time. The training is the somewhat expensive part, but for example I've trained all of the models in the paper on an average (CPU) cluster machine in about one day. Of course, this assumes a pretrained image Convolutional Network that was trained previously on ImageNet - that part can take a day to get to 90%, and then a week more to get the last 10% of cutting edge performance.

Edit: Generating images is much harder and out of scope. What might be feasible is to perhaps stitch parts of existing images together. I'm not sure, that's an open problem and would make for an excellent SIGGRAPH paper.



What kind of GPU - Intel integrated or high-end Nvidia?


A CUDA capable GPU.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: