Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I love to hate on U-net. It works but it's just so inelegant. That is not a true convolution and only works for particular 'patch' sizes bothers me to no end.

I am not super up to date with the field, but has anyone caught on to using 'wavenet' like architectures yet? That is, dialated convolutions.

You have to be a little clever to get residual connections to work properly, but it's a true convolution that works for any patch size, is super-parameter efficient, and captures the same multi-scale features U-net was designed for.

Anecdotally, I used such an arch for some (unfortunately proprietary) 3D imaging work and achieved some nice results.



> "It works".

Well that's sorta the point. Personally I'm not a super huge fan of creating a super specific network architecture and resulting in 2-3% difference in performance. Certainly if you're doing something where a configuration makes sense (LSTM for time series for example), but I think there needs to be a rethinking of the Grand Theory of Deep Learning Architecture TM.

And frankly I think a unsaid reason why U-net is so popular is that it does generalize reasonably well with limited data, which in many fields is not as massive as COCO.

I realize it's sorta asking too much (I both want a NN that works both out of the box, super easily, and doesn't require a TON of data), but I think that's where the current pains are for really explosive growth in AI.


> I think there needs to be a rethinking of the Grand Theory of Deep Learning Architecture TM.

strong agree. Although perhaps not so much a rethinking as a theory of all. Huge dearth of theory in the field. Daily practition involves regular use of black magic intuition for arch, problem posing and debugging. Weird times.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: