Interesting conversation. I would add that papers by Lecun and others have been ...

phowon · on Feb 4, 2019

And convolution-based models still find use in all sorts of cool applications in language, such as: https://arxiv.org/abs/1805.04833

With regards to adversarial discussions, it's one thing to argue about whether method A or method B gives better results in a largely empirical and experimental field. But giving a very misleading characterization of a model is actively detrimental especially when it would give casual readers the impression that the Transformer is a "convolution-based" model, which no one in the field would do.