Even tiny tiny CPUs can do sub in one cycle, so I doubt that. On super-scalar CPUs xor and sub are normally issued to the same execution units so it wouldn't make a difference there either.
On superscalars running xor trick as is would be significantly slower because it implies a data dependency where there isn't one. But all OOO x86's optimize it away internally.
I did read it. A compiler converts your code into assembly. They usually have varying levels of optimisation depending on what you're doing.
The article boils down to "could AI be a good compiler" and I'd say that consistency and repeatability are far more important than a one-off optimisation of a particular section of code. If you've got to the point where a section of code is worth writing some hand-crafted assembly then it's probably worth your time to really understand what's happening with it. Having it "vibe compiled" for you would be a bad idea.
reply