Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The surprise I guess is that—when plotted on a log chart—the prevalence of each rank forms a straight line with respect to its rank.

Of course the frequency is going to be proportional in some way to the rank. But there are many ways that could happen. #1 could occur 10% more than #2. Or twice as much.

And for the law to hold true no matter how deep you go is also surprising. Language seems like it should be a little more chaotic than that, with the top, say, 50 words following one distribution, then the longer tail kinda bumping around at different slopes.

This is my lay understanding. Corrections welcome :)



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: