> I managed to crawl [..] more than 300k movies from IMDB in just a few hour...

alexbardas · on Aug 11, 2012

Very true indeed. I was also randomly changing user-agents (Mozilla, Safari, Chrome, IE). I thought that this will be harder to tell whether there is a lot of traffic from the same network or someone is just intensively crawling the site.

For me, it was more a proof of how efficient and fast a crawler can be. Also, a response from IMDB was very fast in less than 0.4 seconds, so not that much time was lost there.

binarysolo · on Aug 11, 2012

Gray hat question out of curiosity and possible experience: did you also use proxies or perhaps even Tor?

joshu · on Aug 11, 2012

so how polite does one need to be? One hit per x seconds?

yorhel · on Aug 12, 2012

If the /robots.txt does not mention a Crawl-delay, one page per 3 seconds is often a safe value. Of course this rather heavily depends on the site. In any case, if you have any specific need, always contact the people responsible for the site. I occasionaly run custom queries against the database on request, for example.