Apart from the auto-backslapping this a an interesting confirmation that performance of your website impacts search engine results.
As for google sending multiple requests, from the way the article is written it sounds as though google sends the requests all at once and then waits for the answers to come back one by one, you can cure this by switching keep-alive off on the server side.
Typically in your http configuration you would add a line like this (example for apache):
KeepAlive Off
You could even do this just for the googlebot:
BrowserMatch "GoogleBot" nokeepalive
That way you can 'fix' the google bot issues without affecting the normal users of the site.
Even better would be to convince GoogleBot not to pipeline, while keeping keep-alive on. Perhaps by sending some kind of IIS/4 Server header to Googlebot?
Keep-alive and pipelining go hand in hand, a trick I abused for many years in order to serve up streaming video using jpegs.
The funny thing is it works both ways, if you switch keepalive to 'on' and you start dumping answers in to the pipe pre-emptively (because, for instance you know you're talking to your own little piece of javascript on the other side, so you can predict the next request) then you can save yourself the round-trip delays that you would have if you stuck to the regular request/answer, request/answer pattern.
Keep-alive on pretty much implies that pipelining is ok.
For many years this was the 'secret sauce' that my company lived off, the fact that nobody clued in to it is something that amazes me to this date, it seemed a pretty obvious thing to do.
That's a very interesting hack, but I hope no one decides to deploy it today. You really can't predict what the next request will be - the browser can reuse the connection for some other request at a whim.
And I'm sure there's some proxy around that will panic if it gets a response before getting a request.
> Keep-alive on pretty much implies that pipelining is ok.
Well, sure, in theory. But since nearly every browser keeps it off (edit: doesn't pipeline requests), obviously a ton of servers are broken. Even Flickr was broken for many years with pipelining on (image downloads would abort randomly). Chrome is planning to eventually enable it with a bunch of heuristics; hopefully that will improve the situation.
As a rule I have it 'off' because I have seen more bugs related to keep-alive than that I've seen benefits from it but in some special cases the speedup can be dramatic so you should always at least test to see what it does for you.
For instance, a gallery page with lots of small thumbnails could benefit from keep-alive being on.
Is pipelining a well-defined behaviour now? I thought it was still pretty browser dependent.
Obviously you need keep-alive for pipelining to work at all - but are you saying you could actually dump responses down a keep-alive connection without corresponding requests and popular modern browsers will just deal with it?
I've used it since (don't laugh) IE 3 came out because that was the only way to get decent performance out of it.
Pipelining is a thing that is actually harder to guard against than to implement because as soon as keep-alive is on you can start sending multiple requests down the wire.
The other side will presumably respond to the first request by scanning the input up and until the end of the request, then send back the result via the same connection. As soon as it has done that it will look for more input.
Both sides can 'cheat', in other words, you don't actually need to look at the input requests when you're a server, and you don't actually need to wait for that first response to arrive before sending the next if your'e a client.
The socket layer doesn't care one bit, to it it is all data, and the application layer presumably would have a hard time telling when certain bytes were sent if it is not explicitly polling the other side of the connection while generating another answer or reading a new request.
So, typically the sequence on a keepalive connection looks like this:
So as soon as you switch on keep-alive you give the other side the opportunity to start pipe-lining requests or answers.
The problem with the way the google bot works here is that they seem to log all the pipelined but as yet unanswered requests as 'failed' whereas they should only log the request that was next up for answering as failed.
The obvious solutions are either keep alive 'off' for google or speed things up to the point where it will simply work.
I'm all for google to be a force pushing for adoption - pipelining is good - if google requiring it means people pay more attention to making their sites work with it, great.
(I haven't read the spec - does a browser with pipelining on requier the requests rae returned in order?)
While the article was interesting the first line "We did it. We solved one of the unsolved big SEO (Search Engine Optimization) mysteries of the modern time." had me reaching for the close button. Is it just me or is there a LOT of spin from SEO types?
as the author of the article in question, SEO and geek i must say: i deliberately overspinned the wording of the article, as - as a matter of fact - the mystery was so mysterious that nobody in the so called SEO "community" ever noticed that there was something rotten going on in their beloved google webmaster reports.
and (i'm not sure why, it was a friday afternoon when i wrote that thing) i thought it would be funny to take the typical over-the-the-top-linkbait-kind-of-writing-style and ...... did something nobody has ever done before ... put some real information in it.
Someone needs to point out that this isn't the TCP/IP level, it's just HTTP.
I know it is pedantic technical discussions need precision in their agreed terminology. HTTP and TCP/IP are at different stack levels, and everything discussed here is HTTP.
As for google sending multiple requests, from the way the article is written it sounds as though google sends the requests all at once and then waits for the answers to come back one by one, you can cure this by switching keep-alive off on the server side.
Typically in your http configuration you would add a line like this (example for apache):
You could even do this just for the googlebot: That way you can 'fix' the google bot issues without affecting the normal users of the site.