Method is one of the few places where the spec does indicate case-sensitive rath...

haimez · on Sept 27, 2013

This is going to be a problem for your users. RFC's are worth following when they represent a superset of the standard implementations, but when the RFC is more restrictive than what people actually use you're doing yourself a disservice by sticking to the spec.

chrismorgan · on Sept 27, 2013

That is something where I'll be needing to take care. Real-world usage will be very important. At present, for performance, it doesn't preserve the header value as it is reading it and so an invalid value is entirely lost. (Performance meaning you don't need to do an extra heap allocation for each header.) Providing the raw value of the header when parsing fails is something I may need to do; I'm not yet sure. I already know that "invalid" values for the Expire header (especially -1, as noted in RFC 2616) are normal (and so the Expires header has been switched back from being a Tm to being a ~str for the moment).

As I get further along, I intend to use the data from the Common Crawl, which fortuitously includes response headers, to see how my validation goes. Of course, that's only a small set of the real-world headers (cache ones in particular will be scarcely stressed by that at all). Validating request headers will be harder; I've still got to figure out what to do about that.

In the end, though, I'm determined that it will work and work well. Servo using it (and thus demanding robust HTTP support) will help with that goal.

Something I discovered a few hours ago, reading the specs: I believe this header should be valid, with the value being interpreted as the weak entity tag ``Super Encoding™``. I wonder how many clients or servers would support it? No idea yet.

    etag: w/"=?US-ASCII?B?U3VwZXIg?= =?UTF-8?Q?Encoding=E2=84=A2?="

haimez · on Sept 27, 2013

I don't know the Rust type system well enough (nor the internal representation of strings), but if strings allow you to reference sub-strings without re-allocating, then you only have one contiguous section in memory for headers that you can "point" to for the values (maybe this is too C-like to be possible in Rust). My recommendation (feel free to ignore it) would be something that supports typed headers as well as arbitrary string headers, because the ability to fall-back to strings will make your library usable in a much broader sense.

chrismorgan · on Sept 27, 2013

I'd need to think about whether that's feasible or not in the overall design. (Locally, it'd work fine, but I don't think I want to be keeping the raw value around once it's validated.)

Arbitrary string headers are essential. Conversion between the typed header and strings is part of the design (though only partially implemented at present). As for other extension-headers (as they are designated in RFC 2616), that's the header enum variant ExtensionHeader(~str, ~str).

e12e · on Sept 27, 2013

I'm not sure that's the case when it comes to http methods though -- I thought it was, but seeing two pretty high traffic sites, running different "real-world" web servers both give errors on this -- apparently it's an area in which we've already moved a bit away from "be lenient in what you accept; be strict in what you send".

It's been so long since I've played with netcat and HTTP that I can't remember if 'get' vs 'GET' "used to" work or not...

Still, might be something that should be possible to toggle with a flag (case insensitive parsing on/off or something like that).

Might also be useful to keep in mind that there are very real differences between HTTP/1.0 and HTTP/1.1. For browser-facing stuff, 1.1 should be fine these days(?) -- for apis etc, I don't know if "proper" 1.0 support makes sense or not.