Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I was surprised to see their canary stages are just 5 minutes. Many problems take longer to manifest. That seems like a fairly risky release process.


It's actually longer than 5 minutes. There is the duration of the 2% canary deploy where we start to see pick up of traffic, a 5 minute wait, then a 20% "deploy", and a 5 minute wait. All in all this comes out to around 10-15ish minutes in canary. This is a stage where we can almost instantly shut off traffic to the canary deploy.

Could we reduce risk by lengthening the process? Maybe, but you also make deploys longer which means less stuff can get through in a day. This makes devs respond with larger PRs, for example, which increases the risk profile.

So we need to balance time and duration. Typically large problems will manifest quickly, or take a lot longer to detect (and thus are generally more minor problems) when you have our scale of a user base in my experience.


> around 10-15ish minutes in canary

10-15 is fast I think

Sounds as if you can do more than 100 deployments per day? -- but I guess you don't do that many?


The problems that don't immediately manifest could very well take hours or days or longer. There has to be a limit, and 5 minutes is as good as any.


A lot of alerts use moving averages or sustain times to squelch transient noise. You have to wait for the max sustain time to pass before you can conclude that lack of alert = lack of problem.

That time could very well be 5 minutes but the two need to be coordinated.


Yeah, wouldn't you need some sort of minimum amount of traffic to be able to use canary deployment?




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: