Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> investigating "false positive" infrastructure alerts?

Gradually with each false positive (or negative) you learn to tweak your alerts and update dashboards to reduce the noise as much as possible.



So it's really a manual and iterative process....means there should be room for something to be done


You learn pretty quick. Like CPU I don’t alert on it, I do on load average which is more realistic. I’m also solo dev, so I do it on the 15min avg and it need to be above a pretty high threshold 3 times in a row. I don’t monitor ram usage, but swap instead. When it trigger it usually something need to be fixed.

Also check for a monitoring solution with quorum, that way you don’t get bothered by false positives because of a peering issue between your monitoring location and your app (which you have no control over).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: