Yes, Nagios (at least, this used to be our solution 10+ years ago. For all I know, there could be something more current now). The core of Nagios is open source. Also, you could just buy one of their books and copy their underlying way of doing things into your existing system. https://www.nagios.org/about/propaganda/books/
They use hierarchies of nodes. So if an upstream node goes down, you just get notified for that one node, not all for the downstream nodes.
They also use escalating hierarchies for roles, groups, and methods of communications as well. So if something goes down, you don't need to notify everybody, just one role/group (at least initially).
Also, they can set the granular severity of notifications. For instance, if something only goes down intermittently, you could decide whether this is important to you or not.
Now this isn't my area of expertise and it's been a long time since I've seen it in use, but its entire point was to reduce the number of alerts IT support would receive anytime something went wrong.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…