Managing failures and configuring systems properly are of critical importance for robust distributed services. Unfortunately, protocols offering strong fault-tolerance guarantees are generally too costly and insensitive to performance criteria. Yet, system management in practice is often ad-hoc and ill-defined, leading to under-utilized capacity or adverse effects from poorly-behaving machines. This paper proposes a new abstraction called linkattestation groups (LA-Groups) for building robust distributed systems. Developers specify application-level correctness conditions or performance requirements for nodes. Nodes vouch for each other's acceptability within small groups of nodes through digitally-signed link attestations, and then apply a link-state protocol to determine these group relationships.
Discussion(0)
No comments yet. Be the first to comment.