Spent some time at work with HP yesterday looking at Vertica, their scalable database. One of the things I liked about it was how it responded to losing a node.
Suppose you have six nodes, n1 - n6 responsible for data partitions d1 - d6, respectively. Each node stores its own data plus the data from its neighbor to the left. So, under normal operation, n2 stores d1 and d2, n3 stores d2 and d3, ..., n1 stores d6 and d1. Now, suppose node 3 fails. N4 can pick up n3's processing, but it has to do its own as well, so it will take twice as long to finish. Since everything waits on the longest node, this 1/6 loss will actually reduce throughput by 50%. Not so great.
But, there's no reason why n4 has to carry the full brunt of the failure. Sure, it's the only one with a copy of d3, but n5 has all of d4. So, on the first query, n4 does double duty, but on the next one, some other node gets tagged with that. If n6 wasn't busy, it could do n5's work as well, n5 could do n4's, and n4 could do n3's. By passing the assignments around, each node can be kept busy without having to actually redistribute the data (or, at least, not having to redistribute it right away; the system will certainly want to start moving stuff around in the background to keep itself fully redundant). Individual queries will still take twice as long, but throughput should be close to 5/6.
I think this is an excellent example of a simple solution that passes the "good enough" test. There are certainly more elaborate solutions in use which result in less degradation to individual query response, but most applications can take a hit to individual operations as long as throughput isn't suffering too greatly.
No comments:
Post a Comment