Abstract
Shared counters are among the most basic coordination structures in distributed computing. Known implementations of shared counters are either blocking, non-linearizable, or have a sequential bottleneck. We present the first counter algorithm that is both linearizable, non-blocking, and can provably achieve high throughput in k-synchronous executions—executions in which process speeds vary by at most a constant factor k. The algorithm is based on a novel variation of the software combining paradigm that we call bounded-wait combining (BWC). It can thus be used to obtain implementations, possessing the same properties, of any object that supports combinable operations, such as a stack or a queue. Unlike previous combining algorithms where processes may have to wait for each other indefinitely, in the BWC algorithm, a process only waits for other processes for a bounded period of time and then “takes destiny in its own hands”. In order to reason rigorously about the parallelism attainable by our algorithm, we define a novel metric for measuring the throughput of shared objects, which we believe is interesting in its own right. We use this metric to prove that our algorithm achieves throughput of Ω(N/ log N) in k-synchronous executions, where N is the number of processes that can participate in the algorithm. Our algorithm uses two tools that we believe may prove useful for obtaining highly parallel non-blocking implementation of additional objects. The first are “synchronous locks”, locks that are respected by processes only in k-synchronous executions and are disregarded otherwise; the second are “pseduo-transactions”—a weakening of regular transactions that allows higher parallelism.
Similar content being viewed by others
References
Aspnes J., Herlihy M., Shavit N.: Counting networks. J. ACM 41(5), 1020–1048 (1994)
Attiya H., Dagan A.: Improved implementations of binary universal operations. J. ACM 48(5), 1013–1037 (2001)
Attiya, H., Guerraoui, R., Kouznetsov, P.: Computing with reads and writes in the absence of step contention. In: Proceedings of the 19th International Symposium on Distributed Computing (DISC’05) (2005)
Dwork C., Herlihy M., Waarts O.: Contention in shared memory algorithms. J. ACM 44(6), 779–805 (1997)
Ellen, F., Lev, Y., Luchangco, V., Moir, M.: Snzi: scalable nonzero indicators. In: Proceedings of the 27th Annual ACM Symposium on Principles of Distributed Computing (PODC) (2007)
Fich, F.E., Hendler, D., Shavit, N.: Linear lower bounds on real-world implementations of concurrent objects. In: Proceedings of the 46th Annual Symposium on Foundations of Computer Science (FOCS) (2005)
Goodman, J.R., Vernon, M.K., Woest, P.J.: Efficent synchronization primitives for large-scale cache-coherent multiprocessors. In: ASPLOS, pp. 64–75 (1989)
Gottlieb, A., Grishman, R., Kruskal, C.P., McAuliffe, K.P., Rudolph, L., Snir, M.: The nyu ultracomputer designing a mimd, shared-memory parallel machine. In: ISCA ’98: 25 years of the international symposia on Computer architecture (selected papers), pp. 239–254, New York, NY, USA. ACM Press, New York (1998)
Greenwald, M.: Two-handed emulation: how to build non-blocking implementations of complex data-structures using dcas. In: PODC ’02: Proceedings of the twenty-first annual symposium on Principles of distributed computing, pp. 260–269, New York, NY, USA. ACM Press, New York (2002)
Ha, P.H., Papatriantafilou, M., Tsigas, P.: Self-tuning reactive distributed trees for counting and balancing. In: OPODIS, pp. 213–228 (2004)
Hendler, D., Kutten, S.: Bounded-wait combining: Constructing robust and high-throughput shared objects. In: Proceedings of the 20th International Symposium on Distributed Computing (DISC’06), pp. xx–xx (2006)
Hendler, D., Shavit, N., Yerushalmi, L.: A scalable lock-free stack algorithm. In: SPAA ’04: Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures, pp. 206–215, New York, NY, USA. ACM Press, New York (2004)
Herlihy M.: Wait-free synchronization. ACM Trans. Program. Lang. Syst. 13(1), 124–149 (1991)
Herlihy M.: A methodology for implementing highly concurrent objects. ACM Trans. Program. Lang. Syst. 15(5), 745–770 (1993)
Herlihy M., Shavit N., Waarts O.: Linearizable counting networks. Distributed Comput. 9(4), 193–203 (1996)
Herlihy M., Wing J.M.: Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12(3), 463–492 (1990)
Kruskal C.P., Rudolph L., Snir M.: Efficient synchronization on multiprocessors with shared memory. ACM Trans. Program. Lang. Syst. 10(4), 579–601 (1988)
Moir, M., Nussbaum, D., Shalev, O., Shavit, N.: Using elimination to implement scalable and lock-free fifo queues. In: SPAA’05: Proceedings of the 17th annual ACM symposium on Parallelism in algorithms and architectures, pp. 253–262, NY, USA. ACM Press, New York (2005)
Shavit N., Zemach A.: Diffracting trees. ACM Trans. Comput. Syst. 14(4), 385–428 (1996)
Shavit N., Zemach A.: Combining funnels: a dynamic approach to software combining. J. Parallel Distributed Comput. 60(11), 1355–1387 (2000)
Wattenhofer R., Widmayer P.: The counting pyramid: an adaptive distributed counting scheme. J. Parallel Distributed Comput. 64(4), 449–460 (2004)
Yew P.-C., Tzeng N.-F., Lawrie D.H.: Distributing hot-spot addressing in large-scale multiprocessors. IEEE Trans. Comput. 36(4), 388–395 (1987)
Author information
Authors and Affiliations
Corresponding author
Additional information
A preliminary version of this paper appeared in [11].
D. Hendler is supported in part by a grant from the Israel Science Foundation. S. Kutten is supported in part by a grant from the Israel Science Foundation.
Rights and permissions
About this article
Cite this article
Hendler, D., Kutten, S. Bounded-wait combining: constructing robust and high-throughput shared objects. Distrib. Comput. 21, 405–431 (2009). https://doi.org/10.1007/s00446-009-0078-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00446-009-0078-4