Efficient and non-blocking agreement protocols

Gupta, Suyash; Sadoghi, Mohammad

doi:10.1007/s10619-019-07267-w

Efficient and non-blocking agreement protocols

Published: 13 April 2019

Volume 38, pages 287–333, (2020)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Suyash Gupta¹ &
Mohammad Sadoghi¹

523 Accesses
1 Altmetric
Explore all metrics

Abstract

Large scale distributed databases are designed to support commercial and cloud based applications. The minimal expectation from such systems is that they ensure consistency and reliability in case of node failures. The distributed database guarantees reliability through the use of atomic commitment protocols. Atomic commitment protocols help in ensuring that either all the changes of a transaction are applied or none of them exist. To ensure efficient commitment process, the database community has mainly used the two-phase commit (2PC) protocol. However, the 2PC protocol is blocking under multiple failures. This necessitated the development of non-blocking, three-phase commit (3PC) protocol. However, the database community is still reluctant to use the 3PC protocol, as it acts as a scalability bottleneck in the design of efficient transaction processing systems. In this work, we present EasyCommit protocol which leverages the best of both worlds (2PC and 3PC), that is non-blocking (like 3PC) and requires two phases (like 2PC). EasyCommit achieves these goals by ensuring two key observations: (i) first transmit and then commit, and (ii) message redundancy. We present the design of the EasyCommit protocol and prove that it guarantees both safety and liveness. We also present a detailed evaluation of EC protocol and show that it is nearly as efficient as the 2PC protocol. To cater the needs of geographically large scale distributed systems we also design a topology-aware agreement protocol (Geo-scale EasyCommit) that is non-blocking, safe, live and outperforms 3PC protocol.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

A Problem in Two-Phase Commit

Logless one-phase commit made possible for highly-available datastores

Article 18 February 2019

Multi-shot distributed transaction commit

Article 24 March 2021

Notes

The coordinating node is the one which initiates the commit protocol, and in this work it is also the node which receives the client request to execute a transaction.
Partitioned database is the terminology used by the database community to refer to the shared-nothing distributed databases, and should not be intermixed with the term network partitioning.
The term cohort refers to a participating node in the transaction commit process. We use these terms interchangeably.
INITAL, READY and WAIT states are considered as non-committable states.
Without node failures, any transaction that reaches the prepare phase is assumed to successfully commit.

References

Abbadi, A.E., Toueg, S.: Maintaining availability in partitioned replicated databases. ACM Trans Database Syst 14(2), 264–290 (1989). https://doi.org/10.1145/63500.63501
Article MathSciNet Google Scholar
Abdallah, M., Guerraoui, R., Pucheral, P.: One-phase commit: does it make sense? ICPADS (1998)
Agrawal, D., El Abbadi, A., Mahmoud, H.A., Nawab, F., Salem, K.: Managing geo-replicated data in multi-datacenters. In: Proceedings of the 2013 Databases in Networked Information Systems—8th International Workshop, DNIS’13, pp. 23–43 (2013)
Amir, Y., Danilov, C., Dolev, D., Kirsch, J., Lane, J., Nita-Rotaru, C., Olsen, J., Zage, D.: Steward: scaling byzantine fault-tolerant replication to wide area networks. IEEE Trans. Dependable Secur. Comput. 7(1), 80–93 (2010). https://doi.org/10.1109/TDSC.2008.53
Article Google Scholar
Bailis, P., Davidson, A., Fekete, A., Ghodsi, A., Hellerstein, J.M., Stoica, I.: Highly available transactions: virtues and limitations. Proc VLDB Endow 7(3), 181–192 (2013)
Article Google Scholar
Bailis, P., Fekete, A., Ghodsi, A., Hellerstein, J.M., Stoica, I.: Scalable atomic visibility with RAMP transactions. ACM Trans Database Syst 41(3), 15 (2016)
Article MathSciNet Google Scholar
Baker, J., Bond, C., Corbett, J.C., Furman, J., Khorlin, A., Larson, J., Leon, J.M., Li, Y., Lloyd, A., Yushprakh, V.: Megastore: providing scalable, highly available storage for interactive services. In: Proceedings of the Conference on Innovative Data system Research (CIDR), pp. 223–234 (2011)
Bernstein, P.A., Goodman, N.: Concurrency control in distributed database systems. ACM Comput Surv 13(2), 185–221 (1981)
Article MathSciNet Google Scholar
Bernstein, P.A., Goodman, N.: Multiversion concurrency control—theory and algorithms. ACM TODS 8(4), 465–483 (1983)
Article MathSciNet Google Scholar
Bernstein, P.A., Goodman, N.: An algorithm for concurrency control and recovery in replicated distributed databases. ACM Trans Database Syst 9(4), 596–615 (1984). https://doi.org/10.1145/1994.2207
Article MathSciNet Google Scholar
Bernstein, P.A., Hadzilacos, V., Goodman, N.: Concurrency Control and Recovery in Database Systems. Addison-Wesley Longman Publishing Co., Inc., Boston, MA (1987a)
Google Scholar
Bernstein, P.A., Hadzilacos, V., Goodman, N.: Concurrency Control and Recovery in Database Systems. Addison-Wesley Longman Publishing Co., Boston, MA (1987b)
Google Scholar
Boutros, B.S., Desai, B.C.: A two-phase commit protocol and its performance. In: IEEE, DEXA, pp. 100–105 (1996)
Chen, K., Zhou, Y., Cao, Y.: Online data partitioning in distributed database systems. In: Proceedings of the 18th International Conference on Extending Database Technology, OpenProceeding.org, pp. 1–12 (2015)
CockroachDB (2018). https://www.cockroachlabs.com/
Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing, ACM, pp. 143–154 (2010)
Corbett, J.C., Dean, J., Epstein, M., Fikes, A., Frost, C., Furman, J., Ghemawat, S., Gubarev, A., Heiser, C., Hochschild, P., Hsieh, W., Kanthak, S., Kogan, E., Li, H., Lloyd, A., Melnik, S., Mwaura, D., Nagle, D., Quinlan, S., Rao, R., Rolig, L., Saito, Y., Szymaniak, M., Taylor, C., Wang, R., Woodford, D.: Spanner: Google’s globally-distributed database. In: 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12), USENIX Association, pp. 261–264 (2012)
Council TPP (2010) Tpc benchmark c (revision 5.11)
Diaconu, C., Freedman, C., Ismert, E., Larson, P.A., Mittal, P., Stonecipher, R., Verma, N., Zwilling, M.: Hekaton: SQL Server’s Memory-optimized OLTP Engine. ACM, pp. 1243–1254 (2013)
Dutta, P., Guerraoui, R., Pochon, B.: Fast non-blocking atomic commit: an inherent trade-off. Inf Process Lett 91(4), 195–200 (2004)
Article MathSciNet Google Scholar
El Abbadi, A., Skeen, D., Cristian, F.: An efficient, fault-tolerant protocol for replicated data management. In: Proceedings of the Fourth ACM SIGACT-SIGMOD Symposium on Principles of Database Systems, ACM, New York, PODS ’85, pp 215–229 (1985). https://doi.org/10.1145/325405.325443
Freels, M.: FaunaDB: an architectural overview (2018)
Fung, B.: The embarrassing reason behind Amazons huge cloud computing outage this week. The Washington Post, Washington, DC (2017)
Google Scholar
Gawlick, D., Kinkade, D.: Varieties of concurrency control in IMS/VS fast path. IEEE Database Eng. Bull. 8, 3–10 (1985)
Google Scholar
Gifford, D.K.: Weighted voting for replicated data. In: Proceedings of the Seventh ACM Symposium on Operating Systems Principles, ACM, New York, NY, SOSP ’79, pp 150–162 (1979). https://doi.org/10.1145/800215.806583
Gray, J.: Notes on data base operating systems. In: Operating Systems, An Advanced Course. Springer, Berlin, pp. 393–481 (1978)
Gray, J.: The transaction concept: virtues and limitations (invited paper). In: VLDB, pp. 144–154 (1981)
Gray, J.: A Comparison of the Byzantine Agreement Problem and the Transaction Commit Problem, pp. 10–17. Springer, New York (1990)
Google Scholar
Gray, J., Lamport, L.: Consens. Trans. Commit. ACM TODS 31(1), 133–160 (2006)
Article Google Scholar
Gray, J., Reuter, A.: Transaction Processing: Concepts and Techniques, 1st edn. Morgan Kaufmann Publishers Inc., Burlington (1992)
MATH Google Scholar
Guerraoui, R.: Revisiting the Relationship Between Non-blocking Atomic Commitment and Consensus, pp. 87–100. Springer, Berlin (1995)
Google Scholar
Guerraoui, R., Larrea, M., Schiper, A.: Reducing the Cost for Non-blocking in Atomic Commitment. In: IEEE Proceedings of 16th International Conference on Distributed Computing Systems, pp. 692–697 (1996)
Gupta, S., Sadoghi, M.: Blockchain Transaction Processing, pp. 1–11. Springer, Cham (2018a)
Google Scholar
Gupta, S., Sadoghi, M.: EasyCommit: A non-blocking two-phase commit protocol. In: Proceedings of the 21st International Conference on Extending Database Technology, Open Proceedings, EDBT (2018b)
Harding, R., Van Aken, D., Pavlo, A., Stonebraker, M.: An evaluation of distributed concurrency control. Proc VLDB Endow 10(5), 553–564 (2017)
Article Google Scholar
Haritsa, J.R., Ramamritham, K., Gupta, R.: The PROMPT real-time commit protocol. IEEE TPDS 11(2), 160–181 (2000)
Google Scholar
Herlihy, M.P., Wing, J.M.: Linearizability: a correctness condition for concurrent objects. ACM TOPLAS 12(3), 463–492 (1990)
Article Google Scholar
Jiménez-Peris, R., Patiño Martínez, M., Alonso, G., Arévalo, S.: A low-latency non-blocking commit service. Springer, Berlin DISC’01 (2001)
Kallman, R., Kimura, H., Natkins, J., Pavlo, A., Rasin, A., Zdonik, S.B., Jones, E.P.C., Madden, S., Stonebraker, M., Zhang, Y., Hugg, J., Abadi, D.J.: H-store: a high-performance, distributed main memory transaction processing system. PVLDB 1, 1496–1499 (2008)
Google Scholar
Lamport, L.: The part-time parliament. ACM Trans Comput Syst 16(2), 133–169 (1998)
Article Google Scholar
Levy, E., Korth, H.F., Silberschatz, A.: An optimistic commit protocol for distributed transaction management. In: ACM SIGMOD, ACM, pp. 88–97 (1991)
Lin, Q., Chang, P., Chen, G., Ooi, B.C., Tan, K.L., Wang, Z.: Towards a non-2PC transaction management in distributed database systems. In: Proceedings of the 2016 International Conference on Management of Data, ACM, New York, NY, SIGMOD ’16, pp 1659–1674 (2016). https://doi.org/10.1145/2882903.2882923
Lloyd, W., Freedman, M.J., Kaminsky, M., Andersen, D.G.: Stronger semantics for low-latency geo-replicated storage. In: USENIX Association, NSDI, pp. 313–328 (2013)
Mahmoud, H., Nawab, F., Pucher, A., Agrawal, D., El Abbadi, A.: Low-latency multi-datacenter databases using replicated commit. Proc VLDB Endow 6(9), 661–672 (2013). https://doi.org/10.14778/2536360.2536366
Article Google Scholar
Mahmoud, H.A., Arora, V., Nawab, F., Agrawal, D., El Abbadi, A.: MaaT: effective and scalable coordination of distributed transactions in the cloud. Proc VLDB Endow 7(5), 329–340 (2014). https://doi.org/10.14778/2732269.2732270
Article Google Scholar
Mao, Y., Junqueira, F.P., Marzullo, K.: Mencius: building efficient replicated state machines for WANs. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, USENIX Association, pp. 369–384 (2008)
MemSQL (2013). http://www.memsql.com
Mohan, C., Lindsay, B., Obermarck, R.: Transaction management in the R* distributed database management system. ACM TODS 11(4), 378–396 (1986)
Article Google Scholar
Nawab, F., Sadoghi, M.: Blockplane: A global-scale byzantizing middleware. In: Proceedings of the 35th IEEE International Conference on Data Engineering, IEEE, ICDE ’19 (2019)
Nawab, F., Arora, V., Agrawal, D., El Abbadi, A.: Minimizing commit latency of transactions in geo-replicated data stores. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, ACM, SIGMOD ’15, pp 1279–1294 (2015)
NuoDB (2010). http://www.nuodb.com
O’Brien, S.A.: Facebook. Instagram experience outages Saturday. CNN, GA, USA (2017)
Ongaro, D., Ousterhout, J.: In search of an understandable consensus algorithm. In: Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference, USENIX Association, USENIX ATC’14, pp. 305–320 (2014)
Oracle, C.: Oracle 9i real application clusters concepts release 2 (9.2), Part Number A96597-01 (2002)
Ozsu, M.T., Valduriez, P.: Principles of Distributed Database Systems, 3rd edn. Springer, New York (2011)
Google Scholar
Park, T., Yeom, H.Y.: A distributed group commit protocol for distributed database systems. ICPADS (1991)
Patterson, S., Elmore, A.J., Nawab, F., Agrawal, D., El Abbadi, A.: Serializability, not serial: concurrency control and availability in multi-datacenter datastores. Proc VLDB Endow 5(11), (2012)
Pavlo, A., Curino, C., Zdonik, S.: Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems. In: ACM, SIGMOD ’12, pp. 61–72 (2012)
Peng, D., Dabek, F.: Large-scale incremental processing using distributed transactions and notifications. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, USENIX Association, Berkeley, CA, OSDI’10, pp. 251–264 (2010)
Qadah, T.M., Sadoghi, M.: QueCC: a queue-oriented, control-free concurrency architecture. In: Proceedings of the 19th International Middleware Conference, ACM, New York, NY, Middleware ’18, pp 13–25, (2018). https://doi.org/10.1145/3274808.3274810
Reddy, P.K., Kitsuregawa, M.: Reducing the blocking in two-phase commit protocol employing backup sites. In: IEEE, COOPIS’98, pp. 406–416 (1998)
Sadoghi, M., Blanas, S.: Transaction processing on modern hardware. Synth. Lect. Data Manag. 14(2), 1–138 (2019). https://doi.org/10.2200/S00896ED1V01Y201901DTM058
Article Google Scholar
Sadoghi, M., Ross, K.A., Canim, M., Bhattacharjee, B.: Making updates disk-I/O friendly using SSDs. Proc VLDB Endow 6(11), 997–1008 (2013)
Article Google Scholar
Sadoghi, M., Canim, M., Bhattacharjee, B., Nagel, F., Ross, K.A.: Reducing database locking contention through multi-version concurrency. Proc VLDB Endow 7(13), 1331–1342 (2014)
Article Google Scholar
Sadoghi, M., Bhattacherjee, S., Bhattacharjee, B., Canim, M.: L-Store: A real-time OLTP and OLAP system (2018). http://www.OpenProceeding.org, EDBT
Samaras, G., Britton, K., Citron, A., Mohan, C.: Two-phase commit optimizations in a commercial distributed environment. Distrib. Parallel Databases 3(4), 325–360 (1995)
Article Google Scholar
Shute, J., Vingralek, R., Samwel, B., Handy, B., Whipkey, C., Rollins, E., Oancea, M., Littleeld, K., Menestrina, D., Ellner, S., Apte, H.: F1: A distributed sql database that scales. In: VLDB (2013)
Skeen, D.: Nonblocking commit protocols. In: ACM, SIGMOD, pp. 133–142 (1981)
Skeen, D.: A quorum-based commit protocol. Tech. rep. (1982)
Skeen, D., Stonebraker, M.: A formal model of crash recovery in a distributed system. IEEE Trans. Softw. Eng. 9(3), 219–228 (1983)
Article Google Scholar
Stamos, J., Cristian, F.: A low-cost atomic commit protocol. In: Proceedings of the 9th Symposium on Reliable Distributed Systems, IEEE, pp. 10–17 (1990)
Stonebraker, M.: Concurrency control and consistency of multiple copies of data in distributed ingres. IEEE Trans. Softw. Eng. SE–5(3), 188–194 (1979). https://doi.org/10.1109/TSE.1979.234180
Article MATH Google Scholar
Stonebraker, M.: The case for shared nothing. Database Eng. 9, 4–9 (1986)
Google Scholar
Sulleyman, A.: Twitter down: social media app and website not working. The Independent, UK (2017)
Thomson, A., Diamond, T., Weng, S.C., Ren, K., Shao, P., Abadi, D.J.: Calvin: fast distributed transactions for partitioned database systems. In: SIGMOD (2012)
TiDB (2018). https://pingcap.com/en/
VoltDB (2010). https://www.voltdb.com/

Download references

Acknowledgements

We would like to acknowledge Thamir Qadah for the valuable discussions that helped us to design ExpoDB system. Further, we acknowledge the anonymous reviewers for their useful inputs and comments.

Author information

Authors and Affiliations

University of California, Davis, Davis, CA, USA
Suyash Gupta & Mohammad Sadoghi

Authors

Suyash Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Sadoghi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Suyash Gupta.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gupta, S., Sadoghi, M. Efficient and non-blocking agreement protocols. Distrib Parallel Databases 38, 287–333 (2020). https://doi.org/10.1007/s10619-019-07267-w

Download citation

Published: 13 April 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s10619-019-07267-w

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Efficient and non-blocking agreement protocols

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Problem in Two-Phase Commit

Logless one-phase commit made possible for highly-available datastores

Multi-shot distributed transaction commit

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Efficient and non-blocking agreement protocols

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Problem in Two-Phase Commit

Logless one-phase commit made possible for highly-available datastores

Multi-shot distributed transaction commit

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation