This article describes the basic ideas of the GRIT protocol, which was announced at the IEEE International Conference on Data Engineering (ICDE) 2019, and provides an example of using part of the protocol for implementing a transactional storage backend for JanusGraph. This example focuses on a system with only a single database, but as we said, GRIT can support ACID transactions for systems consisting of multiple databases.
Background
In a microservice architecture, an application may invoke multiple microservices, which are usually implemented in different application languages by different teams and may use multiple underlying databases to achieve their functionality. This popular architecture brings new challenges for consistent distributed transactions across multiple microservices. It is a real requirement to support ACID transactions in the context of microservices, but is very hard to achieve using existing technologies, since distributed transaction mechanisms designed for a single database cannot be easily extended to the cases with multiple databases through microservices.
In environments that involve multiple independent databases, the traditional two-phase commit (2PC) protocol1 was essentially the only option for distributed transactions by the system without additional application effort. However, it does not work well in a scale-out platform due to long paths of potentially many coordinating participants and the locking required over the phases. On the other hand, using a transaction log executed by a framework2 such as Saga will incur complex compensating logic by applications and may have business implications due to irreversible partially successful transactions.
To address these issues, we developed GRIT, a novel protocol for globally consistent distributed transactions that cleverly combines ideas from optimistic concurrency control (OCC) 3, 2PC, and deterministic databases4,5 to achieve, for the first time, high-performance, globally consistent transactions across microservices with multiple underlying databases.
GRIT: a protocol for distributed transactions
The following diagram illustrates the GRIT protocol in a system of microservices with two databases. The GRIT components, including GTM, GTL, DBTM, DBTL, and LogPlayer, are shown in the center.
Without the GRIT components, the diagram represents a system of plain microservice architecture with two scale-out databases. They consist of the following:
- Applications: invoke microservices to achieve their functionality.
- Microservices (Entity Services): building blocks to provide business-oriented service for applications to implement business logic. Each DB may have support for multiple microservices, and each microservice is likely independent of the other.
- DB Services: provide DB read/write interface and directly access DB servers. When supporting transactions, it also caches the read/write results of each transaction during the execution phase and sends them to its DBTM for conflict resolution at commit time.
- DB shard servers: the backend storage servers for the database, usually replicated for high availability.
The key components of GRIT include:
- Global Transaction Manager (GTM): It coordinates global transactions across multiple databases. There can be one or more GTMs.
- Global Transaction Log (GTL): It represents the transaction request queue for a GTM. The order of transaction requests in a GTL determines the relative serializability order among global transactions. Persistence of GTLs is optional.
- Database Transaction Manager (DBTM): The transaction manager at each database realm. It performs the conflict checking and resolution, i.e. local commit decision is located here.
- Database Transaction Log (DBTL): The transaction log at each database realm that logs logically committed transactions that relate to this database (including single database transactions and multi-database transactions). The order of transactions in a DBTL determines the serializability order of the whole database system, including the global order dictated by the GTM. A log sequence number (LSN) is assigned to each log entry.
- LogPlayer: This component sends log entries, in sequence, to the backend storage servers for them to apply the updates. Each DB server applies log entries of logically committed transactions in order.
For the purpose of understanding the details of the protocol, we use the following diagram to show the main steps for a distributed transaction.
In GRIT, a distributed transaction goes through three phases:
- Optimistic execution (steps 1-4): As the application is executing the business logic via microservices, the database services capture the read-set and write-set of the transaction. No actual data modification occurs at this phase.
- Logical commit (steps 5-11): Once the application requests the transaction commit, the read-set and write-set at each database service point are submitted to its DBTM. The DBTM uses the read-set and write-set for conflict checking to achieve local commit decision. The GTM will make the global commit decision after collecting all the local decisions of DBTMs for the transaction. A transaction is logically committed once its write-sets are persisted in log stores (DBTLs) for databases involved. This involves minimum coordination between the GTM and the DBTMs.
- Physical apply (steps 12-13): The log players asynchronously sends DBTL entries to backend storage servers. The data modification occurs at this phase.
Overall, our approach avoids pessimistic locking during both execution and commit process and avoids waiting for physical commit. We take the optimistic approach and also make the commit process very efficient by leveraging logical commit logs and moving the physical database changes out of the commit decision process with deterministic database technology, which is similar to log play in replication.
GRIT is able to achieve consistent high throughput and serializable distributed transactions for applications invoking microservices with minimum coordination. GRIT fits well for transactions with few conflicts and provides a critical capability for applications that otherwise need complex mechanisms to achieve consistent transactions across microservices with multiple underlying databases.
Applying GRIT for a single database
As you can see, the GRIT protocol contains two parts: one for each database (or each database realm, which can be a set of partitions of a database) performed by DBTM, DBTL, and LogPlayer, and the other for cross-database coordination by GTM and DBTMs. In the following diagram, we illustrate the design of a transactional graph store backend (called NuGraphStore) for JanusGraph using the part of GRIT protocol for a single database.
The following diagram shows how NuGraphStore is implemented with two availability zone (AZ1 and AZ2) deployment for illustration.
There are a few components involved in the NuGraphStore backend for JanusGraph:
- Storage plugin: a custom storage interface plugin to interface between the JanusGraph DB Layer and the backend storage engine and transaction protocol components.
- DBTM: performs the critical conflict checking for optimistic concurrency control. This is part of the GRIT distributed transaction protocol on single databases performing OCC.
- LogStore: replicated log store for mutations from transactions. One entry for each transaction, indexed by Log Sequence Number (LSN). It acts as a WAL (write-ahead-log) in traditional database systems. The LogStore is the DBTL in our GRIT architecture.
- LogPlayer: applies log entries to the backend servers asynchronously.
- Storage engine: backend storage engine to store KCV (Key-Column-Value) data from JanusGraph. It performs reads and mutations and supports snapshots defined by the LSN.
As an application is performing a transaction, it can read from the store and write to the store. For the read operations, the storage plugin directly communicates with the storage servers (except for reads that are found in the write-set for the transaction). The storage plugin also keeps track of the read-set as the application is reading from the store in the context of a transaction. The useful information for each read is the <key, lsn> pair, where lsn is the log sequence number reflecting the storage engine state when the key-value is read. An LSN is the log index of the entry for the mutations of a transaction. It is assigned by the LogStore and used to define the snapshot of the backend databases. A key being not found is also recorded as part of the read-set. Unlike reads, the storage plugin for writes does not directly communicate with the storage servers. Instead, the storage plugin buffers the writes in the corresponding write-set for the transaction.
When a transaction commits, the storage plugin submits the commit request with the read-set and write-set it has captured for the transaction to the DBTM. The DBTM performs the standard conflict checking for OCC for the transaction. If there is no conflict, it will persist the write-set to the replicated LogStore (i.e., it sends the write-set to the LogStore replica set, so all the replicas keep the exact same log). At this point, the transaction commit completes logically, and the DBTM responds back to the storage plugin. The LogPlayers tail the LogStores and play the log entries to the backend shard servers based on the data distribution.
It’s worth pointing out that the above description is a basic design with many opportunities to enhance for performance and reliability. It’s our belief that it is more productive to make the basic components mature before optimizing across the components or using replication for DBTM to achieve higher reliability. Also, there are different ways that we can capture the read-set and write-set. For a KV store, the simplest form we need for conflict checking is <key, lsn> pairs. To support more complex systems, however, the read-set may include ranges, or predicates, as described in.6 As of this writing, NuGraphStore is going through the open source process.
References
1. C. Mohan, Bruce Lindsay and R. Obermarck, “Transaction management in the R* distributed database management system” ACM Transactions on Database Systems (TODS), Volume 11 Issue 4, Dec. 1986, Pages 378 - 396.
2. Pat Helland, “Life beyond Distributed Transactions: an Apostate’s Opinion”, CIDR 2007.
3. H.T. Kung, J.T. Robinson, “On Optimistic Methods for Concurrency Control”, ACM Transactions on Database Systems 6:2, June, 1981.
4. Thomson, Alexander and Diamond, Thaddeus and Shao, Philip and Ren, Kun and Weng, Shu-Chun and Abadi, Daniel J, “Calvin: Fast distributed transactions for partitioned database systems”, SIGMOD 2012.
5. Kun Ren, Alexander Thomson, Daniel Abadi, “An Evaluation of the Advantages and Disadvantages of Deterministic Database Systems”, VLDB 2014.
6. Thomas Neumann, Tobias Mühlbauer, Alfons Kemper, “Fast Serializable Multi-Version Concurrency Control for Main-Memory Database Systems”, SIGMOD 2015