Monday, December 18, 2017

Saga implementations comparison

In the previous blog post we have investigated the general motions of the saga pattern and how sagas differ from traditional ACID approach. This article will focus on the current state of applicability of this pattern. We will introduce and compare three frameworks that presently support saga processing, namely Narayana LRA, Axon framework and Eventuate.io. Narayana LRA Narayana Long Running Actions is a specification developed by the Narayana team in the collaboration with the Eclipse MicroProfile initiative. The main focus is to introduce an API for coordinating long running activities with the assurance of the globally consistent outcome and without any locking mechanisms. [https://github.com/eclipse/microprofile-sandbox/tree/master/proposals/0009-LRA] Axon framework Axon framework is Java based framework for building scalable and highly performant applications. Axon is based on the Command Query Responsibility Segregation (CQRS) pattern. The main motion is the event processing which includes the separated Command bus for updates and the Event bus for queries. [http://www.axonframework.org/] Eventuate.io Eventuate is a platform that provides an event-driven programming model that focus on solving distributed data management in microservices architectures. Similarly to the Axon, it is based upon CQRS principles. The framework stores events in the MySQL database and it distributes them through the Apache Kafka platform. [http://eventuate.io] NOTE: CQRS is an architectural pattern that splits the domain model into two separated models - the first one is responsible for updates and the containing business logic while the other is taking care of the reads and providing information for the user. Comparisons Even though all of the above frameworks achieve the same outcome there are several areas where we can examine how the handling of the saga processing differ. Developer friendliness The LRA provides for the developer the traditional coordinator oriented architecture. Individual participants can join the LRA by the HTTP call to the LRA coordinator, each providing methods for saga completion and compensation. Narayana provides a LRA client which makes the REST invocations transparent. In the Axon sagas are implemented as aggregates. The aggregate is a logical group of entities and value objects that are treated as a single unit. Axon uses special type of event listener that allows the developer to associate a property in the events with the current saga so that the framework knows which saga should be invoked. The invocation and compensation are executed by separated event handlers and therefore different events. Not a long ago Eventuate presented a new API called Eventuate Tram which handles the saga processing for the platform. It enables applications to send messages as a part of an database transaction. The platform provides abstractions for messaging in form of named channels, events as subscriptions to domain events and commands for asynchronous communication.

Saga specifications In the LRA the specification of how the saga should be performed is specified by the initiator. Initiating service is able to invoke participants on the provided endpoints which allows participant to join the LRA context. The participant can specify whether to join, create new or ignore the corresponding LRA context by the CDI annotation. Axon provides a set of annotation the denote the saga class. A @Saga annotation defines the class as a saga aggregate which allows it to declare a saga event handlers. Saga handler can optionally include the name of the property in the incoming event which has been previously associated with the saga. Additionally Axon provides special annotations to mark the start and end of the saga. In Eventuate the developer is able to specify the saga definition including the participant invocations, compensations and reply handlers in the saga definition. It provides a simple builder which constructs the saga step by step providing the handlers for command and event processing. Failure handling The saga compensating actions are in the LRA defined as designated endpoints. If the initiating service cancels the LRA, the coordinator is able to call the compensation endpoints of the included participants. In Axon application developers need to know which events represents the failure to provide the correct event listeners. Also the tracking of the progress of the execution as well as the one of the compensation is operated by the developer. Eventuate registers the compensation handlers in the saga definitions. Participants are allowed to send a special build-in command which indicates failure a therefore the compensation of the previous actions. Ordering of invocations LRA and Eventuate invoke each participant in strict order. This approach expects the actions to be dependent on each prior action so the compensations are called in reverse order.
NOTE: Narayana LRA allows participants to modify this behavior by returning HTTP 202 Accepted. Axon on the contrary does not control the ordering of invocations so the programmer is allowed to send the compensation commands in any desired ordering. Structure and failure recovery As Axon and Eventuate are primarily CQRS based frameworks they require some additional handling for the saga execution. In Axon saga is still the aggregate which means that it consumes commands and produces events. This may be an unwanted overhead when the application does not follow the CQRS domain pattern. The same applies to Eventuate Tram but as the communication is shadowed through the messaging channels it does not force the programmer to follow CQRS. Both platforms mark every event and command into the distributed event log which warrants the saga completion upon the system failure with the eventual consistency. The LRA on the other hand requires only to implement processing and compensation endpoints. Processing is then handled by the LRA coordinator and it is made transparent to the end users. The failures of the coordinator can be managed by replication and/or transaction log. A participant failure is treated by the custom timeout on the saga invocation after which the coordinator cancels the saga. Conclusion As we have shown all three frameworks are production ready saga implementations. We have discussed the main advantages and drawbacks of each platform respectively. As a part of my thesis I have created a basic example in all of the discussed frameworks. It is a simple ordering saga containing requests for shipment and invoice computation invoked on different services. In the next blog post I plan to describe the execution of this saga in every framework to discuss the main distinctions in grater detail. - https://github.com/xstefank/axon-service - https://github.com/xstefank/eventuate-service - pure CQRS solution - https://github.com/xstefank/lra-service - https://github.com/xstefank/eventuate-sagas - Eventuate Tram

Wednesday, November 8, 2017

Software Transactional Memory for the Cloud at µCon London 2017

I have just returned from µCon London 2017: The Microservices Conference. I was down there talking about our Software Transactional Memory implementation which our regular readers may recall was first introduced by Mark back in 2013. In particular I wanted to show how it can be used with the actor model features of Vert.x and the nice scaling features of OpenShift.

I have put the presentation slides in the narayana git repo so feel free to go and take a look to see what you missed out on. You can also access the source code I used for the practical demonstration of the technology from the same repository. The abstract for the talk gives a good overview of the topics discussed:

"Vert.x is the leading JVM-based stack for developing asynchronous, event-driven applications. Traditional ACID transactions, especially distributed transactions, are typically difficult to use in such an environment due to their blocking nature. However, the transactional actor model, which pre-dates Java, has been successful in a number of areas over the years. In this talk you will learn how they have been integrating this model using Narayana transactions, Software Transactional Memory and Vert.x. Michael will go through an example and show how Vert.x developers can now utilise volatile, persistent and nested transactions in their applications, as well as what this might mean for enterprise deployments."

And the demo that this abstract is referring to is pretty trivial but it does serve to draw out some of the impressive benefits of combining actors with STM. Here's how I introduced the technology:

  1. Firstly I showed how to write a simple Vert.x flight booking application that maintains an integer count of the number of bookings.
  2. Run the application using multiple Vert.x verticle instances.
  3. Demonstrate concurrency issues using parallel workloads.
  4. Fix the concurrency issue by showing how to add volatile STM support to the application.

I put the without STM and the with STM versions of the demo code in the same repository as the presentation slides.

I then showed how to deploy the application to the cloud using a single-node OpenShift cluster running on my laptop using Red Hat's minishift solution. For this I used "persistent" STM which allows state to be shared between JVMs and this gave me the opportunity to show some of the elastic scaling features of STM, Vert.x and OpenShift.

Tuesday, November 7, 2017

A comparison of Long Running Actions with a recent WSO paper

By now you will have hopefully heard about a new specification the Narayana team are working on in collaboration with the Eclipse MicroProfile initiative

This specification is tailored to addressing needs of applications which are running in highly concurrent environments and have the need to ensure updates to multiple resources have an atomic outcomes, but where locking of the resource manager has an unacceptable impact on the overall throughput of the system. LRA has been developed using a cloud first philosophy and achieves its goal by providing an extended transaction model based on Sagas. It provides a set of APIs and components designed to work well in typical microservice architectures.

As you might expect, there are a number of other groups looking into these and similar use cases, initiatives such as Eventuate.io [http://eventuate.io/] Axon Framework [http://www.axonframework.org/]. Today I will provide a high-level comparison of the approach taken by the LRA framework with a paper released to the 2017 IEEE 24th International Conference on Web Services - “WSO: Developer-Oriented Transactional Orchestration of Web-Services”.

Web-Services Orchestration (WSO) [http://ieeexplore.ieee.org/document/8029827/] and LRA are both rooted in the compensating transaction models demonstrated in the Sagas paper [http://www.cs.cornell.edu/andru/cs711/2002fa/reading/sagas.pdf].  Even given their shared grounding in Sagas there are a  number of areas where differentiation can be seen and it would be prudent for the developer to be familiar with these:

1. Ordering of invocations on participants in the transaction

LRA expects that each item is fully dependent on each prior event in the sequence and invokes these in strict order (in the absence of failure). In the case of failure, it uses a Saga-like approach to undo the executed work in reverse order:

 // LRA completion pseudo-code:  
 for (Participant participant : lra.getParticipants()) {  
  if (!participant.complete()) {      
   for (Participant participant : Collections.reverse(lra.getParticipants())  
    if (!participant.compensate()) {  
     // hand this to recovery manager but continue  
    }  
   }  
   break;  

The LRA specification does allow participants to modify this behaviour by allowing them to returning an HTTP "202 Accepted" status code during the completion call and the LRA coordinator will then periodically monitor its progress.

WSO on the other hand allows the transaction scoping to be declared in such a manner as to allow the concurrent execution of individual participants to be performed (pre-compensate). In the case that WSO detects a failure it then appears to use the same approach as LRA.

2. Idempotency

LRA recommends that all compensation and completion handler functions are fully idempotent. This is done to reduce the volume and complexity of the message exchange, and as a consequence makes the model consistent with the Saga principles. This may require the programmer to redevelop their various business operations to guarantee that idempotency property. An alternative approach is available where the handler functions themselves are not idempotent, but an @Status function is provided which can reliably report the current state of the participant and, in companion with the handler functions affects similar idempotency. 

WSO also does not enforce the programmer to develop their business logic with idempotency in mind. To achieve a consistent outcome, the developer is required to build this sophisticated operation with which WSO can probe the resource and try to infer what state the ongoing operation is in.

3. Structure

LRA allows a user to define a Long Running Action which is composed of several participants. Each participant then provides their operation to perform the desired update, plus compensation and confirmation handlers. Although WSO uses different terminology, structuring the transaction is semantically similar.

4. Developer-friendliness

Both LRA and WSO allow the developer to stage migration to their prefered Saga implementation. Out of the box both approaches allow the developer to continue to provide participants that contain only standard business logic, with a view to augment this code with the various additional routines when they are able to do so although LRA does currently mandate the provision of a compensation handler.

5. Locking

As you might expect, both approaches eschew the use of strict resource manager locking between the various participants in their transactions and rely on the compensation framework to reconcile data integrity.

6. Orchestration

The WSO framework provides ability to define strong orchestration of the various participants in an outer transaction. It allows a developer to declare a sequence of participant calls to be made as a result of interim results from the various participants in the orchestration. LRA facilities a similar outcome within its provision of Nested Transactions however achieve orchestration of the participants, LRA requires the developers cooperation to identify and develop those conditional aspects.

In conclusion, as the field of microservice architectures progresses along the path to maturity we are seeing multiple open source projects and vendors respond to the needs of their users by developing frameworks which tend to facilitate atomic outcomes of multiple resources while relax various of the properties we normally expect in transaction system.

Tuesday, September 26, 2017

JavaOne (and the latest release of Narayana!)

I am very pleased to share with you that Mark and I will be presenting at JavaOne in San Francisco.

You can find more details about our session over here:
https://events.rainfocus.com/catalog/oracle/oow17/catalogjavaone17?search=CON1725&showEnrolled=false

It will be on the topic of effective utilization of transactions in a microservice environment. We will demonstrate with theory and a live coding demo some best practices you can apply right now using Narayana.

To that point, it is with great pleasure that I also announce the release of Narayana 5.7.0.Final. If you are attending JavaOne and will be looking to follow along, why not download the release now from:
http://narayana.io/

The release notes are available over here:
https://issues.jboss.org/secure/ReleaseNote.jspa?projectId=12310200&version=12335176

Monday, June 26, 2017

KIE Server updates to run with Narayana

I wanted to share some news that Narayana has now been integrated into the respected KIE BPM server.

You can read an article describing this integration over here.

It utilises the work we have been doing on Tomcat integration, quickstarts for which can be found in our repo:
https://github.com/jbosstm/quickstart/tree/master/transactionaldriver-and-tomcat
https://github.com/jbosstm/quickstart/tree/master/transactionaldriver-jpa-and-tomcat

Wednesday, June 14, 2017

Sagas and how they differ from two-phase commit

With all the talk about microservices and distributed transactions in the community these days and in particular the use of the term Saga - it seems like a good time for a refresher on Sagas and how they differ from a 2PC transaction. As we know, transaction management is needed in a microservices architecture and Sagas can provide a good fit to addressing that need. There are several ways[9] to handle and implement Saga in such environment but the explanation of that topic is not the goal of this post.

This post instead introduces the concept of Saga transactions and compares it to ACID two-phase commit transactions. The two approaches having the same goal - coordinate resources while operations over them form one logical unit of work but taking quite different approaches to ensure that goal is met.

ACID

First, let’s refresh on the concept of an ACID transaction. ACID is defined as - atomicity, consistency, isolation and durability. These properties are provided by a system to guarantee that operations over multiple resources can be considered as a single logical operation (unit of work).
Let’s take it briefly one by one.
  • Atomicity signifies all or nothing. If one operation from the set fails then all others have to fail too. Looking from a different angle we can call the property abortability[1].
  • Consistency states that system is consistent before transaction starts and after it finishes. The definition of consistency varies from resource to resource. In database systems, we can consider it as not breaking any constraints defined by the primary key, foreign key, uniqueness etc. But being consistent in broader term is understood having the application in the consistent state. That is not maintained by the transactional system itself but by the application programmer. Of course, the transaction helps to defend it.
  • Isolation refers to the behaviour of concurrent operations. The ACID definition defines it as transactions being serializable - the system behaves like all transactions being processed in a single thread one by one. Such implementation is not required and it widely differs[2]. Besides such behaviour (even pretended) brings performance implications. It’s usual that isolation property is relaxed to comply with one of the isolation levels with the lower guarantee than serializability[3].
  • Durability means that if the transaction commits all data are persisted and even the system crashes it will be accessible after the restart.

Considering that definition of ACID transactions, now let’s consider about Saga and two-phase commit.

Two-phase commit (2PC)

A well-known algorithm to achieve ACID transaction outcomes is the two-phase commit protocol. 2PC (part of a family of consensus protocols) serves to coordinate the commit of a distributed transaction, i.e one that updates multiple resources. Those of you already familiar with Narayana will be well acquainted with a popular manifestation of this philosophy: JTA
As its name suggests, the protocol works in two phases. The first phase is named ‘prepare’ and the coordinator queries participants if they are ready to finish with the commit. The second phase is named ‘commit’ and coordinator commands participants to commit and made changes visible to the outer world. Coordinator commands to commit only if all participants voted for it. If some of the participant votes ‘abort’ then the whole transaction and all participants are rolled back. It means any change made to the participant during the transaction is aborted.

For a better understanding of the process see details at our wiki at https://developer.jboss.org/wiki/TwoPhaseCommit2PC.

Saga

The Saga[4] pattern, on the other hand, works with units of work that can be undone. There is no commitment protocol included.
The original paper discusses updates to single node database where such work is processed but the notion can be further applied to distributed transactions too.
A Saga consists of a sequence of operations, each could work with a resource. Changes made by the operation on the particular resource are visible to the outer world immediately. We can see it as a just group of operations (a.k.a local transactions) which are executed one by one group by the Saga.
A Saga guarantees that either all operations succeed or all the work is undone by compensating actions. The compensating actions are not generically provided by a coordinator framework, instead, they have undone actions defined in business logic by the application programmer.
The Saga paper talks about three ways of handling a failure:
  • Backward recovery - when a running operation fails then compensating action are executed for any previously finished operation. The compensation actions are executed in reverse order than the operations were run in.
  • Forward recovery - when the running operation is aborted but in the following step, it’s replayed to finish with success. The benefit is that forward progress is ensured and work is not wasted even when the failure occurs - the particular operation is aborted, replayed and Saga continues from that operation forward.
    You don’t need to specify any compensation logic but you can be trapped in an infinite retry loop. This recovery is defined as the pure forward recovery in the paper[4].
  • Forward/backward recovery - for sake of completeness the paper[4] defines the way of combining the forward and backward recovery. The application logic or the saga manager defines save-points. A savepoint declares a place in the Saga processing where operations can be safely retried from. When the running operation fails then compensation action is executed for previously finished operations up to the defined save-point. From there, operations are replied with a try for finishing the Saga successfully.
    We say that the forward recovery defines save-points after each of the successfully finished operation.
The undo actions for compensation and replay actions of forward recovery are expected to be idempotent for being possible to retry it multiple times. If there isn’t any special handling in underlying transport, such as retaining results, it is up to the application developer to code the handler an idempotent way. The saga manager then needs to be able to cope with the situation when Saga participant responses that the operation was already done.
The approach of forward recovery could be very handy [8] (the talk references this approach at time 28:20) but we focus more on compensations (backward recovery).

For better outlining the process let’s take an example. Let’s say we have a message queue where we want to send a message to and a database where we want to insert a record . If we want to group those operations we know each of them is represented by a local transaction. For purpose of this text we define a resource-located transaction as the counterpart of the local transaction which is transaction managed by the resource itself (i.e. database transaction or message broker transaction). Saga groups together business operations and defines compensating undo action for each of them. When business logic sends the message the local transaction is committed and immediately available in the queue. When data is inserted into the database table, the local transaction is committed and the value could be read. When both business operations succeed the Saga finishes.

Let’s try to elaborate a little bit
  1. First saga manager starts Saga
  2. The business logic sends a message to the JMS queue. This business operation is attached to Saga which brings obligation from to define a compensation handler (a chunk of code which is executed in case of Saga needs to be aborted).
    Sending of the message is wrapped into a short local transaction which ends when JMS message is acknowledged by the receiver.
  3. The business logic inserts a row into the database table - again, the operation is attached to Saga, the SQL insert command is covered by short local transaction and a compensation handler has to be defined to undo the insertion in case of abortion of the Saga
  4. If everything goes fine this is the end of the Saga and business logic can continue with another saga.
     

Let’s say a failure occurs during insertion a record to the database and we apply backward recovery.

  • As database insertion fails the ACID resource-located transaction reached an error state and has to be rolled-back by the resource. The error is returned to the Saga.
  • The saga manager is responsible for data integrity being preserved which is ensured by aborting the whole Saga which means executing compensating actions for previously completed business operations. The compensation action for database insertion is not needed as the local transaction has been already rolled-back. But compensation action of sent JMS message is waiting to be executed.
  • The compensating action is inevitably bound to the business context, thus we have to define at least a simple one.
    Let’s assume that sending of the JMS message meant placing an order in a warehouse system. Now the compensating action is expected to undo it. Probably the easiest way to model such cancellation is by sending a different message with cancel command to the JMS broker queue. We expect there is a system listening to messages which process the real order cancellation later on.

Sagas and ACID

We consider two-phase commit to being compliant with ACID. It provides atomicity - the all actions are committed or rolled-back, consistency - system is inconsistent state before transaction begins and after it ends, isolation - isolation provided by resource-located transactions (ie. locking in database) prevents the inconsistent isolation, we can say that except for heuristic, the data is as if serialized access was maintained. And of course, durability is ensured with transaction logs.

On the other hand, we consider Sagas to relax the isolation property. The outcome of each operation, each single resource-located transaction, is visible it ends. From other ACID properties the atomicity is preserved - the whole Saga either ends successfully or all the work is compensated, durability is preserved - all data is persisted at the end of the Saga, consistency - consistency mostly depends on the application programmer but at the end of the Saga the system state should be consistent. In fact, this is the same matter as for the 2PC.

In the case of two-phase commit the length of the resource-located transaction spans almost over the whole global transaction lifetime. Impact on the concurrency depends on implementation of the underlying system but as an example, in relational databases we can expect locks which restricts concurrent work over the same data (locks are acquired for records being work with, and depending on isolation level the lock could be acquired even for the whole table where record resides, despite this is simplified as many databases uses optimistic concurrency where locks are taken at prepare time).
On the other hand, a Saga commits the resource-located transactions immediately after each step in a business process ends. The time when locks are acquired is just that short. Business logic inserts data to the database table, immediately commits the resource-located transactions and locks are released. From this point of view, the Saga more readily facilitates concurrent processing.

Blending Sagas and ACID

One advantage of the two-phase commit approach is ease of use for the application programmer. It’s on the transaction manager to manage all the transaction troubleshooting. The programmer cares only for his business logic - sending a message to the queue, inserting data to the database. On the other hand, Sagas do require that a compensating action being created and such action has to be defined for any Saga in particular.

Saga is a good fit for long-lived transactions (LLT)[7] which are represented by the well-known example of booking a flight, a hotel and a taxi to the hotel in one transaction. We expect communication with third party systems over WebService or REST call. All such communication is “time-consuming” and using two-phase commit locking resources for the whole time of the transaction existence is not a good solution. Here the Saga is nice as it provides a more fine-grained approach to work with the transaction participants.

Some people have the impression that 2PC and Saga stand once against other. That’s truly not so. Each of them plays well its role and helps to solve particular use cases. Furthermore, 2PC transactions and Sagas can be deployed in such a manner as to benefit from both of their advantages.

The Saga (long running action) could define the top level work unit. ACID transactions are the good fit for being steps (operations) of such unit of work.


Where does Narayana fit in

As you might know, Narayana - the premier open source transaction manager - provides implementations of both of these transaction models, suitable for a wide variety of deployment environments.

Many of you are likely familiar most with our JTA 2PC implementation as used in JBoss EAP and the open source WildFly application server. That implementation is based on X/Open specification[12]. For more information, you can refer to the Narayana documentation[10].
In terms of compensations; if you want to stick to standards, you can already use Narayana to achieve saga based transaction using the Narayana WS-BA implementation. However, Narayana also provides a modern compensating transactions framework based on CDI. For more information please refer to a great article from Paul about the compensating framework[5] or visit our Narayana quickstarts[6].


References
[1] https://www.youtube.com/watch?v=5ZjhNTM8XU8 (Transactions: myths, surprises and opportunities, Martin Kleppmann)
[2] https://wiki.postgresql.org/wiki/SSI (Serializable Snapshot Isolation (SSI) in PostgreSQL)
[3] https://en.wikipedia.org/wiki/Isolation_(database_systems) (Wikipedia, Isolation - database systems)
[5] https://developer.jboss.org/wiki/CompensatingTransactionsWhenACIDIsTooMuch (Narayana: Compensating Transactions: When ACID is too much)
[6] https://github.com/jbosstm/quickstart/tree/master/compensating-transactions (Narayana quickstart to compensating transactions)
[8] https://www.youtube.com/watch?v=xDuwrtwYHu8 (GOTO 2015, Applying the Saga Pattern, Caitie McCaffrey)
[10] http://narayana.io/documentation/index.html (Narayana documentation)
[11] https://www.progress.com/tutorials/jdbc/understanding-jta (Understanding JTA - The Java Transaction API)
[12] http://pubs.opengroup.org/onlinepubs/009680699/toc.pdf (The XA Specification - The Open Group Publications Catalog)

Wednesday, May 24, 2017

Narayana 5.6.0.Final released

The team are pleased to announce our latest release of Narayana - the premier open source transaction manager. As normal, the release may be downloaded from our website over here:
http://narayana.io/

The release notes for this version may be found over here:

This release brings with it a collection of improvements including the latest patches and feature work to improve our integration into Tomcat. The best way to get started with that is to take a look at our new quickstarts:

We also added an interesting Software Transactional Memory and vert.x example over here:

We have also managed to get CORBA JTS interop propagation working with Glassfish. You can read more about that in https://issues.jboss.org/browse/JBTM-2623. However validating recovery completes correctly is still in progress so stay tuned for further details in the coming releases...