Abstract
We see more and more clients who want to move their trading platforms to Cloud providers. Many of those platforms are distributed systems relying on UDP multicast messaging systems which, at time of publication, is not supported by Cloud providers. So there is no straight forward migration path, the messaging layer needs to be re-designed to work on the Cloud.
In this paper, we will take a look at Aeron, a low latency messaging solution which we have used with great success on the Cloud. The target audience for this paper is CIOs, CTOs, architects and developers who want to understand the limitations of traditional messaging systems when running on the Cloud and need an alternative solution to migrate a platform to the Cloud or for a new build.
Introduction
Traditionally many exchanges and trading firms, buy side or sell side, have used UDP multicast messaging systems as the backbone of their infrastructure. UDP provides significant advantages over TCP in terms of throughput and latency, especially in scenarios where multiple systems need to listen to the same stream of data, such as quotes, orders, trades, etc. This is because, with UDP multicast, the message fan out happens at the network level and therefore is significantly more efficient than iterating through a list of TCP sockets to publish to. The result is more consistent latency, which is often more important than the lowest possible latency and means all components, and participants, receive the data ‘at the same time’ (fairness).
Products such as 29West LBM (now Informatica Ultra Messaging), Tibco Rendezvous, Tibco FTL and others provide reliable messaging over UDP and have been widely adopted by many market participants and most of the systems we come across with our clients run on one of those providers.
They work well until the institutions want to move their platform to the Cloud.
There are several reasons to do this:
- Cheaper in the cloud: Outsourcing the operation and maintenance of servers and data centres to Public Cloud providers is considered generally cheaper. This is especially where the infrastructure is only needed at certain points in the day. Clients realise that managing their own infrastructure, in most cases, is not a competitive advantage
- Note: Most of these advantages only come with scale, so any migration programme usually aims to incorporate all services. This causes a knock on issue particularly for Trading systems, where an application which, due to its architecture or use case, is not best suited to running on Public Cloud, must be migrated regardless. This is usually due to the large increase in maintenance cost of a single application remaining on its own dedicated infrastructure, post migration, as there are no longer a large number of applications to spread the generic support costs across
- Responsiveness: They have a limited number of points-of-presence compared to major Cloud providers, so moving to the Cloud gets their systems, or the edge of their systems, closer to their clients, providing them with a better user experience
- Time to Market: Clients want to release faster, incrementally test features and leverage the automation tools available in Cloud environments
- Security: The investment in Security made by Cloud providers is larger than they can achieve. They lack the specialist skills to keep on top of security threats
- Minimal upfront investment: When launching a new business or trialling a new product, typically one doesn’t want to pay a big upfront cost for infrastructure until it is proven and a pay-as-you-go model makes more sense
- Access to emerging technology: Much of the R&D in technologies such as artificial intelligence has been undertaken by the cloud vendors. They are generally only making these technologies available on the Public Cloud environments
Financial institutions face many challenges when trying to move to the Cloud, e.g. regulatory approval, network costs, data provenance etc. For front office systems, the lack of UDP multicast is one of these challenges. Which becomes a very big challenge when the whole of your existing platform is based on such a messaging architecture.
Some of the products which market participants use today support UDP unicast, which is available on the Cloud, but it significantly limits the functionality of those systems as they’ve been designed with multicast in mind.
Another reason clients have been reaching out to us for alternative messaging systems, related to cloud migrations, is due to a byproduct of cloud deployment, easy scalability. In moving to the Cloud some of our clients are looking to leverage the ability to scale easily, either as loads increase for a product or service, or to move into new markets etc. This flexibility is a double edged sword, because whilst the flexibility is desired, it can cause a drastic increase in the number of instances clients are running: for instance if they want to establish more points of presence for their clients, they are going to have to deploy their system in (many) more regions. Since most currently installed messaging systems are licensed on a per core or per node basis, the cost of operating this way can increase significantly. They ask our advice on a more cost effective solution.
Over the past few years, we have been using a messaging solution called Aeron and we believe that it is one of the best options available in the market today for running front office trading systems on the Cloud.
We’ll first give some background on Aeron, provide an overview, and then see why it is a great solution for the Cloud.
History of Aeron
Aeron is the brainchild of two world class technologists:
- Todd Montgomery is a networking veteran who has researched, designed, and built numerous protocols, messaging-oriented middleware systems, and real- time data systems. He has completed research for NASA, contributed to the IETF and IEEE, and co-founded two startups; one of them was 29West where he held the position of CTO. He currently works as an independent consultant on high performance systems and is active in several open source projects.
- Martin Thompson is a Java Champion with over two decades of experience building complex and high-performance computing systems. He is most recently known for his work on Aeron and SBE. Previously at LMAX, he was the co-founder and CTO when he created the Disruptor. Prior to LMAX, Martin worked for Betfair; effectively three different content companies wrestling with the world’s largest product catalogues. He was also a lead on some of the most significant C++ and Java systems of the 1990s in the automotive and finance domains.
They started working together in 2013 to develop a reference implementation of Simple Binary Encoding (SBE), a high-performance encoding mechanism specified by the FIX committee. Following the success of this project, their sponsor at the time engaged them to develop a new messaging system.
The design of Aeron started in early 2014 and is based on a set of design principles which have not changed. Aeron is designed for ultra low latency messaging over UDP multicast and unicast and offers a large set of functionalities and a high degree of flexibility.
Adaptive’s involvement
Adaptive have been involved in this journey from the start. We initially ported SBE to C#, then doing the same for Aeron, and have ongoing input into the Aeron roadmap to create features for our clients.
We will now provide an overview of Aeron and its core features.
Aeron Overview
Open source
Aeron is an open source product licensed under Apache 2.0. That’s a major difference compared to competing products in the space. Aeron is not a black box; you can look at the code, which is incredibly useful when troubleshooting issues or trying to take advantage of its design.
Brokerless
Aeron is a brokerless, i.e. peer-to-peer, messaging system. Services interact by sending messages point-to-point using a publish and subscribe model, without a central broker. Adaptive have worked within many banks and used all the mainstream messaging middlewares available in the industry. Experience has taught us, that peer-to-peer messaging system is more flexible than broker based messaging systems and requires less operational overhead. You don’t need additional servers (or appliances) to use it, which are generally organised as shared infrastructure and become an organisational bottleneck for development teams.
Aeron Driver
The Aeron driver is responsible for sending and receiving data on the wire. The driver is available in C and Java and can run in process, or outside of the application as a separate process. The application communicates with the Aeron driver over IPC (inter process communication) through a small Aeron client API. As of today, Aeron has client APIs for Java, C++, and .NET. Additional languages can be added with minimal effort.
Aeron Archive
Aeron Archive is another key component of Aeron. It allows streams of messages to be recorded to disk very efficiently and then replayed. This is typically used to store market data or to offer a higher level of reliability for critical message flows such as orders or trades. Aeron Archive is very efficient and with NVMe SSDs can record streams without slowing down the producer.
Aeron Cluster
Aeron Cluster is the most recent addition to Aeron and it greatly differentiates Aeron from competing products. It enables the implementation of high performance, fault tolerant transactional services such as matching engines, market model engines (RFQ, RFS, IOIs, auctions, etc) and many other critical services which require a high level of throughput and high availability. Aeron Cluster is a game changer for building systems on the Cloud, where VMs can disappear at any point in time and highly available setups are a must.
Aeron Cluster is a big topic on its own that we will cover in a separate article.
Aeron Features
Available transports
Aeron supports multiple transports, and has modern, relevant features:
- Inter Process Communication (IPC): an extremely fast communication media for process to process or thread to thread communication on the same host
- UDP multicast: which is not available on the Cloud as we previously mentioned
- UDP unicast: available on the Cloud
- Multi Destination Cast (MDC): an emulation of multicast over unicast streams which allows a publishing application to send a message once to an MDC address listened to by one or more subscribers. This is the solution clients should use to mimic UDP multicast in Cloud applications
- Multi Destination Subscription: allows a subscription to have multiple endpoints, this is useful for transitioning seamlessly from archive streams to live streams or subscribing to two identical streams to achieve redundancy with zero downtime
- Aeron is the only UDP message transport on the market with native support for flow and congestion control. Congestion control being an optional feature that can be required in Cloud environments
Monitoring and visibility
When designing distributed systems, visibility is absolutely key: distributed systems are complex, things go wrong and you need tools to troubleshoot and understand the root cause of issues. Aeron provides a large set of metrics offering very detailed visibility into what different components are doing. This does not affect its performance: you can monitor Aeron to a very fine grained level without slowing it down. Aeron also consistently uses single threaded agents which are simple to reason about and make the system much easier to debug than a ‘typical’ multi- threaded code base where threads interact in ways that are hard to reason about and impossible to predict.
On the roadmap
Most low latency messaging systems do not provide encryption: they have been designed to run on closed networks and the overhead of encryption is generally too high to make it practical. One of the upcoming features for Aeron is encryption and, thanks to its design, Aeron will be able to provide encryption with very minimal performance overhead.
Kernel bypass is another area Aeron will likely be focusing on. Kernel bypass is traditionally used by low latency trading systems to reduce jitter, latency, and increase throughput. DPDK is becoming the standard in this space. Initially developed by Intel, it was open-sourced in 2013 and became a Linux Foundation project. Network adaptor vendors widely support DPDK and it is supported on Azure, Google Cloud and AWS.
Front Office Trading Systems on the Cloud
Cloud providers are putting tremendous effort into improving their network stack and we have seen a huge increase in bandwidth during the last few years. What started as 1GigE a few years ago is now 10 to 25GigE and they are lining up their infrastructures to support 50 and 100GigE. Azure, AWS and GCP have been rolling out larger pipes but also enhanced network adapters which significantly reduce CPU overhead and jitter, providing more consistent latency.
From a performance perspective, the Cloud is ready to onboard front office real-time trading systems such as Single & Multi Dealer Platforms and electronic systems which provide human-to-human workflows or human-to-machine workflows.
The messaging system is a critical part of these systems. Unlike a lot of the older messaging platforms on the market, Aeron has been designed with the Cloud in mind. In all of the cases we have seen, we find Aeron to be an extremely good fit: itis open source, free to use, extremely fast (it is as fast, if not faster, as proprietary solutions) and with the MDC feature it provides a solution that, whilst not as fast as multicast, enables clients to migrate existing platforms, currently using multicast, to the Cloud.
Adaptive’s Hydra Platform
Today, some of our clients are using Aeron in production on the Cloud. Given the success we are seeing of those implementations, we have decided to standardise our offering around Aeron and we work very closely with the maintainers.
As part of our work on Aeron, we provide our clients with support for Aeron throughout their journey from inception to production. Through this experience and the experience of working with Aeron on our own projects, we have developed other features that help clients to deploy these solutions in an Enterprise environment. We have built them into a set of components we call our Hydra Platform.
See weareadaptive.com/hydra for more information.
Moving to the Cloud?
If you are planning to move to the Cloud but feel locked into a multicast messaging system, or face other technical challenges, we are here to help. Please feel free to get in touch!
Acknowledgements
I would like to thank the following people who kindly accepted to review this article: Gareth Richardson, Martin Thompson, Kevin Covington, Ian Green, John Marks, Matt Barrett, Shaun Laurens, James Kirkland
Olivier Deheurles
I am the CTO and a co-founder of Adaptive. I’ve been designing and building real-time trading systems for more than ten years. I worked on several open-source projects including Disruptor, Simple Binary Encoding, and Aeron.
Adaptive
The Real-time trading experts. We are a software consultancy specialising in designing and building real-time trading systems for financial and commodity markets with offices in London, Barcelona, Montreal, and New York.