22nd January 2026
Architecting 24/7 capital markets with high-availability sequencer designs
Financial markets are evolving towards always-on availability, driven by retail demand, global crypto and digital assets, and the operational requirements coming from accelerated settlement timeframes and globalisation of equities markets.
This article continues our exploration of modern distributed systems for capital markets by moving from design principles to concrete implementation choices for 24/7 markets. The on-demand recording, upon which this blog is based, includes detailed diagrams, timelines, and walkthroughs for deeper study.
What this blog covers:
- What does 24/7 mean, and why now?
- Impact on systems: change, upgrades and protocols
- Achieving 24/7 with sequencer architectures
- End-to-end flows and application protocols
- Upgrading services
- Advanced techniques of sequencer deployment
What does 24/7 mean, and why now?
The shift toward 24/7 capital markets is driven by changes in technology, market behavior, and regulation. Traditionally, trading operated within limited hours, with clear downtime windows for maintenance, upgrades, and reporting. But today’s global markets are evolving rapidly towards round the clock trading.
Why 24/7 trading, and why now?
The structure of modern markets has been reshaped by the globalisation of equities, the rise of 24-hour macro volatility, and a surge in retail trading participation. As markets are impacted by geo-political events that happen around the clock, not just when markets are open, it increases the demand for investors to be able to react quickly and at any time.
The cost of downtime, either planned or unplanned, can be significant to clients and operators alike. Regulatory initiatives such as DORA, T+1 settlement, SEC Regulation SCI and CME-Google, Nasdaq-AWS and LSEG-Azure partnerships, requiring us to rethink the resiliency model of our systems to be cloud tolerant.
What 24/7 trading really requires
High availability is now a baseline requirement for market infrastructure. To meet the demands of a 24/7 trading environment, systems are required to be:
Highly available
- They must be available and functioning 100% of the time.
Resilient and fault tolerant
- They must perform correctly 100% of the time and maintain consistency across multiple distributed components.
- These components cannot disagree on state, such as a client order, or the order book.
- If failure occurs, recovery should be fast, establish its consistent state and not disrupt normal operations.
Scalable
- Systems must scale with increased load without degrading its availability or function during times of market volatility.
Maintainable
- Infrastructure or components must be updated without affecting the availability of the whole system.
Impact on systems: change, upgrades and protocols
Traditionally, exchanges rely on daily resets, version cutovers, and batch windows. In 24/7 markets, these assumptions collapse. Achieving zero downtime transforms every part of the trading stack — operationally and architecturally.
Systems must support:
- Zero-downtime = 100% uptime! Including planned updates and unplanned failures
- Very large MTBF and low MTTR design
- Targets such as 99.999% uptime (equates to less than 30 seconds downtime/month)
- RPO/RTO resiliency and fault tolerance
- Live infrastructure changes: security or OS patches, container upgrades, JVM updates
- Business application updates
Core to achieving these requirements is the use of a consensus-based sequencer architecture combined with state machine replication, and the development of an application protocol to include version upgrades, recovery protocols and control messages.
Fundamentally, you must design for failure; recovery and upgrades.
Achieving 24/7 with sequenced architectures
In traditional infrastructures, distributed failures are complex. Microservices-based infrastructures often increase the failure surface area, making distributed recovery difficult to orchestrate without compromising system state.
How sequencers help achieve 24/7 availability
Sequenced architectures enable high availability and 24/7 operation, as a foundation for building services that can tolerate and recover from failure.
- Consensus-based sequencers are themselves fault tolerant and can recover from failure.
- The sequencer allows for processes to be coordinated between each other – when upgrading or deploying new components, this results in no loss of availability.
Application Protocols
A protocol defines structure, ordering and behaviour
A protocol is not just a message format — it’s the rules by which systems behave and evolve. By definition, a protocol represents “a set of rules governing the exchange of data between 2 or more participants”.
These rules are enforced by:
- Structure and format – the structure, encoding and decoding format of data
- Ordering – in which data is bidirectionally transmitted
- Behaviour – the changes to data that are invoked by services
Upgrading Services
A key challenge in 24/7 environments is the requirement to manage change – to upgrade in a staged manner whilst retaining availability. For a period of time, various services will run older and newer versions concurrently, and protocols between services must be forward or backward compatible as a result.
Furthermore, a service upgrade may involve changes to both structure and data formats. This can cause communication to break when exchanging data with older versions of clients. There are protocols that provide mechanisms to manage message compatibility between versions (e.g. Protobuf) but these do not manage the other aspects of protocol modification, namely ordering and behaviour.

Therefore, a robust upgrade model includes:
- Old and new protocol versions running side-by-side, with the upgrade of any modification of structure, format, order or behaviour resulting in a new version of the protocol.
- Deterministic handover from V1 to V2 with upgrade messages that signal which paths to enable/disable.

In this example:
- protocolId; identifies the protocol group of a message
- protocolVersion; identifies the version of the protocol
- messageType; uniquely identifies a message type
- partitionId; allows routing of a message of the same type to different instances
Control messages are then used to coordinate behaviour between applications as part of the protocol design, for example:
- Take a snapshot
- Event markers – trading session start/end, date roll
- Enable/disable versions of other protocols
- Migrate state or activate new service or partition.
Advanced techniques of sequencer deployment
Once these basic principals and methods are in-place, we can start to look at more advanced combinations of these techniques.
Business operations in 24/7 environments
Many operational aspects of trading are tied to the lifecycle of deployments; trading sessions, contract expiry, windows for batch processing etc. These must be modelled as part of the control protocol in 24/7 environments.
Here, control messages are sent to the sequencer via a scheduled trigger and consumed via the global ordered log in the same way as normal trading protocol messages such as a market order. Each component replica, in this example, matching engines will consume these messages, process the control message deterministically and agree on the actions to take.

Using control messages enables sequenced applications to coordinate key operational aspects of the system to behave in synchrony. In this example, a D message – which could be an End of Day event – causes all matching engines to cancel any orders with Time in Force = Day. This results in each replica evolving to an identical state after processing the control messages.
Snapshots without latency
In the above example, your protocol can be designed to publish control messages to the sequencer that trigger a snapshot.
Messages processed by a sequencer application must use a single thread in order to guarantee deterministic behaviour. Executing snapshot commands across all replicas simultaneously introduces systemic jitter; every node must pause message processing to capture the state, leading to a synchronous bottleneck and increased tail latency.
To reduce the latency impact of taking the snapshot in the processing thread, the snapshot operation can be delegated to a single replica. In this scenario, while one replica is taking a snapshot, the other replicas are able to continue to process the messages.

Truncating logs and snapshots
Operating 24/7 means there are no maintenance periods in which to manage accumulated disk usage, so these operations must take place during system up-time.
To ensure that all necessary information is retained for recovery of processes in the event of a failure, a safe point for truncation needs to be identified.
Here, the control messages D1, D2, are pushed into the sequencer to prompt trading events whilst snapshots S1 and S2 are taken. Where only the last snapshot is required, previous ones may be discarded asynchronously, whilst services capture all subsequent log entries from that point, truncating them only when subsequent snapshots are taken.

Canary deployments
Introducing new functionality presents risk to existing business functions. Upgrades and deployments often fail from early life failure, where issues are more likely to present themselves soon after the upgrade takes place.
In this instance, the partition mechanism can be used to deploy new functionality without affecting existing services.
Depending on the component deployed, this partitioning can take place over different segments. For example a new matching engine could be deployed which caters to low-volume, or test markets first. Only when the change has been proven would it then be rolled out into more critical markets.

Upgrades in place
Upgrading without causing an interruption in service can be challenging, particularly without missing any input or adding latency to processing that input.
By embedding protocol version information into the protocol and providing control messages to signal when upgrades take place it is possible to achieve this.
In this example, with V1 and V2 deployed concurrently (ensuring all aspects of the protocol – format, ordering and behaviour- are mirrored), matching engine V2 supports both V1 and V2 protocols. Control messages signal the application upgrade as V1 ceases operation due to incompatible messages enabling V2 to consume a snapshot of state and begin consuming remaining messages. This in turn, triggers V1 to decommission, and enables seamless handover.

Deploying new services
Systems running 24/7 need to deploy upgrades intra-day. Deploying new services poses a couple of practical challenges – at what point should the sequencer start from, what should the initial state be and how do I coordinate the state between different replicas?
Replicas of a service must start from the same position in the log to remain consistent with each other.
One approach is to start from the snapshot of an existing service and use its state as a bootstrap for the state of the new service. Alternatively control messages (“N” in this example) may be used to define the new start point.

Upgrading the sequencer itself
We’ve demonstrated that sequencer architectures provide powerful mechanisms for building always-on services, ensure zero message loss, and allow for the sequencer itself to be upgraded without downtime.
Although uncommon, the latter can be achieved in seconds using the RAFT consensus algorithm, in a rolling fashion starting with follower nodes and ending with the leader node. The latter causes a new election, prompting sequencer clients to reconnect to the new leader node.
In some instances it may be desirable to buffer messages in flight for resubmission to the sequencer.

Conclusion: Building the future of resilient, 24/7 market infrastructure
The move to 24/7 trading is accelerating. Sequencers, state-machine replication, and versioned protocols offer a viable blueprint for achieving these goals. Combined with advanced operational techniques — snapshotting, canary deployments, state migration and rolling upgrades — capital markets can begin to operate continuously without sacrificing fairness, determinism, or resilience.
