7th April 2021
Building a secure, fully managed cloud-based trading platform
Introduction
Adaptive specialises in building secure trading solutions for our clients in capital markets. Many of our clients are looking for a provider who can not only design and build a bespoke trading solution, but also run their platform as a managed service. With the security requirements needed for these markets, alongside the frequent requirement for a fully managed service, our clients require a partner that has the skills and capabilities to build and run an extremely secure and resilient cloud-based trading platform.
Adaptive, through its Operate offering, is well positioned to provide a fully managed service to the capital markets industry to host and run their bespoke trading platforms. We have a team of experts including infrastructure, site reliability engineering (SRE), security, technical support and service managers who are dedicated to this arm of the Adaptive service offering, and who specialise in supporting trading systems. Adaptive has also developed a number of accelerator tools that allow them to provide a market leading fully managed support and hosting service, that saves clients time and money and minimises risk.
Adaptive’s Operate service allows our clients to retain ownership of the Intellectual Property (IP) of their bespoke solution, whilst removing the cost and operational overhead of managing the solution themselves. Retaining IP ownership allows clients to differentiate within their market, enabling them to compete for market share far more effectively than buying a managed solution from a traditional vendor. Such vendors maintain a single product offering and make bespoke development difficult, and client ownership of the same even more so. With Adaptive, clients can continue to innovate on their platform at their own speed, whilst we manage, host and secure the platform. Coupled with our accelerators, this offering differentiates us in the market and makes us a compelling choice for those who value differentiation through technology.
This post will address how the Adaptive team provides a cloud-based strategic platform that is highly secure, available and resilient.
Designing a secure cloud-based trading platform
When designing trading systems and financial applications for cloud deployments, one major concern is to ensure that data and transactions are properly protected against unauthorized access and manipulation. Sadly, security breaches into institutional systems are not uncommon and occasionally make the news. In these environments, security is typically implemented in platform infrastructure through careful architectural design and heavily automated controls, rules and processes.
While designing an infrastructure to achieve an extremely high level of security in an Amazon Web Services (AWS) platform-as-a-service environment, our team spent a significant amount of time in research and development.
The original objective was to develop a high-security infrastructure using only the existing services provided by AWS. During development we found that such services could not meet all required security attributes. For instance, at the time of development, the AWS transfer service in private mode did not support IP-based restrictions. In turn, external components could not be managed through automated processes executed from within a private virtual network. Additionally, AWS CloudFront could not be connected to load balancers internal to a private platform network. Finally, we found incompatibilities in the handshaking between our single sign on provider and AWS connectivity options.
Given the security controls necessary to achieve highly secure infrastructure and the limitations uncovered in several AWS services during the initial analysis, a system architecture incorporating components not provided natively by AWS was devised. This introduced additional interface points as potential vectors for attack. As part of this project, protective mechanisms were developed to prevent and/or mitigate these potential attack vectors.
Building the platform on the cloud
It’s key to remember the principle of shared responsibility in cloud environments: Amazon focuses on physical and infrastructure security of the facilities and software of AWS services. As customers, we must take the right decisions and make a secure usage of these resources available. We built a secure platform for our clients by leveraging a combination of native AWS services and we added an additional layer of security with custom and external services.
Access Management Best Practices
Starting with Identity and Access Management (IAM), we have developed a comprehensive set of roles and policies for users. A key element of a live production platform is to block all outbound access from compute resources and also prevent employees from having access to sensitive data. We defined a set of IAM policies by following the “least privilege” approach to secure resources. User access is centrally managed through an external identity provider. The authentication and authorization to manage the platform is enforced using roles based permissions, temporary security tokens and multi factor authentication.
Network Security
We use as many services offered by AWS as possible to deploy a secure network for our platform. In order to ensure the secure transport of the data, end-to-end communication is fully encrypted with customers and between internal components. However, after deploying the virtual private network with a dynamic combination of scoped security groups, network access lists and internal endpoints, we needed to have more visibility to detect malicious activities, brute force attacks or suspicious behaviour. We introduced a third party product to enhance the analysis of the network, provide traceability and auditing for our SRE team. The security controls we implement on the platform and the network are aligned to meet our clients’ control objectives.
Storage and encryption
Using unlimited S3 storage is easy. But storing trade or customer information with secure boundaries is definitely harder. It requires the use of encryption using KMS or CloudHSM, IP, service or geographic restrictions, robust storage policies (with deny by default statements), role-based access and layers of networks: CDN, WAF rules, NG firewalls, load balancers. Where applicable, we enforce the principle of Write once, Read many (WORM) in which information, once written, cannot be modified. It’s also mandatory to keep the data encrypted at rest, from server volumes, s3, logs, or any other service processing and storing data.
Secure immutable infrastructure
Building a safe platform starts very early in the development lifecycle. To ensure the quality, consistency and security of the code, we employ a number of static analysis, code analysis tools and continuous inspection of components for vulnerabilities and bad patterns. During the release cycle, the application containers are scanned and deployed securely on repositories. We automate and standardize a golden operating system image that we distribute to accounts for consumption during the release. It is a standard SRE best practice to establish golden images as immutable objects and to manage a security baseline and software through a standard pipeline. The process follows the same best practice and enables the requirement of patching by allowing us to use the new image as soon as an important vulnerability is found.
Surveillance
Safe environment and good practices are inefficient if you do not have proper monitoring in place. Visibility is everything. A production environment generates a huge number of logs. The massive volume of logs and metrics makes it a considerable task to digest without a proper infrastructure capable of intelligent analysis and correlation. The solution we implemented consumes streams of AWS common infrastructure logs, such as VPC Flow Logs, Cloudtrail, load balancer logs. Combined with event-driven activities, the platform allows our team to properly manage and monitor the general health, strengths and weaknesses of the trading environment.
To keep the platform in line with our automation and enforce our policies compliance, we implemented a stateless rules engine that monitors activities against the infrastructure. This helps us to prevent deviation caused by mistakes or malicious intent. It integrates tightly with serverless runtimes to provide real-time remediation/response with low operational overhead.
Mitigating attack vectors
Below are some examples of workarounds or improvement to level up some limitations with AWS services.
IP-based restrictions of the AWS Transfer Service
During our quest to expose SFTP/SSH to the end users, a protocol widely used by bots and hackers, we took the approach to deliver the service using IP whitelist. We undertook to develop means to supplement the lack of support for IP-based restrictions of the AWS Transfer Service (SSH) when the service is exposed on the Internet. We changed the topology of the service to run only from the internal virtual private cloud (VPC). However, in its private setting, the Transfer Service could only process IP restrictions when called from components internal to the VPC.
Consequently, calls originating from outside the private network were denied access. To allow using IP-based restrictions on external calls, we deployed a firewall incorporated to the platform architecture as a façade at the edge of the AWS network to filter incoming calls which could then be securely passed on to the AWS transfer service. IP-based restrictions and whitelist are then handled by the firewall.
Since the commissioning of the platform, AWS added the IP filtering capability on the Public Transfer service by using Security Groups.
Secure the exchange of trading data over a WebSocket connection
Another requirement was to secure the exchange of trading data over a WebSocket connection (bidirectional channel) between a client and the content delivery network (CDN) on an AWS server.
The channel was proposed to proceed through the AWS CloudFront service, which natively provides protection against distributed denial-of-service (DDoS) attacks and a web application firewall (WAF), to access both resources and the WebSocket gateway. However, CloudFront cannot be connected to load balancers internal to a private platform network. Thus, we contemplated using public load balancers, but a need to secure this path was required to avoid bypassing rules enforced at the CloudFront level.
We first tried to apply a documented solution using whitelists of IP addresses used by the content delivery network, dynamically updateable through lambda functions, to authorize connections between CloudFront and load balancers. However, it turned out that these IP addresses were constantly changing, the range of IP so large and they could be shared with other AWS clients, rendering this approach impractical and insecure. Thus, this first approach was abandoned and we sought alternatives to restrict access to the public load balancers and minimize the surface area of attack by forcing the access through CloudFront.
We experimented with an approach that involved the injection of a secure token in a custom HTTP header controlled by CloudFront and that was verified during the load balancer hand-shake protocol. This adequately restricted access to our public load balancers by a malicious user.
Conclusion
Throughout this project, the engineering team explored and experimented with various mechanisms to restrict unauthorized access to financial software systems deployed in cloud environments and render them as secure as possible. To achieve this objective, research and development activities were performed to integrate numerous security policies and controls into a monolithic system. By partnering closely with our clients with continuous interaction on a regular basis, we improve and align the standard with the latest cloud security best practices.
As the team acquired more in-depth knowledge of the Amazon Web Services (AWS) platform-as-a-service environment, it uncovered several functional limitations. The team then sought to add external components to overcome these limitations whilst still enforcing the highest possible levels of security. As a result, the team developed knowledge and means to supplement AWS with external firewall and single sign on components and to concurrently manage deployments of mixed environments comprising both immutable infrastructure that must be deployed in seconds and components that require lengthy configuration. This led to an increased command of declarative configuration language and infrastructure-as-code software tools for automating cloud deployments. Even those tools were not sufficient to fully allow all needed deployment operations and were supplemented by in-house development. Overall, the acquired knowledge will be directly applicable to future project developments.
In conclusion, Adaptive succeeded in developing a cutting-edge cloud deployable platform for financial systems requiring extremely high levels of security. Indeed, the services that we successfully integrated into a single platform through this project can now serve as a model for designing highly secure systems for our customers in the financial industry.
Marc-Antoine Latour
Head of Operate,
Adaptive Financial Consulting