Trading Technology in Capital Markets by Adaptive

1st February 2024

The Software Test Pyramid in Action

An in-depth practical guide to software test automation for modern web applications in 2024

Table of Contents

Challenges in Testing Capital Markets Software

Automating the software testing of a capital markets application in a truly end-to-end manner is challenging, because such applications depend on real time data from financial markets which change dynamically in unpredictable ways. There are features that must behave differently when the market is open vs. when the market is closed, and there is a great deal of behavior that depends on the state of the markets and events outside of our control.

Prevalent Models for Software Test Automation

A lot has been written over the past decade or so about The Test Pyramid, AKA The Testing Pyramid. There are interesting debates about the various “shapes” that may be more appropriate for your body of tests than a pyramid (triangle, ice cream cone, honeycomb, trophy, blob, pizza, Dorito, crab, and diamond, are among the proposed alternatives). If you’re not yet familiar with the literature on the topic, here are some excellent articles we recommend:

Having read all of the above, you may have found much of the material abstract and theoretical. You might also be wondering if the software testing tools/methods mentioned by some of the older articles are still relevant today in our rapidly evolving world. In short, you might be wondering, what does a modern comprehensive software test automation strategy for a web application look like in practice in 2024? 

In this article, we will describe in detail the automated software testing strategy we implemented in a web application we recently built for one of our capital markets clients. We will describe the different types of software tests we wrote, what tools we used, and how we used them to achieve our objectives. We will assume the reader is familiar with the excellent literature referenced above. We’ll elaborate on the distribution of the different types of tests we implemented.

Spoiler alert: we will not be introducing any novel shapes to describe our body of tests.

Scope: Testing a Web Application, Focusing on the UI

We will test the codebase of the web user interface (UI) and the end-to-end behavior of the application. We will not cover testing the backend services directly. We will discuss the testing tools/libraries we used, how we used them, and what we like about them, but won’t be comparing them to alternatives. To protect our client’s intellectual property, we have omitted some details, and certain topics are discussed only in the abstract.

About the Application Being Tested

The web application discussed here provides discrete experiences for mobile and desktop browsers, allowing users to trade financial instruments. The UI is served by a dedicated backend service that connects to other services for market information and trade execution.

The web UI’s tech stack includes React, RxJS, React-RxJS and MUI, with Vite and TypeScript for build tooling. The primary testing tools used are Vitest and Playwright.

The project team has dedicated Quality Assurance engineers who collaborate closely with the developers to make decisions about testing strategy, and contribute to writing end-to-end tests.

Design Principles to Facilitate Software Testing

The following principles guided the way we structured our application code to facilitate ease of testing.

Service Layer Isolation

As we’ll describe below, a large portion of our tests run against an instance of the application that has the entire service layer replaced with mocks. To maximize the surface area of what these tests cover, we strive to minimize the code that needs to be mocked out when the application is run with mock services. Code that interfaces with backend services should be isolated in separate files. It should also be responsible for as little as possible, ideally only the actual network calls to backend services. Error handling, mapping data into view models, and other concerns adjacent to the concern of fetching data from the backend should be separated from the core service layer code. Keeping this layer very thin allows us to run the app with that layer mocked out while covering the rest of the logic with our automation tests.

Separate Code That Does Not Require Tests

Code coverage metrics can be helpful for identifying areas of the codebase that lack sufficient tests, but we find no value in writing unit tests for code that has no material logic or behavior. Similarly, there may be non-critical code that may be very expensive to test, or code that would be more efficiently covered via end-to-end style tests, and therefore deemed not worth unit testing. To get more value from code coverage metrics, we strive to keep code that we choose not to test in separate files from code that should be tested. We explicitly list these files for exclusion in our unit test code coverage configuration. This way, if we see a file flagged for having low coverage, we know it is actionable information.

Externalize Logic From React Components

It is much easier to write unit tests for pure “vanilla” functions (that don’t depend on a rendering framework) than for React components. When a UI component has logic that does not depend on the React API or the lexical scope of the component function or class, we move this logic into separate functions, outside of the component, that can be unit tested independently. Put another way, keep React components as “dumb” as possible.

Types of Software Tests, and Tools Used for Each

We divide our tests into the following categories, which are covered in detail in the following sections.

  • Unit style tests run with Vitest
  • End-to-end style tests run with Playwright
    • Feature Tests: tests the behavior of the app like an end-to-end test would, but with the service layer replaced with predictable mocks
    • End-to-end Regression Tests: tests a larger number of workflows end-to-end in a deployed environment, with higher tolerance for test failure
    • End-to-end Smoke Tests: tests a small number of critical workflows end-to-end in a deployed environment in a failsafe manner

The following chart shows the distribution of the lines of code for each category of tests.

  • Page objects (described in greater detail below) are used by all of the e2e style tests, so we measured them separately.
  • The end-to-end regression tests represent a smaller slice than we would like them to, as we had just begun to develop those tests at the time this data was collected. Our goal is to grow the body of the regression tests to be roughly 7 times the size of the smoke tests.
Stacked bar chart - distribution of lines of code for each category of test

The following chart shows our “aspirational” distribution of the lines of code for each category of tests. It differs from the “actual” chart above only in the quantity of end-to-end regression tests. It represents what the distribution will look like once we’ve grown our body of regression tests to the target state.

Stacked bar chart – “aspirational” distribution of the lines of code for each category of tests

Unit Style Tests

The foundation of our test automation strategy is the unit and integration tests we’ve implemented using Vitest. (Vitest is fast and plays nicely with Vite; other testing libraries we’ve used on other projects also serve the same purpose very well.)

In this context, the distinction between unit and integration tests is theoretical and not particularly important, in our view. A true unit test will test the “unit under test” in isolation, meaning that if it depends on anything, those dependencies will be mocked out in the test. When testing something that invokes some other function, it’s not a “true” unit test if the function it depends on is not mocked out. In some cases it makes sense to mock out dependencies. In many cases it is a great deal easier to write tests without setting up mocks, and it’s often beneficial to test behavior of code without such isolation as it is more representative of the way the code is integrated in the application. A deeper discussion on this topic is beyond the scope of this article, but in practice, we generally don’t draw meaningful distinction between these types of tests. We generally refer to these as “unit tests”, even if many of them are technically integration tests. These test files share the same naming convention, and are all run with the same command. We place these files alongside the files being tested, with a matching filename with .test.ts appended to the name.

As a general rule, if a TypeScript file has logic or behavior, we write a unit test for it, unless the logic is extremely trivial. If unit testing a particular file would require a lot of effort, we consider how essential that unit is to the behavior of the app, and if that behavior can be tested with an end-to-end style test. This guides our decision as to whether or not to invest the effort in unit testing that file. Please also see Things We Did Not Unit Test below.

We model our domain state using RxJS Observables. In our post “Unit Testing RxJS Observables – A Practical Guide”, we describe in detail how to test RxJS Observation Chains.

End-to-end Style Tests

The classes of tests described in this section (feature tests, e2e smoke tests, e2e regression tests) all use Playwright. We have found Playwright to be a highly compelling tool for end-to-end style testing given the speed with which tests run, and the powerful tooling including IDE integration, code generation, debugging tools, trace viewer, and other features.

As we will describe below, all of these tests share page objects. The directory structure looks something like the following:

  • ui-tests
    • feature-tests (contains feature tests)
    • e2e-tests (contains both smoke and regression tests)
    • pages (contains page objects used by feature tests and e2e tests)

The feature-tests and e2e-tests directories each contain a dedicated Playwright config file.

Some of the application’s workflows are different for the desktop vs. mobile experiences, so for all of the tests in this section, we have separate tests for mobile and desktop.

Feature Tests

These are “end-to-end style” integration tests that run with a mocked out service layer. We call them “Feature Tests” because they test the behavior of features, but are not true end-to-end tests. They run against an ephemeral instance of the app running on the same host as the tests. They never run against a deployed environment.

Running the Application in Mock Mode

The mock service behavior is built into the application, rather than as a standalone mock service; it works as follows.

  • In our package.json, we have a script command that runs the application in mock mode by passing --mode mock to Vite.
  • We name all of our service files in a consistent manner, for example, service.ts, qualified by the parent directory which is named to describe the relevant domain data entity. This file should contain the functions that make actual network calls to backend services and nothing else.
  • We have a service.mock.ts corresponding to each service file, containing a mock implementation of that service, using in-memory mock data.
  • In our vite.config.ts, we use resolve.alias to resolve imports of service.ts to service.mock.ts.
Vite config resolve alias code example

We also mock out the authentication code when running in mock mode, so that users are presented with a mock login screen, which is a simple drop down with different mock users to select from. The inclusion of this view in the application is driven by a conditional that resolves to a lazy import of the mock login component if env.MODE is "mock", else, the real login screen component. Thus Vite will omit the mock login screen entirely from the production build.

We use different mock users to simulate different “external” scenarios that we may want to test. For example, when logged in as “Mary Market Closed”, the app will behave as if the backend tells us that the market is closed.

We program the mock services with predefined behavior to simulate real world scenarios. For example, we designate a specific instrument, that when an attempt is made to trade it, it will fail in a specific way. The key here is that the behavior is consistent for that instrument, so we can test specific paths, both happy and unhappy, and expect the tests to pass consistently, because the mock service is completely predictable.

We use a configuration parameter to define how long the mock responses should take to return. The mock services delay their async responses by this amount of time. When we run the application for the purpose of running tests against it, this parameter is set to zero so the tests run as fast as possible. When running the app in mock mode for development purposes, this parameter can be set to other values, for example to simulate a high latency connection to ensure loading states display and transition as expected.

Running the Tests Against the Application in Mock Mode

To run these tests locally, developers first run the app locally in mock mode, then run the dedicated command (configured in package.json) that runs the tests against the local server.

To run these tests in the continuous integration (CI) server, we first invoke a command (configured in package.json) that builds the application in mock mode by passing --mode mock to Vite (similar to how we run in mock mode above). The mock build of the app is self-contained; it does not depend on anything outside of its network space. We configure the global setup/teardown to run a web server (via Vite preview) to serve the app we just built, so Playwright starts up the server when we run the tests, and tears it down at the end. The global setup file looks something like this:

Vite global setup file code example

This ephemeral web server is run on the CI server, and the tests run on the same server against that instance of the app, referencing it via a localhost URL such as http://localhost:1234/ . We then tear down the web server after the tests run.

End-to-end Smoke Tests

The end-to-end smoke tests are fully end-to-end tests that only run against a deployed environment that is fully integrated with real backend services. The objective of these tests is to cover core user workflows of the application, but only those for whom we can make assertions that are expected to pass under all circumstances barring actual problems. These tests should only fail when the application is actually broken in the environment being tested. When they fail, the relevant people are notified with high urgency.

We segregate these tests from the e2e regression tests by tagging them with annotations. In our package.json we have separate script commands that run the desired subset of tests using the --grep command line option, as described in the Playwright docs linked immediately above.

The environments against which these tests run are pre-production environments that are fully integrated with all upstream services, and behave like a production environment would, but trades are effectively paper trades, meaning they do not involve real money. These environments are only accessible via corporate VPN. We have test users for these environments, and their credentials are not treated as secrets, so we store these credentials in plain text in configuration files.

In a perfect world, we would want to have pull request branches deployed to an ephemeral environment before they are merged to the main branch, and run these tests against said environment. Many projects, including this one, don’t have the resources to do that, so in our case, these tests only protect against problems discovered after merging. When pull requests are merged to the main branch, the application is automatically deployed to a “development” environment. These tests run on the CI server against this environment after every such deployment. We also set up our CI server so we could manually trigger these tests to run on the CI server against any specified environment, and using the tests from any specified branch (to ensure changes to the tests would pass before merging).

We found that it took a great deal of iteration to evolve these tests to the point where they were passing consistently. There is a great deal of variability in the behavior of capital markets applications that make it challenging to test real world scenarios in ways that allow for assertions that will both be useful and always pass. Among the challenges we encountered:

  • Test account getting blocked from trading for violating the pattern day trader rule
  • Trades failing due to insufficient buying power, e.g. test user had used all its cash or had too many outstanding limit/stop buy orders
  • Trades failing due to insufficient holdings, e.g. trying to sell an instrument that we no longer held
  • Varying behavior for certain types of trading when the market is open vs. when it is closed

We employed the following strategies to deal with these challenges.

  • We assigned dedicated test users for these tests to ensure that they would only be used for these smoke tests and that no other activity would interfere with account balances, orders, or positions
  • We funded the account with a large amount of (fake) money so we would not run into insufficient funds errors
  • We assigned designated instruments that the account would buy, and separate designated instruments to sell, to avoid getting trading blocked due to buying and selling the same instruments repeatedly (pattern day trading)
  • We manually refreshed the test account regularly by selling the instruments that were bought in the tests, and replenishing the inventory of the instruments that are to be sold by the tests
  • We simply avoided testing certain workflows that had a great deal of variable behavior and were too difficult to ensure consistent successful assertions

End-to-end Regression Tests

Earlier in the project we had a playbook of regression tests that were run manually at the end of each sprint before releasing to higher environments. More recently, we started to write Playwright tests to automate this process. We treat this body of tests very differently from the smoke tests. They are only run when we manually trigger them to run against a given environment. We expect that some of these tests will fail under certain conditions due to the variable nature of the application. Failures don’t necessarily indicate that something is broken. The person who triggered the tests will investigate the failures to determine if they indicate a regression, or if the behavior is expected under the circumstances. Playwright’s excellent diagnostic tooling makes it very easy to drill into the cause of failure. While the regression testing process still requires manual effort, if a majority of these tests pass, we’ve greatly improved the efficiency of the process.

We tag these tests with annotations and also have a file naming convention to identify this set of tests and run them separately. We have dedicated test users, separate from the test users of the smoke tests, for these tests.

Page Object Model

The Page Object Model (POM) is a pattern for keeping element selectors in one place and reusing that code to avoid repetition. We created page objects for each view within the application, and used them across all of the Playwright tests.

Playwright’s test generator is an excellent tool that makes it very easy to lay the foundation for a test by recording actions as you click in the browser. When using the POM pattern, after recording a test, effort is required to refactor the generated selectors/locators to use the POM instead. We felt that it was worth this effort and would make our test suite more resilient to change, and easier to read and extend in the long run.

Things We Did Not Unit Test

We chose to not directly unit test React components on this project. While the development team had favorable experience using React Testing Library to test React components on prior projects, on this project we favored covering the behavior of our React components with a two-pronged approach leveraging Vitest and Playwright, rather than adding another testing library.

  1. Wherever practical, we extracted logic from the React components into standalone pure functions that could be unit tested directly with Vitest irrespective of React.
  2. We tested the behavior of our features with a comprehensive suite of integration tests using Playwright, as described above.

Granted, the Playwright feature tests did not test every permutation of props passed to every component in the app, but they covered the user workflows of the app’s features, and asserted that the app behaved as expected. Conversely, component testing allows for more granular testing of a component’s behavior, but gives less assurance that the component is integrated into the app in a way that achieves the desired functionality for the end user. For teams building UI components used by other teams in other applications (which was not the case here), testing components with a tool such as React Testing Library is essential.

We also chose not to perform automated visual regression testing given our time and resource constraints and priorities. When a project is in a stage where many aspects of the UI are expected to continuously change, the costs of maintaining these tests can outweigh the benefits they provide. Once features have reached stability and are not expected to change rapidly, such tests can be very helpful in protecting against visual regression. We do have our share of layout/visual regressions that slip in, and they don’t always get caught right away. Given that they rarely impact actual functionality and are generally very inexpensive to fix, we find this to be an acceptable tradeoff for our project’s needs.

Disclaimer

The information presented here represents the technical decisions made by a single Adaptive team in close collaboration with their stakeholders to meet the particular goals of that client and project. It is not representative of how Adaptive implements test automation on every project. We work with our clients to find mutually agreeable solutions that meet their particular needs and take into consideration the tradeoffs involved in making technical decisions.

Want to find out more about our testing approaches and how we implement end-to-end bespoke trading technology? Get in Touch

Author: Bruce Harris