Introduction and background
There has been interest in developing computer algorithms that improve automatically through experience (“machine learning”) for many decades. The term “machine learning” (ML) was popularized in 1959 by Arthur Lee Samuel, a pioneer in computer gaming and artificial intelligence who first developed a program able to improve its performance playing checkers, which defeated a human player in 1962 running on an IBM 7094, one of the first commercially available computers.
Arthur L. Samuel using an IBM 7094.
Research on ML algorithms, and the increasing available computer power, has allowed addressing more complex problems, breaking milestones of performance on tasks that were once deemed either too complex or simply out of reach for non-human “intelligent” systems.
In the past two decades, IBM’s DeepBlue defeated in 1997 the then world chess champion Garri Kasparov, using a program which performed over 200 million calculations per second, a brute force approach instead of a machine learning algorithm. Nevertheless, outperforming human intelligence at a complex task was an important milestone for a computer program, and in the following years other ML programs mastered other complex tasks that previously seemed out of reach for non-human intelligent systems.
Watson, a natural language question-answering computer system developed by IBM won at Jeopardy in 2011; ChefWatson, developed on 2013 by IBM, was able to propose new recipes out of a list of ingredients and cuisine style; AlphaGo, a program that plays Go developed by DeepMind defeated in 2016 the then world Go champion Lee Sedol; AlphaFold and AlphaFold2 protein 3D structure prediction programs developed by DeepMind gave outstanding predictions on the past two CASP challenges (Critical Assessment of protein Structure Prediction, a biannual challenge where participants provide 3D protein structure for known but undisclosed target proteins). It’s noteworthy that most of these achievements were made by deep neural networks, a type of ML model.
AlphaFold and AlphaFold2 protein 3D structure predictions in the past two CASP challenges were head and shoulders better than those of any past challenge.
Over the years, ML programs have been applied to a wide diversity of tasks in different fields, for example in medicine, drug discovery, machine translation, automatic speech recognition , autonomous vehicle control[13,14], marketing , video surveillance, shipping and finance.
Given these many different fields of application and the ultimate goal, shared by human researchers and ML algorithms, to improve through experience, it’s expected that many different techniques have been developed and tried over time. Also, the “no free lunch” theorem, which states that no algorithm outperforms another algorithm in all possible application domains, guarantees that the ML toolkit can only grow with time.
Despite the different fields where ML methodologies are applied, they share some common traits. All ML models have parameters which control the performance of the model, and hyperparameters that affect the learning process and how generalizable is the model. They are built following a common iterative process.
First, the algorithm is exposed to the training dataset, and the parameters of the model are optimized so that the algorithm produces the best quality output for the training dataset. Then its performance doing the task is assessed with a different dataset (the test dataset), and in a final step the hyperparameters of the algorithm are adjusted to make it as generalizable as possible to unseen data. The cycle starts over when the need arises to improve the model and more training data becomes available, the later doesn’t take long given the pace at which new data is generated.
ML model lifecycle.
ML methods can be classified in three different broad categories according to how they learn from the training dataset: supervised learning, unsupervised learning and reinforcement learning. Supervised and unsupervised learning optimize one single task, whereas reinforcement learning optimizes a particular type of task, finding the optimal action to choose at every step in a sequence.
In supervised learning algorithms the training data contains both the features to be trained upon as well as the expected outcome for every example. In unsupervised learning the training dataset contains only features, the expected outcome is not explicitly provided. Reinforcement learning algorithms are exposed to a sequential training dataset and to the action taken at each step.
ML methods classification.
According to the output produced by each algorithm, supervised learning techniques can further be classified into regression or classification tasks, where the former produce a number and the later a category. Unsupervised learning could be subdivided into clustering tasks, which produce a partition of data into groups, and representation learning algorithms, which map complex data into a lower dimensional space simpler to understand and analyze. Reinforcement learning is further divided into policy function algorithms, which choose the optimal action to be taken at every step, and reward function learning, that find the reward value which justifies the actions observed at each step.
Researchers have developed many algorithms in each category, ranging in complexity from the simple linear regression to the complex representation learning algorithms, but artificial neural networks are particularly interesting as they have been involved in many of the recent groundbreaking successes of ML. Artificial neural networks were first proposed in 1943, and comprise a set of nodes grouped in layers. Nodes in the first hidden layer take the input values and perform a weighted sum (a linear transformation) over the input values, and then a non-linear operation on the result to produce an output value which is propagated forward to the nodes of the second hidden layer, who perform the linear transformation and non-linear operation, and so on until the last hidden layer produces the final output values.
By Glosser.ca - Own work, Derivative of File: Artificial neural network.svg, CC BY-SA 3.0, link
There are many types of neural networks, differing not only in the number of hidden layers or nodes in each layer, but also in the operations performed on the input or how the input is connected to the nodes. Forward fully connected (FNN) are a type of networks where all the input values are connected to every node. Convolutional (CNN) networks preprocess the input before passing the outcome to a fully connected network. Recurrent (RNN) and long short-term memory (LSTM) networks, both used for sequential input, “remember” the input from previous steps to provide an output in the current step.
Despite the different types of neural networks, their training with a training dataset involves computing a “loss” or “cost” value from its output, and then adjusting the network parameters to lower the value of the loss. In a supervised learning neural network, the loss function could be the root mean square difference between the expected and observed output, whereas for reinforcement learning the loss could be a function of a calculated reward. Then, a backpropagation algorithm is used to find out how to adjust the weights in each layer to minimize the loss function value.
The term “deep” neural network is applied to those networks having more than one “hidden” layer, and it has been shown that multilayered forward networks can approximate functions to any desired precision[23,24]. This explains why they have been successfully applied to learn all sorts of supervised learning[26,27], unsupervised learning[28,29,30,31] and reinforcement learning tasks[32,33].
Machine learning models in finance
The applications of ML algorithms in the field of finance might be classified into models used for tasks in banking, asset management or trading. Banks are interested in analyzing data generated by their customers, their financial transactions or the reports filed by companies in order to target marketing campaigns, or assess credit default risk, or detect fraud. Fund managers are interested in tasks like optimizing their portfolios to maximize returns and control risk, detect market regime changes and understand correlations amongst assets. Trading tasks include price prediction, market impact and liquidity assessment and algorithmic trading.
ML applications in finance.
A review of recently published deep learning networks applied in finance and banking showed that 53% of these ML models were applied to price prediction of stock, currency exchange rates or oil. Another 26% were applied to stock trading, 11% to banking tasks like default risk prediction and credit assessment, and the remaining 10% in portfolio management and prediction of macroeconomic variables. If we classify tasks into banking, asset management and trading, trading tasks represent ~79% (53% in price prediction plus 26% in stock trading) of recently published deep learning networks.
The high number of studies on price prediction (53%) is caused by the complexity of financial time series, which sometimes are non-linear, non-stationary, and show interdependencies. Models of financial time series have evolved with time, traditional methods like autoregressive integrated moving average (ARIMA) and generalized autoregressive conditional heteroskedasticity (GARCH) gave way to ML models, better able to cope with non-linearities, like random forests (RF), support vector machines (SVM), support vector regression (SVR) and deep learning (DL). Recently, some hybrid models that combine traditional methods with ML, as well as reinforcement learning (RL) models, have improved the accuracy of earlier prediction models .
Another review of published ML models in finance classifies them in (see figure below) single DL, hybrid DL, hybrid ML (excluding DL) and ensembles of methods (a set of models where the final outcome is a function of every individual outcome of each model in the ensemble). The most frequent goal in each category was found to be price prediction. Given the interest of ML researchers in the area of trading, it’s likely that future advances in RL and DL networks will soon be adapted to price prediction and other related tasks in the area of trading.
Taxonomy of machine learning models
Senior Software Developer & Data Scientist,
Adaptive Financial Consulting Ltd
 G. S. Handelman, H. K. Kok, R. V. Chandra, A. H. Razavi, M. J. Lee and H. Asadi (2018) eDoctor: machine learning and the future of medicine. Journal of Internal Medicine, 284; 603–619. doi: 10.1111/joim.12822
 Vamathevan, J. et. al. (2019) Applications of machine learning in drug discovery and development. Nat Rev Drug Discov. June ; 18(6): 463–477. doi:10.1038/s41573-019-0024-5
 Liu Y., Zhang J. (2018) Deep Learning in Machine Translation. In: Deng L., Liu Y. (eds) Deep Learning in Natural Language Processing. Springer, Singapore. https://doi.org/10.1007/978-981-10-5209-5_6
 : Jayashree Padmanabhan & Melvin Jose Johnson Premkumar (2015) Machine Learning in Automatic Speech Recognition: A Survey. IETE Technical Review, 32:4, 240-251, doi: 10.1080/02564602.2015.1010611
 Su Yeon Choi & Dowan Cha (2019): Unmanned aerial vehicles using machine learning for autonomous flight; state-of-the-art. Advanced Robotics. doi: 10.1080/01691864.2019.1586760
 S. Aradi, Survey of Deep Reinforcement Learning for Motion Planning of Autonomous Vehicles. IEEE Transactions on Intelligent Transportation Systems, doi: 10.1109/TITS.2020.3024655.
 Ma, L., & Sun, B. (2020). Machine learning and AI in marketing – Connecting computing power to human insights. International Journal of Research in Marketing. doi:10.1016/j.ijresmar.2020.04.005
 Tsakanikas, V., & Dagiuklas, T. (2017). Video surveillance systems-current status and future trends. Computers & Electrical Engineering. doi:10.1016/j.compeleceng.2017.11.011
 Rundo, F. et. al. (2019) Machine Learning for Quantitative Finance Applications: A Survey Appl. Sci. 9, 5574; doi:10.3390/app9245574
 David H. Wolpert (1996) The Lack of A Priori Distinctions Between Learning Algorithms. Neural Computation 8, 1341-1390 (1996)
 McCulloch, W. and Pitts, W. (1943). A Logical Calculus of Ideas Immanent in Nervous Activity. Bulletin of Mathematical Biophysics. 5 (4): 115–133. doi:10.1007/BF02478259.
 Cybenko, G. (1989) Approximation by superpositions of a sigmoidal function Mathematics of Control, Signals, and Systems, 2(4), 303–314. doi:10.1007/BF02551274
 Kurt Hornik (1991) Neural Networks, 4(2), 251–257. doi:10.1016/0893-6080(91)90009-T
 Rumelhart, David E.; Hinton, Geoffrey E.; Williams, Ronald J. (1986). Learning representations by back-propagating errors. Nature. 323 (6088): 533–536. doi:10.1038/323533a0. S2CID 205001834.
 Olga Russakovsky*, Jia Deng*, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg and Li Fei-Fei. (2015) ImageNet Large Scale Visual Recognition Challenge. IJCV. https://arxiv.org/pdf/1409.0575.pdf
 Xiaoxuan Liu et. al. (2019) A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digital Health;1: e271–97. doi: 10.1016/S2589-7500(19)30123-2
 Tao, Jie et.al. (2021) Unsupervised Deep Learning for Fake Content Detection in Social Media. http://hdl.handle.net/10125/70643 doi: 10.24251/HICSS.2021.032
 A. Singh and R. S. Anand, (2015) Speech Recognition Using Supervised and Unsupervised Learning Techniques. International Conference on Computational Intelligence and Communication Networks (CICN), Jabalpur, India, 2015, pp. 691-696, doi: 10.1109/CICN.2015.320.
 Leon A. Gatys et.al. (2015) A Neural Algorithm of Artistic Style https://arxiv.org/pdf/1508.06576v2.pdf
 Aaron van den Oord et.al. (2016) WAVENET: A generative model for raw audio. https://arxiv.org/pdf/1609.03499.pdf
 Volodymyr Mnih et.al. (2013) Playing Atari with Deep Reinforcement Learning https://arxiv.org/pdf/1312.5602v1.pdf and also https://www.youtube.com/watch?v=Q70ulPJW3Gk
 Vinyals, O., Babuschkin, I., Czarnecki, W.M. et al. (2019) Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 350–354. https://doi.org/10.1038/s41586-019-1724-z and also https://www.youtube.com/watch?v=cUTMhmVh1qs
 Ion Smeureanu et.al. (2013) Customer segmentation in private banking sector using machine learning techniques. Journal of Business Economics and Management 14, 5: 923-939. doi:10.3846/16111699.2012.749807
 De Castro Vieira, J. R., Barboza, F., Sobreiro, V. A., & Kimura, H. (2019). Machine learning models for credit analysis improvements: Predicting low-income families’ default. Applied Soft Computing, 105640. doi:10.1016/j.asoc.2019.105640
 Perols, J. (2011). Financial Statement Fraud Detection: An Analysis of Statistical and Machine Learning Algorithms. AUDITING: A Journal of Practice & Theory, 30(2), 19–50. doi:10.2308/ajpt-50009
 Yu, Q., Jiang, H., & Ma, X. (2018). The Application of Data Mining Technology in Customer Relationship Management of Commercial Banks. 2018 14th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD). doi:10.1109/fskd.2018.8687183
 Zhang, Y., & Trubey, P. (2018). Machine Learning and Sampling Scheme: An Empirical Study of Money Laundering Detection. Computational Economics. doi:10.1007/s10614-018-9864-z
 Sami Ben Jabeur, Amir Sadaaoui, Asma Sghaier & Riadh Aloui (2019) Machine learning models and cost-sensitive decision trees for bond rating prediction. Journal of the Operational Research Society, doi: 10.1080/01605682.2019.1581405
 Lee, Y.-C. (2007). Application of support vector machines to corporate credit rating prediction. Expert Systems with Applications, 33(1), 67–74. doi:10.1016/j.eswa.2006.04.018
 Mona Taghavi, Kaveh Bakhtiyari, and Edgar Scavino ( 2013) Agent-based computational investing recommender system. In Proceedings of the 7th ACM conference on Recommender systems (RecSys '13). Association for Computing Machinery, New York, NY, USA, 455–458. doi:10.1145/2507157.2508072
 Chen, J. and Tsang, E.P.K. (2020) Detecting Regime Change in Computational Finance: Data Science, Machine Learning and Algorithmic Trading. CRC Press IBSN: 1000220168, 9781000220162
 Tsai, C.-F. (2009). Feature selection in bankruptcy prediction. Knowledge-Based Systems, 22(2), 120–127. doi:10.1016/j.knosys.2008.08.002
 Ban, G.-Y., El Karoui, N., & Lim, A. E. B. (2018). Machine Learning and Portfolio Optimization. Management Science, 64(3), 1136–1154. doi:10.1287/mnsc.2016.2644
 Gan, L., Wang, H., & Yang, Z. (2020). Machine learning solutions to challenges in finance: An application to the pricing of financial products. Technological Forecasting and Social Change, 153, 119928. doi:10.1016/j.techfore.2020.119928
 Ritter, G. (2017). Machine Learning for Trading. SSRN Electronic Journal. doi:10.2139/ssrn.3015609
 Li, X., Cao, J., & Pan, Z. (2018). Market impact analysis via deep learned architectures. Neural Computing and Applications. doi:10.1007/s00521-018-3415-3
 Akyildirim, E., Goncu, A., & Sensoy, A. (2020). Prediction of cryptocurrency returns using machine learning. Annals of Operations Research. doi:10.1007/s10479-020-03575-y
 Hansen, K. B. (2020). The virtue of simplicity: On machine learning models in algorithmic trading. Big Data & Society, 7(1), 205395172092655. doi:10.1177/2053951720926558
 Li, Y., Zheng, W., & Zheng, Z. (2019). Deep Robust Reinforcement Learning for Practical Algorithmic Trading. IEEE Access, 1–1. doi:10.1109/access.2019.2932789
 Thomas Spooner et.al. (2018) Market Making via Reinforcement Learning AAMAS2018 Conference Proceedings. https://arxiv.org/abs/1804.04216
 Jian Huang, Junyi Chai and Stella Cho (2020). Deep learning in finance and banking: A literature review and classification. Frontiers of Business Research in China 14:13 doi:10.1186/s11782-020-00082-6
 Rundo, Trenta, di Stallo, & Battiato. (2019). Machine Learning for Quantitative Finance Applications: A Survey. Applied Sciences, 9(24), 5574. doi:10.3390/app9245574
 Saeed Nosratabadi et.al. (2020) Data Science in Economics: Comprehensive Review of Advanced Machine Learning and Deep Learning Methods. Mathematics 8, 1799; doi:10.3390/math8101799