Exploring Cutting-Edge DRL Algorithms for Quantitative Finance

:::info
Authors:
(1) Xiao-Yang Liu, Hongyang Yang, Columbia University (xl2427,[email protected]);
(2) Jiechao Gao, University of Virginia ([email protected]);
(3) Christina Dan Wang (Corresponding Author), New York University Shanghai ([email protected]).
:::
Table of Links
Abstract and 1 Introduction
2 Related Works and 2.1 Deep Reinforcement Learning Algorithms
2.2 Deep Reinforcement Learning Libraries and 2.3 Deep Reinforcement Learning in Finance
3 The Proposed FinRL Framework and 3.1 Overview of FinRL Framework
3.2 Application Layer
3.3 Agent Layer
3.4 Environment Layer
3.5 Training-Testing-Trading Pipeline
4 Hands-on Tutorials and Benchmark Performance and 4.1 Backtesting Module
4.2 Baseline Strategies and Trading Metrics
4.3 Hands-on Tutorials
4.4 Use Case I: Stock Trading
4.5 Use Case II: Portfolio Allocation and 4.6 Use Case III: Cryptocurrencies Trading
5 Ecosystem of FinRL and Conclusions, and References
2 RELATED WORKS
We review the state-of-the-art DRL algorithms, relevant opensource libraries, and applications of DRL in quantitative finance.
2.1 Deep Reinforcement Learning Algorithms
Many DRL algorithms have been developed. They fall into three categories: value based, policy based, and actor-critic based.
\
A value based algorithm estimates a state-action value function that guides the optimal policy. Q-learning [49] approximates a Qvalue (expected return) by iteratively updating a Q-table, which works for problems with small discrete state spaces and action spaces. Researchers proposed to utilize deep neural networks for approximating Q-value functions, e.g., deep Q-network (DQN) and its variants double DQN and dueling DQN [1].
\
A policy based algorithm directly updates the parameters of a policy through policy gradient [45]. Instead of value estimation, policy gradient uses a neural network to model the policy directly, whose input is a state and output is a probability distribution according to which the agent takes an action at the input state.
\
An actor-critic based algorithm combines the advantages of value based and policy based algorithms. It updates two neural networks, namely, an actor network updates the policy (probability distribution) while a critic network estimates the state-action value function. During the training process, the actor network takes actions and the critic network evaluates those actions. The state-of-art actor-critic based algorithms are deep deterministic policy gradient (DDPG), proximal policy optimization (PPO), asynchronous advantage actor critic (A3C), advantage actor critic (A2C), soft actor-critic (SAC), multi-agent DDPG, and twin-delayed DDPG (TD3) [1].
\
:::info
This paper is available on arxiv under CC BY 4.0 DEED license.
:::
\
Welcome to Billionaire Club Co LLC, your gateway to a brand-new social media experience! Sign up today and dive into over 10,000 fresh daily articles and videos curated just for your enjoyment. Enjoy the ad free experience, unlimited content interactions, and get that coveted blue check verification—all for just $1 a month!
Account Frozen
Your account is frozen. You can still view content but cannot interact with it.
Please go to your settings to update your account status.
Open Profile Settings