This paper explores the use of a Deep Reinforcement Learning (DRL) model for dynamic portfolio management in the financial market. With the help of deep neural networks and the Twin-Delayed Deep Deterministic Policy Gradient (TD3) algorithm, the framework is able to process high dimensional market data and dynamic environment. The TD3 algorithm incorporates transaction costs and risk aversion constraints in order to simulate the environment of real-world investments. It uses features such as the Moving Average ConvergenceDivergence (MACD) and Relative Strength Index (RSI) to construct its decision-making state space. The performance of the model was assessed using the historical data of six NASDAQ stocks, starting from 2021 to 2023. The results obtained were then compared with two other methods, namely Proximal Policy Optimization (PPO) and Deep Deterministic Policy Gradient (DDPG). Some of the performance measures of the portfolio include cumulative return, annual volatility, Sharpe ratio and the maximum drawdown. The TD3 algorithm produced better results in terms of cumulative and risk return, where it got a $51.28 \%$ cumulative return as compared to a cumulative return of ${2 5. 9 1 \%}$ by DDPG and $17.56 \%$ by PPO. However, the TD3 managed portfolio was accompanied by high annual volatility and drawdown suggesting a risk-return paradox. From the results, it is shown that the TD3 policy is able to produce high returns while maintaining a certain level of risk and outperform static strategies like buy and hold. However, it also included some drawbacks, where the model was not able to forecast the short-term movements of the market and was based on lagging indicators.
Discussion(0)
No comments yet. Be the first to comment.