AI in Finance

Research at the Center targets the analysis of digital finance innovations, including analysis of blockchain consensus protocols, quantifications of risks in DeFi protocols, security aspects of distributed ledgers, and algorithmic aspects of robo-advising.

Tackling and Dissecting the Non-Stationarity-Complexity Tradeoff in Return Prediction

Is there an optimal way to select model complexity and window size when dealing with non-stationary environments? Profs. Agostino Capponi and Kaizehng Wang, together with Dr. Jiacheng Zou and graduate students Chengpiao Huang and J. Antonio Sidaoui investigate machine learning models for stock return prediction in non-stationary environments, revealing a fundamental nonstationarity-complexity tradeoff : complex models reduce misspecification error but require longer training windows that introduce stronger nonstationarity. They resolve this tension with a novel model selection method that jointly optimizes model class and training window size using a tournament procedure that adaptively evaluates candidates on non-stationary validation data. Their theoretical analysis demonstrates that this approach balances misspecification error, estimation variance, and non-stationarity, performing close to the best model in hindsight. Applying their method to 17 industry portfolio returns, we consistently outperform standard rolling-window benchmarks, improving out-of-sample R 2 by 14-23% on average. During NBER-designated recessions, improvements are substantial: their method achieves positive R^2 during the Gulf War recession while benchmarks are negative, and improves R^2 in absolute terms by at least 80 bps during the 2001 recession as well as superior performance during the 2008 Financial Crisis. Economically, a trading strategy based on their selected model generates 31% higher cumulative returns averaged across the industries. See The Nonstationarity-Complexity Tradeoff in Return Prediction.

Semantic Trading: Agentic AI for Clustering and Relationship Discovery in Prediction Markets

How can one combine the power of agentic AI and natural language to discover relationships in prediction markets? Prof. Agostino Capponi along with Dr. Alfio Gliozzo and graduate studenr Brian Zhu propose that prediction markets allow users to trade on outcomes of real-world events, but are prone to fragmentation with overlapping questions, implicit equivalences, and hidden contradictions across markets. They present an agentic AI pipeline that autonomously (i) clusters markets into coherent topical groups using natural-language understanding over contract text and metadata, and (ii) identifies within-cluster market pairs whose resolved outcomes exhibit strong dependence, including “same-outcome” (correlated) and “different-outcome” (anti-correlated) relationships. Using a historical dataset of resolved markets on Polymarket, we evaluate the accuracy of the agent’s relational predictions. We then synthesize discovered relationships into a simple trading strategy to quantify how discovered relationships translate into actionable strategies. Results show that agent-identified relationships have around 60-70% accuracy, and their induced trading strategies have an average return of∼20% over week-long horizons, highlighting the ability of agentic AI and large language models to uncover latent semantic structure within prediction markets. See Semantic Trading: Agentic AI for Clustering and Relationship Discovery in Prediction Markets.

Evaluating Collective Decision-Making through Agentic AI in Decentralized Governance

Can agentic AI improve collective decision-making? Prof. Agostino Capponi along with Dr. Alfio Gliozzo, Chunghyun Han, and Junkyu Lee, present a first empirical study of agentic AI as autonomous decision-makers in decentralized governance. Using more than 3K proposals from major protocols, we build an agentic AI voter that interprets proposal contexts, retrieves historical deliberation data, and independently determines its voting position. The agent operates within a realistic financial simulation environment grounded in verifiable blockchain data, implemented through a modular composable program (MCP) workflow that defines data flow and tool usage via Agentics framework. We evaluate how closely the agent’s decisions align with the human and token-weighted outcomes, uncovering strong alignments measured by carefully designed evaluation metrics. Their findings demonstrate that agentic AI can augment collective decision-making by producing interpretable, auditable, and empirically grounded signals in realistic DAO governance settings. The study contributes to the design of explainable and economically rigorous AI agents for decentralized financial systems. See DAO-AI: Evaluating Collective Decision-Making through Agentic AI in Decentralized Governance.

Data-Driven Financial Factor Analysis

Dealing with high-dimensional, nonlinear time series data is a common issue that arises in financial applications. Of particular interest, is how to develop frameworks that not only deal with the high dimensionality of a multivariate time series but also uncover the intrinsic latent dynamics without the need of imposing restrictive parametric and modeling assumptions. Furthermore, when working within a supervised learning environment, designing a state space model that uncovers latent dynamic factors that simultaneously preserve the explanatory power for the time series of responses is a key case of interest. Leveraging Anisotropic Diffusion Maps, a nonlinear manifold learning technique introduced by Singer and Coifman, Prof. Agostino Capponi and Graeme Baker, together with graduate student Jose Antonio Sidaoui, propose a data-driven dynamic factor framework where a response variable depends on a high-dimensional set of covariates, without imposing any parametric model on the joint dynamics. We combine Kalman filtering with diffusion maps to develop a novel conditional sampling procedure that can be exploited for financial stress testing. We apply our method to the stress testing of equity portfolios using a combination of financial and macroeconomic factors from the Federal Reserve's supervisory scenarios. We demonstrate that our data-driven stress testing method outperforms standard scenario analysis and Principal Component Analysis benchmarks through historical backtests spanning three major financial crises. See Data-Driven Dynamic Factor Modeling via Manifold Learning.

Enhancing Return Prediction and Portfolio Formation with Supply Chain Data

Does incorporating supply chain relationship information from a large network of firms have incremental effects on stock return prediction and asset pricing tasks? Recent financial scenarios like the COVID-19 pandemic have demonstrated the relevance of supply chain networks in asset pricing and more broadly in the financial context. Prof. Agostino Capponi, Dr. Jiacheng Zhu, and graduate student Jose Antonio Sidaoui propose a nonparametric method to aggregate rich firm characteristics over a large supply chain network to explain the cross-section of expected returns. By taking into account a firm's higher-order relationships within the supply chain graph, we construct nonlinear pricing signals that propagate information through the supply chain network via Graph Neural Networks. Analyzing all US-listed stocks with supply chain data, our model achieves over 50% higher out-of-sample Sharpe ratios compared to models using only direct suppliers and consumers, outperforming Fama-French five-factor and principal component models. See Graph Machine Learning for Asset Pricing: Traversing the Supply Chain and Factor Zoo.

Machine learning and data sciences for financial markets

Leveraging the research efforts of more than 60 experts in the area, the book Machine Learning and Data Sciences for Financial Markets, co-edited by Prof. Agostino Capponi and Dr. Charles-Albert Lehalle, reviews cutting-edge practices in machine learning for financial markets. Instead of seeing machine learning as a new field, the authors explore the connection between knowledge developed in quantitative finance over the past 40 years and modern techniques generated by the current revolution in data sciences and artificial intelligence. The text is structured around three main areas: “Interacting with investors and asset owners,” which covers robo-advisors and price formation; “Towards better risk intermediation,” which discusses derivative hedging, portfolio construction, and machine learning for dynamic optimization; and “Connections with the real economy,” which explores nowcasting, alternative data, and ethics of algorithms. Accessible to a wide audience, this invaluable resource will allow practitioners to include machine learning-driven techniques in their day-to-day quantitative practices, while students will build intuition and come to appreciate the technical tools and motivation behind the theory.

Personalized robo-advising and risk preference assessment

Robo-advising has grown enormously over the last decade, offering a large range of financial services to investors, ranging from retirement planning to managing checking and saving accounts to meet investment goals. Robo-advisors democratize access to financial services by reducing barriers to entry through the imposition of low fees for assets under management and minimum required investment amounts. Increased adoption will depend on whether the robo-algorithm is able to personalize its recommendations to the risk preferences of users, and whether it can learn the needs and preferences of the clients served. See Robo-Advising: Learning Investors’ Risk Preferences via Portfolio Choices and Personalized Robo-Advising: Enhancing Investment Through Client Interaction.

Goal-based robo-advising

Empirical and behavioral finance research shows that clients associate the notion of “risk” with the likelihood of not attaining their goals. For example, clients who want to pay for college expenses in five years, purchase a house in ten years, and have enough in their retirement accounts in thirty years worry about the likelihood of falling short of these goals. Robo-advisers are then confronted with the challenges of investing in risky assets to make sure that the goals are well funded by their deadlines. Robo-advisers trade off the immediate consumption of wealth to satisfy an upcoming goal versus saving and maintaining wealth in the portfolio to satisfy future goals of higher priority. See Goal Based Investment Management.

Machine learning in market microstructure

Big data challenges in modern market microstructure study require new empirical tools to analyze trading dynamics, market liquidity, and price formation. We explore machine learning models and their applications in high-frequency, fragmented trading environments. In particular, we focus on ML models designed for time-series data such as long short-term memory neural networks (LSTM) and Transformers. We show that they perform well in predicting market price dynamics. In addition, we demonstrate using ML models to identify the origination of price information. See here for details.