Working papers      

 

New Large Portfolios under Short-Sale Constraints: Optimality, Estimation Accuracy, and Beating 1/N

with Han Tong, Ming Yuan and Jason Zhou (current version: Sept., 2025).

We show that the optimal portfolio under no-short-sale restriction coincides with an unconstrained solution on a subset of assets, and, its Sharpe ratio converges to that of the true optimal portfolio in the special case of a factor model, even though the latter may involve short positions. This implies that long-only investors can potentially approximate efficiency. In high dimensional estimation, the no-short-sale restriction ensures that the estimated optimal portfolio converges to its population counterpart, whereas the unconstrained estimator fails to converge. Perhaps surprisingly, the short-sale constrained portfolio can also outperform the 1/N rule in general.

 

 

New ChatGPT and DeepSeek: Can They Predict the Stock Market and Macroeconomy?

with Jian Chen, Guohao Tang and Wu Zhu (current version: Sept., 2025).

We study whether ChatGPT and DeepSeek can extract information from the Wall Street Journal to predict the stock market and the macroeconomy. We find that ChatGPT has predictive power. DeepSeek underperforms ChatGPT, which is trained more extensively in English. Other large language models also underperform. Consistent with financial theories, the predictability is driven by investors’ underreaction to positive news, especially during periods of economic downturn and high information uncertainty. Negative news correlates with returns but lacks predictive value. At present, ChatGPT appears to be the only model capable of capturing economic news that links to the market risk premium.

Presented at 2024 SIF.

 

 

Which Factors Matter In the Pricing Kernel?

with Bin Luo and Ti Zhou (current version: Aug., 2025).

We propose a simple and efficient framework for estimating a linear pricing kernel spanned by factors through the construction of a mean–variance efficient factor portfolio. Specifically, we leverage the insight that the mean–variance optimization problem can be reformulated as a particular OLS regression, which we solve using LASSO regularization. Our method is well suited for high-dimensional settings where the number of factors exceeds the number of observations. We establish that the out-of-sample Sharpe ratio of the resulting portfolio converges to the theoretical maximum at the fastest rate to date, implying that the estimated pricing kernel yields asymptotically zero pricing errors. Moreover, the method eliminates redundant factors by shrinking their weights toward zero. Applying our methodology to a comprehensive set of 174 characteristic factors plus the market, we find that the factor portfolio achieves an out-of-sample Sharpe ratio of 3.59, significantly outperforming competing methods and producing the lowest pricing errors for the candidate factors. The selected factors vary over time in both composition and number, with an average of around 50. Among them, earnings predictability and the market emerge as the most influential, followed by factors related to intangibles, momentum, and investment. Our framework also introduces a novel model comparison test, rejecting the hypothesis that eight widely used sparse factor models suffice to span the pricing kernel. Finally, we show that corporate bond factors contribute little to equity pricing, and that a real-time pricing kernel with only published characteristic factors continues to achieve substantial mean–variance efficiency.

Presented in 2025 at Wharton Jacobs Levy Center's Frontiers in Quantitative Finance Conference in NYC (discussed by Prof. Robert Stambaugh)

 

 

Systematic Momentum in Corporate Bond Returns

with Cheng Gao, Sophia Zhengzi Li and Peixuan Yuan (current version: Sept, 2025).

We propose a bond systematic momentum factor and show that, together with the bond factor of Kelly, Palhares, and Pruitt (2023), it delivers the best-performing bond factor model to date. Our factor combines information from both firm- and bond-level characteristics, and is constructed from the systematic component of corporate bond returns. We find that limits to arbitrage are central to its superior performance. Furthermore, the bond systematic momentum outperforms both bond factor momentum and explains largely stock systematic momentum. Our results provide new insights into momentum as well as the joint pricing and integration of corporate debt and equity markets.

 

 

Momentum and Factor Momentum: A Re-Examination

with Cheng Gao, Sophia Zhengzi Li and Peixuan Yuan (current version: Dec., 2024).

(Data and Code generating the main result)

We show that the momentum factor remains a unique and irreplaceable factor, in contrast to the redundancy finding of Ehsani and Linnainmaa (2022), which suffers from an omitted-variable problem. By adding a betting-against-systematic (BAS) factor to their framework, we find that the momentum factor exhibits significant alpha. Further, we demonstrate that even an improved factor model, such as IPCA, cannot explain the momentum unless momentum characteristics are utilized. Moreover, in an attribution analysis, we show that firm-specific components, not non-momentum factors, are the primary drivers of momentum returns.

 

 


No Sparsity in Asset Pricing: Evidence from a Generic Statistical Test

with Junnan He and Lingxiao Zhao (current version: July, 2025).

We present a novel test to determine sparsity in characteristic-based factor models. Applying the test to industry and pseudo-random portfolios, we reject the null hypothesis that fewer than ten factors are sufficient to explain returns, and show that at least thirty factors are needed for the various subsample periods examined. We find that dense models outperform sparse ones in both pricing and investing. Testing with tree-based portfolios also indicates no sparsity. Our results suggest that most existing characteristic-based models, which have fewer than six factors, are questionable, and that future research on such low-dimensional models is unlikely to be fruitful.

 

 

ETFs, Anomalies and Market Efficiency

with Ilias Filippou, Songrun He and Sophia Zhengzi Li (current version: April, 2025).

This paper investigates the role of exchange-traded fund (ETF) ownership in enhancing market efficiency, drawing on a comprehensive set of over 200 return anomalies in the U.S. equity market. Controlling for key confounding factors such as short-selling constraints, arbitrage costs, and information environments, we find that higher ETF ownership is associated with reduced mispricing and lower anomaly-based returns. Using Russell index reconstitution as a natural experiment, we provide additional causal evidence of ETF ownership attenuating anomaly profits. The mitigating effect of ETF ownership on mispricing is primarily driven by active ETFs and is more pronounced for build-up anomalies than for resolution anomalies. Further analyses reveal that ETFs attenuate mispricing by facilitating both ex-ante information acquisition and ex-post information incorporation. Overall, our findings suggest that ETFs contribute to improved market efficiency by reducing mispricing arising from a broad range of anomalies.

Presented SGF 2023 and WFA 2023 (would be at TAU Finance Conference 2023); to be presented at Alpine Finance Summit.

 

 

Market Return Decomposition and Equity Risk Premium Predictability

with Yanyan Lin, Chongfeng Wu and Shunwei Zhu (current version: Sept., 2025).

We introduce a novel decomposition of stock market returns into a fundamental component (FC), capturing long-term growth, and an unexpected capital gain component (UC), reflecting short-term fluctuations. Unlike the permanent-temporary price decomposition of Fama and French (1988) and the long-run return framework of Bansal and Yaron (2004), our approach is explicit and simple to estimate. We show that prior works understate return predictability because predictive regressions have little relation with the fundamental component. Using 41 predictors from Goyal, Welch and Zafirov (2024), we find 33 significant predictors of market returns, compared with only 5 in their study, providing strong evidence of market predictability.

 

 


Sparse Macro Factors

with David Rapach (current version: August, 2025).

(Macro-Finance Factors Website)

We estimate sparse principal components from a large set of macro-finance variables. Each component is a sparse linear combination of the underlying variables, enhancing economic interpretability and yielding sharper asset pricing signals. Innovations to the components constitute a set of sparse macro-finance factors. Robust tests show that sparse factors tied to housing, yields, and credit spreads earn significant risk premia. Among the top 20 factor models formed from prominent characteristic-based factors and mimicking portfolios for the housing, yield, and credit spread factors, the latter three play leading roles, highlighting the importance of sparse macro-finance factors for capturing systematic risks.

Best paper award, Inquire UK and Inquire Europe, 2019

 

 

A New Option Momentum: The Role of the Systematic Component

with Heiner Beckmeyer and Ilias Filippou (current version: Sept., 2025).

This paper documents the first transaction-cost-robust option momentum, based on the systematic component of option returns. Using a cutting-edge latent factor model that incorporates information from a broad range of stock and option characteristics, we decompose option returns into their systematic and idiosyncratic components. Our findings reveal that the systematic component exhibits stronger momentum, leading to an option return momentum strategy that significantly outperforms traditional option momentum and option factor momentum. Moreover, we show that characteristics related to risk and quality have the greatest influence on our systematic option momentum, while past prices play a limited role.

Best paper awards, INQUIRE UK/Europe, 2024 and FMA Asset Mgt Consortium at Cambridge, 2024; Presented at EFA, 2024.

 

 

Going Supranational: Anomaly-Market Links and New Dimensions of Market Efficiency

with Xi Dong, Yan Li, Yanran and David Rapach (current version: April, 2025).

We connect cross-sectional anomalies to time-series market return predictability using data from 44 non-US countries. While a large set of representative anomaly returns show limited predictive power for market returns at the country level, they exhibit strong predictability when aggregated to the supranational level. We develop an international analytical framework to explain this difference: cross-sectional mispricing corrections in one country can propagate into market-wide corrections in another, enhancing supranational predictability precisely when mispricing is more country-specific than global. We further decompose anomaly–market links into three analytically-grounded market (in)efficiency measures of broad relevance: systematic mispricing, overpricing dominance, and price randomness. Supported by data, they govern the strength and nature of anomaly–market links across global markets.

 

 

Sharpe Ratio Timing with Stop Loss

with Yufeng Han, Hong Liu and Jing Xu (current version: Sept, 2025).

We propose a new approach to market timing by targeting the Sharpe ratio, which nests volatility timing as a special case when the expected market return is constant. We further develop a theoretical model and introduce a stop-loss strategy, a novel addition to market timing, to enhance performance. Empirically, we find that Sharpe ratio timing substantially outperforms volatility timing, and incorporating the stop-loss strategy improves the results even further. Our study highlights the joint importance of the risk-return tradeoff and risk control in profitable trading, captured by the Sharpe ratio and stop-loss strategy, respectively.

 

 

Intraday Option Reversals: Return Predictability and Market Efficiency

with Heiner Beckmeyer, Ilias Filippou and Zhaoque (Chosen) Zhou (current version: Jan., 2025).

We find the first option reversal patterns intraday: returns reverse half-hourly during the trading day. The reversals are both economically and statistically significant and are robust to transaction costs and various controls, such as implied volatility changes and market frictions. The reversals are unrelated to cross-day momentum. Additionally, we provide an option-demand theoretical framework to explain the patterns. Our findings suggest that intraday demand pressures are important for asset pricing intraday, which drives the reversals and has profound implications for market efficiency.

 

 

Overnight-Intraday Reversal Everywhere

with Chun Liu, Yang Liu, Tianyu Wang and Yingzi Zhu (current version: April, 2025).

We document a novel intraday reversal pattern: buying assets with the lowest past overnight returns and selling those with the highest yields substantial intraday Sharpe ratios two to five times larger than traditional reversal strategies. The pattern is robust across asset classes and holds out-of-sample. We propose a unified explanation based on asset class-specific liquidity provision, whereas the conventional channels fail. Additionally, we show that cross-sectional return dispersion positively predicts both the strategy’s expected return and conditional Sharpe ratio, and provide a new global two-factor model to explain intraday return variations across asset classes.

 

 

Leading Stocks and the Stock Market Expected Returns

with Zhuo Chen, Xianfeng Hao and Honghai Yu (current version: Feb., 2025).

We identify leading stocks using a machine learning method, and find that the negative leaders, which lead other stocks negatively, have a strong predictive power on the future stock market returns both in- and out-of-sample, whereas the positive leaders do not. The predictability generates significant economic value to a mean-variance investor in asset allocation. Economically, underreaction of the followers of the negative leaders appears the driving force for the predictability. Our study provides the first empirical evidence that bridges the lead-lag literature to the literature on the predictability of the market risk premium.

 

 

Fear in the "Fearless" Treasury Market

with Tianyang Wang, Yuanzhi Wang and Qunzi Zhang (current version: Nov., 2024).

This paper examines how fear affects the Treasury market and predicts Treasury bond returns. Using a text-based fear index from social and news media, we find that fear significantly predicts future Treasury returns, both in-sample and out-of-sample, and suggests the global transmission of fear. We also propose a model explaining that risk aversion shocks drive bond risk premia. Our paper further explores various dimensions of fear effects, such as term, magnitude, dynamics, and sources, and compares them with other sentiments. The results highlight the critical role of fear in Treasury market dynamics.

 

 

Optimal Portfolio Choice with Economic Constraints: A Genetic Programming Approach

with Yang Liu (current version: April, 2024).

We develop a new approach to construct the mean-variance efficient portfolio by directly targeting the optimal weight with economic-motivated regularization that incorporates economic constraints to guard against overfitting and enhance interpretability. Instead of struggling with noisy estimators of expected return and covariance matrix, we interpret a portfolio rule as a mapping from historical data to optimal weights and take advantage of the vigorous searching capability of genetic programming (GP) to estimate this weighting function directly. While conventional penalties, such as L1 and L2 norms, are not feasible in our model due to GP's non-parametricity, we propose a trading-frictions-based regularization to control model complexity while preserving interpretability. The out-of-sample Sharpe ratio of our GP approach more than doubles those of existing methods. Beyond portfolio choice, we also derive a model-implied expected return measure from the GP-optimal weight and find that it subsumes the predictability of other machine learning methods in the cross-section of stock returns. Our study highlights the importance of marrying machine learning and economic rationale for interpretable machine learning applications in asset pricing.

 

 


Unusual Financial Communication: ChatGPT, Earnings Calls, and Financial Markets

with Lars Beckmann, Heiner Beckmeyer, Ilias Filippou and Stefan Menze (current version: Feb, 2025).

We devise a prompting strategy for ChatGPT to detect and analyze unusual aspects of financial communication in earnings calls. We identify 25 dimensions across four categories: unusual communication styles by executives and analysts, unusual contents, and technical difficulties. Unusual financial communication is common, correlates with certain firm characteristics and fluctuates with the business cycle. Financial markets react to both aspects – unusual communication styles and unusual contents – with a negative stock return, elevated trading activity, higher volatility and option-implied uncertainty, and downward revisions of next-quarter earnings forecasts by analysts. Our study demonstrates the potential of large language models to provide new insights into the interpretation of financial textual data.

Finalist for Crowell Memorial Prize; Presented at CICF 2024.

 

 

Ex-Ante Risk Premia on Earnings Announcements: Evidence from the Options Market

with Hong Liu, Yingdong Mao and Xiaoxiao Tang (current version: April, 2025).

We provide the first estimate of the ex-ante risk premia on earnings announcements using forward-looking information from the options market. We find that the average earnings announcement risk premium is highly significant at 13 basis points, with substantial variation across firms and over time. Beyond its asset pricing implications, our study also sheds economic insights into understanding the post-earnings-announcement drift and what drives the profitability of widely analyzed straddle strategies. Moreover, our results enable the design of a market-timing strategy based on earnings announcement risk premia, yielding an improvement in the market Sharpe ratio by 37%.

 

 

Option Expected Hedging Demand

with Xiaoxiao Tang and Zhaoque (Chosen) Zhou (current version: Feb., 2024).

Options market makers' delta hedging has an increasing impact on underlying stock prices as both the option volume and the ratio of option volume to stock volume grow drastically in recent years. We introduce a novel approach utilizing real-time option information to calculate the spot elasticity of delta (ED) and expected hedging demand (EHD), and find that the EHD significantly predicts future stock returns in the cross section. The positive impact of EHD on stock prices lasts up to five trading days, and then a reversal follows. The empirical evidence of heterogeneous EHD-return relationship, influenced by ED, leads to varied option market maker behaviors, and is consistent with conventional economic theory. Moreover, we find that EHD has a little correlation with other popular firm characteristics, representing a new risk that is not captured by conventional factor models.

 

 

Seeing is Believing: Annual Report Visuals and Stock Returns

with Xiahu Deng, Lei Gao and Bo Hu (current version: July, 2025).

Why do companies release visually enhanced annual reports that seem redundant with their 10-K filings? We develop a novel model and provide evidence, by employing multiple AI deep learning models for empirical analysis, that firms earn 3–5% abnormal returns after adding visuals to their annual reports, preceded by a surge in institutional investor attention. Firms using data-oriented graphics show no abnormal returns, whereas those with illustrative visuals highlighting innovation and technology exhibit stronger effects. Moreover, firms adopting R&D-focused visuals experience a notable increases in patents granted and radical innovation, suggesting visuals mitigate investor inattention and convey nuanced fundamental information.

Presented at 2024 EFA.

 

 

Market Risk Premium Expectation: Combining Option Theory with Traditional Predictors

with Hong Liu, Yueliang (Jacques) Lu and Weike Xu (current version: April, 2025).

We propose a new state-dependent bound (SDB) on expected market risk premium, linking optionbased bounds to the traditional predictability literature. The proposed SDB significantly improves out-of-sample predictions of market risk premium, outperforming models that rely solely on either option prices or traditional stock market predictors. Moreover, using the SDB substantially increases portfolio Sharpe ratios and enhances investor utility. In a cross-sectional analysis of expected stock returns, we show that option information provides incremental value beyond conventional firm characteristics. Our novel findings highlight the importance of integrating information in both option prices and economic state variables.

Presented at AFA 2024.

 

 

Market Risk Premium: Best Linear Predictor in High Dimension

with Fuwei Jiang, Kunpeng Li and Guoshi Tong (current version: Oct., 2024).

In the age of big data, PCA and PLS are widely used in finance for dimension deduction to identify a few predictive factors. In this paper, we make a surprising discovery that the dimension deduction can achieve the optimal lower bound of one in an equivalent model. We propose a supervised learning method to find the optimal predictive factor, which is the best linear combination of a large set of predictors. Our approach outperforms alternative dimension reduction techniques, such as PCA and PLS, theoretically. Just as an efficient portfolio highlights which assets are crucial in the pricing kernel, our optimal predictive factor pinpoints the most significant combination of predictors in forecasting. When applied to predicting the market risk premium, our method outperforms empirically not only both the PCA and PLS, but also all those state of art machine learning methods use by Dong, Li, Rapach, and Zhou (2022). Furthermore, our method reveals a set of novel predictors. Additionally, we identify the optimal predictive factors for marker volatility, bond excess return and macroeconomic aggregate, and find our method continues to perform the best.

 

 

Maximizing the Sharpe Ratio: A Genetic Programming Approach

with Yang Liu and Yingzi Zhu (current version: Feb., 2025).

(On-line Appendix)

While existing studies focus on minimizing model fitting errors, we maximize directly the Sharpe ratio of spread portfolios with a genetic programming (GP) approach. We find that the GP approach can double the performance in the US and outperform internationally, compared with other approaches under examination. We also apply the GP to maximize the Sharpe ratio of investing in all the underlying stocks, which amounts to searching for the stochastic discount factor that prices all the assets. We find that the Sharpe ratio is 75% greater than before, indicating the loss of relying on spread portfolios for investing and pricing can be substantial.

Presented at AFA 2024, and at 2021 CICF.

 

 

Which Expectation?

with Juhani T. Linnainmaa and Yingguang (Conson) Zhang (current version: Dec., 2023).

We test a theory of two expectations in asset pricing: investors separately form beliefs on cash flow level and cash flow growth when valuing assets. Using 123 anomalies and analysts’ earnings term structure forecasts, we find strong evidence for the separability of the two beliefs. Forecast errors in cash flow level and cash flow growth are uncorrelated. Anomaly portfolios typically manifest biases in one belief or the other but not both. Anomalies with large (small) alphas often have the two biases amplifying (offsetting) each other. The first two principal components of anomaly returns are essentially a growth bias factor and a level bias factor. The two biases explain about 50\% of the anomaly portfolios' cross-sectional deviation from the CAPM. Level bias generates large initial alpha and growth bias generates persistent alpha. We also provide an explanation for the recent alpha decay with analysts’ improved forecast accuracy.

Presented at AFA 2024.

 

 

Myopic Expectations and Stock Market Mispricing

with Yingguang (Conson) Zhang and Yingzi Zhu (current version: April, 2024).

(On-line Appendix)

Are expectations in financial markets myopic? Based on a new multi-horizon expectation framework and using data of U.S. stock analysts’ forecasts, we find that their forecasts are myopic, and their myopic expectations are associated with large price distortions even in recent periods. Our study distinguishes among different sources of myopic expectations, reconciles myopia with long-horizon belief overreaction, quantifies myopia effects across horizons, tests the role of information frictions, and assesses the economic significance in terms of trading profits. Our framework is generally applicable to other settings with multi-horizon expectations, providing a useful tool for future research.

 

 

Fama-MacBeth Regression with Asset Pricing Restriction

with Yuanqi Yang and Yifeng Zhu (current version: May, 2024).

In this paper, we propose a modified Fama-MacBeth regression that incorporates asset pricing restrictions into the estimation. The restrictions require the model to explain both the time series and cross-sectional variations, and also to select factors for sparsity. Solving the estimation via a least angle regression-type algorithm, we find empirically that the new model outperforms existing factor selection methodologies in predicting the cross-sectional stock returns. In addition, we propose new interpretable characteristics-based factors, and our factors outperform classical factors models.

 

 

Pockets of Factor Pricing

with Sophia Zhengzi Li and Peixuan Yuan (current version: December, 2023).

Current factor models assume certain pre-specified factors can price or explain asset returns with the same level of ability across time. In contrast with this conventional wisdom, we find that factor's pricing ability exhibit notable temporal variations, and it tends to cluster in certain periods referred to as "pockets." We propose a real-time approach to effectively identify the pockets, and apply it to a comprehensive set of firm characteristics. We find episodic and distinct dynamics of return predictability for different types of characteristics, contradicting the notion of continuous presence of the same factors with the same pricing ability. Exploiting factor's time-varying predictive power, we construct a composite predictor/factor that achieves a value-weighted hedge return of 3.94% per month with a high t-statistic of 13.87. Additionally, the composite factor pricing model, which incorporates a selection of factors with factor timing, demonstrates superior effectiveness in both explaining and predicting market anomalies. The factor also provides a comprehensive explanation for factor momentum, which is shown a consequence of the past performance of factor returns.

 

 

Did Retail Traders Take Over Wall Street? A Tick-by-Tick Analysis of GameStop's Price Surge

with Zhaoque (Chosen) Zhou (current version: November, 2024).

GameStop’s stock price unprecedentedly surged by over 2800% in January 2021. Unlike previous studies, we utilize tick-by-tick data of both stock and option trades to show that this dramatic price rise was primarily driven by overnight trading and largely fueled by institutional orders rather than retail activity. Our analysis of option trading further provides evidence of a “gamma squeeze”. Theoretically, we extend the Brunnermeier and Pedersen (2005) model to explain several of our key findings. Overall, we conclude that it is because of the institutional backing that retail investors succeeded in driving the stock price bubble.

Presented at CICF 2024.

 

 

Macro Financial Trends and Equity Risk Premium

with Yufeng Han and Yueliang (Jacques) Lu (current version: Aug., 2025).

This paper shows that trends, typically used for monetary policy guidance, are also effective in predicting market excess returns. Using a linear combination method across 14 economic and financial predictor variables, we find that moving-average trends outperform the variables’ current values in forecasting market returns. Incorporating neural networks further improves these predictions. Our findings underscore the importance of trends, supporting the Federal Reserve’s emphasis on trends over lagged variables. When accounting for nonlinearity, we find that market return predictability is significantly greater than commonly believed. Our results are robust across both U.S. and global equity markets.

 

 

Expected Index Option Return: What Can We Learn From Macro and Anomalies

with Heiner Beckmeyer and Guoshi Tong (current version: Jan., 2024).

We provide the first study on whether the expected returns on stock index option is predictable and how, extending the large literature on the predictability of the stock market in a new direction. We find that the stock index option is predictable by common macroeconomic predictors whose predictive power on options is even stronger than on the underlying. We find also that, although stock market inefficiency, as captures by anomalies, explains the future option returns, option market inefficiency plays a greater role. The economic value of incorporating the option predictability versus ignoring it can be substantial.

Presented at CICF 2024.

 

 

Bottom Up vs Top Down: What Does Firm 10-K Tell Us?

with Landon Ross, Jim Horn, Mert Pilanci and Kaihong Luo (current version: November, 2024).

In contrast to the recent increasing focus on large languages model, we propose a bottom-up approach that exploits the individual predictive power of each word. Our word dictionary is constructed by using a data-driven approach, and it is these selected words that are used to build the predictive model with lasso regularized regressions and large panels of word counts. We find that our approach effectively estimates the cross-section of stocks' expected returns, so that a factor that summarizes the information generates economically and statistically significant returns, and these returns are largely unexplained by standard factor models. However, an inspection of the factor dictionary indicates the element contains many words with possible risk-related interpretations, such as currency, oil, research, and restructuring, which increase a stock's expected return, while the words acquisition, completed, derivatives, and quality decrease the expected return.

 

 

Interpretable Factors of Firm Characteristics

with Yuxiao Jiao and Yingzi Zhu (current version: February, 2024).

We propose a new approach to construct factors from firm characteristics. In contrast to existing studies, each of our factors comes from the same group of statistically related firm characteristics, making its economic interpretation possible. The number of groups is not chosen ad hocly, but rather determined by data. Applying our method to a set of 94 representative firm characteristics, we find that the factors chosen by our approach are not only easy to interpret economically, but they also outperforms typical machine learning models. We also apply our approach to the recent and highly effective IPCA model of Kelly, Pruitt and Su (2019), and find that our factors not only are well linked apparent economic risks, but also can price assets no worse than the standard IPCA model.

Presented at 2024 AFA (poster) and 2024 CICF.

 

 

Empirical Asset Pricing with Probability Forecasts

with Songrun He and Linying Lv (current version: February, 2025).

We study probability forecasts in the cross-section of asset pricing and find that simple probability forecast models can perform as well as sophisticated ones, all of which deliver Sharpe ratios comparable to the best of existing return forecast models. Combining probability forecasts with return forecasts yields superior portfolio performance versus using each alone. Additionally, probability forecasts augment existing factor models and improve tail risk forecasts significantly. The results suggest that probability forecasts, so far largely ignored, can offer unique and valuable insights into understanding the cross-section of stock returns.

Presented at 2025 AFA.

 

 

How Accurate Are Survey Forecasts on the Market?

with Songrun He, Jiaen Li and Linying Lv (current version: March, 2025).

We find that three widely used survey forecasts fail to predict the stock market out-of-sample, raising important questions about the reliability of survey forecasts and the proper interpretation of the extensive literature that depends on them. In contrast, we demonstrate that a naive Bayesian learning model and analysts' expectations can significantly predict the stock market out-of-sample. This suggests that these alternatives provide more meaningful insights into investors' attitudes toward risk. As a result, studying these new sources of information may be more impactful and warrants greater attention compared to the reliance on survey forecasts.

 

 

Data-mined Anomalies and the Expected Market Return

with Peng Li, Emmanouil Platanakis, and Xiaoxia Ye (current version: July, 2025).

Based on a theoretical framework for mispricing correction persistence, we propose a two-stage anomaly selection approach to predict market returns. The first stage screens the t-statistic of anomaly returns, and the second stage estimates the slope coefficient of predictive regression to select the most promising anomalies for predicting the market. The selected data-mined anomalies from a universe of several thousand signals exhibit strong and persistent mispricing correction dynamics. We show that aggregate returns of long-short portfolios constructed from these data-mined anomalies are significantly linked to the predictability of aggregate excess market returns, delivering statistically significant out-of-sample predictions of market excess returns and surpassing the predictive power of published anomalies.

 

 

Principal Portfolios: The Multi-Signal Case

with Songrun He and Ming Yuan (current version: October, 2022).

In this paper, we extend Kelly, Malamud, and Pederson"s (2021) new asset pricing framework to allow incorporating multiple predictive signals into optimal principal portfolios. Empirically, we find that the multi-signal theory is valuable for combining signals, improving a naive combination of single signal principal portfolios.

 

 


Anomaly Returns and FOMC

with Lin Tan and Xiaoyan Zhang (current version: April, 2023).

We find that anomaly returns are generally unchanged during FOMC days. The average return on the long- and short-leg, of a comprehensive set of 207 anomalies, increases by 26.3 bps and 28.8 bps, respectively, prior to the FOMC and reverses back afterwards. But for a small group of anomalies that do have substantial changes, their profitability tends to go down with absolute pricing errors greater than usual. Our evidence challenges existing studies that find the CAPM perform better during the FOMC period. Furthermore, we uncover that the less participation of retail investors contributes to the decline of profitability.

Presented at 2023 CICF.

 

 

Predictive Information in Corporate Bonds for the Equity Premium

with Sophia Zhengzi Li and Peixuan Yuan (current version: November, 2024).

This paper presents strong empirical evidence that the corporate bond market contains predictive information about the equity premium. We introduce a novel predictor, the term-structure slope of corporate bond returns, that predicts stock market returns both in- and out-of-sample. The predictability reflects gradual information diffusion stemming from market segmentation: the bond slope embeds forward-looking information about firm fundamentals and macroeconomic activity, and its predictive power is short-lived and strengthens when the two markets are less integrated. Moreover, the effect extends to stock portfolios sorted by size, value, and industry, and is particularly strong among stocks with high credit risk exposure.

Presented at CICF 2024.

 

 

Unspanned Risk and Risk-Return Tradeoff

with Huacheng Zhang (current version: Dec, 2023).

A major tenet of modern finance is the risk-return tradeoff, and yet there is a lack of empirical evidence supporting it. We provide an unspanned risk explanation, which, measured as uncertainty beyond financial markets, is well approximated by the macro uncertainty index of Baker, Bloom, and Davis (2016), 90% of which can be attributed to unspanned uncertainty. We find the first out-of-sample evidence that there is a positive risk-return tradeoff after all. In addition, we find that the unspanned risk matters at stock level too: a high-minus-low unspanned risk portfolio can generate an annualized return of 3.5%.

 

 

Hide in the Herd: Macroeconomic Uncertainty and Analyst Forecasts Dispersion

with Shen Zhao (current version: June, 2025).

We identify a negative correlation between macroeconomic uncertainty and the dispersion of analysts' earnings forecasts, attributing this to herding behavior. Our findings indicate that the herding firms, whose analysts are subject to this behavior, exhibit higher firm-level uncertainty, less informative stock prices regarding fundamentals, and a greater likelihood of overpricing, resulting in lower subsequent returns. Our study ties the relationship between macro-level uncertainty and micro-level dispersion to firm-specific characteristics, reinforcing the idea that elevated uncertainty amplifies psychological biases, leading to informational inefficiencies.

 

 

Betting Against the Crowd: Option Trading and Market Risk Premium

with Jie Cao, Gang Li and Xintong Zhan (current version: March, 2025).

We comprehensively study how option trading influences the equity market risk premium. Surprisingly, we find that trading of individual call options predicts the market index more strongly than index options. This predictability is both statistically significant and economically substantial, persisting from weeks to months. Aggregate individual options trading largely reflects investor sentiment and is primarily driven by retail investors. It also forms the key component in an ensemble learning model, combined with index option trading and other related predictors, respectively. Among all predictors examined, option trading emerges as the most powerful predictor of the market risk premium.

Presented at 2023 CICF.

 

 

Commodity Inflation Risk Premium and Stock Market Returns

with Ai Jun Hou, Emmanouil Platanakis and Xiaoxia Ye (current version: Dec, 2023).

We propose a novel measure of commodity inflation risk premium (cIRP) based on a term structure model of commodity futures. The cIRP, capturing forward-looking information in the futures markets, outperforms well-known characteristics in explaining the cross-section of commodity returns. The associated cIRP factor has the highest Sharpe ratio among the existing factors, and has substantial new information beyond them. Moreover, various aggregations of the individual cIRP predict stock market returns significantly, even after controlling for major economic predictors including the usual inflation measure. The link between commodities and the stock market is stronger than previously thought.

 

 

Heterogeneous Responses in Financial Markets: Insights from Machine Learning

with Xiaoxiao Tang and Xiwei Tang (current version: Aug., 2025).

We propose a machine learning framework that extends the Fama-MacBeth regression to capture heterogeneity in stock return responses to firm characteristics, allowing more flexible estimation of expected returns in the cross section. Using 15 representative firm characteristics, our method nearly doubles the Sharpe ratio of the long-short portfolio formed using Fama-MacBeth return forecasts. It also offers greater interpretability and outperforms other machine learning models, even in high-dimensional settings with 94 characteristics. Our results emphasize the importance of heterogeneity in stock return responses, especially during recessions, and challenge the traditional homogeneity assumption embedded in the Fama-MacBeth regression, with broad implications for empirical asset pricing.

Presented at CICF 2024.

 

 

Asymmetry in Variance: Does It Matter to Stock Returns?

with Xiaoxiao Tang (current version: December, 2024).

We propose a novel measure of asymmetry in variance, AVar, for stock returns and link it to its counterpart under the risk-neutral measure, making it possible to use forward-looking option information to rank stocks by AVar. Empirically, based on option data, we find that the greater the AVar, the greater the stock returns in the cross-section. A long-short portfolio sorted by AVar generates a monthly return of 1.61% and a Sharpe ratio of 1.37, both of which are economically and statistically significant, suggesting that AVar plays a crucial role in asset pricing. Moreover, the term structure of AVar explains the time variation in future stock returns.

 

 

Do Labor Flows Matter in the Stock Market?

with Jian Chen, Chunmian Ge, Nan Li and Jiaquan Yao (current version: Dec., 2024).

(On-line Appendix)

Using data from individual resumes of employees at public firms, we propose a novel measure of monthly innovations in aggregate labor market flows. Our findings reveal that this measure significantly predicts one-month-ahead market returns, both in- and out-of-sample. This predictability cannot be explained by existing labor or macroeconomic predictors. Further analysis indicates that the new labor market predictor captures information about firms’ investment growth, which is not efficiently incorporated into stock prices. This inefficiency arises from investor underreaction, driven by factors such as inattention, heightened information uncertainty, and the slow diffusion of negative news. Our study also reconciles and extends existing research that predominantly focuses on hiring and market returns over longer time horizons or in cross-sectional contexts. By providing robust evidence of the short-term predictability of the aggregate market risk premium, we underscore the labor market’s critical role in conveying economic information and affecting the systematic risk of the broad asset markets.

 

 

ESG and the Market Return

with with Liya Chu, Kent Wang and Bohui Zhang (current version: Dec., 2022).

We propose an environmental, social, and governance (ESG) index. We find that it has significant power in predicting the stock market risk premium, both in- and out-of-sample, and delivers sizable economic gains for mean-variance investors in asset allocation. Although the index is extracted by using the PLS method, its predictability is robust to using alternative machine learning tools. We find further that the aggregate of environmental variables captures short-term forecasting power, while that of social or governance captures long-term. The predictive power of the ESG index stems from both cash flow and discount rate channels.

Presented at CICF 2024.

 

 

Does Compensation Matter? Evidence from CD&A Disclosures

with Xiumin Martin and Jie (Jane) Xu (current version: April, 2021).

We study whether the similarity of firm disclosures on the Compensation Discussion and Analysis (CD&A) has predictability for future stock returns. We find that changes to the language and construction of the CD&As predict firms' future stock returns. A portfolio that longs the CD&A "non-changers" and shorts the "changers" earns a significant Fama-French 5-factor alpha of 5.86% (annualized), for the period of 2008-2020. We further find that companies with low CD&A similarities invest less in R&D, are more likely to be targeted by short-sellers, and have greater forced CEO turnovers. Our results provide new and strong evidence on the role of executive compensation in the cross-section of stock returns.

 

 

Lottery Preference and Anomalies

with Lei Jiang, Quan Wen, and Yifeng Zhu (current version: Nov., 2022).

We construct a lottery factor based on 13 commonly used lottery proxies and show that this factor adds significant explanatory power to prominent factor models for anomalies, especially for those in the skewness and value groups. We find that anomaly returns are significantly stronger among stocks with high lottery features and are mainly driven by the short leg of lottery stocks instead of financial distress. We find further that lottery stocks are often associated with low short volume and high shorting fees, indicating that retail investors' preference to hold lottery stocks leads to a low lendable supply of such shares.

 

 

Economic Fundamentals and Short-Run Exchange Rate Prediction: A Machine Learning Perspective

with Ilias Filippou, David Rapach and Mark Taylor (current version: Jan., 2025)

(Appendix)

This paper establishes the out-of-sample predictability of monthly exchange rates based on economic fundamentals using country characteristics, global variables, and their interactions. Previous work does not find consistent evidence of short-horizon predictability, likely due to using a small set of fundamentals and inadequately capturing time variation and nonlinearities in predictive relations. By employing a large set of economic fundamentals and global variables in conjunction with machine-learning techniques, we are able to consistently and significantly outperform the stringent no-change benchmark forecast. We find stronger predictability during periods of crisis and recession. The exchange rate forecasts are also economically valuable, as they generate sizable utility gains for an investor in the context of foreign currency portfolios. To enhance our understanding of the economic drivers of exchange rate predictability, we identify the most relevant predictors for forecasting exchange rates in the fitted machine-learning models.

Presented at Vienna Symposium on Foreign Exchange Markets, 2021, and 5th Workshop in Financial Markets and Nonlinear Dynamics, 2021.

 

 

Fundamental Extrapolation and Stock Returns

with Dashan Huang, Huacheng Zhang and Yingzi Zhu (current version: January, 2022).

We propose an economic objective-driven pooling strategy to extrapolate multiple fundamentals simultaneously. This strategy outperforms naive extrapolation strategies that use a single fundamental variable and strategies that use past prices or analyst forecasts, and performs similarly as a machine learning-based pooling strategy. We propose a model to show that fundamental extrapolation has dual price effects: a cash flow effect that pushes stock price up relative to its fundamental value and a discount rate effect that depresses stock price via increasing the expected volatility. Our empirical results suggest that the discount rate effect dominates the cash flow effect.

Presented at AFA 2022 and EFA 2020.

 

 

Firm Fundamental Cycles

with Yufeng Han, Zhaodan Huang, and Weidong Tian (current version: Feb., 2025).

We present the first study of firm cycles analogous to business cycles in the broader economy. To identify the cycles, we construct two novel firm fundamental indexes that capture a wide range of business activities. Empirically, firm cycles help explain key market anomalies, including momentum, long-term reversal, and factor momentum. Additionally, we develop an equilibrium model in which production technology growth follows a mean-reverting process, suggesting that firm momentum (reversal) emerges following positive (negative) technology shocks. Furthermore, the model explains factor momentum, which is driven by firm characteristics and market-wide conditions but remains independent of stock momentum.

Best Paper Award, The World Finance Conference, 2019

 

 

Twin Momentum: Fundamental Trends Matter

with Dashan Huang, and Huacheng Zhang (current version: June, 2021).

Using trends in firm fundamentals, we find that there is a fundamental momentum in the stock market. Buying stocks in the top quintile of fundamental trends and selling stocks in the bottom earns a monthly average return of 0.85% comparable to price momentum. Combining both price and fundamental momentum produces a twin momentum, that earns an average return that exceeds their sum and is difficult to explain by short-sell impediment. Our results not only support the view that fundamental analysis is as important as technical analysis, but also indicate that trends contain incremental information beyond often used lagged fundamental predictors.

 

 

Corporate Bond Models: A New Performance Metric

with Xu Guo, Hai Lin and Chunchi Wu (current version: Feb., 2025).

The pricing error (PE) from the IPCA model of Kelly, Palhares, and Pruitt (2023) negatively predicts corporate bond returns in the cross-section, and those PEs from other models have even much greater predictability. A long-short PE portfolio based on the IPCA generates an average monthly return of 0.83%, which is economically significant and robust to using various factors and model specifications. Further analysis indicates that investor sentiment is a plausible driver of the PE predictability. Extending the IPCA model to include nontradable sentiment and macroeconomic uncertainty factors, we find that market sentiment and uncertainty play a role in the PE anomaly.

Presented at 2022 CICF.

 

 

An Information Factor: Can Informed Traders Make Abnormal Profits?

with Matthew Ma, Xiumin Martin and Matthew Ringgenberg (current version: September, 2019).

We construct an information factor (INFO) using the informed stock buying of corporate insiders and the informed selling of short sellers and option traders. INFO strongly predicts future stock returns -- a long-short portfolio formed on INFO earns monthly alphas of 1.24%, substantially outperforming existing strategies including momentum. INFO explains hedge fund returns in the time-series and cross-section. Higher values of INFO are associated with increases in aggregate hedge fund value. Moreover, funds with higher covariation between their returns and INFO outperform by 0.28% per month. The results show information processing skill is an important source of return variation.

 

 

Corporate Activities and the Market Risk Premium

with Erik Lie, Bo Meng and Yiming Qian (current version: October, 2017).

While existing asset pricing studies focus on macroeconomic variables to predict stock market risk premium, we find that an aggregate index of corporate activities has substantially greater predictive power both in- and out-of sample, and yields much greater economic gain for a mean-variance investor. The predictive ability of the corporate index stems from its information content about future cash flows. Cross-sectionally, the corporate index performs particularly well for stocks with great information asymmetry.

 

 

Sentiment Across Asset Markets

with Dashan Huang, Heikki Lehkonen and Kuntara Pukthuanthong (current version: June, 2018)

In this paper, we study investor sentiment in five major asset markets: stocks, bonds, commodities, currencies, and housing. Based on Thomson Reuter's sentiment measures extracted from 235 news and social media sources, we find that each market is predicted by its own sentiment. Cross-markets, kitchen sink regressions reveal that the stock market is influenced only by bond sentiment, while bond market is affected just by currency market, which is largely unexplained by others; the commodities are related to currencies and housing, and housing can be predicted by stock and bond sentiment. In an efficient information aggregation by the partial least square (PLS), the predictability of each market increases substantially by using information of all markets vs using only its own sentiment.

 

 

Cost Behavior and Stock Returns

with Dashan Huang, Fuwei Jiang and Jun Tu (current version: April, 2017).

This paper shows that investors do not fully incorporate cost behavior information into valuation. Firms with higher growth in operating costs generate substantially lower future stock returns. A long-short spread portfolio earns an average return of about 12% per year after controlling for extant risk factors and firm characteristics. Mean-variance spanning tests show that an investor can benefit from investing in this spread portfolio in addition to well-known factors. Firms with high cost growth also suffer from deteriorations in future operating performance. The negative cost growth-return relation is much stronger around earnings announcement days, among firms with lower investor attention, higher idiosyncratic volatility, and higher transaction costs, suggesting that investor underreaction and limits to arbitrage mainly drive the effect.

 

 

Taming Momentum Crashes: A Simple Stop-loss Strategy

with Yufeng Han and Yingzi Zhu (current version: August, 2015).

In this paper, we propose a stop-loss strategy to limit the downside risk of the well-known momentum strategy. At a stop-level of 10%, we find, with data from January 1926 to December 2013, that the maximum monthly losses of the equal- and value-weighted momentum strategies go down from -49.79% to -11.36% and from -64.97% to -23.28%, while the Sharpe ratios are more than doubled at the same time. We also provide a general equilibrium model of stop-loss traders and non-stop traders and show that the market price differs from the price in the case of no stop-loss traders by a barrier option.

 

 

Which Hedge Fund Styles Hedge Against Bad Times?

with Charles Cao and David Rapach (current version: February, 2015).

We examine hedge fund style performance in bad versus good times defined as (1) up and down equity market regimes derived from the 200-day moving average of the S&P 500 price index or (2) nonstressed and stressed financial market regimes determined endogenously using the Federal Reserve Bank of Kansas City Financial Stress Index and threshold estimation. We show that hedge fund styles often exhibit significant changes in risk factor exposures across good and bad times. For certain hedge fund styles, changes in factor exposures represent valuable hedges against bad times; in contrast, other hedge fund styles become more exposed to risk factors during bad times in a manner that magnifies downside risk exposure. In the context of “balanced” 40-30-30 portfolios that allocate across U.S. stocks, bonds, and individual hedge fund styles, we find that the Global Macro, Managed Futures, and Multi-Strategy styles provide investors with especially valuable hedges against bad times.

 

 

Forecasting Bond Risk Premia Using Technical Indicators

with Jeremy Goh, Fuwei Jiang, and Jun Tu (current version: July, 2013).

While economic variables have been used extensively to forecast the U.S. bond risk premia, little attention has been paid to the use of technical indicators which are widely employed by practitioners. In this paper, we fill this gap by studying the predictive ability of using a variety of technical indicators vis-a-vis the economic variables. We find that the technical indicators have statistically and economically significant in- and out-of-sample forecasting power. Moreover, we find that utilizing information from both technical indicators and economic variables substantially increases the forecasting performances relative to using just economic variables.

 

 

Forecasting Stock Returns During Good and Bad Times

with Dashan Huang, Fuwei Jiang and Jun Tu (current version: May, 2015).

We show that stock returns can be significantly predicted by past realized returns in both good and bad times, in and out of sample. We extend the model in Fama and French (1988) to show that stock returns display mean reversion and momentum over time, which is dependent on the market state. Specifically, past stock returns predict future returns negatively in good times and positively in bad times, which is consistent consistent with the change and level effects in P´astor and Stambaugh (2009).

 

 

Hansen-Jagannathan Distance: Geometry and Exact Distribution

with Raymond Kan; November, 2002.

This paper provides an in-depth analysis of the Hansen-Jagannathan (HJ) distance, which is a measure that is widely used for diagnosis of asset pricing models, and also as a tool for model selection. In the mean and standard deviation space of portfolio returns, we provide a geometric interpretation of the HJ-distance. In relation to the traditional regression approach of testing asset pricing models, we show that the HJ-distance is a scaled version of the aggregate pricing errors, and it is closely related to Shanken's (1985) cross-sectional regression test (CSRT) statistic, with the only major difference in how the zero-beta rate is estimated. For the statistical properties, we provide the exact distribution of the sample HJ-distance and also a simple numerical procedure for computing its distribution function. In addition, we propose a new test of equality of HJ-distance for two nested models. Simulation evidence shows that the asymptotic distribution for sample HJ-distance is grossly inappropriate for typical number of test assets and time series observations, making the small sample analysis empirically relevant.

 

Toward a Better Understanding of the Beta Method and the Stochastic Discount Factor Method

with Raymond Kan; May, 2002.

In a standardized factor model, Kan and Zhou (1999) show the stochastic discount factor (SDF) method yields less efficient estimates than the beta method when both are based on the generalized method of moments (GMM). By modifying the common use of the SDF [via adding more moment conditions to the practice before the publication of Kan and Zhou (1999)], Jagannathan and Wang (2001) and Cochrane (2000a,b) find that the two methods have the same asymptotic variance for the new GMM estimator (which no longer admits analytical solution). Moreover, their analysis relies on a joint normality assumption of both the asset returns and factors. In this paper, we show that: 1) once the normality assumption is relaxed, the modified SDF method is highly sensitive to factor skewness and kurtosis whereas the beta method is not, implying that the SDF estimates can be less reliable in realistic situations where the factors are leptokurtic; 2) in conditional asset pricing models, the modified SDF is in general still strictly dominated by the beta method in terms of estimation accuracy; 3) while it is not well understood and almost never used in the SDF formulation of an asset pricing model, the maximum likelihood method is well defined and has both strictly more efficient estimates and more powerful tests than the SDF method; 4) the SDF tests can have much less power than the beta method in conditional asset pricing models. In short, while the SDF set-up is an elegant theoretical formation, empirical estimation and tests should pay as much attention to the beta method as to the SDF if not more (one more reason is that, as shown by Jagannathan and Wang (2001), estimated model pricing errors have smaller variance by using the beta method than the SDF one).

 

The End