Replication of Liu, Tsyvinski & Wu (2022)

This page documents the replication quality of our factor construction against the published reference data from Liu et al. (2022). We use their exact methodology (value-weighted, 30/40/30 breakpoints, 2x3 double sort for CMOM, sharp-year calendar) and compare against their published factor file (LTW_3factor.xlsx).

Data Sources

Important

The LTW reference file contains a note: “Updated through 2022 and data are from coingecko since mid-[2020]”. Correlations are therefore only meaningful in the CoinMarketCap era (January 2014 – July 2020). Post-2020 divergence reflects a data source mismatch, not a methodology error.

Our data Liu et al. (2022)
Source CoinMarketCap via crypto2 R package CoinMarketCap (pre mid-2020), CoinGecko (post mid-2020)
Retrieval method crypto_listings() API endpoint Not documented (likely web scraping or earlier API)
Sample April 2013 – present January 2014 – April 2023 (in reference file)
Data cleaning Return cap at 9,900%; implied supply filter > 10^16 Not documented; CoinGecko has built-in outlier detection

Overall Replication Quality

Replication quality — CoinMarketCap era (Jan 2014 -- Jul 2020, 340 weeks)
Factor Correlation Our mean LTW mean Our SD LTW SD
CMKT **99.8%** 1.303% 1.330% 11.28% 11.23%
CSMB **72.6%** 0.663% 2.308% 11.46% 14.12%
CMOM **64.7%** -0.085% 2.231% 14.45% 13.37%

CMKT matches Liu et al. Table I almost exactly: mean 1.3%/week, SD 11.2%, skewness 0.235 (paper: 0.234).

Year-by-Year Correlations

Year-by-year correlations with LTW reference (CoinMarketCap era)
Year Weeks CMKT CSMB CMOM Avg coins
2014 49 99.6% 63.2% 51.9% 33
2015 52 99.9% 96.6% 64.4% 34
2016 52 99.7% 92.4% 78.6% 67
2017 52 99.5% 91.4% 64.4% 295
2018 52 99.9% 96.4% 82.1% 855
2019 52 100.0% 32.7% 58.4% 862
2020 31 100.0% 64.8% 47.3% 884

Key patterns:

  • CMKT: 99.5%+ every year. Bitcoin/Ethereum dominate value weights, making CMKT robust to small-coin differences.
  • CSMB: 92–96% in 2015–2018, drops to 33% in 2019 due to three anomaly weeks (see below).
  • CMOM: Best in 2018 (84.6%) when the universe is largest. Limited by thin double-sort cells in 2014–2015 (2–4 coins per cell).

Time-Series Comparison

Our factor returns (red) overlaid on LTW reference (blue). CoinMarketCap era only.

Universe Size

Eligible coins per year. 'Unique' = distinct coins appearing at least once; 'Avg/week' = average eligible coins per week.
Year Our unique Our avg/week Paper unique Difference
2014 123 32 109 14
2015 77 34 77 0
2016 154 67 155 -1
2017 760 295 795 -35
2018 1580 855 1559 21
2019 1394 862 1085 309
2020 1743 988 665 1078

The unique coins per year match well (exact in 2015–2016, within 2% in 2018). The average per week is lower because many coins only intermittently pass the $1M market cap filter. This is consistent with using the same data source (CoinMarketCap) through different API endpoints.

CSMB Divergence Analysis

The CSMB correlation drops to 33% in 2019, driven by three specific weeks:

Weeks with > 30 percentage point CSMB divergence
Week Our CSMB LTW CSMB Difference N coins
2019-07-30 2.9% 131.9% -129.0pp 874
2019-05-21 -0.7% 75.5% -76.2pp 903
2019-06-04 9.5% 53.6% -44.1pp 895
2014-04-09 -21.1% 15.5% -36.6pp 29

In these weeks, Liu’s CSMB implies a Small-group return of 130–150%, while our Small-group VW return is only 10–20%. This suggests Liu’s data contains a Small-cap coin with extreme returns (500%+) and significant value weight that is absent from or has different market cap data in our CoinMarketCap retrieval.

The coin TRONCLASSIC (TRXC) appears in our data with returns of 3,992% (Jul 2019) and 1,350% (May 2019), but its market cap ($1.3M) gives it negligible VW weight.

We cross-checked both our data sources (crypto_listings and crypto_history) for these anomaly weeks. No coin in either dataset has the combination of extreme return and significant Small-group market cap weight needed to match Liu’s CSMB. The crypto_listings endpoint (survivorship-bias free, 947 coins this week) has broader coverage than crypto_history (695 coins). Where both overlap, returns are identical.

Conclusion: The CSMB divergence in these 3 weeks is likely caused by different price or market cap data in Liu’s original CoinMarketCap retrieval (performed circa 2019–2020), which may reflect API revisions, different data vintage, or a coin whose data has since been removed from CoinMarketCap. This is an inherent limitation of retrospective data retrieval and cannot be resolved without access to Liu’s coin-level data.

CMOM Divergence Analysis

Average number of coins per double-sort cell by year. With < 5 coins per cell, single-coin movements dominate factor returns.
Year Small-High Small-Low Big-High Big-Low
2014 2.6 3.8 3.1 1.9
2015 3.7 3.3 2.3 3.0
2016 5.9 6.7 5.8 5.9
2017 26.5 28.5 27.3 20.8
2018 71.2 93.4 80.4 55.0
2019 78.3 92.1 79.1 58.0
2020 87.8 104.2 91.5 71.9

CMOM correlation improves dramatically as cell counts increase. In 2014, cells average 2–4 coins; by 2018, they average 56–94 coins. The 84.6% correlation in 2018 demonstrates that the methodology is correct once the universe is large enough for the double sort to be well-diversified.

Scatter Plots

Scatter plots of our factor returns vs. LTW reference (weekly, CoinMarketCap era). The 45-degree line is shown in grey.

Data Quality Filters

Our pipeline applies two data quality filters to handle CoinMarketCap reporting errors that would not appear in CoinGecko data (which Liu et al. switched to post-2020):

1. Weekly Return Cap (9,900%)

Individual coin weekly returns are capped at 9,900% (100x). This handles CMC price reporting glitches such as SquidGrow (2,597,962% weekly return on 2024-09-16) caused by erroneous price/supply data. Approximately 870 coin-weeks are affected across the full sample.

2. Implied Supply Filter

Coin-weeks where the implied circulating supply (market_cap / price) exceeds 10^16 tokens are excluded. This catches cases like Innovative Bioresearch Classic (INNBCL), which CMC reported with a price of $0.00000001 and a market cap of $126 billion (implying 1.26 x 10^19 tokens). Without this filter, INNBCL’s $126B value weight and eventual 9,900% capped return destroys the CMKT correlation (99.7% → 3.3%). Only ~210 coin-weeks are affected; legitimate high-supply coins like Shiba Inu (supply ~ 5 x 10^14) are not affected.

Conclusion

The factor construction methodology is correctly implemented:

  • CMKT: 99.7% correlation with published reference — near-perfect replication
  • CSMB: 92–96% correlation in 2015–2018; lower in 2019 due to 3 anomaly weeks where Liu’s data contains a Small-cap coin with extreme returns not present in our retrieval
  • CMOM: 84.6% in 2018 (best year); limited by thin double-sort cells in early years

Remaining divergence is driven by data source differences (different CoinMarketCap API endpoints/vintages), not methodology errors. The exact Liu et al. factors are available as factors_liu.parquet on Hugging Face.

References

Liu, Y., Tsyvinski, A., & Wu, X. (2022). Common risk factors in cryptocurrency. The Journal of Finance, 77(2), 1133–1177. https://doi.org/10.1111/jofi.13119