Methodology
Reference
The factor construction follows Liu et al. (2022). Additional methodological details from Borri et al. (2026).
Exact Liu et al. (2022) Methodology
The baseline factors (factors_liu.parquet) replicate the exact procedure described in Section II of Liu et al. (2022). Every parameter below is taken directly from the paper.
Data
- Source: CoinMarketCap — daily close price, volume, and market capitalisation (USD)
- Sample period: January 2014 – present (the paper covers January 2014 – July 2020)
- Survivorship-bias free: includes delisted coins via the
crypto_listings()endpoint of thecrypto2R package
Week Definition
Each calendar year is divided into exactly 52 weeks (Liu et al. 2022, Section II.A):
- Week 1 = January 1–7
- Weeks 2–51 = 7 days each
- Week 52 = remaining 8 days (9 days in leap years)
Eligibility Screen
Each week, a coin must satisfy:
- Non-missing price, market capitalisation, and return
- Lagged market capitalisation \(\geq\) $1,000,000
- No minimum age filter (not specified in the paper)
- No stablecoin exclusion (not specified in the paper for the original 2014–2020 sample)
CMKT — Crypto Market Factor
Value-weighted return of all eligible coins, minus the risk-free rate:
\[\text{CMKT}_t = \sum_{i} w_{i,t}\, r_{i,t} \;-\; R_{f,t}\]
where \(w_{i,t}\) = lagged market cap weight and \(R_{f,t}\) = 1-month T-bill rate (negligible at weekly frequency; set to 0).
CSMB — Crypto Size Factor
Each week, eligible coins are sorted by lagged market capitalisation into three groups using 30/40/30 breakpoints:
- Small = bottom 30% by market cap
- Neutral = middle 40% (excluded from the factor)
- Big = top 30% by market cap
Returns are value-weighted within each group.
\[\text{CSMB}_t = R_{\text{Small},t} - R_{\text{Big},t}\]
CMOM — Crypto Momentum Factor
CMOM uses a Fama-French 2×3 independent double sort on size and momentum.
Momentum signal (\(r_{3,0}\)): cumulative return over the prior 3 weeks, ending at the formation date: \[\text{mom}_{i,t} = \frac{P_{i,t-1}}{P_{i,t-4}} - 1\]
where \(P_{i,t}\) = end-of-week price. In our weekly panel, this is lag(price_eow, 1) / lag(price_eow, 4) - 1.
Double sort: coins are independently classified into:
- Size groups (30/40/30 by lagged market cap): Small / Neutral / Big
- Momentum groups (30/40/30 by \(r_{3,0}\)): Loser / Neutral / Winner
Neutral groups in both dimensions are excluded. Four corner portfolios remain:
| Loser (L) | Winner (H) | |
|---|---|---|
| Small (S) | SL | SH |
| Big (B) | BL | BH |
Returns are value-weighted within each cell.
\[\text{CMOM}_t = \frac{1}{2}(R_{SH,t} + R_{BH,t}) - \frac{1}{2}(R_{SL,t} + R_{BL,t})\]
Summary of Liu et al. Parameters
| Parameter | Value |
|---|---|
| Weighting | Value-weighted (lagged market cap) |
| Size breakpoints | 30% / 40% / 30% |
| Momentum breakpoints | 30% / 40% / 30% |
| Momentum signal | \(r_{3,0}\) (3-week return, no skip) |
| CMOM structure | 2×3 Fama-French independent double sort |
| Market cap filter | \(\geq\) $1,000,000 (lagged) |
| Risk-free rate | 1-month T-bill (set to 0 at weekly frequency) |
| Calendar | Sharp year (52 weeks, Jan 1–7 = week 1) |
| Stablecoin exclusion | None |
| Winsorisation | None documented |
Data Quality
Our data comes from CoinMarketCap, which — unlike CoinGecko (used by Liu et al. in their updated factor file) — does not apply exchange-level outlier detection. We therefore apply two filters:
Return Cap
Individual coin weekly returns are capped at 9,900% (100×). This handles CMC price/supply reporting glitches (e.g., SquidGrow: 2,597,962% weekly return). Affects ~870 coin-weeks; no legitimate crypto asset sustains 100× in a single week.
Implied Supply Filter
Coin-weeks where market_cap / price exceeds \(10^{16}\) tokens are excluded. This catches erroneous circulating supply data (e.g., INNBCL: price $0.00000001 with $126B market cap, implying \(1.3 \times 10^{19}\) tokens). Affects ~210 coin-weeks. Legitimate high-supply coins like Shiba Inu (\(\sim 5 \times 10^{14}\) tokens) are not affected.
Early Sample Period
Before July 2014, the CoinMarketCap universe contains fewer than 30 eligible coins. Factor estimates are noisy and dominated by Bitcoin, Litecoin, and a handful of other early coins. We recommend starting analyses from July 2014. Plots on this website use this cutoff.
The specification multiverse
Rather than report one “preferred” set of numbers, we construct the factors across a multiverse of defensible specification choices and report the distribution of results. The full results table (one row per specification, with factor means and Fama-MacBeth pricing statistics) is on the Data page; the findings are summarised in the Results.
Specification axes
| Axis | Options | Note |
|---|---|---|
| Data source | CoinMarketCap, CoinGecko | treated as a specification choice, not a fixed input |
| Breakpoint universe | All coins, Top-100, $100M floor | the headline axis (see below) |
| Evaluation universe | Full cross-section, Investable | which assets the model is asked to price |
| Gap-handling | Consecutive-week, Naive | see below |
| Investable-momentum | On, Off | see below |
| Weighting | Value-weighted, Equal-weighted | |
| Size breakpoints | 2 (median), 3, 5 (quintile), 10 (decile) | |
| Momentum lookback | 1, 2, 3, 4 weeks | |
| Calendar | Sharp year, Mon … Sun (7 weekday starts) | calendar/frequency are alignment boundaries — not cross-priced |
| Exclusions | None, Stablecoins, +Wrapped/derivatives | |
| Delisting returns | Off, On |
The headline weekly grid is ~6,900 internally-consistent “worlds” (each builds both the factors and the test assets with the same choices).
Breakpoint universe (the NYSE-breakpoint analog)
Crypto has no listing threshold — anyone can mint a token — so if size/momentum breakpoints are computed over all coins, the bottom deciles are pure microcap dust and the sort is dominated by economically irrelevant assets. Following the Fama-French convention of computing breakpoints on NYSE stocks only, we compute the cut points on an investable reference set (top-N by market cap, or a market-cap floor) and then assign all coins. This prices the full cross-section (no cherry-picking) with economically meaningful, snapshot-stable breakpoints. It is the single most consequential choice — see Results.
Gap-handling (consecutive-week returns)
A coin with a data gap, naively, produces a multi-week return mislabelled as one week (price / previous-available-price). These spurious extremes land in the size/momentum tails. The consecutive-week rule uses a return only when the prior calendar week actually exists.
Investable-momentum
Momentum is measured only over weeks in which the coin was continuously investable (≥ $1M throughout the lookback). Otherwise a coin that just crossed the size threshold on a one-off pump enters the winner portfolio with about-to-reverse “momentum”.
Data Pipeline
- Retrieval: Daily cryptocurrency snapshots from CoinMarketCap via
crypto2::crypto_listings(). Survivorship-bias free. - Processing: Returns computed from end-of-week prices. Return cap (9,900%) and implied supply filter applied. Aggregated to weekly (sharp-year or Monday-Monday) and monthly panels.
- Liu exact factors:
compute_factors_liu()with hardcoded 30/40/30 breakpoints →factors_liu.parquet - Variant factors:
compute_factors()with parametric breakpoints across all 576 combinations - Portfolios:
compute_portfolios()— decile sorts on size and momentum - Export: All files uploaded to Hugging Face as CSV/Parquet.
Citation
If you use these data in your research, please cite Stoeckl (2026) and Liu et al. (2022).