Methodology

Reference

The factor construction follows Liu et al. (2022). Additional methodological details from Borri et al. (2026).

Exact Liu et al. (2022) Methodology

The baseline factors (factors_liu.parquet) replicate the exact procedure described in Section II of Liu et al. (2022). Every parameter below is taken directly from the paper.

Data

Source: CoinMarketCap — daily close price, volume, and market capitalisation (USD)
Sample period: January 2014 – present (the paper covers January 2014 – July 2020)
Survivorship-bias free: includes delisted coins via the crypto_listings() endpoint of the crypto2 R package

Week Definition

Each calendar year is divided into exactly 52 weeks (Liu et al. 2022, Section II.A):

Week 1 = January 1–7
Weeks 2–51 = 7 days each
Week 52 = remaining 8 days (9 days in leap years)

Eligibility Screen

Each week, a coin must satisfy:

Non-missing price, market capitalisation, and return
Lagged market capitalisation $\geq$ $1,000,000
No minimum age filter (not specified in the paper)
No stablecoin exclusion (not specified in the paper for the original 2014–2020 sample)

CMKT — Crypto Market Factor

Value-weighted return of all eligible coins, minus the risk-free rate:

\[\text{CMKT}_t = \sum_{i} w_{i,t}\, r_{i,t} \;-\; R_{f,t}\]

where $w_{i,t}$ = lagged market cap weight and $R_{f,t}$ = 1-month T-bill rate (negligible at weekly frequency; set to 0).

CSMB — Crypto Size Factor

Each week, eligible coins are sorted by lagged market capitalisation into three groups using 30/40/30 breakpoints:

Small = bottom 30% by market cap
Neutral = middle 40% (excluded from the factor)
Big = top 30% by market cap

Returns are value-weighted within each group.

\[\text{CSMB}_t = R_{\text{Small},t} - R_{\text{Big},t}\]

CMOM — Crypto Momentum Factor

CMOM uses a Fama-French 2×3 independent double sort on size and momentum.

Momentum signal ($r_{3,0}$): cumulative return over the prior 3 weeks, ending at the formation date: \[\text{mom}_{i,t} = \frac{P_{i,t-1}}{P_{i,t-4}} - 1\]

where $P_{i,t}$ = end-of-week price. In our weekly panel, this is lag(price_eow, 1) / lag(price_eow, 4) - 1.

Double sort: coins are independently classified into:

Size groups (30/40/30 by lagged market cap): Small / Neutral / Big
Momentum groups (30/40/30 by $r_{3,0}$): Loser / Neutral / Winner

Neutral groups in both dimensions are excluded. Four corner portfolios remain:

	Loser (L)	Winner (H)
Small (S)	SL	SH
Big (B)	BL	BH

Returns are value-weighted within each cell.

\[\text{CMOM}_t = \frac{1}{2}(R_{SH,t} + R_{BH,t}) - \frac{1}{2}(R_{SL,t} + R_{BL,t})\]

Summary of Liu et al. Parameters

Parameter	Value
Weighting	Value-weighted (lagged market cap)
Size breakpoints	30% / 40% / 30%
Momentum breakpoints	30% / 40% / 30%
Momentum signal	$r_{3,0}$ (3-week return, no skip)
CMOM structure	2×3 Fama-French independent double sort
Market cap filter	$\geq$ $1,000,000 (lagged)
Risk-free rate	1-month T-bill (set to 0 at weekly frequency)
Calendar	Sharp year (52 weeks, Jan 1–7 = week 1)
Stablecoin exclusion	None
Winsorisation	None documented

Data Quality

Our data comes from CoinMarketCap, which — unlike CoinGecko (used by Liu et al. in their updated factor file) — does not apply exchange-level outlier detection. We therefore apply two filters:

Return Cap

Individual coin weekly returns are capped at 9,900% (100×). This handles CMC price/supply reporting glitches (e.g., SquidGrow: 2,597,962% weekly return). Affects ~870 coin-weeks; no legitimate crypto asset sustains 100× in a single week.

Implied Supply Filter

Coin-weeks where market_cap / price exceeds $10^{16}$ tokens are excluded. This catches erroneous circulating supply data (e.g., INNBCL: price $0.00000001 with $126B market cap, implying $1.3 \times 10^{19}$ tokens). Affects ~210 coin-weeks. Legitimate high-supply coins like Shiba Inu ($\sim 5 \times 10^{14}$ tokens) are not affected.

Early Sample Period

Before July 2014, the CoinMarketCap universe contains fewer than 30 eligible coins. Factor estimates are noisy and dominated by Bitcoin, Litecoin, and a handful of other early coins. We recommend starting analyses from July 2014. Plots on this website use this cutoff.

The specification multiverse

Rather than report one “preferred” set of numbers, we construct the factors across a multiverse of defensible specification choices and report the distribution of results. The full results table (one row per specification, with factor means and Fama-MacBeth pricing statistics) is on the Data page; the findings are summarised in the Results.

Specification axes

Axis	Options	Note
Data source	CoinMarketCap, CoinGecko	treated as a specification choice, not a fixed input
Breakpoint universe	All coins, Top-100, $100M floor	the headline axis (see below)
Evaluation universe	Full cross-section, Investable	which assets the model is asked to price
Gap-handling	Consecutive-week, Naive	see below
Investable-momentum	On, Off	see below
Weighting	Value-weighted, Equal-weighted
Size breakpoints	2 (median), 3, 5 (quintile), 10 (decile)
Momentum lookback	1, 2, 3, 4 weeks
Calendar	Sharp year, Mon … Sun (7 weekday starts)	calendar/frequency are alignment boundaries — not cross-priced
Exclusions	None, Stablecoins, +Wrapped/derivatives
Delisting returns	Off, On

The headline weekly grid is ~6,900 internally-consistent “worlds” (each builds both the factors and the test assets with the same choices).

Breakpoint universe (the NYSE-breakpoint analog)

Crypto has no listing threshold — anyone can mint a token — so if size/momentum breakpoints are computed over all coins, the bottom deciles are pure microcap dust and the sort is dominated by economically irrelevant assets. Following the Fama-French convention of computing breakpoints on NYSE stocks only, we compute the cut points on an investable reference set (top-N by market cap, or a market-cap floor) and then assign all coins. This prices the full cross-section (no cherry-picking) with economically meaningful, snapshot-stable breakpoints. It is the single most consequential choice — see Results.

Gap-handling (consecutive-week returns)

A coin with a data gap, naively, produces a multi-week return mislabelled as one week (price / previous-available-price). These spurious extremes land in the size/momentum tails. The consecutive-week rule uses a return only when the prior calendar week actually exists.

Investable-momentum

Momentum is measured only over weeks in which the coin was continuously investable (≥ $1M throughout the lookback). Otherwise a coin that just crossed the size threshold on a one-off pump enters the winner portfolio with about-to-reverse “momentum”.

Data Pipeline

Retrieval: Daily cryptocurrency snapshots from CoinMarketCap via crypto2::crypto_listings(). Survivorship-bias free.
Processing: Returns computed from end-of-week prices. Return cap (9,900%) and implied supply filter applied. Aggregated to weekly (sharp-year or Monday-Monday) and monthly panels.
Liu exact factors: compute_factors_liu() with hardcoded 30/40/30 breakpoints → factors_liu.parquet
Variant factors: compute_factors() with parametric breakpoints across all 576 combinations
Portfolios: compute_portfolios() — decile sorts on size and momentum
Export: All files uploaded to Hugging Face as CSV/Parquet.

Citation

If you use these data in your research, please cite Stoeckl (2026) and Liu et al. (2022).

References

Borri, N., Liu, Y., Tsyvinski, A., & Wu, X. (2026). Cryptocurrency as an investable asset class: Coming of age. https://arxiv.org/abs/2510.14435

Liu, Y., Tsyvinski, A., & Wu, X. (2022). Common risk factors in cryptocurrency. The Journal of Finance, 77(2), 1133–1177. https://doi.org/10.1111/jofi.13119

Stoeckl, S. (2026). Open crypto asset pricing. University of Liechtenstein. https://huggingface.co/datasets/sstoeckl/opencryptoassetpricing