Skip to content

factrix.metrics.quantile

Quantile analysis for cross-sectional panels.

All spread series are time-indexed (date, value) and can be fed into any series/ tool.

Notes

Pipeline. Per-date long-short spread on quantile groups (cross-section step), then non-overlapping t on the spread series.

Input. DataFrame with date, asset_id, factor, forward_return.

Output. Spread series, long/short alpha decomposition.

factrix.metrics.quantile.compute_spread_series

compute_spread_series(df: DataFrame, forward_periods: int = 5, n_groups: int = 5, factor_col: str = 'factor', return_col: str = 'forward_return', tie_policy: str = 'ordinal') -> DataFrame

Per-date long-short spread series (non-overlapping).

Top bucket = highest factor rank; bottom bucket = lowest. Labels use top_return / bottom_return rather than q1_return / q5_return because the bucket width depends on n_groups — at n_groups=10 the bottom is Q10, not Q5.

Parameters:

Name Type Description Default
df DataFrame

Panel with date, asset_id, factor, forward_return.

required
n_groups int

Number of quantile groups.

5
tie_policy str

See _assign_quantile_groups. "ordinal" (default) keeps balanced bucket sizes; "average" keeps tied assets in the same bucket — prefer for low-cardinality factors.

'ordinal'

Returns:

Type Description
DataFrame

DataFrame with date, spread, top_return, bottom_return, universe_return.

Notes

Per non-overlapping date t::

top_return[t]    = mean_{i in Q_top} return[i, t]
bottom_return[t] = mean_{i in Q_bot} return[i, t]
spread[t]        = top_return[t] - bottom_return[t]

factrix uses non-overlap sub-sampling (stride forward_periods) before bucketing, not overlapping panel re-balancing — keeps the spread series free of MA(h-1) autocorrelation so downstream non-overlap t-tests are valid without heteroskedasticity-and-autocorrelation-consistent (HAC).

Examples:

>>> import factrix as fx
>>> from factrix.preprocess import compute_forward_return
>>> from factrix.metrics.quantile import compute_spread_series
>>> panel = compute_forward_return(
...     fx.datasets.make_cs_panel(n_assets=80, n_dates=180, seed=0),
...     forward_periods=5,
... )
>>> spreads = compute_spread_series(panel, forward_periods=5, n_groups=5)
>>> set(spreads.columns) >= {"date", "spread", "top_return", "bottom_return"}
True

factrix.metrics.quantile.compute_group_returns

compute_group_returns(df: DataFrame, forward_periods: int = 5, n_groups: int = 5, factor_col: str = 'factor', return_col: str = 'forward_return', tie_policy: str = 'ordinal') -> DataFrame

Mean forward return per quantile bucket (for monotonicity charts).

Formula
  1. Sample dates every forward_periods rows (non-overlapping).
  2. Per sampled date, assign each asset to a quantile group 0..n_groups-1 by factor (see _assign_quantile_groups for tie_policy semantics).
  3. For each group g: mean_return[g] = mean across (date, asset) where _group=g of return_col (Equal-weighted across all obs in the bucket, not per-date then averaged — use compute_spread_series if you want the latter.)

Returns:

Type Description
DataFrame

DataFrame with group, mean_return sorted ascending by group.

DataFrame

Group 0 = lowest factor rank, n_groups-1 = highest.

Notes

mean_return[g] = mean over (date, asset) where _group=g of return_col — equal-weighted across all observations in the bucket pooled across dates. Use compute_spread_series if you want per-date bucket means averaged afterwards (the information coefficient (IC)/IR-style aggregation order); the two differ when bucket cardinality moves across dates.

Examples:

>>> import factrix as fx
>>> from factrix.preprocess import compute_forward_return
>>> from factrix.metrics.quantile import compute_group_returns
>>> panel = compute_forward_return(
...     fx.datasets.make_cs_panel(n_assets=80, n_dates=180, seed=0),
...     forward_periods=5,
... )
>>> groups = compute_group_returns(panel, forward_periods=5, n_groups=5)
>>> set(groups.columns) >= {"group", "mean_return"}
True

factrix.metrics.quantile.quantile_spread

quantile_spread(df: DataFrame, forward_periods: int = 5, n_groups: int = 5, _precomputed_series: DataFrame | None = None, tie_policy: str = 'ordinal') -> MetricOutput

long-short spread (per-period mean).

Parameters:

Name Type Description Default
_precomputed_series DataFrame | None

If provided, skip recomputing compute_spread_series.

None
tie_policy str

Bucketing tie-break policy, see _assign_quantile_groups. When _precomputed_series is passed, this only affects the tie_ratio diagnostic — the series itself was already built.

'ordinal'

Returns:

Type Description
MetricOutput

MetricOutput with per-period mean spread, t-stat from non-overlapping periods.

Notes

t = mean(spread) / (std(spread) / sqrt(n)) on the non-overlap spread series. H0: E[spread] = 0. Long/short alpha decomposition runs the same t-test on top_return - universe_return and universe_return - bottom_return so callers can attribute the spread to long-side vs short-side excess.

factrix performs the t-test on the non-overlap series rather than applying Newey-West (NW) heteroskedasticity-and-autocorrelation-consistent (HAC) on an overlapping series; the two approaches are sibling routes — overlap variants live alongside ic_newey_west.

References

Hansen-Hodrick 1980: overlapping-return autocorrelation, motivating the non-overlap stride.

Examples:

>>> import factrix as fx
>>> from factrix.preprocess import compute_forward_return
>>> from factrix.metrics.quantile import quantile_spread
>>> panel = compute_forward_return(
...     fx.datasets.make_cs_panel(n_assets=80, n_dates=180, seed=0),
...     forward_periods=5,
... )
>>> result = quantile_spread(panel, forward_periods=5, n_groups=5)
>>> result.name
'quantile_spread'

factrix.metrics.quantile.quantile_spread_vw

quantile_spread_vw(df: DataFrame, forward_periods: int = 5, n_groups: int = 5, factor_col: str = 'factor', return_col: str = 'forward_return', weight_col: str = 'market_cap', tie_policy: str = 'ordinal', lag_weights: bool = True) -> MetricOutput

Value-weighted long-short spread — alpha concentration diagnostic.

Formula (per non-overlapping date \(t\)):

\[ \begin{aligned} \text{vw}_b[t] &= \frac{\sum_{i \in b} w_{i,t-1} \cdot \text{return}_{i, t \to t+h}}{\sum_{i \in b} w_{i,t-1}}, \quad b \in \{\text{bottom}, \text{top}\} \\ \text{spread}[t] &= \text{vw}_{\text{top}}[t] - \text{vw}_{\text{bottom}}[t] \\ \text{value} &= \mathrm{mean}_t\, \text{spread}[t], \quad t = \sqrt{n} \cdot \text{value} / \mathrm{std}(\text{spread}), \quad \text{DDOF}=1 \end{aligned} \]

Weights are lagged by one sampled period per asset by default (lag_weights=True): a portfolio rebalanced at date t uses the market-cap observed at the previous rebalance, not at t. Pairing contemporaneous market_cap[t] with forward_return[t→t+h] is a classic look-ahead trap — market cap measured on date t embeds news that the t→t+h return has not yet realized.

Pass lag_weights=False only when the caller has already supplied a lagged weight column (e.g., prior-month-end cap) and wants the function to treat it as observed at t.

Compare with equal-weighted quantile_spread: if VW spread much smaller (e.g., < 1/3 of EW), the alpha is driven by small-cap assets and may not survive capacity / liquidity constraints.

Parameters:

Name Type Description Default
df DataFrame

Panel with date, asset_id, factor, forward_return, market_cap (or whatever weight_col names).

required
weight_col str

Column for value weighting (default market_cap).

'market_cap'
lag_weights bool

When True (default), shift weight_col by 1 period per asset (on the non-overlap-sampled frame) before weighting. When False, use weights as supplied.

True

Returns:

Type Description
MetricOutput

MetricOutput with per-period mean VW spread, t-stat, and p-value.

MetricOutput

Short-circuits if weight_col is missing or post-sampling n <

MetricOutput

MIN_PORTFOLIO_PERIODS_HARD.

Notes

Per non-overlapping date t, per bucket b in {bot, top}::

vw_b[t] = sum_{i in b} w[i, t-1] * return[i, t -> t+h]
          / sum_{i in b} w[i, t-1]
spread[t] = vw_top[t] - vw_bot[t]
value = mean_t spread[t];  t = sqrt(n) * value / std(spread)

factrix lags weights by one sampled period per asset by default (not one raw bar) so the lag aligns with the rebalance stride; contemporaneous weight × forward_return would embed look-ahead bias from market-cap moves that the forward return has not yet realized.

References

Hou-Xue-Zhang (2020): ~65% of anomalies fail \(|t| \geq 1.96\) once microcaps are mitigated via NYSE breakpoints and value weighting jointly.

Examples:

>>> import polars as pl
>>> import factrix as fx
>>> from factrix.preprocess import compute_forward_return
>>> from factrix.metrics.quantile import quantile_spread_vw
>>> panel = compute_forward_return(
...     fx.datasets.make_cs_panel(n_assets=80, n_dates=180, seed=0),
...     forward_periods=5,
... ).with_columns(pl.lit(1e6).alias("market_cap"))
>>> result = quantile_spread_vw(panel, forward_periods=5, n_groups=5)
>>> result.name
'quantile_spread_vw'

Use cases

  • Compute per-date long-short spread series


    Build the per-date spread = top_return - bottom_return series (with top_return, bottom_return, universe_return) on a non-overlap-sampled panel. Pre-step for quantile_spread; also feeds spanning_alpha and any series/ tool.

  • Mean-spread significance, equal-weighted


    Test \(H_0: \mathbb{E}[\text{spread}] = 0\) on the non-overlap spread series, with the long-vs-short alpha decomposition (top - universe, universe - bottom) attached so callers can attribute the spread to long-side vs short-side excess.

  • Value-weighted spread for capacity diagnostics


    quantile_spread_vw weights each bucket by lagged market_cap (or any caller-supplied weight_col=). When the VW spread is much smaller than the EW spread, the alpha is concentrated in small names and may not survive capacity / liquidity constraints — Hou-Xue-Zhang (2020) found ~65% of factors disappear under VW.

  • Per-bucket mean returns for monotonicity charts


    compute_group_returns returns the pooled mean forward return per quantile bucket — the chart input that shows whether returns rise monotonically across deciles, before any formal monotonicity test.

Choosing a function

Goal Function
Per-date long-short spread table for downstream inspection / slicing compute_spread_series
Per-bucket pooled mean return (decile-curve chart input) compute_group_returns
Mean-spread significance, equal-weighted, non-overlap \(t\) (default) quantile_spread
Mean-spread significance, value-weighted (capacity / size-concentration check) quantile_spread_vw

Worked example — per-date spread then EW vs VW significance

compute_spread_series → quantile_spread → quantile_spread_vw on a synthetic cross-sectional panel

import factrix as fx
from factrix.metrics.quantile import (
    compute_spread_series, quantile_spread, quantile_spread_vw,
)
from factrix.preprocess import compute_forward_return

raw   = fx.datasets.make_cs_panel(
    n_assets=200, n_dates=500, ic_target=0.08,
    with_market_cap=True, seed=2024,
)
panel = compute_forward_return(raw, forward_periods=5)

spread_df = compute_spread_series(panel, forward_periods=5, n_groups=5)
print(spread_df.head())
# ┌────────────┬──────────┬───────────────┬─────────────────┬──────────────────┐
# │ date       ┆ spread   ┆ top_return    ┆ bottom_return   ┆ universe_return  │
# ├────────────┼──────────┼───────────────┼─────────────────┼──────────────────┤
# │ 2024-01-01 ┆  0.0042  ┆  0.0061       ┆  0.0019         ┆  0.0040          │
# │  ...       ┆  ...     ┆  ...          ┆  ...            ┆  ...             │
# └────────────┴──────────┴───────────────┴─────────────────┴──────────────────┘

ew = quantile_spread(panel, forward_periods=5, n_groups=5,
                     _precomputed_series=spread_df)
print(ew.value, ew.stat, ew.metadata["long_alpha"], ew.metadata["short_alpha"])
# 0.0041  4.92  0.0019  0.0022   (approximate)

vw = quantile_spread_vw(panel, forward_periods=5, n_groups=5,
                        weight_col="market_cap")
print(vw.value, vw.stat)
# 0.0017  2.10   (approximate — VW < EW signals small-cap concentration)

See also