factrix.metrics.quantile ¶

Quantile analysis for cross-sectional panels.

All spread series are time-indexed (date, value) and can be fed into any series/ tool.

Notes

Pipeline. Per-date long-short spread on quantile groups (cross-section step), then non-overlapping t on the spread series.

Input. DataFrame with date, asset_id, factor, forward_return.

Output. Spread series, long/short alpha decomposition.

factrix.metrics.quantile.compute_spread_series ¶

compute_spread_series(data: DataFrame, forward_periods: int = 5, n_groups: int = 5, factor_cols: Sequence[str] = ('factor',), return_col: str = 'forward_return', tie_policy: str = 'ordinal') -> dict[str, DataFrame]

Per-date long-short spread series (non-overlapping).

Top bucket = highest factor rank; bottom bucket = lowest. Labels use top_return / bottom_return rather than q1_return / q5_return because the bucket width depends on n_groups — at n_groups=10 the bottom is Q10, not Q5.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Panel with `date, asset_id, factor, forward_return`.	required
`forward_periods`	`int`	Number of periods forward.	`5`
`n_groups`	`int`	Number of quantile groups.	`5`
`factor_cols`	`Sequence[str]`	Factor column names to score. All factors run in a single polars query (one `with_columns` + one `group_by("date").agg(...)` + one `collect`) regardless of `n_assets`. The `n_assets == 1` case is just the general path specialised — no fast/slow path divergence.	`('factor',)`
`return_col`	`str`	Forward-return column shared across factors.	`'forward_return'`
`tie_policy`	`str`	See `_assign_quantile_groups`. `"ordinal"` (default) keeps balanced bucket sizes; `"average"` keeps tied assets in the same bucket — prefer for low-cardinality factors.	`'ordinal'`

Returns:

Type	Description
`dict[str, DataFrame]`	DataFrame with `date, spread, top_return, bottom_return, universe_return`.

Notes

Per non-overlapping date t::

top_return[t]    = mean_{i in Q_top} return[i, t]
bottom_return[t] = mean_{i in Q_bot} return[i, t]
spread[t]        = top_return[t] - bottom_return[t]

factrix uses non-overlap sub-sampling (stride forward_periods) before bucketing, not overlapping panel re-balancing — keeps the spread series free of MA(h-1) autocorrelation so downstream non-overlap t-tests are valid without heteroskedasticity-and-autocorrelation-consistent (HAC).

Examples:

>>> import factrix as fx
>>> from factrix.preprocess import compute_forward_return
>>> from factrix.metrics.quantile import compute_spread_series
>>> panel = compute_forward_return(
...     fx.datasets.make_cs_panel(n_assets=80, n_dates=180, seed=0),
...     forward_periods=5,
... )
>>> spreads = compute_spread_series(panel, forward_periods=5, n_groups=5)
>>> spread_df = spreads["factor"]
>>> set(spread_df.columns) >= {"date", "spread", "top_return", "bottom_return"}
True

factrix.metrics.quantile.compute_group_returns ¶

compute_group_returns(data: DataFrame, forward_periods: int = 5, n_groups: int = 5, factor_col: str = 'factor', return_col: str = 'forward_return', tie_policy: str = 'ordinal') -> DataFrame

Mean forward return per quantile bucket (for monotonicity charts).

Formula

Sample dates every forward_periods rows (non-overlapping).
Per sampled date, assign each asset to a quantile group 0..n_groups-1 by factor (see _assign_quantile_groups for tie_policy semantics).
For each group g: mean_return[g] = mean across (date, asset) where _group=g of return_col (Equal-weighted across all obs in the bucket, not per-date then averaged — use compute_spread_series if you want the latter.)

Returns:

Type	Description
`DataFrame`	DataFrame with `group, mean_return` sorted ascending by group.
`DataFrame`	Group 0 = lowest factor rank, n_groups-1 = highest.

Notes

mean_return[g] = mean over (date, asset) where _group=g of return_col — equal-weighted across all observations in the bucket pooled across dates. Use compute_spread_series if you want per-date bucket means averaged afterwards (the IC/IR-style aggregation order); the two differ when bucket cardinality moves across dates.

Examples:

>>> import factrix as fx
>>> from factrix.preprocess import compute_forward_return
>>> from factrix.metrics.quantile import compute_group_returns
>>> panel = compute_forward_return(
...     fx.datasets.make_cs_panel(n_assets=80, n_dates=180, seed=0),
...     forward_periods=5,
... )
>>> groups = compute_group_returns(panel, forward_periods=5, n_groups=5)
>>> set(groups.columns) >= {"group", "mean_return"}
True

factrix.metrics.quantile.quantile_spread ¶

quantile_spread(data: DataFrame, forward_periods: int = 5, n_groups: int = 5, factor_cols: Sequence[str] = ('factor',), tie_policy: str = 'ordinal', inference: NonOverlapping | NeweyWest = NON_OVERLAPPING, *, _precomputed_series: dict[str, DataFrame] | None = None) -> dict[str, MetricResult]

long-short spread (per-period mean).

Parameters:

Name	Type	Description	Default
`inference`	`NonOverlapping \| NeweyWest`	Headline significance method on the per-date spread. `fx.inference.NON_OVERLAPPING` (default) runs the OLS t-test on the non-overlap stride subsample; `fx.inference.NEWEY_WEST` keeps every date and absorbs the MA(h-1) overlap in a HAC SE. On a small cross-section (`n_assets < 30`) the heavy-tail block bootstrap takes precedence over either (see Notes).	`NON_OVERLAPPING`
`_precomputed_series`	`dict[str, DataFrame] \| None`	If provided, skip recomputing `compute_spread_series`.	`None`
`tie_policy`	`str`	Bucketing tie-break policy, see `_assign_quantile_groups`. When `_precomputed_series` is passed, this only affects the `tie_ratio` diagnostic — the series itself was already built.	`'ordinal'`

Returns:

Type	Description
`dict[str, MetricResult]`	MetricResult with per-period mean spread, t-stat from the chosen
`dict[str, MetricResult]`	`inference`.

Notes

t = mean(spread) / (std(spread) / sqrt(n)) on the non-overlap spread series. H0: E[spread] = 0. The Newey-West (NW) heteroskedasticity-and-autocorrelation-consistent (HAC) route is the sibling that keeps the full overlapping series instead of striding — select it via inference=fx.inference.NEWEY_WEST. Because HAC corrects autocorrelation rather than heavy tails, the small-cross-section block bootstrap still wins when it fires and the requested HAC is flagged inference_overridden in metadata.

Long/short alpha decomposition stays a descriptive OLS t-test on top_return - universe_return and universe_return - bottom_return regardless of inference — it attributes the spread to long-side vs short-side excess, it is not the headline H0.

References

Hansen-Hodrick 1980: overlapping-return autocorrelation, motivating the non-overlap stride.

Examples:

>>> import factrix as fx
>>> from factrix.preprocess import compute_forward_return
>>> from factrix.metrics.quantile import quantile_spread
>>> panel = compute_forward_return(
...     fx.datasets.make_cs_panel(n_assets=80, n_dates=180, seed=0),
...     forward_periods=5,
... )
>>> result = quantile_spread(panel, forward_periods=5, n_groups=5)
>>> result["factor"].name == ""
True

factrix.metrics.quantile.quantile_spread_vw ¶

quantile_spread_vw(data: DataFrame, forward_periods: int = 5, n_groups: int = 5, factor_col: str = 'factor', return_col: str = 'forward_return', weight_col: str = 'market_cap', tie_policy: str = 'ordinal', lag_weights: bool = True) -> MetricResult

Value-weighted long-short spread — alpha concentration diagnostic.

Formula (per non-overlapping date \(t\)):

\[ \begin{aligned} \text{vw}_b[t] &= \frac{\sum_{i \in b} w_{i,t-1} \cdot \text{return}_{i, t \to t+h}}{\sum_{i \in b} w_{i,t-1}}, \quad b \in \{\text{bottom}, \text{top}\} \\ \text{spread}[t] &= \text{vw}_{\text{top}}[t] - \text{vw}_{\text{bottom}}[t] \\ \text{value} &= \mathrm{mean}_t\, \text{spread}[t], \quad t = \sqrt{n} \cdot \text{value} / \mathrm{std}(\text{spread}), \quad \text{DDOF}=1 \end{aligned} \]

Weights are lagged by one sampled period per asset by default (lag_weights=True): a portfolio rebalanced at date t uses the market-cap observed at the previous rebalance, not at t. Pairing contemporaneous market_cap[t] with forward_return[t→t+h] is a classic look-ahead trap — market cap measured on date t embeds news that the t→t+h return has not yet realized.

Pass lag_weights=False only when the caller has already supplied a lagged weight column (e.g., prior-month-end cap) and wants the function to treat it as observed at t.

Compare with equal-weighted quantile_spread: if VW spread much smaller (e.g., < 1/3 of EW), the alpha is driven by small-cap assets and may not survive capacity / liquidity constraints.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Panel with `date, asset_id, factor, forward_return, market_cap` (or whatever `weight_col` names).	required
`weight_col`	`str`	Column for value weighting (default `market_cap`).	`'market_cap'`
`lag_weights`	`bool`	When True (default), shift `weight_col` by 1 period per asset (on the non-overlap-sampled frame) before weighting. When False, use weights as supplied.	`True`

Returns:

Type	Description
`MetricResult`	MetricResult with per-period mean VW spread, t-stat, and p-value.
`MetricResult`	Short-circuits if `weight_col` is missing or post-sampling n <
`MetricResult`	`MIN_PORTFOLIO_PERIODS_HARD`.

Notes

Per non-overlapping date t, per bucket b in {bot, top}::

vw_b[t] = sum_{i in b} w[i, t-1] * return[i, t -> t+h]
          / sum_{i in b} w[i, t-1]
spread[t] = vw_top[t] - vw_bot[t]
value = mean_t spread[t];  t = sqrt(n) * value / std(spread)

factrix lags weights by one sampled period per asset by default (not one raw bar) so the lag aligns with the rebalance stride; contemporaneous weight × forward_return would embed look-ahead bias from market-cap moves that the forward return has not yet realized.

References

Hou-Xue-Zhang (2020): ~65% of anomalies fail \(|t| \geq 1.96\) once microcaps are mitigated via NYSE breakpoints and value weighting jointly.

Examples:

>>> import polars as pl
>>> import factrix as fx
>>> from factrix.preprocess import compute_forward_return
>>> from factrix.metrics.quantile import quantile_spread_vw
>>> panel = compute_forward_return(
...     fx.datasets.make_cs_panel(n_assets=80, n_dates=180, seed=0),
...     forward_periods=5,
... ).with_columns(pl.lit(1e6).alias("market_cap"))
>>> result = quantile_spread_vw(panel, forward_periods=5, n_groups=5)
>>> result.name == ""
True

Use cases¶

Compute per-date long-short spread series

Build the per-date spread = top_return - bottom_return series (with top_return, bottom_return, universe_return) on a non-overlap-sampled panel. Pre-step for quantile_spread; also feeds spanning_alpha and any series/ tool.
Mean-spread significance, equal-weighted

Test \(H_0: \mathbb{E}[\text{spread}] = 0\) on the non-overlap spread series, with the long-vs-short alpha decomposition (top - universe, universe - bottom) attached so callers can attribute the spread to long-side vs short-side excess.
Value-weighted spread for capacity diagnostics

quantile_spread_vw weights each bucket by lagged market_cap (or any caller-supplied weight_col=). When the VW spread is much smaller than the EW spread, the alpha is concentrated in small names and may not survive capacity / liquidity constraints — Hou-Xue-Zhang (2020) found ~65% of factors disappear under VW.
Per-bucket mean returns for monotonicity charts

compute_group_returns returns the pooled mean forward return per quantile bucket — the chart input that shows whether returns rise monotonically across deciles, before any formal monotonicity test.

Choosing a function¶

Goal	Function
Per-date long-short spread table for downstream inspection / slicing	`compute_spread_series`
Per-bucket pooled mean return (decile-curve chart input)	`compute_group_returns`
Mean-spread significance, equal-weighted, non-overlap \(t\) (default)	`quantile_spread`
Mean-spread significance, value-weighted (capacity / size-concentration check)	`quantile_spread_vw`

Worked example — per-date spread then EW vs VW significance¶

compute_spread_series → quantile_spread → quantile_spread_vw on a synthetic cross-sectional panel

import factrix as fx
from factrix.metrics.quantile import (
    compute_spread_series, quantile_spread, quantile_spread_vw,
)
from factrix.preprocess import compute_forward_return

raw   = fx.datasets.make_cs_panel(
    n_assets=200, n_dates=500, ic_target=0.08,
    with_market_cap=True, seed=2024,
)
panel = compute_forward_return(raw, forward_periods=5)

# compute_spread_series returns dict[str, DataFrame] keyed by factor column
spread_series = compute_spread_series(panel, forward_periods=5, n_groups=5)
spread_df = spread_series["factor"]
print(spread_df.head())
# ┌────────────┬──────────┬───────────────┬─────────────────┬──────────────────┐
# │ date       ┆ spread   ┆ top_return    ┆ bottom_return   ┆ universe_return  │
# ├────────────┼──────────┼───────────────┼─────────────────┼──────────────────┤
# │ 2024-01-01 ┆  0.0042  ┆  0.0061       ┆  0.0019         ┆  0.0040          │
# │  ...       ┆  ...     ┆  ...          ┆  ...            ┆  ...             │
# └────────────┴──────────┴───────────────┴─────────────────┴──────────────────┘

ew = quantile_spread(panel, forward_periods=5, n_groups=5,
                     _precomputed_series=spread_series)
print(ew.value, ew.stat, ew.metadata["long_alpha"], ew.metadata["short_alpha"])
# 0.0041  4.92  0.0019  0.0022   (approximate)

vw = quantile_spread_vw(panel, forward_periods=5, n_groups=5,
                        weight_col="market_cap")
print(vw.value, vw.stat)
# 0.0017  2.10   (approximate — VW < EW signals small-cap concentration)

factrix.metrics.quantile ¶

factrix.metrics.quantile.compute_spread_series ¶

factrix.metrics.quantile.compute_group_returns ¶

factrix.metrics.quantile.quantile_spread ¶

factrix.metrics.quantile.quantile_spread_vw ¶

Use cases¶

Choosing a function¶

Worked example — per-date spread then EW vs VW significance¶

See also¶