factrix.metrics.quantile ¶
Quantile analysis for cross-sectional panels.
All spread series are time-indexed (date, value) and can be fed
into any series/ tool.
Notes
Pipeline. Per-date long-short spread on quantile groups (cross-section step), then non-overlapping t on the spread series.
Input. DataFrame with date, asset_id, factor, forward_return.
Output. Spread series, long/short alpha decomposition.
factrix.metrics.quantile.compute_spread_series ¶
compute_spread_series(df: DataFrame, forward_periods: int = 5, n_groups: int = 5, factor_col: str = 'factor', return_col: str = 'forward_return', tie_policy: str = 'ordinal') -> DataFrame
Per-date long-short spread series (non-overlapping).
Top bucket = highest factor rank; bottom bucket = lowest. Labels use
top_return / bottom_return rather than q1_return /
q5_return because the bucket width depends on n_groups — at
n_groups=10 the bottom is Q10, not Q5.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Panel with |
required |
n_groups
|
int
|
Number of quantile groups. |
5
|
tie_policy
|
str
|
See |
'ordinal'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with |
Notes
Per non-overlapping date t::
top_return[t] = mean_{i in Q_top} return[i, t]
bottom_return[t] = mean_{i in Q_bot} return[i, t]
spread[t] = top_return[t] - bottom_return[t]
factrix uses non-overlap sub-sampling (stride forward_periods)
before bucketing, not overlapping panel re-balancing — keeps the
spread series free of MA(h-1) autocorrelation so downstream
non-overlap t-tests are valid without heteroskedasticity-and-autocorrelation-consistent (HAC).
Examples:
>>> import factrix as fx
>>> from factrix.preprocess import compute_forward_return
>>> from factrix.metrics.quantile import compute_spread_series
>>> panel = compute_forward_return(
... fx.datasets.make_cs_panel(n_assets=80, n_dates=180, seed=0),
... forward_periods=5,
... )
>>> spreads = compute_spread_series(panel, forward_periods=5, n_groups=5)
>>> set(spreads.columns) >= {"date", "spread", "top_return", "bottom_return"}
True
factrix.metrics.quantile.compute_group_returns ¶
compute_group_returns(df: DataFrame, forward_periods: int = 5, n_groups: int = 5, factor_col: str = 'factor', return_col: str = 'forward_return', tie_policy: str = 'ordinal') -> DataFrame
Mean forward return per quantile bucket (for monotonicity charts).
Formula
- Sample dates every
forward_periodsrows (non-overlapping). - Per sampled date, assign each asset to a quantile group
0..n_groups-1 by
factor(see_assign_quantile_groupsfor tie_policy semantics). - For each group g:
mean_return[g] = mean across (date, asset) where _group=g
of
return_col(Equal-weighted across all obs in the bucket, not per-date then averaged — usecompute_spread_seriesif you want the latter.)
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with |
DataFrame
|
Group 0 = lowest factor rank, n_groups-1 = highest. |
Notes
mean_return[g] = mean over (date, asset) where _group=g of
return_col — equal-weighted across all observations in the
bucket pooled across dates. Use compute_spread_series if you
want per-date bucket means averaged afterwards (the information coefficient (IC)/IR-style
aggregation order); the two differ when bucket cardinality moves
across dates.
Examples:
>>> import factrix as fx
>>> from factrix.preprocess import compute_forward_return
>>> from factrix.metrics.quantile import compute_group_returns
>>> panel = compute_forward_return(
... fx.datasets.make_cs_panel(n_assets=80, n_dates=180, seed=0),
... forward_periods=5,
... )
>>> groups = compute_group_returns(panel, forward_periods=5, n_groups=5)
>>> set(groups.columns) >= {"group", "mean_return"}
True
factrix.metrics.quantile.quantile_spread ¶
quantile_spread(df: DataFrame, forward_periods: int = 5, n_groups: int = 5, _precomputed_series: DataFrame | None = None, tie_policy: str = 'ordinal') -> MetricOutput
long-short spread (per-period mean).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
_precomputed_series
|
DataFrame | None
|
If provided, skip recomputing |
None
|
tie_policy
|
str
|
Bucketing tie-break policy, see |
'ordinal'
|
Returns:
| Type | Description |
|---|---|
MetricOutput
|
MetricOutput with per-period mean spread, t-stat from non-overlapping periods. |
Notes
t = mean(spread) / (std(spread) / sqrt(n)) on the non-overlap
spread series. H0: E[spread] = 0. Long/short alpha decomposition
runs the same t-test on top_return - universe_return and
universe_return - bottom_return so callers can attribute the
spread to long-side vs short-side excess.
factrix performs the t-test on the non-overlap series rather than
applying Newey-West (NW) heteroskedasticity-and-autocorrelation-consistent (HAC) on an overlapping series; the two approaches are
sibling routes — overlap variants live alongside ic_newey_west.
References
Hansen-Hodrick 1980: overlapping-return autocorrelation, motivating the non-overlap stride.
Examples:
>>> import factrix as fx
>>> from factrix.preprocess import compute_forward_return
>>> from factrix.metrics.quantile import quantile_spread
>>> panel = compute_forward_return(
... fx.datasets.make_cs_panel(n_assets=80, n_dates=180, seed=0),
... forward_periods=5,
... )
>>> result = quantile_spread(panel, forward_periods=5, n_groups=5)
>>> result.name
'quantile_spread'
factrix.metrics.quantile.quantile_spread_vw ¶
quantile_spread_vw(df: DataFrame, forward_periods: int = 5, n_groups: int = 5, factor_col: str = 'factor', return_col: str = 'forward_return', weight_col: str = 'market_cap', tie_policy: str = 'ordinal', lag_weights: bool = True) -> MetricOutput
Value-weighted long-short spread — alpha concentration diagnostic.
Formula (per non-overlapping date \(t\)):
Weights are lagged by one sampled period per asset by default
(lag_weights=True): a portfolio rebalanced at date t uses the
market-cap observed at the previous rebalance, not at t. Pairing
contemporaneous market_cap[t] with forward_return[t→t+h] is
a classic look-ahead trap — market cap measured on date t embeds
news that the t→t+h return has not yet realized.
Pass lag_weights=False only when the caller has already
supplied a lagged weight column (e.g., prior-month-end cap) and
wants the function to treat it as observed at t.
Compare with equal-weighted quantile_spread: if VW spread much
smaller (e.g., < 1/3 of EW), the alpha is driven by small-cap assets
and may not survive capacity / liquidity constraints.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Panel with |
required |
weight_col
|
str
|
Column for value weighting (default |
'market_cap'
|
lag_weights
|
bool
|
When True (default), shift |
True
|
Returns:
| Type | Description |
|---|---|
MetricOutput
|
MetricOutput with per-period mean VW spread, t-stat, and p-value. |
MetricOutput
|
Short-circuits if |
MetricOutput
|
|
Notes
Per non-overlapping date t, per bucket b in {bot, top}::
vw_b[t] = sum_{i in b} w[i, t-1] * return[i, t -> t+h]
/ sum_{i in b} w[i, t-1]
spread[t] = vw_top[t] - vw_bot[t]
value = mean_t spread[t]; t = sqrt(n) * value / std(spread)
factrix lags weights by one sampled period per asset by default
(not one raw bar) so the lag aligns with the rebalance stride;
contemporaneous weight × forward_return would embed look-ahead
bias from market-cap moves that the forward return has not yet
realized.
References
Hou-Xue-Zhang (2020): ~65% of anomalies fail \(|t| \geq 1.96\) once microcaps are mitigated via NYSE breakpoints and value weighting jointly.
Examples:
>>> import polars as pl
>>> import factrix as fx
>>> from factrix.preprocess import compute_forward_return
>>> from factrix.metrics.quantile import quantile_spread_vw
>>> panel = compute_forward_return(
... fx.datasets.make_cs_panel(n_assets=80, n_dates=180, seed=0),
... forward_periods=5,
... ).with_columns(pl.lit(1e6).alias("market_cap"))
>>> result = quantile_spread_vw(panel, forward_periods=5, n_groups=5)
>>> result.name
'quantile_spread_vw'
Use cases¶
-
Compute per-date long-short spread series
Build the per-date
spread = top_return - bottom_returnseries (withtop_return,bottom_return,universe_return) on a non-overlap-sampled panel. Pre-step forquantile_spread; also feedsspanning_alphaand anyseries/tool. -
Mean-spread significance, equal-weighted
Test \(H_0: \mathbb{E}[\text{spread}] = 0\) on the non-overlap spread series, with the long-vs-short alpha decomposition (
top - universe,universe - bottom) attached so callers can attribute the spread to long-side vs short-side excess. -
Value-weighted spread for capacity diagnostics
quantile_spread_vwweights each bucket by laggedmarket_cap(or any caller-suppliedweight_col=). When the VW spread is much smaller than the EW spread, the alpha is concentrated in small names and may not survive capacity / liquidity constraints — Hou-Xue-Zhang (2020) found ~65% of factors disappear under VW. -
Per-bucket mean returns for monotonicity charts
compute_group_returnsreturns the pooled mean forward return per quantile bucket — the chart input that shows whether returns rise monotonically across deciles, before any formal monotonicity test.
Choosing a function¶
| Goal | Function |
|---|---|
| Per-date long-short spread table for downstream inspection / slicing | compute_spread_series |
| Per-bucket pooled mean return (decile-curve chart input) | compute_group_returns |
| Mean-spread significance, equal-weighted, non-overlap \(t\) (default) | quantile_spread |
| Mean-spread significance, value-weighted (capacity / size-concentration check) | quantile_spread_vw |
Worked example — per-date spread then EW vs VW significance¶
compute_spread_series → quantile_spread → quantile_spread_vw on a synthetic cross-sectional panel
import factrix as fx
from factrix.metrics.quantile import (
compute_spread_series, quantile_spread, quantile_spread_vw,
)
from factrix.preprocess import compute_forward_return
raw = fx.datasets.make_cs_panel(
n_assets=200, n_dates=500, ic_target=0.08,
with_market_cap=True, seed=2024,
)
panel = compute_forward_return(raw, forward_periods=5)
spread_df = compute_spread_series(panel, forward_periods=5, n_groups=5)
print(spread_df.head())
# ┌────────────┬──────────┬───────────────┬─────────────────┬──────────────────┐
# │ date ┆ spread ┆ top_return ┆ bottom_return ┆ universe_return │
# ├────────────┼──────────┼───────────────┼─────────────────┼──────────────────┤
# │ 2024-01-01 ┆ 0.0042 ┆ 0.0061 ┆ 0.0019 ┆ 0.0040 │
# │ ... ┆ ... ┆ ... ┆ ... ┆ ... │
# └────────────┴──────────┴───────────────┴─────────────────┴──────────────────┘
ew = quantile_spread(panel, forward_periods=5, n_groups=5,
_precomputed_series=spread_df)
print(ew.value, ew.stat, ew.metadata["long_alpha"], ew.metadata["short_alpha"])
# 0.0041 4.92 0.0019 0.0022 (approximate)
vw = quantile_spread_vw(panel, forward_periods=5, n_groups=5,
weight_col="market_cap")
print(vw.value, vw.stat)
# 0.0017 2.10 (approximate — VW < EW signals small-cap concentration)
See also¶
-
monotonicity
Decile-curve direction-of-monotonicity test on the same buckets.
-
spanning_alpha
Does this spread series carry alpha after controlling for base factor spreads? Consumes
compute_spread_seriesoutput directly. -
notional_turnover/breakeven_cost/net_spread
Implementation feasibility on the same Q1/Qn long-short portfolio.
-
by_slice
Axis-agnostic slice dispatcher for per-slice spread summaries.
-
Statistical methods
Non-overlap \(t\) vs Newey-West (NW) heteroskedasticity-and-autocorrelation-consistent (HAC) on overlapping spreads, DDOF convention.
-
Individual × Continuous landing
Adjacent metrics in the same cell.