factrix.metrics.concentration ¶
Top-bucket concentration analysis for cross-sectional panels.
Measures whether top-bucket (long-leg) alpha is concentrated in a few stocks or broadly distributed, using Herfindahl-Hirschman index (HHI) inverse.
Notes
Pipeline. Per-date HHI inverse on top-bucket weights
(cross-section step) → per-date ratio series, then non-overlapping
sample; across-time t against H₀: ratio ≥ 0.5.
Input. DataFrame with date, asset_id, factor, forward_return.
factrix.metrics.concentration.top_concentration ¶
top_concentration(df: DataFrame, forward_periods: int = 5, q_top: float = 0.2, factor_col: str = 'factor', return_col: str = 'forward_return', weight_by: ConcentrationWeight = 'abs_factor') -> MetricOutput
Top-bucket concentration via Herfindahl-Hirschman index (HHI) inverse.
Per date, selects top q_top stocks by factor rank, computes
HHI of their weights, and returns 1/HHI as the effective number of
independent bets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Panel with |
required |
q_top
|
float
|
Fraction of top-ranked stocks to include (default 0.2 = top 20%). |
0.2
|
weight_by
|
ConcentrationWeight
|
HHI weight convention.
- |
'abs_factor'
|
Returns:
| Type | Description |
|---|---|
MetricOutput
|
MetricOutput with value = mean(1/HHI) across dates. |
MetricOutput
|
Higher = more diversified top bucket. |
Notes
Per non-overlap date \(t\) with top-bucket members \(Q^{\mathrm{top}}(t)\)
(size \(n^{\mathrm{top}}\)), define weights \(w_i\) by weight_by
and form the Herfindahl
\(\mathrm{HHI}_t = \sum_i (w_i / \sum_j w_j)^2\). Effective
independent bets \(n^{\mathrm{eff}}_t = 1 / \mathrm{HHI}_t\).
Per-date diversification ratio
\(r_t = n^{\mathrm{eff}}_t / n^{\mathrm{top}}\) is averaged and tested
one-sided against \(H_0: \mathbb{E}[r] \geq 0.5\): rejecting flags
concentration.
factrix uses rank(method="average") for the top-bucket cutoff
— tie_policy from Config does not apply because HHI measures
concentration among the selected stocks, not their bucketing.
tie_ratio is still recorded in metadata as a data-quality
diagnostic (high tie_ratio → unstable membership across
re-rankings).
Examples:
>>> import factrix as fx
>>> from factrix.preprocess import compute_forward_return
>>> from factrix.metrics.concentration import top_concentration
>>> panel = compute_forward_return(
... fx.datasets.make_cs_panel(n_assets=80, n_dates=180, seed=0),
... forward_periods=5,
... )
>>> result = top_concentration(panel, forward_periods=5, q_top=0.2)
>>> result.name
'top_concentration'
Use cases¶
-
Signal concentration in the long leg
weight_by="abs_factor"(default) — Herfindahl-Hirschman index (HHI) on \(|\text{factor}|\) inside the top-\(q\) bucket. Answers "how concentrated is the signal itself in the long leg". Conservative; depends only on factor values, not on realised returns. -
Realised risk concentration
weight_by="alpha_contribution"— HHI on \(|\text{sign}(\text{factor}) \cdot \text{forward\_return}|\). Captures whether the long-leg's realised return is dominated by a few outliers. Absolute value: a single big winner and a single big loser both register as concentration (the right framing for risk, not for signed-alpha attribution). -
Diversification test, one-sided
value = mean(1/HHI)(effective number of independent bets);statis a one-sided \(t\) against \(H_0: \mathbb{E}[r] \geq 0.5\) where \(r_t = n^{\text{eff}}_t / n^{\text{top}}\). Rejecting flags the long leg as concentrated relative to the bucket cardinality.
Worked example — top-bucket HHI on a synthetic panel¶
top_concentration with both weighting modes
import factrix as fx
from factrix.metrics.concentration import top_concentration
from factrix.preprocess import compute_forward_return
raw = fx.datasets.make_cs_panel(
n_assets=500, n_dates=500, ic_target=0.08, seed=2024,
)
panel = compute_forward_return(raw, forward_periods=5)
# Signal-level concentration (no return dependence)
sig = top_concentration(panel, forward_periods=5, q_top=0.2,
weight_by="abs_factor")
print(sig.value, sig.metadata["ratio_eff_to_total"], sig.stat)
# 78.4 0.78 -2.40 (approximate; ratio > 0.5 -> diversified)
# Realised-return concentration (risk framing)
risk = top_concentration(panel, forward_periods=5, q_top=0.2,
weight_by="alpha_contribution")
print(risk.value, risk.metadata["ratio_eff_to_total"], risk.stat)
# 41.2 0.41 3.10 (approximate; ratio < 0.5 -> outlier-driven)
See also¶
-
quantile_spread/quantile_spread_vw
The long-short spread on the same top / bottom buckets — pair the EW-vs-VW spread gap with concentration to disentangle small-cap vs few-name alpha.
-
notional_turnover/breakeven_cost
Implementation feasibility on the same long-short construction.
-
Statistical methods
One-sided \(t\) on the per-date diversification ratio, DDOF convention, sample-size guards.
-
Metric applicability reference
When this metric applies and the sample-size guards that gate it (
MIN_PORTFOLIO_PERIODS_HARD/MIN_PORTFOLIO_PERIODS_WARN). -
Individual × Continuous landing
Adjacent metrics in the same cell.