Skip to content

factrix.metrics.concentration

Top-bucket concentration analysis for cross-sectional panels.

Measures whether top-bucket (long-leg) alpha is concentrated in a few stocks or broadly distributed, using Herfindahl-Hirschman index (HHI) inverse.

Notes

Pipeline. Per-date HHI inverse on top-bucket weights (cross-section step) → per-date ratio series, then non-overlapping sample; across-time t against H₀: ratio ≥ 0.5.

Input. DataFrame with date, asset_id, factor, forward_return.

factrix.metrics.concentration.top_concentration

top_concentration(df: DataFrame, forward_periods: int = 5, q_top: float = 0.2, factor_col: str = 'factor', return_col: str = 'forward_return', weight_by: ConcentrationWeight = 'abs_factor') -> MetricOutput

Top-bucket concentration via Herfindahl-Hirschman index (HHI) inverse.

Per date, selects top q_top stocks by factor rank, computes HHI of their weights, and returns 1/HHI as the effective number of independent bets.

Parameters:

Name Type Description Default
df DataFrame

Panel with date, asset_id, factor (and forward_return if weight_by="alpha_contribution").

required
q_top float

Fraction of top-ranked stocks to include (default 0.2 = top 20%).

0.2
weight_by ConcentrationWeight

HHI weight convention. - "abs_factor" (default): weight by |factor|. Answers "how concentrated is the signal itself in the top bucket". Conservative, signal-level. - "alpha_contribution": weight by the magnitude of each name's realised contribution |sign(factor) · forward_return|. Captures risk-concentration: the top bucket's realised return is dominated by a few outliers. Note the absolute value — a single big winner and a single big loser both register as concentration, which is the right framing for risk but NOT for signed-alpha attribution. If you need the latter, apply HHI downstream on the signed sign(factor) · forward_return series yourself.

'abs_factor'

Returns:

Type Description
MetricOutput

MetricOutput with value = mean(1/HHI) across dates.

MetricOutput

Higher = more diversified top bucket.

Notes

Per non-overlap date \(t\) with top-bucket members \(Q^{\mathrm{top}}(t)\) (size \(n^{\mathrm{top}}\)), define weights \(w_i\) by weight_by and form the Herfindahl \(\mathrm{HHI}_t = \sum_i (w_i / \sum_j w_j)^2\). Effective independent bets \(n^{\mathrm{eff}}_t = 1 / \mathrm{HHI}_t\). Per-date diversification ratio \(r_t = n^{\mathrm{eff}}_t / n^{\mathrm{top}}\) is averaged and tested one-sided against \(H_0: \mathbb{E}[r] \geq 0.5\): rejecting flags concentration.

factrix uses rank(method="average") for the top-bucket cutoff — tie_policy from Config does not apply because HHI measures concentration among the selected stocks, not their bucketing. tie_ratio is still recorded in metadata as a data-quality diagnostic (high tie_ratio → unstable membership across re-rankings).

Examples:

>>> import factrix as fx
>>> from factrix.preprocess import compute_forward_return
>>> from factrix.metrics.concentration import top_concentration
>>> panel = compute_forward_return(
...     fx.datasets.make_cs_panel(n_assets=80, n_dates=180, seed=0),
...     forward_periods=5,
... )
>>> result = top_concentration(panel, forward_periods=5, q_top=0.2)
>>> result.name
'top_concentration'

Use cases

  • Signal concentration in the long leg


    weight_by="abs_factor" (default) — Herfindahl-Hirschman index (HHI) on \(|\text{factor}|\) inside the top-\(q\) bucket. Answers "how concentrated is the signal itself in the long leg". Conservative; depends only on factor values, not on realised returns.

  • Realised risk concentration


    weight_by="alpha_contribution" — HHI on \(|\text{sign}(\text{factor}) \cdot \text{forward\_return}|\). Captures whether the long-leg's realised return is dominated by a few outliers. Absolute value: a single big winner and a single big loser both register as concentration (the right framing for risk, not for signed-alpha attribution).

  • Diversification test, one-sided


    value = mean(1/HHI) (effective number of independent bets); stat is a one-sided \(t\) against \(H_0: \mathbb{E}[r] \geq 0.5\) where \(r_t = n^{\text{eff}}_t / n^{\text{top}}\). Rejecting flags the long leg as concentrated relative to the bucket cardinality.

Worked example — top-bucket HHI on a synthetic panel

top_concentration with both weighting modes

import factrix as fx
from factrix.metrics.concentration import top_concentration
from factrix.preprocess import compute_forward_return

raw   = fx.datasets.make_cs_panel(
    n_assets=500, n_dates=500, ic_target=0.08, seed=2024,
)
panel = compute_forward_return(raw, forward_periods=5)

# Signal-level concentration (no return dependence)
sig = top_concentration(panel, forward_periods=5, q_top=0.2,
                        weight_by="abs_factor")
print(sig.value, sig.metadata["ratio_eff_to_total"], sig.stat)
# 78.4  0.78  -2.40   (approximate; ratio > 0.5 -> diversified)

# Realised-return concentration (risk framing)
risk = top_concentration(panel, forward_periods=5, q_top=0.2,
                         weight_by="alpha_contribution")
print(risk.value, risk.metadata["ratio_eff_to_total"], risk.stat)
# 41.2  0.41  3.10   (approximate; ratio < 0.5 -> outlier-driven)

See also

  • quantile_spread / quantile_spread_vw


    The long-short spread on the same top / bottom buckets — pair the EW-vs-VW spread gap with concentration to disentangle small-cap vs few-name alpha.

    api/metrics/quantile →

  • notional_turnover / breakeven_cost


    Implementation feasibility on the same long-short construction.

    api/metrics/tradability →

  • Statistical methods


    One-sided \(t\) on the per-date diversification ratio, DDOF convention, sample-size guards.

    reference/statistical-methods →

  • Metric applicability reference


    When this metric applies and the sample-size guards that gate it (MIN_PORTFOLIO_PERIODS_HARD / MIN_PORTFOLIO_PERIODS_WARN).

    reference/metric-applicability →

  • Individual × Continuous landing


    Adjacent metrics in the same cell.

    api/metrics/individual-continuous →