factrix.metrics.concentration ¶

Top-bucket concentration analysis for cross-sectional panels.

Measures whether top-bucket (long-leg) alpha is concentrated in a few stocks or broadly distributed, using Herfindahl-Hirschman index (HHI) inverse.

Notes

Pipeline. Per-date HHI inverse on top-bucket weights (cross-section step) → per-date ratio series, then non-overlapping sample; across-time t against H₀: ratio ≥ 0.5.

Input. DataFrame with date, asset_id, factor, forward_return.

factrix.metrics.concentration.top_concentration ¶

top_concentration(df: DataFrame, forward_periods: int = 5, q_top: float = 0.2, factor_col: str = 'factor', return_col: str = 'forward_return', weight_by: ConcentrationWeight = 'abs_factor') -> MetricOutput

Top-bucket concentration via Herfindahl-Hirschman index (HHI) inverse.

Per date, selects top q_top stocks by factor rank, computes HHI of their weights, and returns 1/HHI as the effective number of independent bets.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Panel with `date, asset_id, factor` (and `forward_return` if `weight_by="alpha_contribution"`).	required
`q_top`	`float`	Fraction of top-ranked stocks to include (default 0.2 = top 20%).	`0.2`
`weight_by`	`ConcentrationWeight`	HHI weight convention. - `"abs_factor"` (default): weight by `\|factor\|`. Answers "how concentrated is the signal itself in the top bucket". Conservative, signal-level. - `"alpha_contribution"`: weight by the magnitude of each name's realised contribution `\|sign(factor) · forward_return\|`. Captures risk-concentration: the top bucket's realised return is dominated by a few outliers. Note the absolute value — a single big winner and a single big loser both register as concentration, which is the right framing for risk but NOT for signed-alpha attribution. If you need the latter, apply HHI downstream on the signed `sign(factor) · forward_return` series yourself.	`'abs_factor'`

Returns:

Type	Description
`MetricOutput`	MetricOutput with value = mean(1/HHI) across dates.
`MetricOutput`	Higher = more diversified top bucket.

Notes

Per non-overlap date \(t\) with top-bucket members \(Q^{\mathrm{top}}(t)\) (size \(n^{\mathrm{top}}\)), define weights \(w_i\) by weight_by and form the Herfindahl \(\mathrm{HHI}_t = \sum_i (w_i / \sum_j w_j)^2\). Effective independent bets \(n^{\mathrm{eff}}_t = 1 / \mathrm{HHI}_t\). Per-date diversification ratio \(r_t = n^{\mathrm{eff}}_t / n^{\mathrm{top}}\) is averaged and tested one-sided against \(H_0: \mathbb{E}[r] \geq 0.5\): rejecting flags concentration.

factrix uses rank(method="average") for the top-bucket cutoff — tie_policy from Config does not apply because HHI measures concentration among the selected stocks, not their bucketing. tie_ratio is still recorded in metadata as a data-quality diagnostic (high tie_ratio → unstable membership across re-rankings).

Examples:

>>> import factrix as fx
>>> from factrix.preprocess import compute_forward_return
>>> from factrix.metrics.concentration import top_concentration
>>> panel = compute_forward_return(
...     fx.datasets.make_cs_panel(n_assets=80, n_dates=180, seed=0),
...     forward_periods=5,
... )
>>> result = top_concentration(panel, forward_periods=5, q_top=0.2)
>>> result.name
'top_concentration'

Use cases¶

Signal concentration in the long leg

weight_by="abs_factor" (default) — Herfindahl-Hirschman index (HHI) on \(|\text{factor}|\) inside the top-\(q\) bucket. Answers "how concentrated is the signal itself in the long leg". Conservative; depends only on factor values, not on realised returns.
Realised risk concentration

weight_by="alpha_contribution" — HHI on \(|\text{sign}(\text{factor}) \cdot \text{forward\_return}|\). Captures whether the long-leg's realised return is dominated by a few outliers. Absolute value: a single big winner and a single big loser both register as concentration (the right framing for risk, not for signed-alpha attribution).
Diversification test, one-sided

value = mean(1/HHI) (effective number of independent bets); stat is a one-sided \(t\) against \(H_0: \mathbb{E}[r] \geq 0.5\) where \(r_t = n^{\text{eff}}_t / n^{\text{top}}\). Rejecting flags the long leg as concentrated relative to the bucket cardinality.

Worked example — top-bucket HHI on a synthetic panel¶

top_concentration with both weighting modes

import factrix as fx
from factrix.metrics.concentration import top_concentration
from factrix.preprocess import compute_forward_return

raw   = fx.datasets.make_cs_panel(
    n_assets=500, n_dates=500, ic_target=0.08, seed=2024,
)
panel = compute_forward_return(raw, forward_periods=5)

# Signal-level concentration (no return dependence)
sig = top_concentration(panel, forward_periods=5, q_top=0.2,
                        weight_by="abs_factor")
print(sig.value, sig.metadata["ratio_eff_to_total"], sig.stat)
# 78.4  0.78  -2.40   (approximate; ratio > 0.5 -> diversified)

# Realised-return concentration (risk framing)
risk = top_concentration(panel, forward_periods=5, q_top=0.2,
                         weight_by="alpha_contribution")
print(risk.value, risk.metadata["ratio_eff_to_total"], risk.stat)
# 41.2  0.41  3.10   (approximate; ratio < 0.5 -> outlier-driven)

factrix.metrics.concentration ¶

factrix.metrics.concentration.top_concentration ¶

Use cases¶

Worked example — top-bucket HHI on a synthetic panel¶

See also¶