Skip to content

factrix.metrics.clustering

Event clustering diagnostic for event signals.

When events cluster on the same dates, the independence assumption underlying the CAAR t-test is violated, potentially inflating the test statistic. The Herfindahl-Hirschman Index (HHI) on event dates quantifies this concentration.

Only meaningful for multi-asset panels (N > 1). For single-asset event studies, clustering across assets is not applicable.

Notes

Pipeline. Static cross-section — single HHI computed once over the event-date histogram; no time-axis aggregation, no formal H₀ (descriptive concentration index).

factrix.metrics.clustering.clustering_diagnostic

clustering_diagnostic(df: DataFrame, *, factor_col: str = 'factor', cluster_window: int = 3) -> MetricOutput

Event clustering Herfindahl index on event dates.

Computes \(\mathrm{HHI} = \sum_d s_d^2\) where \(s_d = (\text{events on date } d) / (\text{total events})\). Herfindahl-Hirschman index (HHI) ranges from \(1/D\) (uniform) to \(1.0\) (all events on one date).

High HHI → events concentrate in few dates → cross-event independence assumption violated → CAAR \(t\)-stat may be inflated.

Parameters:

Name Type Description Default
df DataFrame

Panel with date, asset_id, factor.

required
cluster_window int

Not used in HHI calculation but preserved for future block-bootstrap clustering adjustment.

3

Returns:

Type Description
MetricOutput

MetricOutput with value=HHI, metadata includes effective_n_dates

MetricOutput

and concentration ratio.

Notes

\(\mathrm{HHI} = \sum_d s_d^2\) where \(s_d = (\text{events on date } d) / \text{total}\); ranges from \(1/D\) (uniform across \(D\) event dates) to \(1.0\) (all events on a single date). effective_n_dates \(= 1 / \mathrm{HHI}\); hhi_normalized \(= (\mathrm{HHI} - 1/D) / (1 - 1/D)\) rescales to \([0, 1]\).

factrix reports HHI as a descriptive concentration index — no formal \(H_0\) — because the natural follow-up correction (cross-sectional dependence in CAAR / BMP) is delegated to bmp_test(kolari_pynnonen_adjust=True).

Examples:

>>> import factrix as fx
>>> from factrix.metrics.clustering import clustering_diagnostic
>>> panel = fx.datasets.make_event_panel(n_assets=50, n_dates=400, seed=0)
>>> result = clustering_diagnostic(panel)
>>> result.name
'clustering_hhi'

Use cases

  • Gate the CAAR independence assumption


    Read value (Herfindahl-Hirschman index (HHI) on the event-date histogram) and metadata["effective_n_dates"] \(= 1 / \mathrm{HHI}\). High HHI → events concentrate in few dates → cross-event independence under caar's \(t\)-test is violated and the statistic may be inflated.

  • Trigger the Kolari-Pynnönen adjustment


    When hhi_normalized is high (\(\geq 0.3\) is the threshold the BMP docstring calls out), switch on bmp_test(kolari_pynnonen_adjust=True) to absorb same-date shock sharing in the \(z\) statistic.

Worked example — HHI on event dates

clustering_diagnostic on a synthetic event panel

import factrix as fx
from factrix.metrics.clustering import clustering_diagnostic
from factrix.metrics.caar import bmp_test

panel = fx.datasets.make_event_panel(
    n_assets=200, n_dates=500, event_rate=0.02,
    cluster_dates=True, seed=2024,
)

diag = clustering_diagnostic(panel)
print(diag.value,
      diag.metadata["effective_n_dates"],
      diag.metadata["hhi_normalized"])
# 0.041  24.4  0.36   (approximate)

# hhi_normalized >= 0.3 -> reach for the K-P adjustment:
z = bmp_test(panel, kolari_pynnonen_adjust=True)

See also