factrix.metrics.clustering ¶
Event clustering diagnostic for event signals.
When events cluster on the same dates, the independence assumption underlying the CAAR t-test is violated, potentially inflating the test statistic. The Herfindahl-Hirschman Index (HHI) on event dates quantifies this concentration.
Only meaningful for multi-asset panels (N > 1). For single-asset event studies, clustering across assets is not applicable.
Notes
Pipeline. Static cross-section — single HHI computed once over the event-date histogram; no time-axis aggregation, no formal H₀ (descriptive concentration index).
factrix.metrics.clustering.clustering_diagnostic ¶
clustering_diagnostic(df: DataFrame, *, factor_col: str = 'factor', cluster_window: int = 3) -> MetricOutput
Event clustering Herfindahl index on event dates.
Computes \(\mathrm{HHI} = \sum_d s_d^2\) where \(s_d = (\text{events on date } d) / (\text{total events})\). Herfindahl-Hirschman index (HHI) ranges from \(1/D\) (uniform) to \(1.0\) (all events on one date).
High HHI → events concentrate in few dates → cross-event independence assumption violated → CAAR \(t\)-stat may be inflated.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Panel with |
required |
cluster_window
|
int
|
Not used in HHI calculation but preserved for future block-bootstrap clustering adjustment. |
3
|
Returns:
| Type | Description |
|---|---|
MetricOutput
|
MetricOutput with value=HHI, metadata includes effective_n_dates |
MetricOutput
|
and concentration ratio. |
Notes
\(\mathrm{HHI} = \sum_d s_d^2\) where
\(s_d = (\text{events on date } d) / \text{total}\); ranges from
\(1/D\) (uniform across \(D\) event dates) to \(1.0\) (all events on
a single date).
effective_n_dates \(= 1 / \mathrm{HHI}\);
hhi_normalized \(= (\mathrm{HHI} - 1/D) / (1 - 1/D)\) rescales
to \([0, 1]\).
factrix reports HHI as a descriptive concentration index — no
formal \(H_0\) — because the natural follow-up correction
(cross-sectional dependence in CAAR / BMP) is delegated to
bmp_test(kolari_pynnonen_adjust=True).
Examples:
Use cases¶
-
Gate the CAAR independence assumption
Read
value(Herfindahl-Hirschman index (HHI) on the event-date histogram) andmetadata["effective_n_dates"]\(= 1 / \mathrm{HHI}\). High HHI → events concentrate in few dates → cross-event independence undercaar's \(t\)-test is violated and the statistic may be inflated. -
Trigger the Kolari-Pynnönen adjustment
When
hhi_normalizedis high (\(\geq 0.3\) is the threshold the BMP docstring calls out), switch onbmp_test(kolari_pynnonen_adjust=True)to absorb same-date shock sharing in the \(z\) statistic.
Worked example — HHI on event dates¶
clustering_diagnostic on a synthetic event panel
import factrix as fx
from factrix.metrics.clustering import clustering_diagnostic
from factrix.metrics.caar import bmp_test
panel = fx.datasets.make_event_panel(
n_assets=200, n_dates=500, event_rate=0.02,
cluster_dates=True, seed=2024,
)
diag = clustering_diagnostic(panel)
print(diag.value,
diag.metadata["effective_n_dates"],
diag.metadata["hhi_normalized"])
# 0.041 24.4 0.36 (approximate)
# hhi_normalized >= 0.3 -> reach for the K-P adjustment:
z = bmp_test(panel, kolari_pynnonen_adjust=True)
See also¶
-
caar/bmp_test
The downstream tests whose independence assumption this metric gates.
bmp_test(kolari_pynnonen_adjust=True)is the formal correction. -
signal_density
Inverse firing frequency — pair with
clustering_hhisince bars-per-event ignores temporal concentration. -
Metric applicability reference
Confounded-event handling and within-asset event clustering notes.
-
Individual × Sparse landing
Adjacent event-study metrics in the same cell.