stats
Inference-method instances + standalone statistical helpers. The
public surface is what screening functions (bhy / bhy_hierarchical) and
the slice-test functions (slice_pairwise_test / slice_joint_test)
accept on their estimator= kwarg, plus a small set of false discovery rate (FDR) /
bootstrap utilities for callers who
want to drive inference outside the dispatch chain.
The numerical implementations live in the private factrix._stats
package; nothing under _stats is part of the public API.
Estimator catalogue¶
Estimator is the base Protocol — each
instance names which inference path downstream code reads from
FactorProfile.stats. HACEstimator(Estimator) is the sub-protocol
adding cell-internal compute(series, *, forward_periods) ->
InferenceResult for heteroskedasticity-and-autocorrelation-consistent (HAC)-on-mean estimators; pass instances to
AnalysisConfig.estimator= for evaluate-time inference dispatch.
Default-constructed instances live in
factrix.stats._ESTIMATOR_REGISTRY and surface through
list_estimators(scope, signal) (top-level factrix export).
| Class | Protocol | Algorithm family | Emits | Applicable to | Use when |
|---|---|---|---|---|---|
NeweyWest |
HACEstimator |
Newey-West (NW) Bartlett HAC | (T_NW, P_NW) |
every cell | Default — drives primary_p on every PANEL / TIMESERIES procedure. |
HansenHodrick |
HACEstimator |
Hansen-Hodrick (HH) rectangular HAC | (T_HH, P_HH) |
(INDIVIDUAL, CONTINUOUS) only |
Overlapping forward returns on information coefficient (IC) PANEL / FM PANEL — the MA(h-1) overlap structure has a closed-form rectangular-kernel SE. Pass via AnalysisConfig.individual_continuous(estimator=HansenHodrick()) to drive primary_p from the HH path instead of NW. |
WaldNWCluster |
Cluster-Wald χ² (NW HAC + 1-way cluster on slice) | (WALD_NWCL, P_WALD_NWCL) |
(INDIVIDUAL, CONTINUOUS) |
Slice test on a stacked per-date metric panel (#176 functions). | |
WaldTwoWayCluster |
Cluster-Wald χ² (Cameron-Gelbach-Miller two-way cluster on (date, asset)) | (WALD_TWOWAY, P_WALD_TWOWAY) |
(INDIVIDUAL, CONTINUOUS) |
Reserved interface — raw asset-date panel path. No function consumes it until factor_decomposition lands later. |
|
BlockBootstrap |
Politis-Romano stationary or Künsch fixed block bootstrap; Politis-White auto block length | (P_BOOT,) |
(INDIVIDUAL, CONTINUOUS) |
Paired-diff slice test when distributional assumptions of the cluster-Wald path are uncomfortable (heavy tails, persistent shocks). |
WaldTwoWayCluster is a reserved interface
The class ships in #153 so the (WALD_TWOWAY, P_WALD_TWOWAY) StatCode
pair has a stable home, but no function populates profile.stats
with P_WALD_TWOWAY until factor_decomposition lands. Calling
bhy(estimator=WaldTwoWayCluster()) against a profile produced
by evaluate() raises a missing-stat error pointing at the
precondition.
Picking an Estimator¶
| Question | Estimator |
|---|---|
Default single-series significance on evaluate() output |
NeweyWest |
Overlapping forward returns (forward_periods > 1) on IC PANEL / FM PANEL |
HansenHodrick |
| Slice contrast on per-date IC / FM (regime, sector, decile) | WaldNWCluster |
| Slice paired-diff on heavy-tailed / persistent series, distributional assumptions uncomfortable | BlockBootstrap |
| Raw asset-date panel inference (factor × slice interaction) | WaldTwoWayCluster (reserved) |
Pass an instance to a screening function to override the default
primary_p lookup:
from factrix.stats import HansenHodrick
# IC PANEL with overlapping forward_periods=5 → use HH instead of NW.
survivors = fx.multi_factor.bhy(profiles, estimator=HansenHodrick())
BlockBootstrap is the only Estimator whose constructor takes
configuration:
from factrix.stats import BlockBootstrap
# Stationary scheme, B=999, automatic block length, fixed seed.
est = BlockBootstrap(
block_length="auto", # Politis-White (2004) spectral plug-in
n_resamples=999,
scheme="stationary", # or "fixed" for Künsch (1989) deterministic blocks
rng_seed=42, # None → system entropy; realised seed written to metadata
)
The scheme is metadata, not a separate StatCode — both schemes
emit P_BOOT. Two BlockBootstrap instances with different scheme
are distinct Estimators from a function's perspective; the function writes
the resolved scheme + block length + seed into
FactorProfile.metadata[StatCode.P_BOOT].
StatCode pairs¶
StatCode is the canonical naming for the scalar statistics the
procedures populate on profile.stats. The shape is
<KIND>_<ALGO> — KIND names the test statistic (T for Student-t /
asymptotic normal, J for Hansen J / χ², WALD for Wald χ²); ALGO
names the inference algorithm or SE family (NW, HH, NWCL,
DC, generalized method of moments (GMM), …).
| Pair | What it is |
|---|---|
(T_NW, P_NW) |
Newey-West HAC t-statistic + p — the primary_p source the metric evaluate() runs populates by default. |
(T_HH, P_HH) |
Hansen-Hodrick rectangular-kernel HAC t + p — emitted only when forward_periods > 1. |
(WALD_NWCL, P_WALD_NWCL) |
Cluster-Wald χ² + p under NW HAC + 1-way slice cluster — emitted by the slice-test functions. |
(WALD_TWOWAY, P_WALD_TWOWAY) |
Cluster-Wald χ² + p under two-way cluster on (date, asset) — reserved. |
(P_BOOT,) |
Block-bootstrap empirical p — singleton, no parametric test statistic to publish. |
(J_GMM, P_GMM) |
Hansen (1982) GMM J-statistic + right-tail p (1 - χ²_df.cdf(J)) on a moment-condition system. Populated by factrix.stats.GMM (#191); see Estimator alternatives for usage. |
Diagnostic StatCodes (FACTOR_ADF_*, RESID_LJUNG_BOX_*,
EVENT_HHI_VALUE) follow a different naming axis; see
StatCode for the full
enum.
FDR / bootstrap utilities¶
Standalone helpers that don't go through the Estimator dispatch chain:
bhy_adjust(p_values, fdr=0.05, *, n_tests=None)— Benjamini-Yekutieli step-up rejection mask. Returnsnp.ndarray[bool]aligned to input order.bhy_adjusted_p(p_values, *, n_tests=None)— per-hypothesis Benjamini-Hochberg-Yekutieli (BHY)-adjusted p-values (clipped at 1).stationary_bootstrap_resamples(values, n_bootstrap, …)— Politis-Romano (1994) resamples; emits the value matrix directly.bootstrap_mean_ci(values, *, n_bootstrap, ci, …)— stationary-bootstrap CI for a statistic (defaultmean; passstatistic=for Sharpe / median / skew).
The family-wise error rate (FWER) procedures (Holm step-down / Bonferroni / Romano-Wolf
bootstrap step-down) live as private helpers under
factrix._stats.multiple_testing; they ship in #153 and are
consumed by the slice-test functions in #176 — the function's
default-selection logic picks Holm for time-disjoint slices and
Romano-Wolf for date-shared slices.
Estimator protocol¶
Three-layer protocol: base Estimator for family-function selection
(bhy(profiles, estimator=...)); HACEstimator(Estimator) for
evaluate-time HAC-on-mean dispatch (AnalysisConfig.estimator=);
MomentEstimator(Estimator) for over-identifying-restriction tests
on a multivariate moment system (AnalysisConfig.moment_estimator=).
@runtime_checkable
class Estimator(Protocol):
@property
def name(self) -> str: ...
@property
def description(self) -> str: ...
def applicable_to(self, scope: FactorScope, signal: Signal) -> bool: ...
def emits_for(
self,
scope: FactorScope,
signal: Signal,
metric: Metric | None,
) -> StatCode: ...
@runtime_checkable
class HACEstimator(Estimator, Protocol):
@property
def min_periods(self) -> int: ...
def compute(
self,
series: np.ndarray,
*,
forward_periods: int,
) -> InferenceResult: ...
@runtime_checkable
class MomentEstimator(Estimator, Protocol):
@property
def min_periods(self) -> int: ...
def compute(
self,
moments: np.ndarray, # (T, K) moment matrix
*,
forward_periods: int, # overlap horizon — floors the LRCov bandwidth
) -> GMMResult: ...
NeweyWest and HansenHodrick implement HACEstimator; GMM
implements MomentEstimator; the slice-test instances
(WaldNWCluster / WaldTwoWayCluster / BlockBootstrap) implement
only the selection base since their compute paths are multivariate
(cross-asset / cross-slice) rather than mean-on-series or
moment-system. A slope-axis HAC sub-protocol (TS β / TS Dummy) is
tracked separately rather than overloading HACEstimator.compute.
InferenceResult¶
HACEstimator.compute returns a frozen dataclass carrying the
inference layer of a FactorProfile (procedure stitches descriptive
stats like MEAN on top):
@dataclass(frozen=True, slots=True)
class InferenceResult:
stat: float # t-statistic
p: float # two-sided p-value
stat_name: StatCode # T_NW / T_HH / ...
p_name: StatCode # P_NW / P_HH / ...
metadata: Mapping[str, Any] # {"nw_lags": k} / {"kernel": ..., "variance_clamped": bool}
warnings: frozenset[WarningCode]
GMMResult¶
MomentEstimator.compute returns a frozen dataclass for the
over-identifying-restriction test on a moment-condition system:
@dataclass(frozen=True, slots=True)
class GMMResult:
j_stat: float # Hansen J statistic
df: int # n_moments - n_params
overid_p: float # 1 - χ²_df.cdf(j_stat)
n_moments: int
n_params: int # 0 in current release (pure overid)
metadata: Mapping[str, Any] # {"weight_matrix_iter": 2, "weight_singular": False, ...}
warnings: frozenset[WarningCode]
Unlike InferenceResult, no stat_name / p_name field — the type
itself implies the (StatCode.J_GMM, StatCode.P_GMM) pair, and cell
procedures key FactorProfile.stats accordingly.
get_estimator(name) -> Estimator¶
Registry lookup helper used by AnalysisConfig.from_dict to
rehydrate cfg.estimator / cfg.moment_estimator from their
serialized name strings. Raises UnknownEstimatorError if name
is not registered; the error message lists every available estimator.