Skip to content

Input contract — the panel must satisfy the four-column floor documented in Panel schema.

factrix.evaluate

evaluate(panel: Any, config: AnalysisConfig | None = None, /, *, factor_col: str = 'factor') -> FactorProfile

Evaluate one factor against its forward returns and return a FactorProfile.

The profile carries primary_p (the headline p-value for downstream false discovery rate (FDR)), the cell-specific statistics, sample-size diagnostics, warnings, and the identity / context tuple used by multi-factor aggregators (bhy / partial_conjunction / bhy_hierarchical).

All factrix-raised errors inherit from FactrixError.

Dispatch lore — cell schema, Mode, multi-factor cost

Dispatch is explicit. No auto-fallback when the panel shape does not match the cell. The one exception: Common × Continuous at N == 1 auto-routes to the TIMESERIES single-series path (profile.mode == "TIMESERIES") so single-asset macro factors still flow through.

Required columns per cell. Every cell floors its INPUT_SCHEMA at the same four columns; optional columns activate additional standalone metrics and short-circuit gracefully (NaN + reason) when absent.

Cell Required Optional column → enables
Individual × Continuous (ic, fama_macbeth) date, asset_id, factor, forward_return market_cap (or any name passed as weight_col=) → quantile_spread_vw value-weighting
Individual × Sparse (event studies) date, asset_id, factor, forward_return priceevent_around_return, mfe_mae_summary (degrade gracefully if absent)
Common × Continuous (broadcast macro factor) date, asset_id, factor, forward_return
Common × Sparse (broadcast event dummy) date, asset_id, factor, forward_return

forward_return is part of the input contract — attach it via compute_forward_return before the call so the horizon is explicit and aligned with config.forward_periods.

Mode — PANEL vs TIMESERIES. Derived at evaluate-time from N = panel["asset_id"].n_unique() and surfaced on profile.mode:

profile.mode When Inference
"PANEL" N ≥ 2 cross-sectional / event cells per-date statistic → time-series mean with Newey-West (NW) heteroskedasticity-and-autocorrelation-consistent (HAC)
"TIMESERIES" Common × Continuous with N == 1 single-series ordinary least squares (OLS) with plain SE; HAC only on stage-2 aggregation

Full conventions: Timeseries-mode conventions. Sample-guard contract: Panel vs timeseries.

Multi-factor cost. Each call repeats the per-date cross-section work (sort / group-by / rank / Herfindahl-Hirschman index (HHI)) on its own, so cost scales as O(n_factors × per_date_cost). There is no shared-pass primitive; bhy controls FDR but does not reduce the per-signal evaluation cost.

Parameters:

Name Type Description Default
panel Any

Long-format panel satisfying the four-column floor (date, asset_id, factor, forward_return). See Panel schema for the canonical contract and dtype semantics.

required
config AnalysisConfig | None

Validated AnalysisConfig selecting the dispatch cell (Scope × Signal × Metric). Construct via one of the four factories on the class.

None
factor_col str

Name of the signal column on panel (default "factor"). Renamed to "factor" internally before dispatch. Looping over candidates with different factor_col= values is the canonical multi-factor pattern.

'factor'

Returns:

Type Description
FactorProfile

FactorProfile with primary_p,

FactorProfile

stats, warnings, info_notes, mode, n_obs,

FactorProfile

n_assets, plus identity = (factor_id, forward_periods)

FactorProfile

and context = {universe_id, regime_id, ...}.

Raises:

Type Description
MissingConfigError

evaluate(panel) called without an AnalysisConfig. Recovery: call suggest_config.

IncompatibleAxisError

config axes form an illegal cell.

ModeAxisError

Legal cell has no procedure under the derived Mode. Carries .suggested_fix: AnalysisConfig | None with the nearest-legal config.

InsufficientSampleError

T below the procedure's MIN_PERIODS_HARD floor. Carries .actual_periods / .required_periods.

ValueError

factor_col not present on panel, or both "factor" and factor_col present with differing values (ambiguous which is the signal — drop the unused column).

Examples:

Single-factor inference on a cross-sectional panel:

>>> import factrix as fx
>>> from factrix.preprocess import compute_forward_return
>>> raw = fx.datasets.make_cs_panel(n_assets=100, n_dates=250)
>>> panel = compute_forward_return(raw, forward_periods=5)
>>> cfg = fx.AnalysisConfig.individual_continuous(forward_periods=5)
>>> profile = fx.evaluate(panel, cfg)

Non-default signal column name:

>>> panel_renamed = panel.rename({"factor": "alpha"})
>>> profile = fx.evaluate(panel_renamed, cfg, factor_col="alpha")

Multi-factor screening with FDR — see :func:factrix.multi_factor.bhy.


Use cases

  • Single-factor significance


    One panel + one AnalysisConfig → one FactorProfile carrying primary_p and the cell-specific statistics.

  • Batch screening with false discovery rate (FDR)


    Loop evaluate over candidate signal columns and feed the resulting list of profiles to bhy for false-discovery-rate control. See Batch screening.

  • Cross-cell apples-to-apples


    Swap the AnalysisConfig factory to compare information coefficient (IC) rank-ordering against Fama-MacBeth λ on the same panel, or individual-asset factors against broadcast macro factors. Return shape is identical across cells.

  • TIMESERIES auto-routing


    Common × Continuous with N == 1 falls back to single-series ordinary least squares (OLS) with Newey-West heteroskedasticity-and-autocorrelation-consistent (HAC) SE, so single-asset macro factors flow through the same entry point without a parallel code path.

Worked example — single-factor smoke test

Synthetic panel → evaluate → read primary_p + diagnose()

Full runnable example complementing the doctest snippets in Examples above with realistic console output and a diagnose() dump.

import factrix as fx
from factrix.preprocess import compute_forward_return

raw   = fx.datasets.make_cs_panel(
    n_assets=100, n_dates=500, ic_target=0.08, seed=2024,
)
panel = compute_forward_return(raw, forward_periods=5)

cfg     = fx.AnalysisConfig.individual_continuous(
    metric=fx.Metric.IC, forward_periods=5,
)
profile = fx.evaluate(panel, cfg)

print("primary_p =", round(profile.primary_p, 4))
# → primary_p = 0.0

print(profile.diagnose())
# {'identity': {'factor_id': 'factor', 'forward_periods': 5},
#  'context': {},
#  'cell':     {'scope': 'individual', 'signal': 'continuous',
#               'metric': 'ic', 'mode': 'panel'},
#  'n_obs':    494, 'n_pairs': 49400, 'n_periods': 494, 'n_assets': 100,
#  'primary_p':     2.13e-40,
#  'primary_stat':  14.60,
#  'primary_stat_name': 't_nw',
#  'warnings': [], 'info_notes': [],
#  'stats':    {'mean': 0.0722, 't_nw': 14.60, 'p_nw': 2.13e-40},
#  'metadata': {'t_nw': {'nw_lags': 5}, 'p_nw': {'nw_lags': 5}}}

Config recipes — one per dispatch cell

Minimum-viable AnalysisConfig for each of the four cells. The evaluate(panel, cfg) call site is identical; only cfg changes.

Rank predictive ordering — Spearman IC + Newey-West (NW) HAC.

cfg = fx.AnalysisConfig.individual_continuous(
    metric=fx.Metric.IC, forward_periods=5,
)

Unit-of-exposure premium — Fama-MacBeth λ.

cfg = fx.AnalysisConfig.individual_continuous(
    metric=fx.Metric.FM, forward_periods=5,
)

Event study with sparse {0, R} event triggers (R is any real magnitude; {0, 1} for a pure event flag is the simplest form). Attach a price column on the panel to also get event_around_return / mfe_mae_summary in the profile.

cfg = fx.AnalysisConfig.individual_sparse(forward_periods=5)

Broadcast macro factor (e.g. VIX). With N == 1 on the panel, evaluate auto-routes to single-series OLS with NW HAC SE (profile.mode == "TIMESERIES").

cfg = fx.AnalysisConfig.common_continuous(forward_periods=5)

Broadcast event dummy (FOMC, index rebalance).

cfg = fx.AnalysisConfig.common_sparse(forward_periods=5)

Per-cell required / optional columns and the PANEL ↔ TIMESERIES Mode derivation are documented in the Dispatch lore admonition above.

Next steps

  • Batch screening guide


    Wires evaluate into the multi-factor FDR pipeline: loop over candidates while preserving identity / context; choose between bhy / partial_conjunction / bhy_hierarchical; mixed-cell batches; primary_p vs stats at the FDR stage.

    Read the guide →

  • Panel schema


    New to the input contract? Start here for the four-column floor (date, asset_id, factor, forward_return), dtype semantics, and optional columns that activate extra metrics.

    Read the schema →

See also

  • Timeseries-mode conventions


    The N == 1 auto-routing rules and SE conventions for single-series paths.

    reference/ts-mode-conventions →

  • Panel vs timeseries sample guard


    Sample-size floors and the InsufficientSampleError recovery path.

    guides/panel-timeseries →

  • run_metrics — descriptive twin


    Computes the same statistics but makes no FDR claim. Use when you want the numbers without the inference framing.

    api/run-metrics →