Glossary
factrix names its dispatch axes after the structure of the data
(Scope × Signal) and the regression layout (Mode). The literature
this borrows from uses overlapping but not identical vocabulary —
this glossary maps the factrix terms to their nearest industry
equivalents and flags collisions where the same word means different
things in factrix vs Alphalens / Barra / standard panel-econometrics.
Scope axis — INDIVIDUAL vs COMMON¶
INDIVIDUAL¶
Per-asset factor values: each (date, asset_id) pair has its own
factor reading. The cross-section is the unit of inference (per-date
information coefficient (IC), FM λ, CAAR per event date). This is the regime most equity factor
research operates in.
Industry equivalents:
- Alphalens: "characteristic" / "factor". Alphalens panels are always per-asset signals, so the term is implicit.
- Standard cross-sectional asset pricing: "stock characteristic"
(size, book-to-market, momentum). Same regime; factrix routes these
through
Individual × Continuous. - MSCI / Axioma: "factor exposure" of asset
ito factork.
COMMON¶
A factor that does not vary across assets: on any given date every
asset_id carries the same factor value. The panel layout is
unchanged (still long-format (date, asset_id) rows) — scope is a
factor attribute, not a data shape. Macro factors (VIX, USD index,
term spread, monetary-policy shocks) and broadcast event dummies
(FOMC announcement days) live here.
Industry equivalents and collisions:
- Macro / global factor literature: "common factor" = a single
series that drives multiple assets. Matches factrix
COMMON. - Barra "common factor": a shared risk factor estimated from
cross-sectional regressions on a panel of asset characteristics
(Barra style / size / value / momentum factors). Estimation-driven,
per-asset latent loadings. Collides with factrix
COMMON, which is a given broadcast series with no estimation step on the factor itself. - Alphalens: no direct equivalent. Closest is "group-neutral" /
benchmark series, but Alphalens does not natively dispatch to a
COMMONcell.
When porting Barra terminology in: a Barra "common factor" with raw
cross-sectional loadings is a factrix INDIVIDUAL × CONTINUOUS
problem (the loadings are the per-asset characteristic). A Barra
"common factor" treated as a single series shared across assets is a
factrix COMMON × CONTINUOUS problem.
Signal axis — CONTINUOUS vs SPARSE¶
CONTINUOUS¶
Factor takes a real-valued reading on every observation in the panel. Most equity characteristics; macro factors; ML predictor scores.
SPARSE¶
Factor is non-zero only on event dates and zero elsewhere; the panel
encodes a discrete event arrival process. The general schema is
{0, R} where R is any real value (positive, negative, or any
magnitude); the simplest form is {0, 1} for a pure event flag.
See compute_caar's input-form table for the resulting estimator
distinction.
Industry equivalents:
- MacKinlay (1997) event-study vocabulary: factrix's
SPARSEcell is the event-study cell; the factor column is the announcement indicator. factrix'sfactor != 0filter is the event-window selector. - Alphalens: no direct equivalent — Alphalens assumes continuous characteristics. Event-study workflows live outside Alphalens.
Mode axis — PANEL vs TIMESERIES¶
Mode is derived from data at evaluate-time, not configured:
N ≥ 2→PANEL.N == 1→TIMESERIES.
PANEL¶
Multi-asset panel; cross-section participates in inference (per-date aggregation, cross-asset SE).
Industry equivalents: panel data; longitudinal data; pooled cross-section / time-series. The econometrics literature usually says "panel" with no further qualification.
TIMESERIES¶
Single-asset series. Time axis is the only sample axis; cross-section is degenerate.
Industry equivalents: time-series regression; single-asset ordinary least squares (OLS).
Note that factrix TIMESERIES mode does not mean Alphalens'
"time-series of cross-sectional means" (that is per-date aggregation
of a PANEL); the two collide on the word time-series and disagree
on what is aggregated. See
Timeseries-mode conventions for the
specific contracts that apply when factrix routes here.
Other terms¶
the metric evaluate() runs¶
Every (scope, signal, metric, mode) dispatch cell maps to exactly one
metric callable that evaluate() runs internally
— ic, fama_macbeth, caar, or ts_beta — and that metric's
p-value becomes FactorProfile.primary_p.
Defined by the
factrix._registry._DISPATCH_REGISTRY. Other metrics applicable to
the same cell are called separately via
list_metrics and the per-metric callable;
they do not enter the FactorProfile.
The cell-keyed reverse-index lives at Metric applicability § Cell to evaluate-metric.
Legacy synonyms (pre-#144 docs and source comments): procedure-canonical,
cell-canonical, primary metric, canonical metric. New content uses the
behavioral phrasing above; legacy occurrences are phased out via the ratchet
test in tests/test_docs_terminology_ratchet.py so the count can only decrease.
factor¶
The signal column. Same role as Alphalens "factor", Barra "exposure",
or generic "alpha score" / "predictor". The procedure-level
INPUT_SCHEMA consumes a column literally named factor; pass
evaluate(..., factor_col="alpha") to use a differently-named
column without renaming the panel first
(Panel schema § factor_col=).
forward_return¶
The forward return column consumed by every dispatch cell. factrix's
compute_forward_return(df, forward_periods=N) produces a
per-period return with t+1 entry:
Two non-textbook choices to internalize:
t+1entry, nott: signal attis computed from data up to and includingprice[t], so trading atprice[t]would assume same-bar execution. Entry att+1enforces a causal boundary and cleanly separates the return window from the BMP estimation window.- Divided by
N: result is expressed per period rather than as the cumulativeN-period return, so factor evaluations at differentforward_periodsare directly comparable. If you need the cumulativeN-period number, multiply byN.
Only simple returns are implemented; no log-return option. factrix
takes forward_return as input rather than computing it inside
evaluate() — attach it with
compute_forward_return before dispatch so
the horizon N is explicit and aligned with
AnalysisConfig.forward_periods.
Distinct from "spot return" (contemporaneous one-period return) and "realised return" (ex-post return for risk attribution).
signal_horizon (datasets only)¶
A property of the synthetic signal embedded in make_cs_panel /
make_event_panel — the horizon at which the generator's nominal IC
is realised. Aligning AnalysisConfig.forward_periods with
signal_horizon is the regime where the synthetic IC / drift is
calibrated; mismatched horizons induce IC decay (synthetic) and bias
(in TIMESERIES mode — see
Timeseries-mode conventions § Non-overlap).
non-overlapping returns¶
Sampling every h-th date so consecutive observations are
independent under the h-period forecasting null. factrix's
Individual × Continuous cell defaults to this for ic and caar;
Newey-West (NW) heteroskedasticity-and-autocorrelation-consistent (HAC) is the alternative on the full overlapping series. See
Statistical methods § HAC SE.
estimation_window¶
Per-asset pre-event sample used to fit the abnormal-return baseline
for bmp_test and corrado_rank_test. See
Metric applicability § estimation_window.
Multiple testing¶
FDR — False Discovery Rate¶
The expected proportion of false positives among rejected nulls, in
contrast to FWER (Family-Wise Error Rate, the probability of any
false positive). For a screening rule that rejects R of m
hypotheses and produces V false positives, FDR ≡ E[V / max(R, 1)].
factrix's primary screening primitive is BHY (Benjamini-Hochberg-Yekutieli),
which controls FDR at a user-chosen q under arbitrary dependence.
See Statistical methods and
bhy.
BHY — Benjamini-Hochberg-Yekutieli¶
Step-up FDR-controlling procedure that allows arbitrary dependence
between p-values — including violations of the positive regression
dependence on a subset (PRDS) condition required by Benjamini-Hochberg
1995, at the cost of a log(m) factor. Mathematically implements
Benjamini & Yekutieli (2001). factrix's
default multiplicity-correction method because financial p-values are
typically not independent. For the BH / BY / BHY naming convention, see
Statistical methods § Multiple-testing under dependence.
family¶
The set of hypotheses jointly entered into a single FDR / FWER
controlling procedure. Once a hypothesis joins a family, the
procedure's FDR claim only holds within that family. Family choice
is a contract decision: re-running BHY on a filtered subset of
survivors does not preserve FDR ≤ q. See
Cross-function reference § expand_over
for the sample-restriction vs hypothesis-dimension split.
Survivors¶
Result type from the screening functions (bhy, partial_conjunction,
bhy_hierarchical). Carries the post-correction adjusted q-value
(adj_q) per identity plus a boolean survivor[i] ↔ adj_q[i] ≤ q
duality so downstream functions (compare(survivors)) can render
leaderboards without re-applying the threshold.