Where factrix fits

This page expands the design philosophy, walks through the pipeline and internals, draws scope boundaries, compares against same-purpose peers, shows adjacent-tool integration, and discloses honest weaknesses.

1. What factrix is¶

factrix is a factor inference surface: given a candidate factor and a forward return, it answers is the predictive power real? and returns a structured profile of evidence — rather than applying one uniform formula to every factor.

Three factor types each get a mainstream test fitted to their data-generating process:

Cross-sectional factors — Information Coefficient (IC) and Fama-MacBeth (FM), both with Newey-West (NW) heteroskedasticity-and-autocorrelation-consistent (HAC) standard errors and a Hansen-Hodrick lag floor for overlapping forward returns.
Event factors — Cumulative Average Abnormal Return (CAAR) on the event-date series with calendar-aware non-overlap inference, plus overlap and clustering diagnostics for crowded event calendars.
Common factors — a factor whose realisation is shared across the cross-section in a given period (Fama-French market / size / value, or a macro variable). factrix tests these as a panel exposure across the asset cross-section (n_assets >= 2); a single asset has no cross-section to aggregate over, so these metrics raise on n_assets == 1 data.

Each type also runs a multi-metric diagnostic battery — never collapsed into a single score. This is a deliberate design choice, not an oversight. A composite score becomes its own optimisation target the moment it ships (Goodhart 1984), and weighted aggregation across heterogeneous nulls implicitly prices each null (DeMiguel-Garlappi-Uppal 2009; Harvey 2017). See design notes §1 and §7 for the full citation chain.

2. Where factrix sits¶

2.1 Ecosystem pipeline¶

factrix is Stage 1 of a multi-stage workflow. It is not a competitor to portfolio construction, backtesting, or execution tools — it sits upstream of them and produces the input they assume.

flowchart LR
    DATA[Raw data] --> CONSTR[Factor construction<br/>zipline Pipeline · self-roll]
    CONSTR --> FX[<b>factrix inference</b><br/>Stage 1 — kill fakes]
    FX --> PORT[Strategy construction<br/>skfolio · PyPortfolioOpt · riskfolio-lib]
    PORT --> BT[Backtest<br/>vectorbt · zipline-reloaded · bt]
    BT --> LIVE[Live trading<br/>lumibot · nautilus_trader]
    classDef here fill:#3670A0,color:#fff,stroke:#234060,stroke-width:2px;
    class FX here

2.2 factrix internals¶

Inside factrix the call graph is small. Long-format data (the (date, asset_id, factor, forward_return) floor) plus a metrics={label: metric()} dict and factor_cols enter evaluate(). Each metric instance carries a MetricSpec whose cell (FactorScope × FactorDensity × DataStructure) decides whether it applies to the detected data shape; the DAG executor runs batchable stage-1 producers once across the factor batch and per-factor consumers once per factor, and returns a list[EvaluationResult] — one per factor, each holding a read-only Mapping[str, MetricResult] of per-metric outputs plus a flat list[Warning]. The list flows into multi_factor.bhy(results, metrics=[...]) for cross-test false discovery rate (FDR) control, which returns the surviving factors.

flowchart LR
    DATA[Long data<br/>date · asset_id · factor · forward_return] --> EVAL[evaluate<br/>metrics · factor_cols]
    METRICS[metrics dict<br/>label → metric instance] --> EVAL
    EVAL --> DISP{Cell-routed<br/>applicability}
    DISP -->|individual · dense| CS[IC · FM beta<br/>+ diagnostics]
    DISP -->|individual · sparse| EV[CAAR<br/>+ diagnostics]
    DISP -->|common · dense| CO[common_beta<br/>+ diagnostics]
    DISP -->|common · sparse| DUM[CAAR<br/>+ event diagnostics]
    CS --> RES[EvaluationResult<br/>metrics · warnings · cell]
    EV --> RES
    CO --> RES
    DUM --> RES
    RES --> BHY[multi_factor.bhy<br/>FDR within family]
    BHY --> SURV[Surviving factors]

The dispatch arrow is the single line that distinguishes factrix from peers that apply one uniform formula across factor types (see §4). For the field-order walk of the EvaluationResult shown in the graph — metrics, warnings, cell, and the rest — see Reading results.

3. Out of scope (use these libraries instead)¶

What factrix deliberately does not do, and the canonical tool for each. This is a commitment, not a TODO. To expand factrix scope, update ARCHITECTURE.md Invariants first.

Out of scope	Use instead
Portfolio optimisation (MVO / HRP / risk parity)	skfolio, PyPortfolioOpt, riskfolio-lib, cvxpy
ML signal layer	xgboost + shap
Regime detection methodology (HMM / threshold)	hmmlearn, self-roll
Structural break detection (Chow / Bai-Perron)	ruptures
GARCH / wild-bootstrap SE	arch
Persistent-predictor auto-correction (IVX / Stambaugh)	arch, R `ivx` (factrix flags via augmented Dickey-Fuller (ADF); does not auto-correct)
Backtest / execution / slippage / margin	vectorbt, bt, zipline-reloaded, backtrader
Intraday / HFT (tick-level)	dedicated tooling
Cross-factor signal combiner	self-roll, scikit-learn
Composite factor scoring across dimensions	AlphaEval (different design philosophy — see §4.4)
Deflated / probabilistic / Haircut Sharpe	mlfinlab (commercial); roadmap gap for factrix — see §7
Cross-sectional factor construction DSL	zipline-reloaded Pipeline; factrix consumes Pipeline output
Returns-level tear-sheet (downstream of factrix)	pyfolio-reloaded

3.1 Rationale for the controversial rows¶

Three rows surprise readers most often. Their rationale is anchored in design notes rather than restated here.

Composite factor scoring — rejected for the reason a single weighted score becomes its own target the instant it ships (Goodhart 1984), and weighted aggregation across heterogeneous nulls implicitly prices each null without disclosing the price. factrix exposes per-metric pass/fail and keeps the user in the inference loop. See design notes §1.

ML signal layer — out of scope as a deliberate boundary. The signal-generation problem is well served by xgboost + shap, and folding model fit into factrix would change the page's hero claim from "inference on a hypothesised factor" to "inference on a fitted model" — those need different statistical machinery (cross-validation schemes, leakage tests). qlib already covers the integrated pipeline; we leave that branch to qlib.

Persistent-predictor flagging only, not auto-correction — when the cross-sectional or time-series predictor is highly persistent (ADF p-value above the configured threshold; default 0.10), factrix raises PERSISTENT_REGRESSOR and notes that the beta estimate may carry Stambaugh (1999) bias. It does not silently swap in IVX (Phillips-Magdalinos), Stambaugh-correction, or sign-restricted inference, because the right correction depends on the researcher's economic prior — IVX assumes a near-unit-root predictor; Stambaugh requires a specified innovation-correlation sign. Auto-correcting would mask the modelling choice. Reach for arch or R ivx when the flag fires.

4. Same-purpose peers¶

Six peers occupy the factor-evaluation / hypothesis-test space. Each subsection follows the same shape: positioning, where the peer wins, where factrix wins, and the user profile that should pick the peer instead. Code side-by-side snippets are not included here yet; they will land once the migration examples are validated against the same input panel both ways.

4.1 alphalens-reloaded¶

Positioning — alphalens-reloaded is the canonical pandas tear-sheet for cross-sectional factors; the get_clean_factor_and_ forward_returns → create_full_tear_sheet flow is the vocabulary most working quants recognise on sight. factrix targets the same hypothesis-test slot but extends past CS-only and past pandas-bound performance.

Where alphalens wins

Tear-sheet vocabulary every quant recognises; fastest path to a publishable chart pack from a notebook.
pandas-native — drops into existing notebooks without a polars conversion step.
Six years of community examples and Stack Overflow answers.

Where factrix wins

IC inference uses NW HAC with a Hansen-Hodrick lag floor for overlapping forward returns; alphalens applies a naive scipy.stats.ttest_1samp on the IC time series (source), which is biased when forward windows overlap.
Multiple-testing correction (Benjamini-Hochberg-Yekutieli (BHY)) is built into the screening surface; alphalens has no batch-level FDR control by design.
Type-routed dispatch — alphalens is CS-only by design; factrix also covers event and common-factor hypotheses without the user re-implementing the test machinery.

When to pick alphalens instead — you have a single CS factor, your existing toolchain is pandas-only, and the tear-sheet vocabulary matters more than the inference rigour.

4.2 qlib factor layer¶

Positioning — qlib is a full alpha → model → backtest → live platform. Its factor-evaluation surface (qlib.contrib.eva.alpha) is a thin utility under that platform, not the product.

Where qlib wins

Industrial-scale data layer with caching and an integrated backtest engine — pick qlib when you want one tool for the whole pipeline.
Alpha158 / Alpha360 baselines and RD-Agent integration give an ML-first research workflow.
Largest active community among peers in this list.

Where factrix wins

qlib's calc_ic / calc_all_ic apply uniform IC + Rank-IC across every factor regardless of type (source). factrix dispatches IC, FM, CAAR, or ts-β by factor type.
factrix is decoupled from any data store, signal-mining DSL, or backtest engine. You can drop it into an existing pipeline; qlib expects you to adopt its data layout.
factrix ships per-metric NW HAC, BHY FDR, and persistent-predictor flagging as first-class outputs of evaluate(); in qlib these live in scattered helper functions or are absent.

When to pick qlib instead — you want one integrated platform covering data → factor → ML model → backtest → live, and the opinionated qlib data store is acceptable.

4.3 linearmodels¶

Positioning — linearmodels (Kevin Sheppard, statsmodels core) is the reference Python implementation of panel econometrics: HAC kernels (Bartlett / Parzen / QS with auto-bandwidth), clustered SE, and a correctly-implemented Fama-MacBeth second-stage SE. It is a primitive, not a framework.

Where linearmodels wins

Best-in-class HAC and clustered SE coverage; correct FM second-stage variance (most homebrew loops are wrong by a constant factor).
Maintained by an econometrics-credible author with frequent releases.

Where factrix wins

linearmodels is a panel-econometrics toolkit; it has no factor tear-sheet, no IC surface, no event-study path, no batch multiple-testing layer. The user must already have a panel and know which test to run.
factrix uses arch for HAC kernels (Kevin Sheppard maintains both arch and linearmodels; their HAC implementations are functionally equivalent) and treats FM as one routed metric among several. You are not asked to assemble the workflow.

When to pick linearmodels instead — you only need correct Fama-MacBeth standard errors on a panel you have already constructed, and you do not need IC, CAAR, BHY, or any of the inference surfaces.

4.4 AlphaEval¶

Positioning — AlphaEval (repo) is a post-processing ranker for formula-mined alphas: input qlib expression-DSL formulas, compose them via WeightCalculator, score the composite across five dimensions including an OpenAI-LLM-as-judge for "financial logic". Its target user mines 1000 GP/GA formulas and needs to rank them.

This is the design path factrix considered and rejected with literature backing — see design notes §1 and §7.

Where AlphaEval wins

Purpose-built for formula-mining workflows; LLM-judged "financial logic" dimension is unique.
Composite ranker is the right tool when the input is a pool of mined alphas rather than a small set of hypothesised factors.

Where factrix wins

factrix evaluates one factor (or batch with FDR) against an explicit null and surfaces per-metric pass/fail rather than a weighted scalar. The two libraries answer different questions.
Composite scoring becomes its own optimisation target the moment it ships (Goodhart 1984); per-metric inference keeps the null distributions distinct.

When to pick AlphaEval instead — you mine formula alphas with GP/GA and need to rank thousands by an aggregate score for downstream selection.

4.5 eventstudy¶

Positioning — eventstudy is the only dedicated event-study Python package. It implements the standard parametric / BMP / Patell tests on event windows. The package self-describes as alpha-quality with a frequently-changing API.

Where eventstudy wins

BMP / Patell standardised tests for event-window inference are shipped with vocabulary that matches MacKinlay (1997).

Where factrix wins

factrix integrates event-date CAAR with calendar-aware non-overlap inference plus overlap and clustering diagnostics; eventstudy treats events in isolation.
Event inference lives in the same EvaluationResult shape as CS and common-factor inferences; one pipeline screens all three with shared FDR control.

When to pick eventstudy instead — you only do M&A or earnings event studies in isolation and do not need integration with cross-sectional or macro work.

4.6 mlfinlab¶

Positioning — mlfinlab is the López de Prado reference implementation of deflated / probabilistic / Haircut Sharpe, PBO via combinatorial CV, BHY-adjusted p-values, and a structural-break suite (Chow / CUSUM / SADF). It went commercial in 2022; the public PyPI package was removed and the public repo has been dormant since 2021-12.

Where mlfinlab wins

The only library shipping deflated / probabilistic / Haircut Sharpe end-to-end. If your firm has the licence, this is the shortest path to those metrics.

Where factrix wins

Open source (Apache-2.0); pip install factrix works.
Active maintenance with tagged releases and reviewable PR narratives during the pre-1.0 line.
Deflated Sharpe / PSR is on the factrix roadmap and is the highest-priority OSS gap; see §7.

When to pick mlfinlab instead — your firm pays for the licence and you need deflated Sharpe today.

5. Adjacent tools and integration¶

The tools below are not peers; they sit upstream, downstream, or underneath factrix. The point of this section is to make the integration surface explicit so factrix reads as a citizen of the ecosystem rather than a walled garden.

5.1 Per-tool role¶

Tool	Role relative to factrix
zipline-reloaded Pipeline	Upstream CS factor construction DSL; factrix consumes Pipeline output
arch	Reference HAC kernel implementation; factrix depends on it, does not reimplement
statsmodels	General econometrics primitives (regression / time-series); used internally
empyrical-reloaded	Low-level return-stat primitives (Sharpe, Sortino, drawdown); dependency layer
pyfolio-reloaded	Downstream returns-level tear-sheet; consumes strategy P&L, not factor signal
vectorbt	Stage 3 parameter-grid backtest engine; pairs with factrix BHY for honest workflow
skfolio / PyPortfolioOpt / riskfolio-lib	Stage 2 strategy construction (portfolio optimisation); consume factrix-validated factors as input

5.2 Integration sketches¶

Stage 1 → factrix: zipline Pipeline outputs a pandas MultiIndex (date, asset), which converts to the polars panel factrix expects in two lines.

import polars as pl
import factrix as fx
from factrix.preprocess import compute_forward_return

# zipline_out: pandas DataFrame with MultiIndex (date, asset),
# columns include the factor value and the realised return.
panel = pl.from_pandas(zipline_out.reset_index())
panel = compute_forward_return(panel, forward_periods=5)

from factrix.metrics import ic
results = fx.evaluate(panel, factor_cols=["factor"], metrics={"ic": ic()})
ic_p = results["factor"].metrics["ic"].p_value

factrix → Stage 2: surviving factors after BHY feed a portfolio optimiser.

import factrix as fx
from factrix.metrics import ic

# Each panel carries its factor under a distinct column name
# ("momentum_12" / "value" / ...); evaluate stamps each result's
# `factor` from factor_cols so identities stay unique without manual surgery.
results = []
for name, p in panels.items():
    res = fx.evaluate(p, factor_cols=[name], metrics={"ic": ic()})
    results.extend(res.values())

bhy_ic = fx.multi_factor.bhy(results, metrics=["ic"], q=0.05)["ic"]

# bhy_ic.survivors is a list[EvaluationResult]; pass the underlying factor
# panels to skfolio / PyPortfolioOpt / riskfolio-lib as Stage 2 input

The integration story matters because it answers the implicit "what do I do with the inference" question — factrix is an intermediate stage, not an endpoint.

6. When factrix is NOT the right tool¶

If you are not at the inference / screening stage, factrix is the wrong tool. The chart below routes you to the canonical alternative for each adjacent stage. Construction (upstream) is intentionally not branched: readers reach this chart asking given I have a factor, what do I do next, not how do I build one.

flowchart TD
    A[What stage of the alpha pipeline?] --> F[Inference / screening on a factor]
    A --> W[Optimise weights for trusted factors]
    A --> E[Backtest or deploy a strategy]
    A --> R[Returns-level tear-sheet on a P&L series]
    F --> FX[<b>factrix</b>]
    W --> WX[skfolio · PyPortfolioOpt · riskfolio-lib]
    E --> EX[zipline-reloaded · backtrader · bt · vectorbt · nautilus_trader]
    R --> RX[pyfolio-reloaded · QuantStats]

7. Honest weaknesses¶

This section is the disclosure surface. It is meant to be cited verbatim by skeptical readers — soft-pedalling the gaps would be self-defeating once they read the source.

7.1 Capability matrix and roadmap¶

Capability	factrix today	Closest peer	Status / roadmap
CS IC/IR tear-sheet	yes	alphalens (legacy, pandas)	parity on visualization vocabulary
Event CAAR	yes	eventstudy (alpha-quality) / linearmodels (manual)	event-date CAAR with non-overlap inference out of the box
Macro panel	yes	linearmodels (manual)	packaged macro-factor evaluation surface
Multi-test FDR (BHY)	yes	mlfinlab (commercial-gated)	only OSS implementation post-mlfinlab paywall
NW HAC	yes	linearmodels / arch	depend on `arch`, do not reimplement
Type-routed mainstream metric (CS / Event / Macro)	yes	none	factrix's core differentiation
Deflated Sharpe / PSR / PBO	no	mlfinlab (commercial-gated)	roadmap priority — most painful OSS gap
ML pipeline integration	no (out of scope)	qlib	document interop; leave to qlib
Live trading / execution	no (out of scope)	lumibot / nautilus_trader	document boundary; leave out

7.2 Non-capability weaknesses¶

Smaller community compared with alphalens / qlib — factrix is a newer project with fewer Stack Overflow answers. Expect to read source for edge cases that alphalens has been asked about for six years.
No published replication of a canonical anomaly study yet — a factor-zoo skeptic will ask whether factrix's conclusions agree with the published record on a known-good factor. That replication is on the roadmap and is a credibility gap until it ships.

Next steps¶

If this page resolved the fit question and you want to run factrix:

Quickstart — 30-second example from a raw panel to a p_value readout.
Concepts — the three-axis taxonomy and the metric dispatch underneath the routing examples above.
Choosing a metric — research-question to metric mapping for the five scenarios in §2.

If you want to compare factrix to a specific peer before installing, the §4 same-purpose peers table is the densest summary on this page.

8. Citations¶

The methodological choices on this page anchor in the following sources. The full bibliography lives in reference/bibliography.md; this section names the most load-bearing ones.

Goodhart (1984) — Monetary Theory and Practice. Origin of the Goodhart's-law argument used by design-notes §1.
DeMiguel, Garlappi & Uppal (2009) — "Optimal versus naive diversification." Review of Financial Studies. Equal-weight beats optimised under estimation error; cited in the no-composite position.
Harvey (2017) — "Presidential address: The scientific outlook in financial economics." Journal of Finance. Argues for pre-registered per-metric audit trails over unified statistics.
Bailey & López de Prado (2014) — "The Deflated Sharpe Ratio." Journal of Portfolio Management. Roadmap target for the §7 PSR row. (Not yet in bibliography; will land alongside the Deflated-Sharpe roadmap work.)
Harvey, Liu & Zhu (2016) — "…and the cross-section of expected returns." Review of Financial Studies. Multiple-testing inflation in the factor zoo; motivates BHY.
Hou, Xue & Zhang (2020) — "Replicating anomalies." Review of Financial Studies. The replication context for the §7.2 weakness disclosure.
Brown & Warner (1985) — "Using daily stock returns: The case of event studies." Journal of Financial Economics. The canonical event-study reference behind CAAR.
MacKinlay (1997) — "Event studies in economics and finance." Journal of Economic Literature. Vocabulary used in §1 and §4.5.