factrix.metrics.fama_macbeth ¶
Fama-MacBeth regression — FM-canonical metric for the
Individual × Continuous cell.
compute_fm_betas: per-date cross-sectional ordinary least squares (OLS) → (date, beta) DataFrame.
fama_macbeth: Newey-West t-test on the beta series.
pooled_ols: pooled OLS with clustered SE by date.
beta_sign_consistency: fraction of periods with correct beta sign.
Notes
Pipeline. Per-date cross-sectional OLS slope \(\lambda\) (cross-section step) → time series of \(\lambda\), then Newey-West (NW) heteroskedasticity-and-autocorrelation-consistent (HAC) \(t\) on its mean; pooled OLS variant clusters SE by date.
References
- Fama & MacBeth (1973), "Risk, Return, and Equilibrium: Empirical Tests."
- Newey & West (1987), "A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix."
- Petersen (2009), "Estimating Standard Errors in Finance Panel Data Sets: Comparing Approaches."
factrix.metrics.fama_macbeth.compute_fm_betas ¶
compute_fm_betas(df: DataFrame, *, factor_col: str = 'factor', return_col: str = 'forward_return') -> DataFrame
Per-date cross-sectional ordinary least squares (OLS): \(R_i = \alpha + \beta \cdot \text{Signal}_i + \varepsilon\).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Long panel with |
required |
factor_col
|
str
|
Column carrying the factor exposure. |
'factor'
|
return_col
|
str
|
Column carrying the forward return. |
'forward_return'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with |
DataFrame
|
finite OLS solution; dates with fewer than 3 observations or |
DataFrame
|
a singular design are dropped). |
Notes
Per date \(t\), solve the cross-sectional OLS
\(R_{i,t} = \alpha_t + \beta_t \cdot \text{Signal}_{i,t} + \varepsilon_{i,t}\)
and emit the slope \(\beta_t\). The output series feeds the
stage-2 Newey-West (NW) heteroskedasticity-and-autocorrelation-consistent (HAC) \(t\)-test in
fama_macbeth.
factrix drops dates with fewer than 3 cross-sectional observations or a singular design rather than coercing to NaN — this keeps stage-2 a clean t-test on a finite, well-defined series with no NaN propagation in the NW kernel.
References
- Fama & MacBeth (1973). "Risk, Return, and Equilibrium: Empirical Tests." Journal of Political Economy, 81(3), 607–636. The per-date cross-sectional regression at stage 1 of the FM procedure.
Examples:
>>> import factrix as fx
>>> from factrix.preprocess import compute_forward_return
>>> from factrix.metrics.fama_macbeth import compute_fm_betas
>>> panel = compute_forward_return(
... fx.datasets.make_cs_panel(n_assets=80, n_dates=180, seed=0),
... forward_periods=5,
... )
>>> beta_df = compute_fm_betas(panel)
>>> set(beta_df.columns) >= {"date", "beta"}
True
factrix.metrics.fama_macbeth.fama_macbeth ¶
fama_macbeth(beta_df: DataFrame, *, newey_west_lags: int | None = None, forward_periods: int | None = None, is_estimated_factor: bool = False, factor_return_var: float | None = None) -> MetricOutput
Newey-West t-test on FM beta series. \(H_0: \mathrm{mean}(\beta) = 0\).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
beta_df
|
DataFrame
|
DataFrame with |
required |
newey_west_lags
|
int | None
|
Number of Newey-West (NW) lags. Defaults to \(\lfloor T^{1/3} \rfloor\). |
None
|
forward_periods
|
int | None
|
Overlap horizon of the regression's forward
return. When set, the NW bandwidth is floored at
|
None
|
is_estimated_factor
|
bool
|
Set True when the Implementation: Shanken (1992) single-factor special case — the NW SE is scaled by \(\sqrt{1 + \hat\lambda^2/\sigma^2_f}\) (Shanken's general multi-factor multiplicative term \(1 + \lambda'\Sigma_f^{-1}\lambda\) collapses to \(1 + \hat\lambda^2/\sigma^2_f\) when there is one factor). factrix's simplification omits the additive \(+\sigma^2_f/T\) term of the full Shanken variance and is therefore only honest for large \(T\). Note: |
False
|
factor_return_var
|
float | None
|
\(\sigma^2_f\), the time-series variance of the
factor-mimicking portfolio return. Prefer supplying this when
you have a spread-portfolio return series (the long-short
spread actually traded on the signal). When |
None
|
Notes
Stage 2 of FM:
\(\overline{\beta} = \mathrm{mean}_t\,\beta_t\);
\(t = \overline{\beta} / \mathrm{SE}_{\mathrm{NW}}(\beta)\)
with kernel lag
\(L = \max(\lfloor T^{1/3} \rfloor,\, h - 1)\).
With is_estimated_factor=True, the
Shanken (1992) single-factor correction scales
SE by \(\sqrt{1 + \overline{\beta}^2 / \sigma^2_f}\).
factrix uses the Andrews (1991) \(T^{1/3}\) bandwidth floored against the Hansen-Hodrick overlap horizon rather than the Newey-West (1994) data-adaptive plug-in — simpler, deterministic, and adequate at typical research \(T\). factrix's simplification of the Shanken variance omits the additive \(+\sigma^2_f / T\) term, so the correction is honest only for large \(T\).
References
- Fama & MacBeth (1973). "Risk, Return, and Equilibrium: Empirical Tests." Journal of Political Economy, 81(3), 607–636. Two-stage λ procedure underlying this test.
- Newey & West (1987). "A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix." Econometrica, 55(3), 703–708. HAC variance estimator.
- Andrews (1991). "Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimation." Econometrica, 59(3), 817–858. Optimal Bartlett growth rate.
- Hansen & Hodrick (1980). "Forward Exchange Rates as Optimal Predictors of Future Spot Rates." Journal of Political Economy, 88(5), 829–853. Overlap horizon flooring the kernel.
- Shanken (1992). "On the Estimation of Beta-Pricing Models." Review of Financial Studies, 5(1), 1–33. Errors-in-variables correction for FM stage-2 t when the regressor is itself estimated.
- Kan & Zhang (1999). "Two-Pass Tests of Asset
Pricing Models with Useless Factors." Journal of Finance,
54(1), 203–235. Useless-factor diagnostic; cited as cautionary
background on factor validity beyond the EIV inflation that
is_estimated_factoraddresses.
Examples:
Chain from :func:compute_fm_betas output:
>>> import factrix as fx
>>> from factrix.preprocess import compute_forward_return
>>> from factrix.metrics.fama_macbeth import compute_fm_betas, fama_macbeth
>>> panel = compute_forward_return(
... fx.datasets.make_cs_panel(n_assets=80, n_dates=180, seed=0),
... forward_periods=5,
... )
>>> beta_df = compute_fm_betas(panel)
>>> result = fama_macbeth(beta_df, forward_periods=5)
>>> result.name
'fm_beta'
factrix.metrics.fama_macbeth.pooled_ols ¶
pooled_ols(df: DataFrame, *, factor_col: str = 'factor', return_col: str = 'forward_return', cluster_col: str = 'date', two_way_cluster_col: str | None = None) -> MetricOutput
Pooled ordinary least squares (OLS) with clustered SE — robustness check against FM.
Clustering on date alone catches contemporaneous cross-sectional dependence but misses asset-level persistence; on asset alone the reverse. Petersen (2009) shows panel data usually has both — single-way clusters understate SE by 20-50% in that regime.
FM and single-way share the same point estimate under a balanced
panel but typically disagree on SE; when \(\hat\beta\) and FM
\(\hat\lambda\) have opposite signs, profile.diagnose()
flags an FM/pooled sign-mismatch — a red flag for misspecification.
Short-circuits when \(N < 10\) (no regression), returns stat=None
with \(p=1.0\) when the effective \(G < 3\) (SE undefined with < 3
clusters).
Formula
Point estimate:
where \(X = [1, \text{Signal}]\) stacked across all \((\text{date}, \text{asset})\) observations.
Single-way clustered sandwich SE (default, cluster on
cluster_col):
with finite-sample correction \(c = \tfrac{G}{G-1} \cdot \tfrac{N-1}{N-K}\), \(\mathrm{SE}(\hat\beta) = \sqrt{V_{1,1}}\), \(t = \hat\beta / \mathrm{SE}\), \(\mathrm{df} = G - 1\).
Two-way clustered sandwich SE (when two_way_cluster_col is
set — Cameron-Gelbach-Miller (2011) /
Petersen (2009)):
where \(V_A\), \(V_B\), \(V_{A \cap B}\) are single-way variances clustered on \(A\), on \(B\), and on the intersection cells \((A, B)\). Each component uses its own finite-sample correction. \(\mathrm{df} = \min(G_A, G_B) - 1\) (Thompson (2011)).
Notes
Pool (date, asset) rows and run a single OLS R = alpha +
beta * Signal + eps with the appropriate cluster-robust
sandwich covariance described above. Single-way: df = G - 1
with G the number of clusters; two-way:
df = min(G_A, G_B) - 1 per Thompson (2011).
factrix reports stat = None (rather than 0) when G < 3
because the cluster-robust variance is undefined with too few
clusters; falling back to a homoskedastic SE in that regime
would silently break the panel-correlation guarantee that
motivated using clustered SE in the first place.
References
- Petersen (2009). "Estimating Standard Errors in Finance Panel Data Sets: Comparing Approaches." Review of Financial Studies, 22(1), 435–480. Comparison of FM, clustered, and two-way SE under firm/time correlation.
- Cameron, Gelbach & Miller (2011).
"Robust Inference With Multiway Clustering." Journal of
Business & Economic Statistics, 29(2), 238–249. Two-way
clustering formula
V_AB = V_A + V_B − V_{A∩B}. - Thompson (2011). "Simple Formulas for Standard
Errors that Cluster by Both Firm and Time." Journal of
Financial Economics, 99(1), 1–10. Finite-sample df correction
min(G_A, G_B) − 1.
Examples:
>>> import factrix as fx
>>> from factrix.preprocess import compute_forward_return
>>> from factrix.metrics.fama_macbeth import pooled_ols
>>> panel = compute_forward_return(
... fx.datasets.make_cs_panel(n_assets=80, n_dates=180, seed=0),
... forward_periods=5,
... )
>>> result = pooled_ols(panel)
>>> result.name
'pooled_beta'
factrix.metrics.fama_macbeth.beta_sign_consistency ¶
beta_sign_consistency(beta_df: DataFrame, *, expected_sign: int = 1) -> MetricOutput
Fraction of FM per-date \(\beta\)s carrying the expected sign — value \(= \mathrm{mean}_t \mathbb{1}\{\mathrm{sign}(\beta_t) = s^\star\}\).
\(\beta_t\) is the per-date ordinary least squares (OLS) \(\beta\) from compute_fm_betas.
Range \([0, 1]\); \(1.0\) = \(\beta\) always has the expected sign across
periods. Unlike ts_beta_sign_consistency (which symmetrizes via
\(\max(p, 1-p)\) where \(p\) is the positive-sign fraction), this one is directional —
you must supply the a-priori expected sign. Typical use: paired with
a prior on factor direction to check stability.
Short-circuits to NaN when no non-null \(\beta\) observations exist.
Notes
value \(= \mathrm{mean}_t \mathbb{1}\{\mathrm{sign}(\beta_t) = s^\star\}\)
over the FM per-date beta series. Range \([0, 1]\); \(1.0\) = beta
always agrees with the prior. Descriptive (no formal \(H_0\));
pair with fama_macbeth for inferential significance.
factrix splits this directional check from the symmetric
ts_beta_sign_consistency so the two answer different
questions: this one requires the caller to commit to a prior
sign; the symmetric variant tests cross-asset agreement only.
Examples:
Chain from :func:compute_fm_betas output:
>>> import factrix as fx
>>> from factrix.preprocess import compute_forward_return
>>> from factrix.metrics.fama_macbeth import (
... compute_fm_betas,
... beta_sign_consistency,
... )
>>> panel = compute_forward_return(
... fx.datasets.make_cs_panel(n_assets=80, n_dates=180, seed=0),
... forward_periods=5,
... )
>>> beta_df = compute_fm_betas(panel)
>>> result = beta_sign_consistency(beta_df, expected_sign=1)
>>> result.name
'beta_sign_consistency'
Use cases¶
-
Compute per-date FM beta series
Stage 1 of Fama-MacBeth: per-date cross-sectional ordinary least squares (OLS) slope \(\beta_t\) in \(R_{i,t} = \alpha_t + \beta_t \cdot \text{Signal}_{i,t} + \varepsilon_{i,t}\). Pre-step for
fama_macbethand the descriptivebeta_sign_consistencycheck. -
Mean-\(\beta\) significance, Newey-West (NW) heteroskedasticity-and-autocorrelation-consistent (HAC)
Stage 2 of Fama-MacBeth: \(t\)-test on \(\mathbb{E}[\beta_t] = 0\) with Newey-West HAC SE, bandwidth \(\max(\lfloor T^{1/3} \rfloor, h-1)\). Default inferential test for the Individual x Continuous cell.
-
Errors-in-variables correction for estimated signals
Set
is_estimated_factor=True(withfactor_return_var=where the factor-mimicking-portfolio return series is available) to apply the Shanken (1992) single-factor EIV correction (the multi-factor multiplicative term collapses to \(1 + \hat\lambda^2/\sigma^2_f\)). Required when the Signal column is itself estimated — rolling beta, PCA score, ML prediction. -
Pooled OLS robustness check
pooled_olsruns a single regression across the stacked panel with cluster-robust SE (one-way ondate, or two-way withtwo_way_cluster_col). When pooled \(\hat\beta\) and FM \(\hat\lambda\) disagree in sign,profile.diagnose()flags a misspecification red flag.
Choosing a function¶
| Goal | Function |
|---|---|
| Per-date FM beta table for downstream inspection / slicing | compute_fm_betas |
| Mean-\(\beta\) significance with NW HAC SE (default Stage 2) | fama_macbeth |
| Pooled OLS with cluster-robust SE (one-way on date, or two-way) | pooled_ols |
| Directional stability — fraction of periods with the expected \(\beta\) sign | beta_sign_consistency |
Worked example — per-date FM beta then NW HAC significance¶
compute_fm_betas → fama_macbeth on a synthetic cross-sectional panel
import factrix as fx
from factrix.metrics.fama_macbeth import compute_fm_betas, fama_macbeth
from factrix.preprocess import compute_forward_return
raw = fx.datasets.make_cs_panel(
n_assets=100, n_dates=500, ic_target=0.08, seed=2024,
)
panel = compute_forward_return(raw, forward_periods=5)
beta_df = compute_fm_betas(panel)
print(beta_df.head())
# ┌────────────┬───────────┐
# │ date ┆ beta │
# ├────────────┼───────────┤
# │ 2024-01-01 ┆ 0.0091 │
# │ 2024-01-02 ┆ 0.0077 │
# │ ... ┆ ... │
# └────────────┴───────────┘
out = fama_macbeth(beta_df, forward_periods=5)
print(out.value, out.stat, out.metadata["p_value"])
# 0.0084 6.10 1.3e-09 (approximate)
See also¶
-
by_slice
Axis-agnostic slice dispatcher for per-slice FM beta summaries.
-
slice_pairwise_test/slice_joint_test
Cross-slice inference (Wald \(\chi^2\) + Holm / Romano-Wolf adjusted \(p\)).
-
Statistical methods
NW HAC SE, Andrews bandwidth, Hansen-Hodrick overlap floor, and the Shanken (1992) single-factor EIV correction.
-
Metric applicability reference
When this metric applies and the sample-size guards that gate it (
MIN_FM_PERIODS_HARD/MIN_FM_PERIODS_WARN). -
Individual × Continuous landing
Adjacent metrics in the same cell.