Helpers for attaching forward_return to a raw panel. The output
panel — (date, asset_id, factor, forward_return) — is the canonical
input to evaluate.
factrix.preprocess.compute_forward_return ¶
Step 1: Compute per-period forward return per asset.
forward_return = (price[t+1+N] / price[t+1] - 1) / N
Entry at t+1 (next bar after signal), exit at t+1+N.
WHY t+1 entry: The signal at t is computed using data up to and including price[t]. Using price[t] as both signal input and entry price assumes you can trade at the same price used to generate the signal — unrealistic in practice. Entry at t+1 enforces a strict causal boundary: signal → wait → trade → measure.
This also keeps the return window cleanly separated from the estimation window in event studies (BMP test), eliminating the need for ad-hoc shift corrections.
Dividing by N normalizes returns to a per-period basis, making different forward_periods directly comparable on a scale basis (see Notes for the scope boundary).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Must contain |
required |
forward_periods
|
int
|
Holding horizon in rows of the time axis, not calendar time (default 5). On a daily panel this is 5 trading days; on a weekly panel, 5 weeks; on 1-min bars, 5 minutes. Frequency is the caller's responsibility. |
5
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Input DataFrame with |
DataFrame
|
Rows where forward return is null (end of series) are dropped. |
Notes
The ÷N per-period normalization is a scale choice with
three caveats the caller should know:
- Arithmetic, not summed-log-return. This is the arithmetic per-period mean of a simple return, not the academic-standard direct long-horizon regression of summed log returns on the predictor (the latter is linear-additive across horizons by construction).
- Compounding bias. Compounding at the arithmetic mean
is an upward-biased estimator of cumulative wealth; the
bias grows with
Nand per-bar return variance. Negligible for rank-based information coefficient (IC); not negligible for signed-return mean and t-tests at largeN. - Scale, not inference.
÷Naligns the scale across horizons — it does not address the inference problem. Overlap is handled by heteroskedasticity-and-autocorrelation-consistent (HAC) (see :class:factrix.stats.NeweyWest); across-horizon selection is handled by the family-wise error rate (FWER) correction in :func:factrix.multi_factor.bhy. The three concerns (scale, overlap, cross-horizon selection) are addressed at separate layers; overlap and across-horizon dependence share a common source in the persistent regressor, but each requires its own tool.
References
- Fama & French (1988). "Dividend Yields
and Expected Stock Returns." Journal of Financial
Economics, 22(1), 3–25. Direct summed-log-return
long-horizon regression — the academic-standard
alternative to factrix's
÷N. - Jacquier, Kane & Marcus (2003). "Geometric or Arithmetic Mean: A Reconsideration." Financial Analysts Journal, 59(6), 46–53. Compounding bias of the arithmetic mean and the unbiased horizon-weighted blend.
- Boudoukh, Richardson & Whitelaw (2008). "The Myth of Long-Horizon Predictability." Review of Financial Studies, 21(4), 1577–1605. Documents that across-horizon regression statistics share information through the persistent regressor — separate from any per-period scaling choice, and the reason inference across horizons is not addressed by normalization.
Examples:
>>> import factrix as fx
>>> from factrix.preprocess import compute_forward_return
>>> raw = fx.datasets.make_cs_panel(n_assets=20, n_dates=120)
>>> panel = compute_forward_return(raw, forward_periods=5)
>>> "forward_return" in panel.columns
True
>>> panel["forward_return"].null_count() == 0
True
The output panel is the canonical input to fx.evaluate: