Helpers for attaching forward_return to a raw panel. The output panel — (date, asset_id, factor, forward_return) — is the canonical input to evaluate.

factrix.preprocess.compute_forward_return ¶

compute_forward_return(df: DataFrame, forward_periods: int = 5) -> DataFrame

Step 1: Compute per-period forward return per asset.

forward_return = (price[t+1+N] / price[t+1] - 1) / N

Entry at t+1 (next bar after signal), exit at t+1+N.

WHY t+1 entry: The signal at t is computed using data up to and including price[t]. Using price[t] as both signal input and entry price assumes you can trade at the same price used to generate the signal — unrealistic in practice. Entry at t+1 enforces a strict causal boundary: signal → wait → trade → measure.

This also keeps the return window cleanly separated from the estimation window in event studies (BMP test), eliminating the need for ad-hoc shift corrections.

Dividing by N normalizes returns to a per-period basis, making different forward_periods directly comparable on a scale basis (see Notes for the scope boundary).

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Must contain `date`, `asset_id`, `price`. Must already be sorted with regular spacing per asset on the time axis; this function shifts by row count and does not inspect `date`.	required
`forward_periods`	`int`	Holding horizon in rows of the time axis, not calendar time (default 5). On a daily panel this is 5 trading days; on a weekly panel, 5 weeks; on 1-min bars, 5 minutes. Frequency is the caller's responsibility.	`5`

Returns:

Type	Description
`DataFrame`	Input DataFrame with `forward_return` column appended.
`DataFrame`	Rows where forward return is null (end of series) are dropped.

Notes

The ÷N per-period normalization is a scale choice with three caveats the caller should know:

Arithmetic, not summed-log-return. This is the arithmetic per-period mean of a simple return, not the academic-standard direct long-horizon regression of summed log returns on the predictor (the latter is linear-additive across horizons by construction).
Compounding bias. Compounding at the arithmetic mean is an upward-biased estimator of cumulative wealth; the bias grows with N and per-bar return variance. Negligible for rank-based information coefficient (IC); not negligible for signed-return mean and t-tests at large N.
Scale, not inference. ÷N aligns the scale across horizons — it does not address the inference problem. Overlap is handled by heteroskedasticity-and-autocorrelation-consistent (HAC) (see :class:factrix.stats.NeweyWest); across-horizon selection is handled by the family-wise error rate (FWER) correction in :func:factrix.multi_factor.bhy. The three concerns (scale, overlap, cross-horizon selection) are addressed at separate layers; overlap and across-horizon dependence share a common source in the persistent regressor, but each requires its own tool.

References

Fama & French (1988). "Dividend Yields and Expected Stock Returns." Journal of Financial Economics, 22(1), 3–25. Direct summed-log-return long-horizon regression — the academic-standard alternative to factrix's ÷N.
Jacquier, Kane & Marcus (2003). "Geometric or Arithmetic Mean: A Reconsideration." Financial Analysts Journal, 59(6), 46–53. Compounding bias of the arithmetic mean and the unbiased horizon-weighted blend.
Boudoukh, Richardson & Whitelaw (2008). "The Myth of Long-Horizon Predictability." Review of Financial Studies, 21(4), 1577–1605. Documents that across-horizon regression statistics share information through the persistent regressor — separate from any per-period scaling choice, and the reason inference across horizons is not addressed by normalization.

Examples:

>>> import factrix as fx
>>> from factrix.preprocess import compute_forward_return
>>> raw = fx.datasets.make_cs_panel(n_assets=20, n_dates=120)
>>> panel = compute_forward_return(raw, forward_periods=5)
>>> "forward_return" in panel.columns
True
>>> panel["forward_return"].null_count() == 0
True

The output panel is the canonical input to fx.evaluate:

>>> cfg = fx.AnalysisConfig.individual_continuous(forward_periods=5)
>>> profile = fx.evaluate(panel, cfg)
>>> isinstance(profile, fx.FactorProfile)
True