Skip to content

Synthetic panel generators for examples, tests, and documentation. Both emit raw canonical-column panels (date, asset_id, price, factor); attach forward_return via factrix.preprocess.compute_forward_return before passing to evaluate.

The dataset's signal_horizon is a property of the generated synthetic signal, not a pipeline parameter. When AnalysisConfig.forward_periods == signal_horizon the pipeline realizes the nominal information coefficient (IC) / drift; other horizons realize a decayed signal.

factrix.datasets.make_cs_panel

make_cs_panel(*, n_assets: int = 50, n_dates: int = 252, ic_target: float = 0.04, signal_horizon: int = 5, seed: int = 42, start_date: str = _DEFAULT_START) -> DataFrame

Synthetic cross-sectional panel with a calibrated target information coefficient (IC).

Construction
  1. Per-asset volatility σ_i ~ U[0.01, 0.03]; daily arithmetic returns are ε_{t,i} ~ N(0, σ_i).
  2. Prices p[t,i] = 100 · cumprod(1 + ε).
  3. Signal-horizon forward return fr[t] = (p[t+1+H]/p[t+1] − 1) / H where H = signal_horizon.
  4. Factor is a cross-sectional mixture of standardized forward return and iid noise::

    factor[t] = ρ · z(fr[t]) + √(1−ρ²) · z(η[t])

where ρ = clip(ic_target, −0.99, 0.99) and z is plain Gaussian (not MAD) z-score so the identity Corr(factor, fr) = ρ holds exactly per date at horizon H. Factorlib's ic_mean uses Spearman rank IC, which tracks Pearson ρ tightly at small |ρ| but is not identical — realized ic_mean at |ic_target| ≳ 0.2 may diverge by a few bp. 5. The last H+1 dates have no defined forward return; factor values there are pure noise and will be dropped along with the null forward returns once compute_forward_return runs.

Parameters:

Name Type Description Default
n_assets int

Cross-sectional width.

50
n_dates int

Number of calendar dates (daily index, includes weekends — factrix doesn't prescribe a calendar).

252
ic_target float

Target per-date Pearson CS correlation between factor and forward return at signal_horizon. Realized realized per-date IC after fx.evaluate will fall near this within a couple of standard errors — overlapping forward returns reduce effective independent dates by signal_horizon so s.e. ≈ 1 / √((n_dates / signal_horizon) · n_assets).

0.04
signal_horizon int

Horizon (in bars) at which the synthetic signal lives — a property of the generated data, not a pipeline parameter. Pipelines measuring at AnalysisConfig.forward_periods == signal_horizon realize the nominal IC; different horizons realize a decayed IC (correct physics for a signal with a natural time-scale, not a bug).

5
seed int

RNG seed.

42
start_date str

ISO date for the first row.

_DEFAULT_START

Returns:

Type Description
DataFrame

Long DataFrame with date, asset_id, price, factor and

DataFrame

date dtype pl.Datetime("ms"). Attach forward_return

DataFrame

(e.g. via factrix.preprocess.compute_forward_return)

DataFrame

before passing to fx.evaluate.

Examples:

>>> import factrix as fx
>>> raw = fx.datasets.make_cs_panel(n_assets=20, n_dates=120)
>>> set(raw.columns) == {"date", "asset_id", "price", "factor"}
True
>>> raw["asset_id"].n_unique() == 20
True

Attach a forward return before evaluating:

>>> from factrix.preprocess import compute_forward_return
>>> panel = compute_forward_return(raw, forward_periods=5)
>>> "forward_return" in panel.columns
True

factrix.datasets.make_event_panel

make_event_panel(*, n_assets: int = 50, n_dates: int = 252, event_rate: float = 0.02, post_event_drift_bps: float = 10.0, signal_horizon: int = 5, seed: int = 42, start_date: str = _DEFAULT_START) -> DataFrame

Synthetic event-signal panel — sparse {0, R} schema, emitted here as the canonical signed ternary factor ∈ {-1, 0, +1}.

Construction
  1. Baseline returns as in make_cs_panel.
  2. Independent Bernoulli(event_rate) per (t, i); sign ±1 with equal probability (this generator's chosen magnitude under the broader {0, R} sparse schema). Non-event cells get 0.
  3. Post-event drift: for each event with sign s, add s · post_event_drift_bps / 1e4 / signal_horizon to the signal_horizon bars t+2 .. t+1+H of that asset — the exact window a pipeline measuring forward return at the same horizon will see. Drift magnitude is small (≈ bps-per-day) so the event signal is discoverable but not trivial.
  4. Prices are cumulated after drift injection.

Suitable for AnalysisConfig.individual_sparse() / AnalysisConfig.common_sparse().

Parameters:

Name Type Description Default
n_assets int

Cross-sectional width.

50
n_dates int

Number of calendar dates.

252
event_rate float

Per-cell event probability (≈ expected events per asset per date).

0.02
post_event_drift_bps float

Total drift in basis points injected across the signal_horizon bars of the forward-return window (bars t+2 .. t+1+H).

10.0
signal_horizon int

Horizon (in bars) over which post-event drift is distributed — a property of the generated data, not a pipeline parameter. Pipelines measuring at AnalysisConfig.forward_periods == signal_horizon realize the nominal drift; different horizons realize a weakened or diluted signal (correct physics for a signal with a natural time-scale, not a bug).

5
seed int

RNG seed.

42
start_date str

ISO date for the first row.

_DEFAULT_START

Returns:

Type Description
DataFrame

Long DataFrame with date, asset_id, price, factor. Factor

DataFrame

is Float64 with values in {-1.0, 0.0, +1.0}. Attach

DataFrame

forward_return (e.g. via

DataFrame

factrix.preprocess.compute_forward_return) before

DataFrame

passing to fx.evaluate.

Examples:

>>> import factrix as fx
>>> raw = fx.datasets.make_event_panel(n_assets=20, n_dates=120, event_rate=0.05)
>>> set(raw["factor"].unique().to_list()) <= {-1.0, 0.0, 1.0}
True

Pair with the sparse factory:

>>> from factrix.preprocess import compute_forward_return
>>> panel = compute_forward_return(raw, forward_periods=5)
>>> cfg = fx.AnalysisConfig.individual_sparse(forward_periods=5)
>>> cfg.signal is fx.Signal.SPARSE
True