Synthetic panel generators for examples, tests, and documentation. Both emit raw canonical-column panels (date, asset_id, price, factor); attach forward_return via factrix.preprocess.compute_forward_return before passing to evaluate.

The dataset's signal_horizon is a property of the generated synthetic signal, not a pipeline parameter. When AnalysisConfig.forward_periods == signal_horizon the pipeline realizes the nominal information coefficient (IC) / drift; other horizons realize a decayed signal.

factrix.datasets.make_cs_panel ¶

make_cs_panel(*, n_assets: int = 50, n_dates: int = 252, ic_target: float = 0.04, signal_horizon: int = 5, seed: int = 42, start_date: str = _DEFAULT_START) -> DataFrame

Synthetic cross-sectional panel with a calibrated target information coefficient (IC).

Construction

Per-asset volatility σ_i ~ U[0.01, 0.03]; daily arithmetic returns are ε_{t,i} ~ N(0, σ_i).
Prices p[t,i] = 100 · cumprod(1 + ε).
Signal-horizon forward return fr[t] = (p[t+1+H]/p[t+1] − 1) / H where H = signal_horizon.
Factor is a cross-sectional mixture of standardized forward return and iid noise::

factor[t] = ρ · z(fr[t]) + √(1−ρ²) · z(η[t])

where ρ = clip(ic_target, −0.99, 0.99) and z is plain Gaussian (not MAD) z-score so the identity Corr(factor, fr) = ρ holds exactly per date at horizon H. Factorlib's ic_mean uses Spearman rank IC, which tracks Pearson ρ tightly at small |ρ| but is not identical — realized ic_mean at |ic_target| ≳ 0.2 may diverge by a few bp. 5. The last H+1 dates have no defined forward return; factor values there are pure noise and will be dropped along with the null forward returns once compute_forward_return runs.

Parameters:

Name	Type	Description	Default
`n_assets`	`int`	Cross-sectional width.	`50`
`n_dates`	`int`	Number of calendar dates (daily index, includes weekends — factrix doesn't prescribe a calendar).	`252`
`ic_target`	`float`	Target per-date Pearson CS correlation between factor and forward return at `signal_horizon`. Realized realized per-date IC after `fx.evaluate` will fall near this within a couple of standard errors — overlapping forward returns reduce effective independent dates by `signal_horizon` so s.e. ≈ `1 / √((n_dates / signal_horizon) · n_assets)`.	`0.04`
`signal_horizon`	`int`	Horizon (in bars) at which the synthetic signal lives — a property of the generated data, not a pipeline parameter. Pipelines measuring at `AnalysisConfig.forward_periods == signal_horizon` realize the nominal IC; different horizons realize a decayed IC (correct physics for a signal with a natural time-scale, not a bug).	`5`
`seed`	`int`	RNG seed.	`42`
`start_date`	`str`	ISO date for the first row.	`_DEFAULT_START`

Returns:

Type	Description
`DataFrame`	Long DataFrame with `date, asset_id, price, factor` and
`DataFrame`	`date` dtype `pl.Datetime("ms")`. Attach `forward_return`
`DataFrame`	(e.g. via `factrix.preprocess.compute_forward_return`)
`DataFrame`	before passing to `fx.evaluate`.

Examples:

>>> import factrix as fx
>>> raw = fx.datasets.make_cs_panel(n_assets=20, n_dates=120)
>>> set(raw.columns) == {"date", "asset_id", "price", "factor"}
True
>>> raw["asset_id"].n_unique() == 20
True

Attach a forward return before evaluating:

>>> from factrix.preprocess import compute_forward_return
>>> panel = compute_forward_return(raw, forward_periods=5)
>>> "forward_return" in panel.columns
True

factrix.datasets.make_event_panel ¶

make_event_panel(*, n_assets: int = 50, n_dates: int = 252, event_rate: float = 0.02, post_event_drift_bps: float = 10.0, signal_horizon: int = 5, seed: int = 42, start_date: str = _DEFAULT_START) -> DataFrame

Synthetic event-signal panel — sparse {0, R} schema, emitted here as the canonical signed ternary factor ∈ {-1, 0, +1}.

Construction

Baseline returns as in make_cs_panel.
Independent Bernoulli(event_rate) per (t, i); sign ±1 with equal probability (this generator's chosen magnitude under the broader {0, R} sparse schema). Non-event cells get 0.
Post-event drift: for each event with sign s, add s · post_event_drift_bps / 1e4 / signal_horizon to the signal_horizon bars t+2 .. t+1+H of that asset — the exact window a pipeline measuring forward return at the same horizon will see. Drift magnitude is small (≈ bps-per-day) so the event signal is discoverable but not trivial.
Prices are cumulated after drift injection.

Suitable for AnalysisConfig.individual_sparse() / AnalysisConfig.common_sparse().

Parameters:

Name	Type	Description	Default
`n_assets`	`int`	Cross-sectional width.	`50`
`n_dates`	`int`	Number of calendar dates.	`252`
`event_rate`	`float`	Per-cell event probability (≈ expected events per asset per date).	`0.02`
`post_event_drift_bps`	`float`	Total drift in basis points injected across the `signal_horizon` bars of the forward-return window (bars `t+2 .. t+1+H`).	`10.0`
`signal_horizon`	`int`	Horizon (in bars) over which post-event drift is distributed — a property of the generated data, not a pipeline parameter. Pipelines measuring at `AnalysisConfig.forward_periods == signal_horizon` realize the nominal drift; different horizons realize a weakened or diluted signal (correct physics for a signal with a natural time-scale, not a bug).	`5`
`seed`	`int`	RNG seed.	`42`
`start_date`	`str`	ISO date for the first row.	`_DEFAULT_START`

Returns:

Type	Description
`DataFrame`	Long DataFrame with `date, asset_id, price, factor`. Factor
`DataFrame`	is `Float64` with values in `{-1.0, 0.0, +1.0}`. Attach
`DataFrame`	`forward_return` (e.g. via
`DataFrame`	`factrix.preprocess.compute_forward_return`) before
`DataFrame`	passing to `fx.evaluate`.

Examples:

>>> import factrix as fx
>>> raw = fx.datasets.make_event_panel(n_assets=20, n_dates=120, event_rate=0.05)
>>> set(raw["factor"].unique().to_list()) <= {-1.0, 0.0, 1.0}
True

Pair with the sparse factory:

>>> from factrix.preprocess import compute_forward_return
>>> panel = compute_forward_return(raw, forward_periods=5)
>>> cfg = fx.AnalysisConfig.individual_sparse(forward_periods=5)
>>> cfg.signal is fx.Signal.SPARSE
True