Synthetic panel generators for examples, tests, and documentation.
Both emit raw canonical-column panels (date, asset_id, price,
factor); attach forward_return via
factrix.preprocess.compute_forward_return before
passing to evaluate.
The dataset's signal_horizon is a property of the generated
synthetic signal, not a pipeline parameter. When
AnalysisConfig.forward_periods == signal_horizon the pipeline
realizes the nominal information coefficient (IC) / drift; other horizons realize a decayed
signal.
factrix.datasets.make_cs_panel ¶
make_cs_panel(*, n_assets: int = 50, n_dates: int = 252, ic_target: float = 0.04, signal_horizon: int = 5, seed: int = 42, start_date: str = _DEFAULT_START) -> DataFrame
Synthetic cross-sectional panel with a calibrated target information coefficient (IC).
Construction
- Per-asset volatility
σ_i ~ U[0.01, 0.03]; daily arithmetic returns areε_{t,i} ~ N(0, σ_i). - Prices
p[t,i] = 100 · cumprod(1 + ε). - Signal-horizon forward return
fr[t] = (p[t+1+H]/p[t+1] − 1) / HwhereH = signal_horizon. -
Factor is a cross-sectional mixture of standardized forward return and iid noise::
factor[t] = ρ · z(fr[t]) + √(1−ρ²) · z(η[t])
where ρ = clip(ic_target, −0.99, 0.99) and z is plain
Gaussian (not MAD) z-score so the identity
Corr(factor, fr) = ρ holds exactly per date at horizon
H. Factorlib's ic_mean uses Spearman rank IC, which
tracks Pearson ρ tightly at small |ρ| but is not
identical — realized ic_mean at |ic_target| ≳ 0.2 may
diverge by a few bp.
5. The last H+1 dates have no defined forward return; factor
values there are pure noise and will be dropped along with
the null forward returns once compute_forward_return runs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_assets
|
int
|
Cross-sectional width. |
50
|
n_dates
|
int
|
Number of calendar dates (daily index, includes weekends — factrix doesn't prescribe a calendar). |
252
|
ic_target
|
float
|
Target per-date Pearson CS correlation between
factor and forward return at |
0.04
|
signal_horizon
|
int
|
Horizon (in bars) at which the synthetic signal
lives — a property of the generated data, not a pipeline
parameter. Pipelines measuring at
|
5
|
seed
|
int
|
RNG seed. |
42
|
start_date
|
str
|
ISO date for the first row. |
_DEFAULT_START
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Long DataFrame with |
DataFrame
|
|
DataFrame
|
(e.g. via |
DataFrame
|
before passing to |
Examples:
>>> import factrix as fx
>>> raw = fx.datasets.make_cs_panel(n_assets=20, n_dates=120)
>>> set(raw.columns) == {"date", "asset_id", "price", "factor"}
True
>>> raw["asset_id"].n_unique() == 20
True
Attach a forward return before evaluating:
factrix.datasets.make_event_panel ¶
make_event_panel(*, n_assets: int = 50, n_dates: int = 252, event_rate: float = 0.02, post_event_drift_bps: float = 10.0, signal_horizon: int = 5, seed: int = 42, start_date: str = _DEFAULT_START) -> DataFrame
Synthetic event-signal panel — sparse {0, R} schema, emitted
here as the canonical signed ternary factor ∈ {-1, 0, +1}.
Construction
- Baseline returns as in
make_cs_panel. - Independent
Bernoulli(event_rate)per(t, i); sign±1with equal probability (this generator's chosen magnitude under the broader{0, R}sparse schema). Non-event cells get0. - Post-event drift: for each event with sign
s, adds · post_event_drift_bps / 1e4 / signal_horizonto thesignal_horizonbarst+2 .. t+1+Hof that asset — the exact window a pipeline measuring forward return at the same horizon will see. Drift magnitude is small (≈ bps-per-day) so the event signal is discoverable but not trivial. - Prices are cumulated after drift injection.
Suitable for AnalysisConfig.individual_sparse() /
AnalysisConfig.common_sparse().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_assets
|
int
|
Cross-sectional width. |
50
|
n_dates
|
int
|
Number of calendar dates. |
252
|
event_rate
|
float
|
Per-cell event probability (≈ expected events per asset per date). |
0.02
|
post_event_drift_bps
|
float
|
Total drift in basis points injected
across the |
10.0
|
signal_horizon
|
int
|
Horizon (in bars) over which post-event drift
is distributed — a property of the generated data, not a
pipeline parameter. Pipelines measuring at
|
5
|
seed
|
int
|
RNG seed. |
42
|
start_date
|
str
|
ISO date for the first row. |
_DEFAULT_START
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Long DataFrame with |
DataFrame
|
is |
DataFrame
|
|
DataFrame
|
|
DataFrame
|
passing to |
Examples:
>>> import factrix as fx
>>> raw = fx.datasets.make_event_panel(n_assets=20, n_dates=120, event_rate=0.05)
>>> set(raw["factor"].unique().to_list()) <= {-1.0, 0.0, 1.0}
True
Pair with the sparse factory: