factrix.metrics.ic ¶
IC (Information Coefficient) computation for cross-sectional panels.
Notes
Pipeline. Per-date Spearman rank IC (cross-section step) → IC time series, then non-overlapping cross-asset t or Newey-West (NW) heteroskedasticity-and-autocorrelation-consistent (HAC) t on its mean; the regime variant slices the same pipeline.
Input. DataFrame with date, asset_id, factor, forward_return.
Output. Time-indexed IC series (date, ic) that can be fed
into any series/ tool (oos, trend, significance, hit_rate).
factrix.metrics.ic.compute_ic ¶
compute_ic(df: DataFrame, factor_col: str = 'factor', return_col: str = 'forward_return') -> DataFrame
Per-date Spearman Rank information coefficient (IC).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Panel with |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with columns |
DataFrame
|
Dates with fewer than |
DataFrame
|
|
DataFrame
|
\(1 - n_{\mathrm{unique}} / n\) in \([0, 1]\). |
Notes
Per-date Spearman IC is
\(\mathrm{IC}_t = \mathrm{corr}(\mathrm{rank}(f_t), \mathrm{rank}(r_t))\)
over the cross-section at date \(t\); rank ties are broken with
average rank (method="average").
At high tie rates Spearman \(\rho\) on average ranks is biased
relative to the tie-corrected formula (Kendall-Stuart §31). The
per-date factor tie_ratio is surfaced alongside ic so
downstream callers can detect bucketed / categorical signals
without re-inspecting the input; ic / ic_newey_west /
ic_ir aggregate it as the median across dates and stash it in
MetricOutput.metadata["tie_ratio"]. When the median exceeds
TIE_RATIO_WARN_THRESHOLD (0.3) those aggregators also emit a
UserWarning: treat the IC magnitude as a lower bound and
consider a tie-corrected correlation or a continuous transform
of the factor.
factrix drops dates whose cross-section has fewer than
MIN_ASSETS_PER_DATE_IC assets — undersized panels yield
rank-correlation estimates with degenerate variance.
References
Grinold 1989: \(\mathrm{IR} \approx \mathrm{IC} \times \sqrt{\mathrm{breadth}}\) motivates IC as the canonical signal-quality measure. The appraisal-ratio single-asset ancestor is Treynor-Black 1973; the breadth identity itself is Grinold's generalisation.
Examples:
>>> import factrix as fx
>>> from factrix.preprocess import compute_forward_return
>>> from factrix.metrics.ic import compute_ic
>>> panel = compute_forward_return(
... fx.datasets.make_cs_panel(n_assets=80, n_dates=120, seed=0),
... forward_periods=5,
... )
>>> ic_df = compute_ic(panel)
>>> set(ic_df.columns) >= {"date", "ic", "tie_ratio"}
True
factrix.metrics.ic.ic ¶
ic(ic_df: DataFrame, forward_periods: int = 5) -> MetricOutput
Information coefficient (IC) mean significance: is mean IC significantly different from zero?
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ic_df
|
DataFrame
|
Output of |
required |
forward_periods
|
int
|
Sampling interval for non-overlapping dates. |
5
|
Returns:
| Type | Description |
|---|---|
MetricOutput
|
MetricOutput with value=mean IC, t_stat from non-overlapping sampling. |
Notes
Given the per-date IC series \(\mathrm{IC}_t\), significance is
\(t = \mathrm{mean}(\mathrm{IC}) / (\mathrm{std}(\mathrm{IC}) / \sqrt{n})\)
computed on a non-overlapping subsample (every
forward_periods-th date). \(H_0: \mathbb{E}[\mathrm{IC}] = 0\).
factrix uses non-overlapping resampling rather than Newey-West heteroskedasticity-and-autocorrelation-consistent (HAC)
for the default ic test to avoid the lag floor implied by
overlapping forward returns; the HAC route is offered separately
as ic_newey_west for callers who prefer to keep every sample.
References
Grinold 1989: IC as the canonical signal-quality measure under the Fundamental Law of Active Management. Hansen-Hodrick 1980: K-period overlapping returns carry MA(K-1) autocorrelation — the motivation for the non-overlap stride used here.
Examples:
Chain from :func:compute_ic output:
>>> import factrix as fx
>>> from factrix.preprocess import compute_forward_return
>>> from factrix.metrics.ic import compute_ic, ic
>>> panel = compute_forward_return(
... fx.datasets.make_cs_panel(n_assets=80, n_dates=180, seed=0),
... forward_periods=5,
... )
>>> ic_df = compute_ic(panel)
>>> result = ic(ic_df, forward_periods=5)
>>> result.name
'ic'
factrix.metrics.ic.ic_newey_west ¶
ic_newey_west(ic_df: DataFrame, forward_periods: int = 5) -> MetricOutput
Information coefficient (IC) mean significance via Newey-West heteroskedasticity-and-autocorrelation-consistent (HAC) \(t\)-test on the overlapping series.
Sibling of ic(): same null hypothesis (\(H_0\): mean IC = 0), but
keeps every observation and absorbs the autocorrelation induced by
overlapping forward_periods-day returns through HAC standard
errors rather than dropping samples.
Notes
\(t = \mathrm{mean}(\mathrm{IC}) / \mathrm{SE}_{\mathrm{NW}}(\mathrm{IC})\)
on the full overlapping IC series. Lag selection:
\(L = \max(\lfloor T^{1/3} \rfloor, h - 1)\) (with \(h\) = forward_periods)
— the Andrews (1991) Bartlett growth rate, floored against the
Hansen-Hodrick MA(\(h-1\)) overlap horizon so the kernel covers
the induced dependence.
factrix uses the Andrews fixed-rate rule rather than the Newey-West (1994) data-adaptive bandwidth — simpler, deterministic across reruns, and adequate at the typical \(T\) of factor research.
References
Newey-West 1987: HAC variance estimator.
Andrews 1991: optimal Bartlett growth rate
\(T^{1/3}\) underlying the default lag rule.
Hansen-Hodrick 1980: forward_periods - 1
floor for overlapping returns.
Newey-West 1994: data-adaptive lag-selection
alternative; cited as background.
Examples:
Chain from :func:compute_ic output:
>>> import factrix as fx
>>> from factrix.preprocess import compute_forward_return
>>> from factrix.metrics.ic import compute_ic, ic_newey_west
>>> panel = compute_forward_return(
... fx.datasets.make_cs_panel(n_assets=80, n_dates=180, seed=0),
... forward_periods=5,
... )
>>> ic_df = compute_ic(panel)
>>> result = ic_newey_west(ic_df, forward_periods=5)
>>> result.name
'ic_newey_west'
factrix.metrics.ic.ic_ir ¶
ic_ir(ic_df: DataFrame) -> MetricOutput
\(\mathrm{ICIR} = \mathrm{mean}(\mathrm{IC}) / \mathrm{std}(\mathrm{IC})\).
Signed ratio — positive when information coefficient (IC) is consistently positive, negative when consistently negative. Analogous to a Sharpe ratio for the factor signal.
This is a descriptive statistic, not a hypothesis test (t_stat=None).
For significance testing, use ic().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ic_df
|
DataFrame
|
Output of |
required |
Returns:
| Type | Description |
|---|---|
MetricOutput
|
MetricOutput with value=IC_IR (signed), t_stat=None. |
Notes
\(\mathrm{ICIR} = \mathrm{mean}(\mathrm{IC}) / \mathrm{std}(\mathrm{IC})\)
over the per-date IC series — a Sharpe-style ratio describing
time-series stability of the signal. Reported as a descriptive
statistic; no inference is attached because the heteroskedasticity-and-autocorrelation-consistent (HAC)-corrected
significance test on \(\mathrm{mean}(\mathrm{IC})\) lives in ic
/ ic_newey_west.
References
Grinold 1989: ICIR is the time-stability normalisation that completes the IR decomposition.
Examples:
Chain from :func:compute_ic output:
>>> import factrix as fx
>>> from factrix.preprocess import compute_forward_return
>>> from factrix.metrics.ic import compute_ic, ic_ir
>>> panel = compute_forward_return(
... fx.datasets.make_cs_panel(n_assets=80, n_dates=180, seed=0),
... forward_periods=5,
... )
>>> ic_df = compute_ic(panel)
>>> result = ic_ir(ic_df)
>>> result.name
'ic_ir'
Use cases¶
-
Compute per-date information coefficient (IC)
Build the per-date Spearman IC series (with
tie_ratiodiagnostics) from a long-format panel before any inferential test. Pre-step foric/ic_newey_west/ic_ir. -
Mean-IC significance, non-overlapping
Test \(H_0: \mathbb{E}[\mathrm{IC}] = 0\) on the every-
forward_periodssubsample to avoid the autocorrelation induced by overlapping forward returns. Default for the IC cell. -
Mean-IC significance, heteroskedasticity-and-autocorrelation-consistent (HAC)
Same null, but keep every overlapping observation and absorb the induced MA dependence through a Newey-West HAC standard error. Trade larger effective sample for kernel choice.
-
IC stability (signed IR)
mean(IC) / std(IC)over the per-date series — a Sharpe-style descriptive statistic for signal time-stability. No inference attached.
Choosing a function¶
| Goal | Function |
|---|---|
| Per-date IC table for downstream inspection / slicing | compute_ic |
| Mean-IC significance, deterministic non-overlap subsample (default) | ic |
| Mean-IC significance, keep every overlap and use HAC SE | ic_newey_west |
| Time-stability ratio (no inference) | ic_ir |
All four are invoked indirectly via evaluate(panel, AnalysisConfig.individual_continuous(metric=Metric.IC))
— they're documented here for callers who want the standalone numerical
output without the inference framing.
Worked example — per-date IC then mean significance¶
compute_ic → ic_newey_west on a synthetic cross-sectional panel
import factrix as fx
from factrix.metrics.ic import compute_ic, ic_newey_west
from factrix.preprocess import compute_forward_return
raw = fx.datasets.make_cs_panel(
n_assets=100, n_dates=500, ic_target=0.08, seed=2024,
)
panel = compute_forward_return(raw, forward_periods=5)
ic_df = compute_ic(panel)
print(ic_df.head())
# ┌────────────┬───────────┬───────────┐
# │ date ┆ ic ┆ tie_ratio │
# ├────────────┼───────────┼───────────┤
# │ 2024-01-01 ┆ 0.083 ┆ 0.000 │
# │ 2024-01-02 ┆ 0.071 ┆ 0.000 │
# │ ... ┆ ... ┆ ... │
# └────────────┴───────────┴───────────┘
out = ic_newey_west(ic_df, forward_periods=5)
print(out.value, out.stat, out.metadata["p_value"])
# 0.0722 14.60 2.13e-40
Cross-slice IC analysis
For per-slice IC summaries (regime / universe / sector / ...), use
by_slice on an IC frame joined with slice labels.
For inferential contrasts (pairwise Wald χ² + Holm / Romano-Wolf
adjusted p), use slice_pairwise_test. The
metric-specific regime_ic callable and by_regime dispatcher were
removed in v0.12.0; see the
Slice analysis guide.
See also¶
-
by_slice
Axis-agnostic slice dispatcher for per-slice IC summaries.
-
slice_pairwise_test/slice_joint_test
Cross-slice inference (Wald χ² + Holm / Romano-Wolf adjusted p).
-
Slice analysis guide
Slicing and cross-slice inference end-to-end.
-
Metric applicability reference
When this metric applies and the sample-size guards that gate it.
-
Statistical methods
HAC SE, false discovery rate (FDR), robust-scale, unit-root disciplines that govern the inference.
-
Individual × Continuous landing
Adjacent metrics in the same cell.