Skip to content

factrix.metrics.ic

IC (Information Coefficient) computation for cross-sectional panels.

Notes

Pipeline. Per-date Spearman rank IC (cross-section step) → IC time series, then non-overlapping cross-asset t or Newey-West (NW) heteroskedasticity-and-autocorrelation-consistent (HAC) t on its mean; the regime variant slices the same pipeline.

Input. DataFrame with date, asset_id, factor, forward_return.

Output. Time-indexed IC series (date, ic) that can be fed into any series/ tool (oos, trend, significance, hit_rate).

factrix.metrics.ic.compute_ic

compute_ic(df: DataFrame, factor_col: str = 'factor', return_col: str = 'forward_return') -> DataFrame

Per-date Spearman Rank information coefficient (IC).

Parameters:

Name Type Description Default
df DataFrame

Panel with date, asset_id, factor_col, return_col.

required

Returns:

Type Description
DataFrame

DataFrame with columns date, ic, tie_ratio sorted by date.

DataFrame

Dates with fewer than MIN_ASSETS_PER_DATE_IC assets are dropped.

DataFrame

tie_ratio is the per-date factor tie density

DataFrame

\(1 - n_{\mathrm{unique}} / n\) in \([0, 1]\).

Notes

Per-date Spearman IC is \(\mathrm{IC}_t = \mathrm{corr}(\mathrm{rank}(f_t), \mathrm{rank}(r_t))\) over the cross-section at date \(t\); rank ties are broken with average rank (method="average").

At high tie rates Spearman \(\rho\) on average ranks is biased relative to the tie-corrected formula (Kendall-Stuart §31). The per-date factor tie_ratio is surfaced alongside ic so downstream callers can detect bucketed / categorical signals without re-inspecting the input; ic / ic_newey_west / ic_ir aggregate it as the median across dates and stash it in MetricOutput.metadata["tie_ratio"]. When the median exceeds TIE_RATIO_WARN_THRESHOLD (0.3) those aggregators also emit a UserWarning: treat the IC magnitude as a lower bound and consider a tie-corrected correlation or a continuous transform of the factor.

factrix drops dates whose cross-section has fewer than MIN_ASSETS_PER_DATE_IC assets — undersized panels yield rank-correlation estimates with degenerate variance.

References

Grinold 1989: \(\mathrm{IR} \approx \mathrm{IC} \times \sqrt{\mathrm{breadth}}\) motivates IC as the canonical signal-quality measure. The appraisal-ratio single-asset ancestor is Treynor-Black 1973; the breadth identity itself is Grinold's generalisation.

Examples:

>>> import factrix as fx
>>> from factrix.preprocess import compute_forward_return
>>> from factrix.metrics.ic import compute_ic
>>> panel = compute_forward_return(
...     fx.datasets.make_cs_panel(n_assets=80, n_dates=120, seed=0),
...     forward_periods=5,
... )
>>> ic_df = compute_ic(panel)
>>> set(ic_df.columns) >= {"date", "ic", "tie_ratio"}
True

factrix.metrics.ic.ic

ic(ic_df: DataFrame, forward_periods: int = 5) -> MetricOutput

Information coefficient (IC) mean significance: is mean IC significantly different from zero?

Parameters:

Name Type Description Default
ic_df DataFrame

Output of compute_ic().

required
forward_periods int

Sampling interval for non-overlapping dates.

5

Returns:

Type Description
MetricOutput

MetricOutput with value=mean IC, t_stat from non-overlapping sampling.

Notes

Given the per-date IC series \(\mathrm{IC}_t\), significance is \(t = \mathrm{mean}(\mathrm{IC}) / (\mathrm{std}(\mathrm{IC}) / \sqrt{n})\) computed on a non-overlapping subsample (every forward_periods-th date). \(H_0: \mathbb{E}[\mathrm{IC}] = 0\).

factrix uses non-overlapping resampling rather than Newey-West heteroskedasticity-and-autocorrelation-consistent (HAC) for the default ic test to avoid the lag floor implied by overlapping forward returns; the HAC route is offered separately as ic_newey_west for callers who prefer to keep every sample.

References

Grinold 1989: IC as the canonical signal-quality measure under the Fundamental Law of Active Management. Hansen-Hodrick 1980: K-period overlapping returns carry MA(K-1) autocorrelation — the motivation for the non-overlap stride used here.

Examples:

Chain from :func:compute_ic output:

>>> import factrix as fx
>>> from factrix.preprocess import compute_forward_return
>>> from factrix.metrics.ic import compute_ic, ic
>>> panel = compute_forward_return(
...     fx.datasets.make_cs_panel(n_assets=80, n_dates=180, seed=0),
...     forward_periods=5,
... )
>>> ic_df = compute_ic(panel)
>>> result = ic(ic_df, forward_periods=5)
>>> result.name
'ic'

factrix.metrics.ic.ic_newey_west

ic_newey_west(ic_df: DataFrame, forward_periods: int = 5) -> MetricOutput

Information coefficient (IC) mean significance via Newey-West heteroskedasticity-and-autocorrelation-consistent (HAC) \(t\)-test on the overlapping series.

Sibling of ic(): same null hypothesis (\(H_0\): mean IC = 0), but keeps every observation and absorbs the autocorrelation induced by overlapping forward_periods-day returns through HAC standard errors rather than dropping samples.

Notes

\(t = \mathrm{mean}(\mathrm{IC}) / \mathrm{SE}_{\mathrm{NW}}(\mathrm{IC})\) on the full overlapping IC series. Lag selection: \(L = \max(\lfloor T^{1/3} \rfloor, h - 1)\) (with \(h\) = forward_periods) — the Andrews (1991) Bartlett growth rate, floored against the Hansen-Hodrick MA(\(h-1\)) overlap horizon so the kernel covers the induced dependence.

factrix uses the Andrews fixed-rate rule rather than the Newey-West (1994) data-adaptive bandwidth — simpler, deterministic across reruns, and adequate at the typical \(T\) of factor research.

References

Newey-West 1987: HAC variance estimator. Andrews 1991: optimal Bartlett growth rate \(T^{1/3}\) underlying the default lag rule. Hansen-Hodrick 1980: forward_periods - 1 floor for overlapping returns. Newey-West 1994: data-adaptive lag-selection alternative; cited as background.

Examples:

Chain from :func:compute_ic output:

>>> import factrix as fx
>>> from factrix.preprocess import compute_forward_return
>>> from factrix.metrics.ic import compute_ic, ic_newey_west
>>> panel = compute_forward_return(
...     fx.datasets.make_cs_panel(n_assets=80, n_dates=180, seed=0),
...     forward_periods=5,
... )
>>> ic_df = compute_ic(panel)
>>> result = ic_newey_west(ic_df, forward_periods=5)
>>> result.name
'ic_newey_west'

factrix.metrics.ic.ic_ir

ic_ir(ic_df: DataFrame) -> MetricOutput

\(\mathrm{ICIR} = \mathrm{mean}(\mathrm{IC}) / \mathrm{std}(\mathrm{IC})\).

Signed ratio — positive when information coefficient (IC) is consistently positive, negative when consistently negative. Analogous to a Sharpe ratio for the factor signal.

This is a descriptive statistic, not a hypothesis test (t_stat=None). For significance testing, use ic().

Parameters:

Name Type Description Default
ic_df DataFrame

Output of compute_ic().

required

Returns:

Type Description
MetricOutput

MetricOutput with value=IC_IR (signed), t_stat=None.

Notes

\(\mathrm{ICIR} = \mathrm{mean}(\mathrm{IC}) / \mathrm{std}(\mathrm{IC})\) over the per-date IC series — a Sharpe-style ratio describing time-series stability of the signal. Reported as a descriptive statistic; no inference is attached because the heteroskedasticity-and-autocorrelation-consistent (HAC)-corrected significance test on \(\mathrm{mean}(\mathrm{IC})\) lives in ic / ic_newey_west.

References

Grinold 1989: ICIR is the time-stability normalisation that completes the IR decomposition.

Examples:

Chain from :func:compute_ic output:

>>> import factrix as fx
>>> from factrix.preprocess import compute_forward_return
>>> from factrix.metrics.ic import compute_ic, ic_ir
>>> panel = compute_forward_return(
...     fx.datasets.make_cs_panel(n_assets=80, n_dates=180, seed=0),
...     forward_periods=5,
... )
>>> ic_df = compute_ic(panel)
>>> result = ic_ir(ic_df)
>>> result.name
'ic_ir'

Use cases

  • Compute per-date information coefficient (IC)


    Build the per-date Spearman IC series (with tie_ratio diagnostics) from a long-format panel before any inferential test. Pre-step for ic / ic_newey_west / ic_ir.

  • Mean-IC significance, non-overlapping


    Test \(H_0: \mathbb{E}[\mathrm{IC}] = 0\) on the every-forward_periods subsample to avoid the autocorrelation induced by overlapping forward returns. Default for the IC cell.

  • Mean-IC significance, heteroskedasticity-and-autocorrelation-consistent (HAC)


    Same null, but keep every overlapping observation and absorb the induced MA dependence through a Newey-West HAC standard error. Trade larger effective sample for kernel choice.

  • IC stability (signed IR)


    mean(IC) / std(IC) over the per-date series — a Sharpe-style descriptive statistic for signal time-stability. No inference attached.

Choosing a function

Goal Function
Per-date IC table for downstream inspection / slicing compute_ic
Mean-IC significance, deterministic non-overlap subsample (default) ic
Mean-IC significance, keep every overlap and use HAC SE ic_newey_west
Time-stability ratio (no inference) ic_ir

All four are invoked indirectly via evaluate(panel, AnalysisConfig.individual_continuous(metric=Metric.IC)) — they're documented here for callers who want the standalone numerical output without the inference framing.

Worked example — per-date IC then mean significance

compute_ic → ic_newey_west on a synthetic cross-sectional panel

import factrix as fx
from factrix.metrics.ic import compute_ic, ic_newey_west
from factrix.preprocess import compute_forward_return

raw   = fx.datasets.make_cs_panel(
    n_assets=100, n_dates=500, ic_target=0.08, seed=2024,
)
panel = compute_forward_return(raw, forward_periods=5)

ic_df = compute_ic(panel)
print(ic_df.head())
# ┌────────────┬───────────┬───────────┐
# │ date       ┆ ic        ┆ tie_ratio │
# ├────────────┼───────────┼───────────┤
# │ 2024-01-01 ┆ 0.083     ┆ 0.000     │
# │ 2024-01-02 ┆ 0.071     ┆ 0.000     │
# │ ...        ┆ ...       ┆ ...       │
# └────────────┴───────────┴───────────┘

out = ic_newey_west(ic_df, forward_periods=5)
print(out.value, out.stat, out.metadata["p_value"])
# 0.0722  14.60  2.13e-40

Cross-slice IC analysis

For per-slice IC summaries (regime / universe / sector / ...), use by_slice on an IC frame joined with slice labels. For inferential contrasts (pairwise Wald χ² + Holm / Romano-Wolf adjusted p), use slice_pairwise_test. The metric-specific regime_ic callable and by_regime dispatcher were removed in v0.12.0; see the Slice analysis guide.

See also