Skip to content

factrix.multi_factor.partial_conjunction

partial_conjunction(profiles: Iterable[FactorProfile], *, min_pass: int, expand_over: Sequence[str], n_conditions: int | None = None, estimator: Estimator | None = None, q: float = 0.05) -> Survivors

Partial conjunction screening: filter identities significant in at least min_pass of m expanded conditions, false discovery rate (FDR)-controlled.

For "factor X is significant in universes A and B" style claims, naive set(survivors_A) & set(survivors_B) does not preserve FDR Benjamini-Bogomolov (2014). The partial conjunction test Benjamini-Heller (2008) provides a contract-bearing path: per identity, combine the m per-condition p-values into a single PC p-value, then run Benjamini-Hochberg-Yekutieli (BHY) across identities.

The PC p-value formula (Bonferroni-style, BH2008): for k = min_pass, p_PC = (m - k + 1) * p_((k)) capped at 1, where p_((k)) is the k-th smallest p-value across the identity's m conditions. k = m reduces to max(p) (full conjunction); k = 1 reduces to m * min(p) (Bonferroni-union, forbidden here — see below).

Parameters:

Name Type Description Default
profiles Iterable[FactorProfile]

Iterable of :class:FactorProfile. Conditions per identity come from expand_over; multiple profiles sharing an identity must differ on at least one expand_over key (the standard _resolve_family uniqueness check).

required
min_pass int

k in "k of m" — minimum number of conditions required to be significant. Must be >= 2; min_pass=1 is union semantics (FDR ≈ 2q under independence) and raises with a pointer to bhy(expand_over=...).

required
expand_over Sequence[str]

Required, non-empty. Context keys defining the condition axis. Identity dimensions (factor_id / forward_periods) are rejected by the family-resolution layer (#160 anti-shopping defense).

required
n_conditions int | None

Strict-mode declaration. None (lenient) lets m be inferred per identity from the data; an int requires every identity to have exactly that many conditions and raises on mismatch (paper-grade — surfaces data gaps fail-loud).

None
estimator Estimator | None

Optional inference-method override (#170). None uses each profile's primary_p.

None
q float

Nominal FDR target for the BHY step-up over PC p-values. Default 0.05.

0.05

Returns:

Type Description
Survivors

class:Survivors in input order (deduplicated to one row per

Survivors

surviving identity, using the first profile of that identity as

Survivors

representative). adj_p is the BHY-adjusted PC p-value;

Survivors

pc_p is the raw PC p-value; n_tests[identity] is the

Survivors

condition count m; n_passed_uncorr[i] is the count of

Survivors

raw p-values strictly below q for survivor i.

Raises:

Type Description
UserInputError

min_pass < 2 (with min_pass=1 flagged as union semantics); expand_over empty or None; n_conditions < min_pass; strict-mode n_conditions mismatch with actual condition count; identity with fewer than min_pass conditions; family-resolution invariants (unknown expand_over key, identity-shadowing, duplicate partition key).

Examples:

Three candidate factors evaluated in two regions; survive if significant in at least 2 of 2 regions:

>>> import dataclasses
>>> import factrix as fx
>>> from factrix.preprocess import compute_forward_return
>>> cfg = fx.AnalysisConfig.individual_continuous(forward_periods=5)
>>> profiles = [
...     dataclasses.replace(
...         fx.evaluate(
...             compute_forward_return(
...                 fx.datasets.make_cs_panel(
...                     n_assets=100, n_dates=250,
...                     seed=hash((fid, region)) % 1000,
...                 ),
...                 forward_periods=5,
...             ),
...             cfg,
...         ),
...         factor_id=fid,
...         context={"region": region},
...     )
...     for fid in ("alpha_1", "alpha_2", "alpha_3")
...     for region in ("US", "EU")
... ]
>>> survivors = fx.multi_factor.partial_conjunction(
...     profiles, min_pass=2, expand_over=["region"]
... )

Contract-bearing screening for the "factor X is significant in \(k\) of \(m\) conditions" claim. Replaces the notebook idiom set(survivors_a) & set(survivors_b), which does not preserve false discovery rate (FDR) (Benjamini & Bogomolov 2014), with the partial conjunction test of Benjamini & Heller (2008).

import factrix as fx

# "Momentum is significant in BOTH large-cap AND small-cap universes"
profiles = [
    fx.evaluate(panel_large, cfg, factor_col="mom"),
    fx.evaluate(panel_small, cfg, factor_col="mom"),
    # ... + value, quality, etc. one profile per (factor, universe) cell
]
survivors = fx.multi_factor.partial_conjunction(
    profiles,
    min_pass=2,
    n_conditions=2,
    expand_over=["universe_id"],
    q=0.05,
)

Versus bhy(expand_over=...) — same data, different question

Both functions accept expand_over=, but the survivor unit and the question answered differ. This is the single most common source of confusion; pick the row that matches your claim.

Function Survivor unit Question Example claim
bhy(profiles, expand_over=["universe_id"]) (factor, universe) pair "Where is this factor significant?" "Momentum is significant in large_cap; value is significant in small_cap"
partial_conjunction(profiles, min_pass=2, expand_over=["universe_id"]) factor identity "Which factors are significant across all conditions?" "Momentum is significant across both universes"

In other words: bhy treats each universe as its own hypothesis and expands the family; partial_conjunction treats each universe as a condition the factor must pass jointly and aggregates back to one hypothesis per factor.

When not to reach for partial_conjunction

Real intent Reach for Why
"At least any condition is significant" bhy(profiles, expand_over=[...]) min_pass=1 is union semantics — FDR inflates to ~2q. partial_conjunction raises rather than implement this.
Rank candidates (no FDR control) compare compare is a view, not a filter.
Sensitivity to estimator / sample choice robustness (#178) Conditions there are methods, not data slices.
Cross-slice metric difference (descriptive) by_slice Returns per-slice metric values; no inference.
Cross-slice metric difference (inferential, slice-pairs) slice_pairwise_test / slice_joint_test Tests whether the slices' metric series differ, not whether the factor is jointly significant.

Strict vs lenient mode

n_conditions is the contract knob.

Mode When Behavior
Strict (n_conditions=int) Paper-grade; you know the design (e.g. exactly 2 universes, exactly 4 horizons) Identity with any condition count other than n_conditions raises. Data gaps surface fail-loud.
Lenient (n_conditions=None) EDA / prototyping; condition count varies by identity m inferred per identity from the data; only requires m >= min_pass.
# Strict: 2 universes required for every factor; missing one raises.
fx.multi_factor.partial_conjunction(
    profiles, min_pass=2, n_conditions=2, expand_over=["universe_id"]
)

# Lenient: "at least 3 of however many horizons each factor has".
fx.multi_factor.partial_conjunction(
    profiles, min_pass=3, expand_over=["fwd_period"]
)

How the math works

Per identity, the \(m\) per-condition \(p\)-values are reduced to a single PC \(p\)-value (Bonferroni-style, BH2008):

\[ p_{\text{PC}}^{(k/m)} = \min\bigl(1,\; (m - k + 1) \cdot p_{(k)}\bigr) \]

where \(p_{(k)}\) is the \(k\)-th smallest of the \(m\) \(p\)-values and \(k = \texttt{min\_pass}\). Two corner cases worth knowing:

  • \(k = m\) (full conjunction) → \(p_{\text{PC}} = \max(p)\). Reject only when even the worst condition is significant.
  • \(k = 1\) (union) → \(p_{\text{PC}} = m \cdot \min(p)\). Bonferroni-corrected minimum. Forbidden here — the surface raises with a pointer to bhy(expand_over=...), where the family-level FDR inflation is explicit rather than hidden in a "robust across" claim.

The PC \(p\)-values are then fed to a standard Benjamini-Hochberg-Yekutieli (BHY) step-up across identities, controlling group-level FDR ≤ q. The harmonic dependence correction \(c(m) = \sum 1/i\) is applied because PC \(p\)-values across identities are not generally positive regression dependence on a subset (PRDS) — sharing underlying panels makes the joint distribution unknown, so the conservative choice is the default.

BH2008 also presents a Simes-style PC combiner, which is less conservative under PRDS. factrix ships only the Bonferroni-style — it is uniformly valid without dependence assumptions; a Simes path may be added later if a use case demands it.

Survivors output

partial_conjunction returns the same Survivors container as bhy, populated with PC-specific metadata:

Field Meaning
profiles One representative profile per surviving identity (the first profile of that identity in input order)
adj_p BHY-adjusted PC \(p\)-value; survivor iff adj_p <= q
pc_p Raw PC \(p\)-value (pre-BHY)
min_pass The \(k\) you passed
n_tests Keyed by identity tuple (factor_id, forward_periods) → actual \(m\) used
n_passed_uncorr Per-identity count of raw \(p < q\). Descriptive — flags borderline (n_passed_uncorr == min_pass) and data-gap cases at a glance. Cutoff is your q, so the count moves with q — using it to override adj_p survivor selection is the anti-shopping failure mode this function exists to prevent.

Validation summary

Trigger Outcome
min_pass < 2 UserInputError. min_pass == 1 additionally points at bhy(expand_over=...).
expand_over empty / None UserInputError — the function is undefined without a condition axis.
expand_over names an identity field (factor_id / forward_periods) UserInputError (#160 anti-shopping defense — same as bhy).
n_conditions < min_pass UserInputError (unsatisfiable).
Strict mode: identity's condition count \(\neq\) n_conditions UserInputError — surfaces missing-universe / missing-horizon data gaps.
Identity with condition count \(<\) min_pass (lenient) UserInputError.
Duplicate (identity, expand_over_values) partition key UserInputError (family-resolution invariant).

References

  • [BH2008] Benjamini, Y. & Heller, R. (2008). Screening for partial conjunction hypotheses. Biometrics, 64(4), 1215–1222.
  • [BB2014] Benjamini, Y. & Bogomolov, M. (2014). Selective inference on multiple families of hypotheses. JRSS-B, 76(1).
  • [HY2014] Heller, R. & Yekutieli, D. (2014). Replicability analysis for genome-wide association studies. AOAS, 8(1).