factrix.multi_factor.partial_conjunction ¶

partial_conjunction(profiles: Iterable[FactorProfile], *, min_pass: int, expand_over: Sequence[str], n_conditions: int | None = None, estimator: Estimator | None = None, q: float = 0.05) -> Survivors

Partial conjunction screening: filter identities significant in at least min_pass of m expanded conditions, false discovery rate (FDR)-controlled.

For "factor X is significant in universes A and B" style claims, naive set(survivors_A) & set(survivors_B) does not preserve FDR Benjamini-Bogomolov (2014). The partial conjunction test Benjamini-Heller (2008) provides a contract-bearing path: per identity, combine the m per-condition p-values into a single PC p-value, then run Benjamini-Hochberg-Yekutieli (BHY) across identities.

The PC p-value formula (Bonferroni-style, BH2008): for k = min_pass, p_PC = (m - k + 1) * p_((k)) capped at 1, where p_((k)) is the k-th smallest p-value across the identity's m conditions. k = m reduces to max(p) (full conjunction); k = 1 reduces to m * min(p) (Bonferroni-union, forbidden here — see below).

Parameters:

Name	Type	Description	Default
`profiles`	`Iterable[FactorProfile]`	Iterable of :class:`FactorProfile`. Conditions per identity come from `expand_over`; multiple profiles sharing an identity must differ on at least one `expand_over` key (the standard `_resolve_family` uniqueness check).	required
`min_pass`	`int`	`k` in "k of m" — minimum number of conditions required to be significant. Must be `>= 2`; `min_pass=1` is union semantics (FDR ≈ `2q` under independence) and raises with a pointer to `bhy(expand_over=...)`.	required
`expand_over`	`Sequence[str]`	Required, non-empty. Context keys defining the condition axis. Identity dimensions (`factor_id` / `forward_periods`) are rejected by the family-resolution layer (#160 anti-shopping defense).	required
`n_conditions`	`int \| None`	Strict-mode declaration. `None` (lenient) lets `m` be inferred per identity from the data; an `int` requires every identity to have exactly that many conditions and raises on mismatch (paper-grade — surfaces data gaps fail-loud).	`None`
`estimator`	`Estimator \| None`	Optional inference-method override (#170). `None` uses each profile's `primary_p`.	`None`
`q`	`float`	Nominal FDR target for the BHY step-up over PC p-values. Default `0.05`.	`0.05`

Returns:

Type	Description
`Survivors`	class:`Survivors` in input order (deduplicated to one row per
`Survivors`	surviving identity, using the first profile of that identity as
`Survivors`	representative). `adj_p` is the BHY-adjusted PC p-value;
`Survivors`	`pc_p` is the raw PC p-value; `n_tests[identity]` is the
`Survivors`	condition count `m`; `n_passed_uncorr[i]` is the count of
`Survivors`	raw p-values strictly below `q` for survivor `i`.

Raises:

Type	Description
`UserInputError`	`min_pass < 2` (with `min_pass=1` flagged as union semantics); `expand_over` empty or `None`; `n_conditions < min_pass`; strict-mode `n_conditions` mismatch with actual condition count; identity with fewer than `min_pass` conditions; family-resolution invariants (unknown `expand_over` key, identity-shadowing, duplicate partition key).

Examples:

Three candidate factors evaluated in two regions; survive if significant in at least 2 of 2 regions:

>>> import dataclasses
>>> import factrix as fx
>>> from factrix.preprocess import compute_forward_return
>>> cfg = fx.AnalysisConfig.individual_continuous(forward_periods=5)
>>> profiles = [
...     dataclasses.replace(
...         fx.evaluate(
...             compute_forward_return(
...                 fx.datasets.make_cs_panel(
...                     n_assets=100, n_dates=250,
...                     seed=hash((fid, region)) % 1000,
...                 ),
...                 forward_periods=5,
...             ),
...             cfg,
...         ),
...         factor_id=fid,
...         context={"region": region},
...     )
...     for fid in ("alpha_1", "alpha_2", "alpha_3")
...     for region in ("US", "EU")
... ]
>>> survivors = fx.multi_factor.partial_conjunction(
...     profiles, min_pass=2, expand_over=["region"]
... )

Contract-bearing screening for the "factor X is significant in \(k\) of \(m\) conditions" claim. Replaces the notebook idiom set(survivors_a) & set(survivors_b), which does not preserve false discovery rate (FDR) (Benjamini & Bogomolov 2014), with the partial conjunction test of Benjamini & Heller (2008).

import factrix as fx

# "Momentum is significant in BOTH large-cap AND small-cap universes"
profiles = [
    fx.evaluate(panel_large, cfg, factor_col="mom"),
    fx.evaluate(panel_small, cfg, factor_col="mom"),
    # ... + value, quality, etc. one profile per (factor, universe) cell
]
survivors = fx.multi_factor.partial_conjunction(
    profiles,
    min_pass=2,
    n_conditions=2,
    expand_over=["universe_id"],
    q=0.05,
)

Versus `bhy(expand_over=...)` — same data, different question¶

Both functions accept expand_over=, but the survivor unit and the question answered differ. This is the single most common source of confusion; pick the row that matches your claim.

Function	Survivor unit	Question	Example claim
`bhy(profiles, expand_over=["universe_id"])`	`(factor, universe)` pair	"Where is this factor significant?"	"Momentum is significant in `large_cap`; value is significant in `small_cap`"
`partial_conjunction(profiles, min_pass=2, expand_over=["universe_id"])`	`factor` identity	"Which factors are significant across all conditions?"	"Momentum is significant across both universes"

In other words: bhy treats each universe as its own hypothesis and expands the family; partial_conjunction treats each universe as a condition the factor must pass jointly and aggregates back to one hypothesis per factor.

When not to reach for `partial_conjunction`¶

Real intent	Reach for	Why
"At least any condition is significant"	`bhy(profiles, expand_over=[...])`	`min_pass=1` is union semantics — FDR inflates to ~2q. `partial_conjunction` raises rather than implement this.
Rank candidates (no FDR control)	`compare`	`compare` is a view, not a filter.
Sensitivity to estimator / sample choice	`robustness` (#178)	Conditions there are methods, not data slices.
Cross-slice metric difference (descriptive)	`by_slice`	Returns per-slice metric values; no inference.
Cross-slice metric difference (inferential, slice-pairs)	`slice_pairwise_test` / `slice_joint_test`	Tests whether the slices' metric series differ, not whether the factor is jointly significant.

Strict vs lenient mode¶

n_conditions is the contract knob.

Mode	When	Behavior
Strict (`n_conditions=int`)	Paper-grade; you know the design (e.g. exactly 2 universes, exactly 4 horizons)	Identity with any condition count other than `n_conditions` raises. Data gaps surface fail-loud.
Lenient (`n_conditions=None`)	EDA / prototyping; condition count varies by identity	`m` inferred per identity from the data; only requires `m >= min_pass`.

# Strict: 2 universes required for every factor; missing one raises.
fx.multi_factor.partial_conjunction(
    profiles, min_pass=2, n_conditions=2, expand_over=["universe_id"]
)

# Lenient: "at least 3 of however many horizons each factor has".
fx.multi_factor.partial_conjunction(
    profiles, min_pass=3, expand_over=["fwd_period"]
)

How the math works¶

Per identity, the \(m\) per-condition \(p\)-values are reduced to a single PC \(p\)-value (Bonferroni-style, BH2008):

\[ p_{\text{PC}}^{(k/m)} = \min\bigl(1,\; (m - k + 1) \cdot p_{(k)}\bigr) \]

where \(p_{(k)}\) is the \(k\)-th smallest of the \(m\) \(p\)-values and \(k = \texttt{min\_pass}\). Two corner cases worth knowing:

\(k = m\) (full conjunction) → \(p_{\text{PC}} = \max(p)\). Reject only when even the worst condition is significant.
\(k = 1\) (union) → \(p_{\text{PC}} = m \cdot \min(p)\). Bonferroni-corrected minimum. Forbidden here — the surface raises with a pointer to bhy(expand_over=...), where the family-level FDR inflation is explicit rather than hidden in a "robust across" claim.

The PC \(p\)-values are then fed to a standard Benjamini-Hochberg-Yekutieli (BHY) step-up across identities, controlling group-level FDR ≤ q. The harmonic dependence correction \(c(m) = \sum 1/i\) is applied because PC \(p\)-values across identities are not generally positive regression dependence on a subset (PRDS) — sharing underlying panels makes the joint distribution unknown, so the conservative choice is the default.

BH2008 also presents a Simes-style PC combiner, which is less conservative under PRDS. factrix ships only the Bonferroni-style — it is uniformly valid without dependence assumptions; a Simes path may be added later if a use case demands it.

Survivors output¶

partial_conjunction returns the same Survivors container as bhy, populated with PC-specific metadata:

Field	Meaning
`profiles`	One representative profile per surviving identity (the first profile of that identity in input order)
`adj_p`	BHY-adjusted PC \(p\)-value; survivor iff `adj_p <= q`
`pc_p`	Raw PC \(p\)-value (pre-BHY)
`min_pass`	The \(k\) you passed
`n_tests`	Keyed by identity tuple `(factor_id, forward_periods)` → actual \(m\) used
`n_passed_uncorr`	Per-identity count of raw \(p < q\). Descriptive — flags borderline (`n_passed_uncorr == min_pass`) and data-gap cases at a glance. Cutoff is your `q`, so the count moves with `q` — using it to override `adj_p` survivor selection is the anti-shopping failure mode this function exists to prevent.

Validation summary¶

Trigger	Outcome
`min_pass < 2`	`UserInputError`. `min_pass == 1` additionally points at `bhy(expand_over=...)`.
`expand_over` empty / `None`	`UserInputError` — the function is undefined without a condition axis.
`expand_over` names an identity field (`factor_id` / `forward_periods`)	`UserInputError` (#160 anti-shopping defense — same as `bhy`).
`n_conditions < min_pass`	`UserInputError` (unsatisfiable).
Strict mode: identity's condition count \(\neq\) `n_conditions`	`UserInputError` — surfaces missing-universe / missing-horizon data gaps.
Identity with condition count \(<\) `min_pass` (lenient)	`UserInputError`.
Duplicate `(identity, expand_over_values)` partition key	`UserInputError` (family-resolution invariant).

References¶

[BH2008] Benjamini, Y. & Heller, R. (2008). Screening for partial conjunction hypotheses. Biometrics, 64(4), 1215–1222.
[BB2014] Benjamini, Y. & Bogomolov, M. (2014). Selective inference on multiple families of hypotheses. JRSS-B, 76(1).
[HY2014] Heller, R. & Yekutieli, D. (2014). Replicability analysis for genome-wide association studies. AOAS, 8(1).