factrix.multi_factor.partial_conjunction ¶
partial_conjunction(profiles: Iterable[FactorProfile], *, min_pass: int, expand_over: Sequence[str], n_conditions: int | None = None, estimator: Estimator | None = None, q: float = 0.05) -> Survivors
Partial conjunction screening: filter identities significant in
at least min_pass of m expanded conditions, false discovery rate (FDR)-controlled.
For "factor X is significant in universes A and B" style claims,
naive set(survivors_A) & set(survivors_B) does not preserve FDR
Benjamini-Bogomolov (2014). The partial conjunction test
Benjamini-Heller (2008) provides a contract-bearing path: per
identity, combine the m per-condition p-values into a single
PC p-value, then run Benjamini-Hochberg-Yekutieli (BHY) across identities.
The PC p-value formula (Bonferroni-style, BH2008): for k =
min_pass, p_PC = (m - k + 1) * p_((k)) capped at 1, where
p_((k)) is the k-th smallest p-value across the identity's
m conditions. k = m reduces to max(p) (full conjunction);
k = 1 reduces to m * min(p) (Bonferroni-union, forbidden
here — see below).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
profiles
|
Iterable[FactorProfile]
|
Iterable of :class: |
required |
min_pass
|
int
|
|
required |
expand_over
|
Sequence[str]
|
Required, non-empty. Context keys defining the
condition axis. Identity dimensions ( |
required |
n_conditions
|
int | None
|
Strict-mode declaration. |
None
|
estimator
|
Estimator | None
|
Optional inference-method override (#170). |
None
|
q
|
float
|
Nominal FDR target for the BHY step-up over PC p-values.
Default |
0.05
|
Returns:
| Type | Description |
|---|---|
Survivors
|
class: |
Survivors
|
surviving identity, using the first profile of that identity as |
Survivors
|
representative). |
Survivors
|
|
Survivors
|
condition count |
Survivors
|
raw p-values strictly below |
Raises:
| Type | Description |
|---|---|
UserInputError
|
|
Examples:
Three candidate factors evaluated in two regions; survive if significant in at least 2 of 2 regions:
>>> import dataclasses
>>> import factrix as fx
>>> from factrix.preprocess import compute_forward_return
>>> cfg = fx.AnalysisConfig.individual_continuous(forward_periods=5)
>>> profiles = [
... dataclasses.replace(
... fx.evaluate(
... compute_forward_return(
... fx.datasets.make_cs_panel(
... n_assets=100, n_dates=250,
... seed=hash((fid, region)) % 1000,
... ),
... forward_periods=5,
... ),
... cfg,
... ),
... factor_id=fid,
... context={"region": region},
... )
... for fid in ("alpha_1", "alpha_2", "alpha_3")
... for region in ("US", "EU")
... ]
>>> survivors = fx.multi_factor.partial_conjunction(
... profiles, min_pass=2, expand_over=["region"]
... )
Contract-bearing screening for the "factor X is significant in \(k\) of
\(m\) conditions" claim. Replaces the notebook idiom
set(survivors_a) & set(survivors_b), which does not preserve false discovery rate (FDR)
(Benjamini & Bogomolov 2014),
with the partial conjunction test of
Benjamini & Heller (2008).
import factrix as fx
# "Momentum is significant in BOTH large-cap AND small-cap universes"
profiles = [
fx.evaluate(panel_large, cfg, factor_col="mom"),
fx.evaluate(panel_small, cfg, factor_col="mom"),
# ... + value, quality, etc. one profile per (factor, universe) cell
]
survivors = fx.multi_factor.partial_conjunction(
profiles,
min_pass=2,
n_conditions=2,
expand_over=["universe_id"],
q=0.05,
)
Versus bhy(expand_over=...) — same data, different question¶
Both functions accept expand_over=, but the survivor unit and the
question answered differ. This is the single most common source of
confusion; pick the row that matches your claim.
| Function | Survivor unit | Question | Example claim |
|---|---|---|---|
bhy(profiles, expand_over=["universe_id"]) |
(factor, universe) pair |
"Where is this factor significant?" | "Momentum is significant in large_cap; value is significant in small_cap" |
partial_conjunction(profiles, min_pass=2, expand_over=["universe_id"]) |
factor identity |
"Which factors are significant across all conditions?" | "Momentum is significant across both universes" |
In other words: bhy treats each universe as its own hypothesis and
expands the family; partial_conjunction treats each universe as a
condition the factor must pass jointly and aggregates back to one
hypothesis per factor.
When not to reach for partial_conjunction¶
| Real intent | Reach for | Why |
|---|---|---|
| "At least any condition is significant" | bhy(profiles, expand_over=[...]) |
min_pass=1 is union semantics — FDR inflates to ~2q. partial_conjunction raises rather than implement this. |
| Rank candidates (no FDR control) | compare |
compare is a view, not a filter. |
| Sensitivity to estimator / sample choice | robustness (#178) |
Conditions there are methods, not data slices. |
| Cross-slice metric difference (descriptive) | by_slice |
Returns per-slice metric values; no inference. |
| Cross-slice metric difference (inferential, slice-pairs) | slice_pairwise_test / slice_joint_test |
Tests whether the slices' metric series differ, not whether the factor is jointly significant. |
Strict vs lenient mode¶
n_conditions is the contract knob.
| Mode | When | Behavior |
|---|---|---|
Strict (n_conditions=int) |
Paper-grade; you know the design (e.g. exactly 2 universes, exactly 4 horizons) | Identity with any condition count other than n_conditions raises. Data gaps surface fail-loud. |
Lenient (n_conditions=None) |
EDA / prototyping; condition count varies by identity | m inferred per identity from the data; only requires m >= min_pass. |
# Strict: 2 universes required for every factor; missing one raises.
fx.multi_factor.partial_conjunction(
profiles, min_pass=2, n_conditions=2, expand_over=["universe_id"]
)
# Lenient: "at least 3 of however many horizons each factor has".
fx.multi_factor.partial_conjunction(
profiles, min_pass=3, expand_over=["fwd_period"]
)
How the math works¶
Per identity, the \(m\) per-condition \(p\)-values are reduced to a single PC \(p\)-value (Bonferroni-style, BH2008):
where \(p_{(k)}\) is the \(k\)-th smallest of the \(m\) \(p\)-values and \(k = \texttt{min\_pass}\). Two corner cases worth knowing:
- \(k = m\) (full conjunction) → \(p_{\text{PC}} = \max(p)\). Reject only when even the worst condition is significant.
- \(k = 1\) (union) → \(p_{\text{PC}} = m \cdot \min(p)\). Bonferroni-corrected
minimum. Forbidden here — the surface raises with a pointer to
bhy(expand_over=...), where the family-level FDR inflation is explicit rather than hidden in a "robust across" claim.
The PC \(p\)-values are then fed to a standard Benjamini-Hochberg-Yekutieli (BHY) step-up across
identities, controlling group-level FDR ≤ q. The harmonic dependence
correction \(c(m) = \sum 1/i\) is applied because PC \(p\)-values across
identities are not generally positive regression dependence on a subset (PRDS) — sharing underlying panels makes the
joint distribution unknown, so the conservative choice is the default.
BH2008 also presents a Simes-style PC combiner, which is less conservative under PRDS. factrix ships only the Bonferroni-style — it is uniformly valid without dependence assumptions; a Simes path may be added later if a use case demands it.
Survivors output¶
partial_conjunction returns the same Survivors
container as bhy, populated with PC-specific metadata:
| Field | Meaning |
|---|---|
profiles |
One representative profile per surviving identity (the first profile of that identity in input order) |
adj_p |
BHY-adjusted PC \(p\)-value; survivor iff adj_p <= q |
pc_p |
Raw PC \(p\)-value (pre-BHY) |
min_pass |
The \(k\) you passed |
n_tests |
Keyed by identity tuple (factor_id, forward_periods) → actual \(m\) used |
n_passed_uncorr |
Per-identity count of raw \(p < q\). Descriptive — flags borderline (n_passed_uncorr == min_pass) and data-gap cases at a glance. Cutoff is your q, so the count moves with q — using it to override adj_p survivor selection is the anti-shopping failure mode this function exists to prevent. |
Validation summary¶
| Trigger | Outcome |
|---|---|
min_pass < 2 |
UserInputError. min_pass == 1 additionally points at bhy(expand_over=...). |
expand_over empty / None |
UserInputError — the function is undefined without a condition axis. |
expand_over names an identity field (factor_id / forward_periods) |
UserInputError (#160 anti-shopping defense — same as bhy). |
n_conditions < min_pass |
UserInputError (unsatisfiable). |
Strict mode: identity's condition count \(\neq\) n_conditions |
UserInputError — surfaces missing-universe / missing-horizon data gaps. |
Identity with condition count \(<\) min_pass (lenient) |
UserInputError. |
Duplicate (identity, expand_over_values) partition key |
UserInputError (family-resolution invariant). |
References¶
- [BH2008] Benjamini, Y. & Heller, R. (2008). Screening for partial conjunction hypotheses. Biometrics, 64(4), 1215–1222.
- [BB2014] Benjamini, Y. & Bogomolov, M. (2014). Selective inference on multiple families of hypotheses. JRSS-B, 76(1).
- [HY2014] Heller, R. & Yekutieli, D. (2014). Replicability analysis for genome-wide association studies. AOAS, 8(1).