How causal composition across 50 million claims predicted that CD28 upregulates PD-L1 — before the paper was published.
In December 2024, a team published in Cancer Cell that CD28 — previously known only as a T-cell co-stimulatory receptor — is expressed inside cancer cells, where it binds and stabilizes PD-L1 mRNA, driving immune evasion. This was completely unexpected. CD28 upregulating PD-L1 in cancer cells was not in any database.
The paper: "Inhibiting intracellular CD28 in cancer cells enhances antitumor immunity and overcomes anti-PD-1 resistance via targeting PD-L1" — Cancer Cell, December 2024 (PMID: 39672166)
Our reference database contains 49,926,718 claims from 30+ public databases (STRING, CTD, Reactome, DrugBank, DisGeNET, PrimeKG, and more), all ingested before this paper was published.
Running db.predict("gene_940") (CD28) with causal composition:
CD28 --[upregulates]--> CD274 (PD-L1)
12 supporting paths via 12 independent intermediaries
1 opposing path
Consensus: 92%
Evidence (co-regulation through shared chemical responses):
CD28 -[upregulates]-> Olaparib -[upregulates]-> PD-L1
CD28 -[downregulates]-> Nifedipine -[downregulates]-> PD-L1
CD28 -[upregulates]-> Phorbol 12-myristate 13-acetate -[upregulates]-> PD-L1
CD28 -[upregulates]-> Lipopolysaccharide -[upregulates]-> PD-L1
... 8 more independent paths
Query time: 2,723ms on 50M claims
No LLM was involved. No text was read. The prediction emerged purely from the structure of the knowledge graph — 12 independent chemical intermediaries all agree that agents which upregulate CD28 also upregulate PD-L1.
Traditional knowledge graphs store facts: "Gene A interacts with Gene B." Attest stores claims with causal predicates: "Compound X upregulates Gene A" (from CTD), "Compound X upregulates Gene B" (from CTD).
Causal composition follows these directed edges through intermediaries and applies biochemical logic:
| Hop 1 | Hop 2 | Composed |
|---|---|---|
| upregulates | upregulates | upregulates |
| downregulates | downregulates | upregulates (double negative) |
| upregulates | downregulates | downregulates |
| inhibits | inhibits | activates (double negative) |
When multiple independent intermediaries agree on the composed direction, the prediction has convergent evidence. CD28 → PD-L1 had 12:1 consensus — 12 independent compounds agree on upregulation, only 1 suggests downregulation.
We validated predictions for TP53 against published literature (7 gene targets with known ground truth):
| Prediction | Literature | Verdict |
|---|---|---|
| TP53 → CDKN1A (upregulates) | Textbook biology | Correct |
| TP53 → BAX (upregulates) | Textbook biology | Correct |
| TP53 → IL6 (upregulates) | Known p53 target | Correct |
| TP53 → CCN2 (upregulates) | JCI 2011, mechanism known | Correct |
| TP53 → DUSP10 (upregulates) | IJMS 2019, CRC genotoxic stress | Correct |
| TP53 → THBD (upregulates) | p53 actually represses THBD | Wrong direction |
| TP53 → BMP2 (upregulates) | Context-dependent, leaning wrong | Wrong direction |
5/7 correct = 71% precision. False positives occur when the same gene has opposite effects in different tissues (p53 activates some targets but represses others depending on cellular context). We filter the worst offenders using contradictory leg detection: when the source gene both upregulates and downregulates the same intermediary, that intermediary is context-dependent and excluded from predictions.
We tested whether the graph could predict findings from recent high-impact papers, using only data that predates each publication:
| Paper | Finding | Attest prediction | Verdict |
|---|---|---|---|
| Cancer Cell 2024 | CD28 upregulates PD-L1 | upregulates (12:1) | Correct |
| Nat Commun 2025 | KRAS downregulates BRCA1 | downregulates (19:17) | Correct |
| Nat Commun 2024 | PRMT5 activates FUS | upregulates (25:17) | Correct |
| Cell Death Differ 2025 | DYRK2 inhibits USP28 | downregulates (3:2) | Close |
| Nature 2022 | ADAR1 inhibits ZBP1 | upregulates (9:0) | Wrong |
The ADAR1 error is instructive: ADAR1 and ZBP1 are both interferon-stimulated genes (co-upregulated by the same stimuli), but ADAR1 actually inhibits ZBP1 post-transcriptionally via RNA editing. Co-regulation evidence cannot capture post-transcriptional inhibition.
After loading SemMedDB (35.8M literature-extracted predications from PubMed), the evidence quality stack now catches this error automatically:
The principle: NLP-extracted predications have ~30% directional error rate. With 1-2 claims, errors dominate. With 100+, the signal crushes the noise. directional_confidence() requires a minimum of 3 independent sources before trusting any directional prediction.
from attestdb import AttestDB
db = AttestDB.open_read_only("reference.attest")
# Discover novel predictions for any entity
predictions = db.predict("gene_940") # CD28
for p in predictions[:5]:
print(f"{p.predicted_predicate} -> {p.target}")
print(f" {p.supporting_paths} supporting, {p.opposing_paths} opposing")
print(f" gap: {p.is_gap}, consensus: {p.consensus:.0%}")
# Test a specific hypothesis
verdict = db.what_if(
("gene_940", "gene"),
("upregulates", "relation"),
("gene_29126", "gene"),
)
print(verdict.verdict) # "plausible"
print(verdict.explanation) # "12 causal path(s) supporting"
Available as MCP tools (attest_predict, attest_what_if) for AI-native workflows. 77 MCP tools total.
We ran db.predict("gene_7157") (TP53) and validated the top 8 predictions — relationships with zero direct claims in the database, predicted purely from causal composition through 12-16 independent intermediaries:
| Prediction | Paths | Literature | Verdict |
|---|---|---|---|
| TP53 → TYMS | 12 | p53 represses TYMS promoter by >95% (1997) | Textbook |
| TP53 → EIF4EBP1 | 12 | p53→AMPK→mTOR→4E-BP1 axis | Textbook |
| TP53 → VIM | 13 | p53 suppresses vimentin via miR-200c | Textbook |
| TP53 → GJA1 | 13 | Mutant p53 degrades Connexin 43 (2022) | Emerging |
| TP53 → SATB1 | 12 | p53 binds SATB1 promoter (2024) | Emerging |
| TP53 → PDHX | 16 | Indirect via PDK2, not PDHX directly | Indirect |
| TP53 → WIF1 | 12 | Plausible (p53 antagonizes Wnt), no direct evidence | Novel |
| TP53 → PFN1 | 16 | Known PFN1→p53, reverse direction not published | Novel |
5/8 confirmed, 0 contradicted. TYMS is the standout: a textbook p53 target (discovered 1997, extensively validated) that had zero directional claims in our 85M-claim database — yet 12 independent causal composition paths recovered it in 2.2 seconds.
We ran predict() on EGFR, BRCA1, and KRAS in addition to TP53 — generating 1,149 predictions across 4 genes in under 2 minutes total. Validated top predictions from each:
| Gene | Prediction | Paths | Evidence | Verdict |
|---|---|---|---|---|
| KRAS | → SUZ12 | 16 | Cancer Cell 2016 — PRC2 barrier to KRAS-driven EMT | Confirmed |
| KRAS | → GNPDA1 | 18 | KRAS drives hexosamine pathway (Cell 2012) | Confirmed |
| EGFR | → CCNB2 | 16 | EGFR signaling drives G2/M cyclins | Confirmed |
| EGFR | → MCOLN1 | 21 | EGFR→mTOR→TRPML1 lysosomal axis | Plausible |
| BRCA1 | → STUB1 | 20 | Both E3 ligases in breast cancer UPS | Plausible |
| KRAS | → CSRP1 | 20 | No direct evidence; CSRP2 has MAPK links | Novel |
| BRCA1 | → CSRP1 | 19 | No direct evidence — same target, independent gene | Novel |
8/17 confirmed, 0 contradicted (47% precision). CSRP1 is the most interesting novel finding — independently predicted by both KRAS (20 paths) and BRCA1 (19 paths) through different intermediaries. The BRCA1→CSRP1 prediction was computationally validated: anticorrelation in TCGA (n=1,218, ρ=−0.42, p=10−52) and independently replicated in METABRIC (n=1,980, ρ=−0.22, p=10−24).