Corroboration, contradiction, causal prediction, temporal decay, gap analysis — these aren't features bolted onto a database. They're consequences of storing claims instead of facts.
A claim is a 7-tuple: who asserted what, about which entities, with what confidence, when, in which namespace.
Claims are immutable and append-only. The system never modifies or deletes a claim. New information creates new claims. Retraction creates a tombstone — the original is preserved for audit.
Every claim carries two cryptographic identifiers. This is the structural trick that makes corroboration and deduplication automatic.
The claim's unique identity. No two distinct claims share a claim_id. It answers: who said what, when?
The claim's semantic identity — what fact it asserts, regardless of who said it or when. All claims asserting the same triple share a content_id.
Three labs publish "Drug X activates Gene A." Each gets a unique claim_id. But they share a content_id — because they're asserting the same fact. That shared ID is how the engine counts independent sources automatically.
When multiple independent sources assert the same fact, the engine boosts confidence logarithmically. One source gives no boost. Two gives 1.3x. Four gives 1.6x. Eight caps at 1.7x.
"Independent" is real deduplication, not a count of rows. Claims sharing a DOI, PMID, or overlapping provenance chain are grouped into one source. Five papers citing the same upstream study count as one source, not five.
Predicates have three algebraic properties that enable the engine to reason about claims without domain-specific knowledge.
Some predicates are opposites: activates ↔ inhibits,
causes ↔ prevents,
promotes ↔ suppresses.
If both (S, P, O) and (S, opposite(P), O) exist, the engine flags a contradiction
and counts the evidence on each side.
Directional predicates compose like multiplication of signs.
This is what powers predict() — the engine walks 2-hop causal
chains and composes predicates algebraically to discover novel relationships.
| First hop | Second hop | Composed result | Logic |
|---|---|---|---|
| activates | activates | activates | positive × positive = positive |
| activates | inhibits | inhibits | positive × negative = negative |
| inhibits | activates | inhibits | negative × positive = negative |
| inhibits | inhibits | activates | negative × negative = positive |
| prevents | prevents | causes | double negative |
Some predicates are symmetric: if A interacts_with B, then
B interacts_with A. Symmetric predicates don't compose — they
represent undirected associations, not causal chains.
A small pharmacology scenario showing how the capabilities compose.
Three independent labs report that Drug X activates Gene A. Two studies report Gene A inhibits Protein B.
The three papers share a content_id because they assert the same triple.
Corroboration boost: 1.48x (3 independent sources).
A fourth paper says Drug X inhibits Gene A. The engine detects:
opposite(activates) = inhibits, same entity pair → contradiction.
Evidence ratio: 3 vs 1.
predict("Drug X") walks 2-hop causal chains and composes predicates:
Snapshot at last week: only "activates" exists. what_if("Drug X activates Gene A")
returns supported.
Snapshot at today: contradiction exists. Same query returns
contested.
No ML model. No training data. No statistical inference. The prediction falls directly out of the composition table. The contradiction falls out of the opposition relation. The corroboration falls out of the dual identity system. Every capability is a consequence of the data structure.
Confidence decays exponentially at query time. The stored claim is never modified.
Half-lives are configurable per predicate. Operational facts (has_status)
decay in 30 days. Durable science (inhibits, binds) decays in
730 days. A fact corroborated by 50 old sources and 1 fresh source may have the fresh
source dominate effective confidence — without anyone deleting or updating anything.
Each capability emerges from a structural property of the claim. They compose arbitrarily because they all operate on the same underlying 7-tuple.
Independent sources asserting the same fact strengthen confidence.
Opposing predicates on the same entity pair, weighted by evidence count.
2-hop composition discovers novel relationships via predicate algebra.
Recent claims outweigh old ones at query time, without mutation.
Trace any conclusion to its source data. Retraction cascades through derivation chains.
Detect unknown relationships between known entities — the most actionable missing knowledge.
what_if() evaluates a hypothesis against existing evidence without modifying anything.
Query the knowledge base at any point in history. All capabilities work on snapshots.
PageRank, betweenness, community detection — all derived from the claim log.
Identify inflection points where the rate of knowledge accumulation changes.
Normalize variants, merge cross-system identifiers, union-find alias groups.
Disjoint partitions with RBAC. Each tenant sees a complete but independent claim space.
These capabilities are individually useful. Their real power is in composition.
"Find predictions where each step is independently corroborated, and the predicted relationship doesn't already exist." This is the core drug repurposing pattern: Gene A activates Protein B (8 papers) and Protein B inhibits Disease C (3 trials). The predicted Gene A → Disease C is high-confidence because each step is well-sourced.
"When did this controversy start, and has subsequent evidence resolved it?" Compare snapshots to trace the timeline: at T1 only one side exists; at T2 the opposition appears; at T3 new corroboration shifts the evidence ratio. Provenance tracing identifies which sources drove each phase.
"Run predictions within a tenant's data, respecting sensitivity levels." A pharmaceutical company's predictions draw only from their own namespace plus public claims. Predictions requiring restricted data from another tenant are simply invisible — the algebra operates on a reduced but correct claim space.
Traditional databases store facts and trust them. A claim-native database stores assertions about facts — each carrying provenance, confidence, and a timestamp.
This inversion lets the system reason about why it believes something (provenance), how strongly (confidence × corroboration), whether that belief is contested (contradiction detection), what it might imply (causal composition), and when the belief changed (temporal analysis).
A traditional database could implement any one of these as a feature. But the claim-native model enables arbitrary composition because all 12 capabilities operate on the same underlying structure — the immutable, timestamped, provenanced claim.