A row stores data. An edge stores a link.
A claim stores evidence.

Every database has an atomic unit. In Attest, it's a claim - an assertion with a source, confidence, and timestamp attached. This changes what the database can do.

Relational row

('api-gateway', 'depends_on', 'redis')

A fact. No source, no confidence. If it's wrong, DELETE it - no trace.

Graph edge

(api-gateway)-[:DEPENDS_ON]->(redis)

A relationship. Says nothing about who established it or when.

Attest claim

k8s-manifest-v2.3 asserts
api-gateway depends_on redis
confidence: 1.0 · 2024-01-15

Evidence. Source is structural, not metadata. Retractable. Corroborable. Auditable.

Why the Primitive Matters

A row gives you a fact. An edge gives you a relationship. A claim gives you a fact with a reason to believe it. That reason - the source, the confidence, the timestamp - is what makes retraction, corroboration, and time-travel possible. Without it, you're just storing data and hoping it's true.

Claim Lifecycle

Every fact in Attest follows this lifecycle - from ingestion through potential retraction and recovery via corroboration.

Ingest

Source asserts a claim with provenance

→

Corroborate

Independent sources confirm the same fact

→

Retract

Source is wrong - claims are tombstoned

→

Cascade

Downstream claims auto-degrade

→

Survive

Corroborated facts remain valid

This is what "self-correcting" means. When a source turns out to be wrong, the engine traces the impact automatically. Facts with independent support survive. Facts that depended solely on the bad source are degraded. Nothing is deleted - everything is auditable.

Claims

A claim is the smallest unit of knowledge in Attest:

db.ingest(
    subject=("api-gateway", "service"),        # What entity
    predicate=("depends_on", "depends_on"),     # What relationship
    object=("redis", "service"),                # To what entity
    provenance={                                # Where this came from
        "source_type": "config_management",
        "source_id": "k8s-manifest-v2.3",
    },
    confidence=1.0,                              # How certain (0.0 - 1.0)
)

This isn't just a labeled edge in a graph. It's a record that says: "The Kubernetes manifest v2.3 asserts that api-gateway depends on redis, with full confidence." The source is part of the data, not metadata.

Claims are immutable. Once recorded, they're never modified. New information creates new claims. If a source is wrong, a retraction creates a tombstone - the original claim is preserved for audit.

Provenance

Every claim must have a source. This isn't a best practice - it's enforced by the engine. Writes without provenance are rejected. This means you can always answer two questions: "Where did we learn this?" and "Should we still trust it?"

Source types describe the kind of source:

Source Type	Examples
`config_management`	Kubernetes manifests, Terraform configs, org charts
`chat_extraction`	ChatGPT conversations, Claude sessions, Slack threads
`experiment_log`	ML experiment results, A/B test outcomes
`monitoring`	Datadog, PagerDuty, Prometheus alerts
`database_import`	Bulk imports from external databases
`human_annotation`	Manual entries by domain experts
`experimental`	Lab results, assay data
`literature_extraction`	Findings from papers and documents
`clinical_trial`	Clinical study results

The source_id identifies the specific source: a paper DOI, a K8s manifest version, a Slack channel and thread, an experiment run ID. Combined with the source type, this gives you a complete audit trail for every fact in the database.

Corroboration

When the same fact shows up from multiple independent sources, that's a stronger signal than a single source saying it. Attest tracks this automatically.

# A Kubernetes manifest says api-gateway depends on Redis
db.ingest(..., provenance={"source_id": "k8s-manifest", ...})

# An incident response chat independently confirms it
db.ingest(..., provenance={"source_id": "chat:incident-42:turn:0", ...})

# Both claims point to the same fact - corroboration is tracked
group = db.claims_by_content_id(claims[0].content_id)
print(f"{len(group)} independent sources confirm this")

This is why claim-native matters. In a traditional database, you'd have two rows with the same data - a duplicate. In Attest, you have one fact with two sources - corroboration. The difference becomes critical during retraction.

Retraction and Self-Correction

Sources can be wrong. Runbooks go stale, papers get retracted, configs change. In a traditional database, you'd delete the bad data and hope nothing depended on it.

Attest handles this structurally:

Simple retraction - marks the source's claims as retracted, creates an audit trail
Cascade retraction - also marks any downstream claims that cited the retracted source as degraded
Corroboration survives - if other independent sources support the same fact, it stays valid

# A runbook turns out to be outdated
cascade = db.retract_cascade("runbook-redis-v1", reason="Outdated procedure")
print(f"Retracted: {cascade.source_retract.retracted_count}")
print(f"Downstream degraded: {cascade.degraded_count}")

Nothing is deleted. The original claims are preserved. They're just marked so that queries know to treat them differently. This is what "self-correcting" means - the engine handles the consequences of bad data automatically.

Autonomous Learning (Autodidact)

Retraction handles bad data. But what about missing data? Attest can close knowledge gaps automatically through a background daemon called the autodidact.

The loop is simple: detect gaps in the knowledge graph, search for evidence (PubMed, Semantic Scholar, Perplexity, or any custom source), extract claims from what it finds, and ingest validated results. Then repeat.

Gap detection - scans for single-source entities, low-confidence claims, and explicit gaps
Evidence registry - built-in sources auto-register based on available API keys; free sources (PubMed, Semantic Scholar) always available
Cost caps - dual budget enforcement (call count + dollar amount per day) prevents runaway spending; conservative defaults ($1/day)
Event-driven triggers - source retractions and new inquiries wake the daemon immediately instead of waiting for the next timer tick
Negative results - when a search finds nothing, it records the dead end so future cycles skip it

db.enable_autodidact(interval=1800, max_cost_per_day=2.00)

# Check what it'll cost before committing
estimate = db.autodidact_cost_estimate(cycles=48)
print(f"Est. monthly: ${estimate['cost_per_month_capped']}")

# Monitor progress
status = db.autodidact_status()
print(f"Claims learned: {status.total_claims_ingested}")
print(f"Cost today: ${status.estimated_cost_today:.3f}")

Without any API keys or search functions, the daemon still runs as a continuous gap detector - it populates the task queue for external agents to pick up. With free sources only, it researches and ingests at zero marginal cost (only the LLM extraction call costs money).

Time Travel

Every claim carries a timestamp. You can query the knowledge base as it existed at any point in the past:

import time

# What did we know yesterday?
yesterday = time.time_ns() - 86_400 * 10**9
snapshot = db.at(yesterday)
frame = snapshot.query("api-gateway", depth=1)

This is possible because claims are immutable and append-only. New knowledge doesn't overwrite old knowledge - it layers on top. "What was known about the auth service when we decided to migrate it?" is a query, not a forensic investigation.

The Extraction Pipeline

Most knowledge isn't structured. It's in conversations, documents, and Slack threads. Attest has a built-in extraction pipeline that turns unstructured text into claims:

Parse - Break the input into messages or sections
Group - Pair user questions with assistant answers
Extract - Identify structured claims in the text
Curate - Filter contradictions and low-quality claims
Ingest - Store each claim with provenance tracing to the source conversation and turn

Every extracted claim carries its provenance: which conversation, which turn, which extraction method. You can always trace back to the original text.

Mode	API Key?	When to use
`"heuristic"`	No	Explicit relational text ("X depends on Y"). Fast and free.
`"llm"`	Yes	Nuanced or implicit relationships. 9 providers supported.
`"smart"`	Yes	Large volumes. Heuristic first, LLM only for new content. Saves cost.

Vocabularies

A vocabulary tells Attest what kinds of entities and relationships exist in your domain. It enforces type constraints - so a service can depend_on another service, but not on a feature.

Vocabulary	Entity Types	Relationships	Domain
`bio`	gene, protein, compound, disease, pathway, ...	binds, inhibits, treats, associated_with, ...	Biomedical research
`devops`	service, incident, alert, team, runbook, ...	depends_on, triggers, monitors, owns, ...	Infrastructure
`ml`	model, dataset, feature, experiment, ...	trained_on, outperforms, uses_feature, ...	ML experiments

You can register multiple vocabularies on the same database, or define your own. Concurrency note: Single-writer model (like SQLite). Concurrent reads OK. Use the REST API or MCP server for multi-process writes.

Why Claims, Not Rows or Edges

A claim (subject, predicate, object, source, confidence, timestamp) encodes strictly more information than a row, an edge, or an embedding vector. The graph, vectors, documents, and audit trail are all derived from claims - not stored separately. Retract a source and the edges disappear. Corroborate and edges strengthen. No separate systems to sync.

What you need	Conventional	With Attest
Relationships	Graph DB (Neo4j, etc.)	Derived from claim triples
Semantic search	Vector DB (Pinecone, Weaviate)	Embeddings computed from graph structure
Evidence / context	Document store (S3, Elasticsearch)	`evidence_text` in claim payloads
Audit trail	Separate audit log	Provenance is structural - every claim carries its source
Deduplication	ETL pipelines	`content_id` groups corroborating sources automatically
Correction	DELETE and hope	Retraction cascades, corroborated facts survive

No ETL between systems. No sync failures. No “the graph says X but the vector DB returns Y.” One primitive that handles everything an LLM-embedded system needs to learn, remember, and self-correct.

A row stores data. An edge stores a link.A claim stores evidence.