Benchmarks - Attest

At a Glance

How Attest compares to conventional approaches on the metrics that matter for knowledge systems.

Metric	Attest	Graph DB	Vector DB	Relational DB
Write throughput	1.3M claims/sec	~50K edges/sec (Neo4j batch)	~10K vectors/sec	~100K rows/sec
Point query	8 µs	~1 ms	~5 ms (ANN)	~0.1 ms (indexed)
Provenance	Required on every write	Optional property	None	Optional column
Contradiction handling	Native - both claims coexist	Custom schema	N/A	Custom schema
Source retraction	One call, cascade + audit	Custom logic	Delete + re-embed	Custom logic
Time travel	Free (append-only log)	Snapshot restore	No	Temporal tables (Postgres 16+)
Infrastructure	`pip install attestdb`	Server required	Server or cloud	Server required

Graph/Vector/Relational figures are based on public documentation. Attest numbers come from automated benchmarks in this repo.

Engine Performance

Attest ships a custom Rust storage engine - append-only claim log with maintained indexes, file locking, and CRC32 crash recovery.

Operation	Performance	Notes
Claim ingestion	1.3M claims/sec	Append-only log with maintained indexes
Entity query	8 µs	In-memory adjacency lookup
BFS traversal (depth 2)	15 µs	Full subgraph extraction
Adjacency list build (1K claims)	223 µs	Cold start from claim log

How we measured

Rust engine: Criterion benchmarks in rust/attest-store/benches/store_bench.rs. 1,000 pre-built claims, 100 entities, in-memory store. black_box() prevents compiler optimization.
ContextFrame assembly bottleneck is Python-side, not storage lookup - Python overhead for query result assembly.

Reproduce it

# Rust microbenchmarks
$ cd rust && cargo bench

# Python performance tests
$ uv run pytest tests/integration/test_performance.py -v

Curator Accuracy

The curator triages incoming claims: store, skip, or flag for review. We test this against a set of 250 expert-labeled claims.

Metric	Result	Target
Overall accuracy	98%	>80%
False positive rate	<1%	-
False negative rate	<2%	-

This is the heuristic curator (no LLM). It runs offline, with zero API calls. LLM-backed curators can achieve higher accuracy on nuanced claims but require an API key.

Reproduce it

$ uv run pytest tests/eval/test_curator_accuracy.py -v

Edge Recovery (Link Prediction)

Given 80% of a real biomedical knowledge graph (Hetionet), can the system predict the withheld 20%? This tests whether structural embeddings capture real biomedical relationships - not just text similarity.

Metric	Result	Target
Edge recovery (recall)	17.35%	>15%
Method	Damped random walk on D^−½ A D^−½	-
Dataset	Hetionet ego network (~200 entities, ~5K edges)	-

Reproduce it

$ uv run pytest tests/eval/test_hetionet_holdout.py -m slow -s

Causal Composition Prediction

2-hop causal predicate composition across 85.7M claims from 30+ databases. Holdout evaluation: remove 20% of causal edges per gene, predict from remaining 80%.

Metric	Result	Target
Holdout recall (20 genes)	14.1% (554/3,938)	-
Enrichment over random	4,340×	-
Co-occurrence baseline	58.6%	-
Literature validation	8/17 confirmed, 0 contradicted	-
Causal edge query	0–2 ms	-
predict() latency	2–16 s (50 intermediaries)	-
Novel finding validated	BRCA1→CSRP1 anticorrelation (ρ=−0.42, 4,183 patients)	-

Structural Embeddings vs. Vector DB

Attest computes embeddings from graph topology (SVD on normalized adjacency), not from text. "Aspirin" is near "inflammation" because they’re connected in the graph, not because the words appear together in a corpus. This means:

No embedding model needed - zero API cost, no GPU, deterministic results
Embeddings encode relationships, not words - entities connected to similar neighbors are close in embedding space
Auto-update - when claims change, embeddings recompute from the new graph. No separate vector DB to sync
Link prediction built-in - 17.35% edge recovery on Hetionet proves the embeddings capture real biomedical structure

Comparison based on public documentation review:

Capability	Attest	Vector DB + Metadata
Embedding source	Graph topology (SVD)	Text (sentence transformers)
Update cost	O(recompute SVD) - seconds	Re-embed changed docs - minutes to hours
Link prediction	Built-in - 17.35% recall	Not a feature
Contradiction detection	Structural - opposite predicates	Not possible with cosine similarity
Provenance on results	Every result traces to source claims	Metadata if you added it
Infrastructure	Zero - embedded, single file	Separate vector DB service

Ingestion Pipeline

Attest’s connectors aren’t just data loaders - they run a full extraction pipeline:

Connect

Fetch from source

→

Extract

LLM or heuristic claim extraction

→

Validate

13 rules, entity normalization

→

Store

Provenance chain, embeddings, corroboration

Three lines to ingest from any source with full provenance:

db = AttestDB("knowledge.attest")
conn = db.connect("slack", token="xoxb-...", channels=["#research"])
result = conn.run(db)  # extracts, validates, ingests with provenance

Without Attest, you’d build each of these yourself:

Step	What you build	Attest handles it
Fetch	Slack/Teams/Gmail API pagination	30 connectors
Extract	LLM prompt engineering for claims	`ingest_text()` / `ingest_chat()`
Normalize	Unicode NFKD, Greek letters, dedup	Locked normalization (Python + Rust)
Validate	Custom schema + rules	13 validation rules on every write
Provenance	Custom source tracking	Structural - required on every claim
Contradictions	Custom logic	Opposite predicates + confidence
Embeddings	Separate vector DB call	Auto-computed from graph topology

Cross-Language Verification

The Python and Rust layers must produce identical results - same entity IDs, same claim hashes, same content IDs. We verify this with 118 golden test vectors covering entity normalization, hashing, chain hashes, and confidence scoring.

What's tested	Vectors	Status
Entity normalization (Unicode, Greek, whitespace)	51	Bit-identical
Hashing (claim ID + content ID, SHA-256)	20	Bit-identical
Chain hash (Merkle audit chain)	13	Bit-identical
Confidence scoring (Tier-1)	26	Bit-identical

Reproduce it

# Generate vectors from Python
$ uv run python scripts/generate_golden_vectors.py

# Verify in Rust
$ cd rust && cargo test

What Only Claim-Native Can Benchmark

Traditional graph database benchmarks (LDBC SNB, etc.) measure query throughput and traversal latency. Those benchmarks don't test the things that make Attest different, because no other database does them.

Retraction Cascade

Retract a source and every downstream claim is automatically marked as degraded. Corroborated facts survive.

Corroboration Tracking

Same fact from two independent sources? The engine tracks it as corroboration, not a duplicate.

Time Travel

Query the knowledge base as it existed at any past timestamp. Append-only claim log makes this free.

Provenance Audit

Every fact traces back to its source. No claim exists without provenance - the engine rejects it.

Test Suite

Suite	Tests	Runtime
Python unit + integration	976	~60s
Rust unit + golden vectors	124	~3s
Eval (Hetionet, curator accuracy)	6	~9 min

# Run everything except slow eval tests
$ uv run pytest tests/unit/ tests/integration/ -q

# Run Rust tests
$ cd rust && cargo test

# Run full eval suite (slow, downloads data)
$ uv run pytest tests/eval/ -m slow -s

Capability Comparison

Most tools store facts. Attest stores claims - with provenance, confidence, and contradiction handling built into the engine. Here's how that changes what's possible.

This comparison is based on public documentation review. Where we've tested a system directly, we note it. Capabilities marked as "possible with custom code" mean the core engine doesn't provide it out of the box.

Capability	Attest	Mem0	Letta / MemGPT	Zep / Graphiti	LangGraph	Neo4j	PostgreSQL	Vector DBs
Provenance on every write	Required - engine rejects writes without source	No	No	Partial - conversation-level	No	Optional property	Optional column	No
Contradictions coexist	Native - both claims stored with confidence	Overwrites	Overwrites	Based on public docs: last-write-wins	No - checkpoint overwrites state	Possible with custom schema	Possible with custom schema	N/A - no structured facts
Source retraction	One call - corroborated facts survive, cascade audit	No	No	No	No	Custom logic	Custom logic	Delete + re-embed
Multi-source corroboration	Automatic - content_id grouping + confidence boost	No	No	Based on public docs: not built-in	No	Custom queries	Custom queries	No
Confidence tracking	Per-claim, Tier-1 + Tier-2 scoring	No	No	Edge weights	No	Property	Column	Similarity score only
Impact analysis	`db.impact(source_id)`	No	No	No	No	Custom Cypher	Custom SQL	No
Knowledge drift	`db.drift(days=30)`	No	No	No	No	Custom queries	Custom queries	No
Time-travel queries	`db.at(timestamp)`	No	No	Based on public docs: not built-in	Checkpoint history - but no structured time-travel queries	No (needs temporal graphs extension)	Possible with temporal tables	No
Audit trail	`db.audit(claim_id)` - full chain	No	No	Partial	No	Custom queries	Custom queries	No
Zero infrastructure	`pip install attestdb` - embedded	Hosted service	Server required	Server required	Server required (LangGraph Platform)	Server required	Server required	Varies - some embedded

The Core Difference

Attest is not a general-purpose database, a vector store, or an LLM memory layer. It's a claim-native database - purpose-built for the case where knowledge comes from multiple sources, contradicts itself, and needs to be retracted or corrected over time.

If your use case is "store text and retrieve it by similarity," a vector database is simpler. If your use case is "model a fixed graph schema," Neo4j is battle-tested. If your use case is "conversational memory for a chatbot," Mem0 or Zep may be a better fit.

But if you need to know who said what, when, and how confident they were - and you need the system to handle the case where a source turns out to be wrong - that's what Attest was built for.

Every number is reproducible

At a Glance

Engine Performance

How we measured

Reproduce it

Curator Accuracy

Reproduce it

Edge Recovery (Link Prediction)

Reproduce it

Causal Composition Prediction

Structural Embeddings vs. Vector DB

Ingestion Pipeline

Cross-Language Verification

Reproduce it

What Only Claim-Native Can Benchmark

Retraction Cascade

Corroboration Tracking

Time Travel

Provenance Audit

Test Suite

Capability Comparison

The Core Difference

Run the numbers on your own data.