API Reference - Attest: A Claim-Native Database

Setup

attestdb.open(path, embedding_dim=768, strict=False)

Open or create an Attest database. The primary entry point for all operations.

Parameters

Name	Type	Default	Description
`path`	`str`	-	File path for the database
`embedding_dim`	`int \| None`	`768`	Embedding vector dimension. `None` disables embedding index
`strict`	`bool`	`False`	Raise on validation warnings instead of logging

Returns

AttestDB - database handle (also usable as context manager)

Example

import attestdb

# Open or create a database
db = attestdb.open("my_knowledge.db")

# Context manager closes automatically
with attestdb.open("my_knowledge.db") as db:
    db.ingest(...)

attestdb.quickstart(path="attest.db", vocabs=None, curator="heuristic", embedding_dim=None)

One-line setup: create a database, register vocabularies, and configure the curator in a single call.

Parameters

Name	Type	Default	Description
`path`	`str`	`"attest.db"`	Database file path
`vocabs`	`list[str] \| None`	`None`	Vocabularies to register: `"bio"`, `"devops"`, `"ml"`
`curator`	`str`	`"heuristic"`	Curator mode: `"heuristic"` or a provider name
`embedding_dim`	`int \| None`	`None`	Embedding vector dimension. `None` disables embedding index

Returns

AttestDB - fully configured database handle

Example

import attestdb

db = attestdb.quickstart("bio.db", vocabs=["bio"], curator="gemini")

Method	Description
`db.configure_curator(model="heuristic", api_key=None)`	Set the curator. `"heuristic"` (offline) or a provider name (see LLM Providers).
`db.register_vocabulary(namespace, vocab)`	Register entity types, predicates, and constraints for a domain.
`db.register_predicate(predicate_id, constraints)`	Register a single predicate with subject/object type constraints.
`db.register_payload_schema(schema_id, schema)`	Register a JSON schema for payload validation on a predicate type.
`db.close()`	Close the database. Also works as a context manager: `with attestdb.open(...) as db:`

Ingestion

db.ingest(subject, predicate, object, provenance, confidence=None, payload=None, timestamp=None)

Add a single claim with full provenance. The atomic write operation - every claim must have a source.

Parameters

Name	Type	Default	Description
`subject`	`tuple[str, str]`	-	Entity name and type, e.g. `("redis", "service")`
`predicate`	`tuple[str, str]`	-	Relationship and class, e.g. `("depends_on", "dependency")`
`object`	`tuple[str, str]`	-	Target entity name and type
`provenance`	`dict`	-	Must include `source_type` and `source_id`
`confidence`	`float \| None`	`None`	0.0–1.0 confidence score (auto-assigned if omitted)
`payload`	`dict \| None`	`None`	Arbitrary structured data attached to the claim
`timestamp`	`int \| None`	`None`	Unix timestamp in nanoseconds (auto-generated if omitted)
`external_ids`	`dict \| None`	`None`	External ID mappings for subject/object entities

Returns

str - the claim_id (SHA-256 hash)

Example

claim_id = db.ingest(
    subject=("api-gateway", "service"),
    predicate=("depends_on", "dependency"),
    object=("redis", "service"),
    provenance={"source_type": "k8s_manifest", "source_id": "deploy/prod"},
    confidence=0.95,
)

db.ingest_batch(claims)

Bulk-ingest many claims at once. Faster than individual ingest() calls - warms caches and skips per-claim corroboration tracking.

Parameters

Name	Type	Default	Description
`claims`	`list[ClaimInput]`	-	List of claim input objects

Returns

BatchResult - .ingested (int), .duplicates (int), .errors (list)

Example

from attestdb import ClaimInput

claims = [
    ClaimInput(
        subject=("api-gateway", "service"),
        predicate=("depends_on", "dependency"),
        object=("redis", "service"),
        provenance={"source_type": "config", "source_id": "k8s"},
    ),
    # ... more claims
]
result = db.ingest_batch(claims)
print(f"Ingested {result.ingested}, skipped {result.duplicates} dupes")

db.ingest_text(text, source_id="", use_curator=True)

Extract claims from unstructured text using LLM-powered extraction, then ingest them.

Parameters

Name	Type	Default	Description
`text`	`str`	-	Raw text to extract claims from
`source_id`	`str`	`""`	Identifier for the text source
`use_curator`	`bool`	`True`	Triage extracted claims through the curator

Returns

list - extracted and ingested claims

Example

db.ingest_text(
    "Redis is used as the primary cache for the API gateway. "
    "The gateway also depends on PostgreSQL for persistent storage.",
    source_id="architecture_doc_v2",
)

db.ingest_chat(messages, conversation_id="", platform="generic", use_curator=True, extraction="llm")

Extract claims from a conversation. Accepts OpenAI/Anthropic message format ([{role, content}, ...]).

Parameters

Name	Type	Default	Description
`messages`	`list[dict]`	-	Chat messages in `[{role, content}]` format
`conversation_id`	`str`	`""`	Optional conversation identifier
`platform`	`str`	`"generic"`	Platform hint: `"generic"`, `"chatgpt"`, `"claude"`
`use_curator`	`bool`	`True`	Triage extracted claims through the curator
`extraction`	`str`	`"llm"`	`"llm"`, `"heuristic"`, or `"smart"`

Returns

ChatIngestionResult - per-turn breakdown of extracted claims

Example

messages = [
    {"role": "user", "content": "What cache does our API use?"},
    {"role": "assistant", "content": "The API gateway uses Redis for caching."},
]
result = db.ingest_chat(messages, extraction="heuristic")

db.ingest_slack(path, channels=None, use_curator=True, extraction="llm")

Extract claims from a Slack workspace export ZIP. Optionally filter by channel name.

Parameters

Name	Type	Default	Description
`path`	`str`	-	Path to the Slack export ZIP file
`bot_ids`	`set[str] \| None`	`None`	Only treat these bot IDs as assistant. `None` = all bots
`channels`	`list[str] \| None`	`None`	Only process these channels. `None` = all channels
`use_curator`	`bool`	`True`	Triage extracted claims through the curator
`extraction`	`str`	`"llm"`	`"llm"`, `"heuristic"`, or `"smart"`

Returns

list[ChatIngestionResult] - one result per channel/thread with bot interaction

Example

results = db.ingest_slack(
    "slack_export.zip",
    channels=["engineering", "incidents"],
    extraction="smart",
)

db.connect(name, *, save=False, **kwargs)

Create a connector for an external data source. Returns a Connector instance - call .run(db) to fetch and ingest. 30 connectors available: slack, teams, gmail, gdocs, gdrive, zoho, postgres, mysql, mssql, notion, confluence, sharepoint, csv, sqlite, github, jira, linear, hubspot, salesforce, zendesk, servicenow, pagerduty, http, airtable, mongodb, elasticsearch, s3, google_sheets, box, dsi.

Parameters

Name	Type	Default	Description
`name`	`str`	-	Connector name (e.g. `"slack"`, `"postgres"`)
`save`	`bool`	`False`	Persist credentials to encrypted token store (requires `cryptography`)
`**kwargs`	-	-	Connector-specific options (token, dsn, mapping, etc.)

Returns

Connector - call .run(db) to execute

Examples

# Slack: live channel history
conn = db.connect("slack", token="xoxb-...", channels=["general"])
result = conn.run(db)

# PostgreSQL: map query columns to claims
conn = db.connect("postgres",
    dsn="postgresql://user:pass@host/db",
    query="SELECT gene, relation, target FROM interactions",
    mapping={"subject": "gene", "predicate": "relation", "object": "target"},
)
result = conn.run(db)

# Notion: ingest pages as text
conn = db.connect("notion", api_key="ntn_...", save=True)
result = conn.run(db)

Method	Description
`db.ingest_chat_file(path, platform="auto", use_curator=True, extraction="llm")`	Extract claims from a file: ChatGPT export ZIP, JSON conversation, or plain text.
`db.curate(claims, agent_id="default")`	Run claims through the curator before ingesting. Returns stored/skipped/flagged.

Extraction modes

Mode	API Key?	When to use
`"heuristic"`	No	Explicit relational text ("X depends on Y"). Fast and free.
`"llm"`	Yes	Nuanced or implicit relationships. Deeper understanding.
`"smart"`	Yes	Large volumes. Heuristic first, LLM only for new content. Saves cost.

Querying

db.query(focal_entity, depth=2, min_confidence=0.0, max_claims=500)

Get a full picture of an entity: relationships, narrative summary, confidence scores, contradictions, and topic membership.

Parameters

Name	Type	Default	Description
`focal_entity`	`str`	-	Entity name or ID to query
`depth`	`int`	`2`	BFS traversal depth
`min_confidence`	`float`	`0.0`	Minimum confidence threshold for relationships
`exclude_source_types`	`list[str] \| None`	`None`	Source types to exclude from results
`max_claims`	`int`	`500`	Maximum claims to consider
`max_tokens`	`int`	`4000`	Token budget for narrative generation
`llm_narrative`	`bool`	`False`	Use LLM for narrative generation instead of templates
`confidence_threshold`	`float`	`0.0`	Hard filter on relationship confidence
`predicate_types`	`list[str] \| None`	`None`	Only include these predicate types

Returns

ContextFrame - focal entity, relationships, claim count, narrative, confidence range, contradictions, topic membership

Example

frame = db.query("redis")
print(frame.focal_entity.name, "-", frame.claim_count, "claims")
print(frame.narrative)
for rel in frame.direct_relationships:
    print(f"  {rel.predicate} → {rel.target.name} ({rel.confidence:.0%})")

db.search(query_embedding, top_k=10)

Semantic search via embeddings. Returns the top-k closest claims by vector similarity.

Parameters

Name	Type	Default	Description
`query_embedding`	`list[float]`	-	Query vector (must match `embedding_dim`)
`top_k`	`int`	`10`	Number of results to return

Returns

list[tuple[str, float]] - list of (claim_id, distance) pairs

db.explain(focal_entity, depth=2, min_confidence=0.0)

Same as query() but also returns timing and candidate counts for performance profiling.

Returns

tuple[ContextFrame, QueryProfile] - the frame plus profiling data (elapsed_ms, total_candidates, after_scoring)

Example

frame, profile = db.explain("redis")
print(f"Query took {profile.elapsed_ms:.1f}ms, {profile.total_candidates} candidates")

db.find_paths(entity_a, entity_b, max_depth=3, top_k=5)

Find the top-k paths between two entities with per-hop edge details, sorted by total confidence.

Parameters

Name	Type	Default	Description
`entity_a`	`str`	-	Start entity
`entity_b`	`str`	-	End entity
`max_depth`	`int`	`3`	Maximum hops to search
`top_k`	`int`	`5`	Number of paths to return

Returns

list[PathResult] - each with .steps (list of PathStep), .total_confidence, .length

Example

paths = db.find_paths("api-gateway", "postgresql")
for p in paths:
    hops = " → ".join(s.entity_id for s in p.steps)
    print(f"{hops}  (confidence: {p.total_confidence:.2f})")

Method	Description
`db.resolve(entity_id)`	Resolve an entity name to its canonical normalized ID.
`db.get_entity(entity_id)`	Get entity summary: name, type, claim count. Returns `EntitySummary \| None`.
`db.claims_for(entity_id, predicate_type=None, source_type=None, min_confidence=0.0)`	Get raw claims for an entity. Filter by predicate, source, or confidence. Returns `list[Claim]`.
`db.claims_by_content_id(content_id)`	Get all claims about the same fact (corroboration group). Returns `list[Claim]`.
`db.list_entities(entity_type=None, min_claims=0)`	List all entities. Filter by type or minimum claim count. Returns `list[EntitySummary]`.
`db.path_exists(entity_a, entity_b, max_depth=3)`	Check if two entities are connected. Returns `bool`.
`db.raw_query(query, params=None)`	Escape hatch: run a raw query against the storage engine. Returns `list[list]`.
`db.get_embedding(claim_id)`	Retrieve the stored embedding vector for a claim. Returns `list[float] \| None`.

Understanding

db.quality_report(stale_threshold=0, expected_patterns=None)

Health metrics for the knowledge base: single-source entities, source distribution, knowledge density, gap counts.

Parameters

Name	Type	Default	Description
`stale_threshold`	`int`	`0`	Age in days to consider an entity stale (0 = disabled)
`expected_patterns`	`dict \| None`	`None`	Expected predicate patterns for gap detection

Returns

QualityReport - total_claims, total_entities, single_source_entity_count, avg_claims_per_entity, source_type_distribution, predicate_distribution

Example

report = db.quality_report()
print(f"{report.total_entities} entities, {report.total_claims} claims")
print(f"{report.single_source_entity_count} single-source entities")
print(f"Avg claims/entity: {report.avg_claims_per_entity:.1f}")

db.knowledge_health()

Quantified health score (0–100) with weighted metrics: multi-source ratio, corroboration, freshness, source diversity, and confidence trend.

Returns

KnowledgeHealth - health_score (0–100), multi_source_ratio, corroboration_ratio, freshness_score, source_diversity, confidence_trend, knowledge_density

Example

health = db.knowledge_health()
print(f"Health: {health.health_score:.0f}/100")
print(f"Multi-source ratio: {health.multi_source_ratio:.0%}")
print(f"Corroboration: {health.corroboration_ratio:.0%}")

Health score breakdown

Metric	Weight	What it measures
Multi-source ratio	30%	Fraction of entities backed by more than one source
Corroboration ratio	25%	Fraction of claims independently confirmed
Freshness	20%	Recency of claims (30-day half-life decay)
Source diversity	15%	Number of distinct source types
Confidence trend	10%	Whether confidence is improving over time

db.find_gaps(expected_patterns, entity_type=None, min_claims=1)

Vocabulary-driven gap identification. Compares each entity's relationship profile against expected predicate patterns for its type.

Parameters

Name	Type	Default	Description
`expected_patterns`	`dict[str, set[str]]`	-	Map of entity_type → expected predicates, e.g. `{"gene": {"associated_with", "interacts_with"}}`
`entity_type`	`str \| None`	`None`	Filter to a specific entity type
`min_claims`	`int`	`1`	Minimum claims for an entity to be checked

Returns

list[GapResult] - entities missing expected relationships

Example

gaps = db.find_gaps({
    "gene": {"associated_with", "interacts_with", "expressed_in"},
    "drug": {"treats", "targets"},
})
for gap in gaps:
    print(f"{gap.entity_id} missing: {gap.missing_predicate_types}")

db.find_bridges(entity_type=None, min_claims=2, max_depth=3, top_k=50)

Predict potential connections between currently-unlinked entities using embedding similarity and common-neighbor scoring.

Parameters

Name	Type	Default	Description
`entity_type`	`str \| None`	`None`	Filter to a specific entity type
`min_claims`	`int`	`2`	Minimum claims for entities to be considered
`max_depth`	`int`	`3`	Maximum graph distance to search
`top_k`	`int`	`50`	Number of bridge predictions to return
`max_degree`	`int \| None`	`None`	Exclude high-degree hub entities

Returns

list[BridgePrediction] - predicted connections with confidence scores

db.find_confidence_alerts(entity_type=None, min_claims=2, quality_spread=0.3)

Find entities with reliability concerns: single-source dependencies, stale data, or wide confidence spreads.

Parameters

Name	Type	Default	Description
`entity_type`	`str \| None`	`None`	Filter to a specific entity type
`min_claims`	`int`	`2`	Minimum claims for an entity to be checked
`stale_threshold`	`int`	`0`	Days after which data is considered stale
`quality_spread`	`float`	`0.3`	Max confidence range before flagging

Returns

list[EntityConfidenceAlert] - entities with provenance or confidence issues

Method	Description
`db.schema()`	What entity types, predicates, and patterns exist, with counts. Returns `SchemaDescriptor`.
`db.stats()`	Entity count, claim count, index size. Returns `dict`.

Topology

Method	Description
`db.compute_topology(resolutions=None, min_community_size=3)`	Run Leiden community detection on the claim graph.
`db.topics(level=None)`	Get topic hierarchy from last topology computation. Returns `list[TopicNode]`.
`db.density_map()`	Density metrics per topic: claim count, source diversity. Returns `list[DensityMapEntry]`.
`db.cross_domain_bridges(top_k=20)`	Find entities connecting different knowledge domains. Returns `list[CrossDomainBridge]`.
`db.query_topic(topic_id)`	Get all entities in a specific topic.
`db.generate_structural_embeddings(dim=64)`	SVD-based graph embeddings for all entities. Returns entity count.
`db.generate_weighted_structural_embeddings(dim=64)`	Confidence-weighted SVD graph embeddings. Returns entity count.
`db.get_adjacency_list()`	Build in-memory adjacency list from all claim edges. Returns `dict[str, set[str]]`.
`db.get_weighted_adjacency()`	Weighted adjacency with per-edge confidence and sources. Returns `dict`.

Provenance & Trust

db.retract_cascade(source_id, reason)

Retract all claims from a source and mark anything that depended on them as degraded. The nuclear option for bad sources.

Parameters

Name	Type	Default	Description
`source_id`	`str`	-	Source to retract
`reason`	`str`	-	Human-readable reason for retraction

Returns

CascadeResult - .source_retract (RetractResult), .degraded_claim_ids (list), .degraded_count (int)

Example

result = db.retract_cascade("unreliable_vendor", "Data quality issues found in audit")
print(f"Retracted {result.source_retract.retracted_count} claims")
print(f"Degraded {result.degraded_count} downstream dependents")

db.at(timestamp)

Time-travel: query the knowledge base as it was at a specific point in time. Returns a read-only snapshot view.

Parameters

Name	Type	Default	Description
`timestamp`	`int`	-	Unix timestamp in nanoseconds

Returns

AttestDBSnapshot - read-only view supporting query(), claims_for(), list_entities()

Example

import time

# View knowledge base as of yesterday
yesterday = int((time.time() - 86400) * 1_000_000_000)
snapshot = db.at(yesterday)
frame = snapshot.query("redis")
print(f"Yesterday: {frame.claim_count} claims")

db.impact(source_id)

If this source is retracted, how many claims and entities are affected? Preview the blast radius before acting.

Parameters

Name	Type	Default	Description
`source_id`	`str`	-	Source to analyze

Returns

ImpactReport - direct_claims, downstream_claims, affected_entities, claim_ids

Example

report = db.impact("vendor_api_v2")
print(f"Direct: {report.direct_claims}, Downstream: {report.downstream_claims}")
print(f"Affects {len(report.affected_entities)} entities")

Method	Description
`db.retract(source_id, reason)`	Mark all claims from a source as retracted (tombstoned). Returns `RetractResult`.
`db.trace_downstream(claim_id)`	See what claims depend on a specific claim. Returns `DownstreamNode` tree.
`db.audit(claim_id)`	Full provenance chain: who said it, corroborators, dependents. Returns `AuditTrail`.
`db.drift(days=30)`	How has knowledge changed? New claims, new entities, retracted sources. Returns `DriftReport`.

Intelligence

Methods that answer questions only a provenance-tracking database can answer.

db.blindspots(min_claims=5)

Find entities backed by only a single source, knowledge gaps, and low-confidence areas.

Parameters

Name	Type	Default	Description
`min_claims`	`int`	`5`	Minimum claims for an entity to be flagged as single-source

Returns

BlindspotMap - single_source_entities, knowledge_gaps, low_confidence_areas

Example

blind = db.blindspots()
print(f"{len(blind.single_source_entities)} entities rely on a single source")
for entity in blind.single_source_entities[:5]:
    print(f"  {entity}")

db.consensus(topic)

How many independent sources agree about an entity? Returns agreement ratio, claims by source, and corroborated content IDs.

Parameters

Name	Type	Default	Description
`topic`	`str`	-	Entity name to analyze consensus for

Returns

ConsensusReport - total_claims, unique_sources, avg_confidence, agreement_ratio, claims_by_source, corroborated_content_ids

Example

report = db.consensus("redis")
print(f"Agreement: {report.agreement_ratio:.0%} across {report.unique_sources} sources")

db.source_reliability(source_id=None)

Per-source corroboration and retraction rates. Pass a source_id for one source, or omit for all sources.

Parameters

Name	Type	Default	Description
`source_id`	`str \| None`	`None`	Specific source to check, or `None` for all

Returns

dict - per-source metrics: total_claims, active, retracted, degraded, corroboration_rate, retraction_rate

Example

reliability = db.source_reliability()
for src, metrics in reliability.items():
    print(f"{src}: {metrics['corroboration_rate']:.0%} corroborated")

db.hypothetical(claim)

What-if analysis: would this claim corroborate existing knowledge? Does it fill a gap between known entities?

Parameters

Name	Type	Default	Description
`claim`	`ClaimInput`	-	The hypothetical claim to test

Returns

HypotheticalReport - would_corroborate, existing_corroborations, fills_gap, content_id, related_entities

Example

from attestdb import ClaimInput

report = db.hypothetical(ClaimInput(
    subject=("redis", "service"),
    predicate=("depends_on", "dependency"),
    object=("sentinel", "service"),
    provenance={"source_type": "test", "source_id": "test"},
))
print(f"Would corroborate: {report.would_corroborate}")
print(f"Fills gap: {report.fills_gap}")

db.predict(entity_id, max_intermediaries=100, min_paths=3, directional_only=False, entity_aliases=None)

Discover novel regulatory predictions via causal composition. Follows causal edges through intermediaries and composes predicates (inhibits + inhibits = activates). Returns predictions ranked by convergent evidence - genuine gaps first. No LLM calls. Validated at 47% precision across 4 genes (8/17 confirmed, 0 contradicted).

Parameters

Name	Type	Default	Description
`entity_id`	`str`	-	Entity to predict relationships for
`max_intermediaries`	`int`	`100`	Maximum intermediary entities to explore in BFS
`min_paths`	`int`	`3`	Minimum independent paths for a prediction
`directional_only`	`bool`	`False`	Exclude "regulates" - only directional predicates (activates, inhibits, etc.)
`entity_aliases`	`dict`	`None`	Entity ID alias map for cross-database dedup (from build_entity_aliases())

Returns

list[Prediction] - target, predicted_predicate, supporting_paths, opposing_paths, consensus, is_gap, evidence

Example

predictions = db.predict("gene_7157")  # TP53
for p in predictions[:5]:
    print(f"{p.predicted_predicate} -> {p.target}")
    print(f"  {p.supporting_paths} supporting, gap={p.is_gap}")

db.what_if(subject, predicate, object, confidence=0.6)

Test a hypothesis against the knowledge graph. Returns causal evidence for/against with multi-hop composition paths, contradiction detection, gap analysis, and follow-up suggestions. No LLM calls.

Parameters

Name	Type	Default	Description
`subject`	`tuple`	-	(entity_id, entity_type)
`predicate`	`tuple`	-	(predicate_id, predicate_type)
`object`	`tuple`	-	(entity_id, entity_type)
`confidence`	`float`	`0.6`	Hypothetical confidence level

Returns

SandboxVerdict - verdict (supported/contradicted/plausible/insufficient_data), confidence_score, direct + indirect evidence, gaps, follow-ups

Example

verdict = db.what_if(
    ("gene_940", "gene"),
    ("upregulates", "relation"),
    ("gene_29126", "gene"),
)
print(verdict.verdict)       # "plausible"
print(verdict.explanation)   # "12 causal path(s) supporting"

Method	Description
`db.fragile(max_sources=1, min_age_days=0)`	Find claims backed by few independent sources. Returns `list[Claim]`.
`db.stale(days=90)`	Find claims not updated within the given period. Returns `list[Claim]`.

Reason

Methods that generate new knowledge from graph structure - hypothesis testing, proactive discovery, and analogical reasoning.

db.discover(top_k=10)

Proactive hypothesis generation from graph structure. Three signals: bridge predictions (ensemble-scored pairs with composed predicates), cross-domain insights (topology bridge entities), and chain completion (2-hop pairs missing direct connections). Pure computation, no LLM.

Parameters

Name	Type	Default	Description
`top_k`	`int`	`10`	Maximum discoveries to return

Returns

list[Discovery] - hypothesis, predicted_predicate, confidence, novelty_score, evidence_summary, supporting_paths, suggested_action

Example

for d in db.discover(top_k=5):
    print(d.hypothesis)
    print(f"  {d.predicted_predicate} (conf={d.confidence:.2f}, novelty={d.novelty_score:.2f})")
    print(f"  → {d.suggested_action}")

db.analogies(entity_a, entity_b, top_k=5)

Find structural analogies: A:B :: C:D. Uses structural embeddings to find entities similar to A and B, then predicts the C:D pair and relationship.

Parameters

Name	Type	Default	Description
`entity_a`	`str`	-	Source entity A
`entity_b`	`str`	-	Source entity B (connected to A)
`top_k`	`int`	`5`	Maximum analogies to return

Returns

list[Analogy] - entity_a, entity_b, entity_c, entity_d, predicted_predicate, score, explanation

Example

for a in db.analogies("BRCA1", "apoptosis"):
    print(f"{a.entity_c} : {a.entity_d} (score={a.score:.2f})")
    print(f"  {a.explanation}")

db.test_hypothesis(hypothesis)

Evaluate a natural-language hypothesis against the knowledge base. Parses entities, finds multi-hop evidence chains, and returns a verdict with supporting/contradicting evidence.

Parameters

Name	Type	Default	Description
`hypothesis`	`str`	-	Natural language hypothesis to test

Returns

HypothesisVerdict - verdict (supported/contradicted/partial/unsupported), verdict_confidence, supporting_chains, contradicting_chains, confidence_gaps, suggested_next_steps

Example

verdict = db.test_hypothesis("aspirin reduces inflammation via COX-2")
print(f"{verdict.verdict} (confidence={verdict.verdict_confidence:.2f})")
for chain in verdict.supporting_chains:
    print(f"  {chain.summary}")

Method	Description
`db.evolution(entity_id, since=None)`	Knowledge evolution over time: new connections, confidence changes, source diversification. Returns `EvolutionReport`.
`db.trace(entity_a, entity_b, max_depth=4)`	Source-overlap-discounted reasoning chains between two entities. Returns `list[ReasoningChain]`.
`db.close_gaps(hypothesis=None, top_k=5)`	Hypothesis-driven gap closing: test hypothesis, research confidence gaps, re-test. Returns `CloseGapsReport`.
`db.suggest_investigations(top_k=10)`	Unified prioritized investigation recommendations synthesized from all insight signals. Returns `list[Investigation]`.

Crown Jewels

Features impossible with any other database. These exploit Attest's unique combination of timestamps, provenance, confidence, corroboration grouping, and contradiction detection.

db.diff(since, until=None)

Knowledge diff - like git diff for knowledge. Shows what beliefs formed, strengthened, weakened, or contradicted between two time periods. No other database tracks how beliefs evolve over time.

Parameters

Name	Type	Default	Description
`since`	`str \| int`	-	Start of period - ISO string or nanosecond int
`until`	`str \| int \| None`	`None`	End of period (None = now)

Returns

KnowledgeDiff - new_beliefs, strengthened, weakened, new_contradictions, new_entities, new_sources, total_new_claims, summary

Example

diff = db.diff(since="2025-01-01")
print(diff.summary)
# "47 new beliefs; 12 strengthened; 3 new contradictions; 8 new sources"
for b in diff.new_beliefs[:5]:
    print(f"  {b.subject} {b.predicate} {b.object} (conf={b.confidence_after:.2f})")

db.resolve_contradictions(top_k=10, auto_resolve=False, use_llm=False)

Self-healing contradictions. Finds all opposing claims (via OPPOSITE_PREDICATES), scores evidence quality on each side (corroboration, source diversity, recency, confidence), and picks winners. Optionally ingests resolution meta-claims. No other database can reason about its own conflicts.

Parameters

Name	Type	Default	Description
`top_k`	`int`	`10`	Maximum contradictions to return
`auto_resolve`	`bool`	`False`	Ingest resolution meta-claims for clear winners
`use_llm`	`bool`	`False`	Use LLM for ambiguous cases

Returns

ContradictionReport - total_found, resolved, ambiguous, analyses (with evidence weights and margins), claims_added

Example

report = db.resolve_contradictions(auto_resolve=True)
print(f"Found {report.total_found}, resolved {report.resolved}")
for a in report.analyses:
    print(f"  {a.subject} ↔ {a.object}: {a.resolution} (margin={a.margin:.2f})")

db.simulate(retract_source=None, add_claim=None, remove_entity=None)

Counterfactual what-if analysis - compute cascading effects without modifying the database. "What if this paper is retracted? 47 claims affected, 3 drug mechanisms break." No other database can simulate scenarios on its own integrity.

Parameters

Name	Type	Default	Description
`retract_source`	`str \| None`	`None`	Source ID to simulate retracting
`add_claim`	`ClaimInput \| None`	`None`	Claim to simulate adding
`remove_entity`	`str \| None`	`None`	Entity to simulate removing

Returns

SimulationReport - claims_affected, claims_removed, claims_degraded, entities_now_orphaned, connection_losses, confidence_shifts, risk_score, risk_level, summary

Example

# What if our main source is wrong?
sim = db.simulate(retract_source="paper_2024_nature")
print(f"{sim.claims_removed} claims affected, risk: {sim.risk_level}")
for loss in sim.connection_losses:
    print(f"  {loss.entity_a} ↔ {loss.entity_b}: lost {loss.lost_predicates}")

db.compile(topic, max_entities=50, use_llm=False)

Knowledge compilation - generate a structured research brief with citations, confidence levels, contradictions, and gaps. An automated literature review from the graph. No other database can produce a structured document with provenance-tracked evidence chains.

Parameters

Name	Type	Default	Description
`topic`	`str`	-	Topic to compile a brief for
`max_entities`	`int`	`50`	Maximum entities to include
`use_llm`	`bool`	`False`	Use LLM for narrative generation

Returns

KnowledgeBrief - sections (title, key_findings, citations, contradictions, gaps), executive_summary, total_entities, total_claims_cited, strongest_findings, weakest_areas

Example

brief = db.compile("sickle cell treatment")
print(brief.executive_summary)
for section in brief.sections:
    print(f"\n## {section.title}")
    for f in section.key_findings:
        print(f"  • {f}")

db.explain_why(entity_a, entity_b, max_depth=4, use_llm=False)

Full provenance-traced reasoning chain between two entities. Traces the best path with source citations at every hop, flags contradictions, computes reliability, and generates a human-readable narrative.

Parameters

Name	Type	Default	Description
`entity_a`	`str`	-	Start entity
`entity_b`	`str`	-	End entity
`max_depth`	`int`	`4`	Maximum hops to search
`use_llm`	`bool`	`False`	Use LLM for narrative

Returns

Explanation - connected, steps (with source_summary, evidence_text), chain_confidence, narrative, alternative_paths, source_count

Example

exp = db.explain_why("aspirin", "inflammation")
print(exp.narrative)
# Connection: aspirin → inflammation (2 hops, confidence=0.72)
#   1. aspirin -[inhibits]→ cox-2 (conf=0.90) [2 source(s): paper_1, trial_5]
#   2. cox-2 -[promotes]→ inflammation (conf=0.80) [1 source(s): textbook]

db.forecast(entity_id, top_k=10)

Predict next connections for an entity. Uses 2-hop structural analysis and historical growth patterns to predict which entities are most likely to become connected next.

Parameters

Name	Type	Default	Description
`entity_id`	`str`	-	Entity to forecast for
`top_k`	`int`	`10`	Maximum predictions

Returns

Forecast - predictions (target_entity, predicted_predicate, score, reason, evidence_entities), growth_rate, trajectory

Example

fc = db.forecast("BRCA1")
print(f"Trajectory: {fc.trajectory}, {fc.growth_rate:.1f} connections/month")
for p in fc.predictions[:5]:
    print(f"  → {p.target_entity} via {p.predicted_predicate} (score={p.score:.2f})")

db.merge_report(other)

Diff two knowledge bases. Shows what each knows that the other doesn’t, shared beliefs, entity coverage gaps, and contradictions between them. No other database can structurally compare two knowledge bases.

Parameters

Name	Type	Default	Description
`other`	`AttestDB`	-	Another database to compare against

Returns

MergeReport - self_unique_beliefs, other_unique_beliefs, shared_beliefs, conflicts, self_unique_entities, other_unique_entities, summary

Example

team_a = attestdb.open("team_a.db")
team_b = attestdb.open("team_b.db")
report = team_a.merge_report(team_b)
print(f"Team A knows {report.self_unique_beliefs} things Team B doesn't")
print(f"{len(report.conflicts)} disagreements")

Research

Close the loop: detect knowledge gaps, research answers via LLM, and ingest the results - automatically. Plug in any external source (web search, PubMed, Slack) via the search_fn callback.

db.investigate(max_questions=20, use_curator=True, search_fn=None)

Full gap-closing loop: detect blindspots, formulate questions, research each via LLM, ingest validated claims, and measure improvement.

Parameters

Name	Type	Default	Description
`max_questions`	`int`	`20`	Max questions to generate and research
`use_curator`	`bool`	`True`	Triage discovered claims through the curator
`search_fn`	`callable \| None`	`None`	Optional `fn(question) → text` for external search

Returns

InvestigationReport - questions_generated, questions_researched, claims_ingested, blindspot_before, blindspot_after

Example

report = db.investigate(max_questions=10)
print(f"Researched {report.questions_researched} questions")
print(f"Ingested {report.claims_ingested} new claims")
print(f"Blindspots: {report.blindspot_before} → {report.blindspot_after}")

db.research_question(question, entity_id=None, entity_type="", predicate_hint="")

Research a single question. The LLM generates structured claims, which are validated and ingested.

Parameters

Name	Type	Default	Description
`question`	`str`	-	Natural-language research question
`entity_id`	`str \| None`	`None`	Optional focal entity
`entity_type`	`str`	`""`	Optional entity type hint
`predicate_hint`	`str`	`""`	Optional predicate to hint at

Returns

ResearchResult - claims_ingested, claims_rejected, inquiry_resolved, source

Example

result = db.research_question(
    "What databases does the API gateway depend on?",
    entity_id="api-gateway",
    entity_type="service",
)
print(f"Ingested {result.claims_ingested} claims")

How it works

Step	What happens
1. Detect	`blindspots()` + `find_gaps()` + `find_confidence_alerts()` identify weak areas
2. Question	Each gap becomes a natural-language research question, registered as an inquiry
3. Research	LLM generates structured claims for each question (or `search_fn` provides external text)
4. Ingest	Claims are validated and ingested with `source_type="llm_research"`
5. Resolve	Matching inquiries are auto-resolved via the `inquiry_matched` event

Pluggable search

# Use any external source as the research backend
def pubmed_search(question: str) -> str:
    # Call PubMed, web search, internal wiki, etc.
    return fetch_abstracts(question)

report = db.investigate(max_questions=10, search_fn=pubmed_search)
print(f"Researched {report.questions_researched} questions")
print(f"Ingested {report.claims_ingested} new claims")
print(f"Blindspots: {report.blindspot_before} → {report.blindspot_after}")

Research questions

Method	Description
`db.ingest_inquiry(question, subject, object, predicate_hint="")`	Register a question you want answered. Returns inquiry claim_id.
`db.open_inquiries()`	List all unanswered questions. Returns `list[Claim]`.
`db.check_inquiry_matches(subject_id=None, object_id=None, predicate_id=None)`	Check if new claims match open questions. Returns `list[str]`.

Autonomous self-learning (Autodidact)

Enable a background daemon that runs the detect → research → ingest loop continuously. Built-in evidence sources (PubMed, Semantic Scholar) auto-register. Paid sources (Perplexity, Serper) register when their API key is present. Dual budget caps (call count + dollar amount) prevent runaway costs.

Method	Description
`db.enable_autodidact(interval=3600, max_cost_per_day=1.00, sources="auto")`	Start the self-learning daemon. Runs gap detection and research on a timer.
`db.disable_autodidact()`	Stop the daemon.
`db.autodidact_status()`	Current status: cycles, claims ingested, cost today, budget state. Returns `AutodidactStatus`.
`db.autodidact_run_now()`	Trigger an immediate research cycle.
`db.autodidact_cost_estimate(cycles=24)`	Dry-run cost projection without executing. Returns cost breakdown dict.
`db.autodidact_history(limit=10)`	Recent cycle reports with per-cycle costs. Returns `list[CycleReport]`.

Example

# Enable with conservative defaults: 5 tasks/cycle, $1/day cap
db.enable_autodidact(interval=1800, max_cost_per_day=2.00)

# Check estimated costs before committing
estimate = db.autodidact_cost_estimate(cycles=48)
print(f"Est. daily cost: ${estimate['cost_per_day_capped']}")
print(f"Est. monthly: ${estimate['cost_per_month_capped']}")

# Monitor
status = db.autodidact_status()
print(f"Cycles: {status.cycle_count}, Claims learned: {status.total_claims_ingested}")
print(f"Cost today: ${status.estimated_cost_today:.3f} / ${status.max_cost_per_day}")

# Stop when done
db.disable_autodidact()

Built-in evidence sources

Priority	Source	Cost	API Key
0	Perplexity Sonar	~$0.001/query	`PERPLEXITY_API_KEY`
1	PubMed (NCBI)	Free	None required
2	Semantic Scholar	Free	None required
3	Serper (Google)	~$0.001/query	`SERPER_API_KEY`

With sources="auto" (the default), free sources always register. Paid sources register only when their API key is in the environment. Pass search_fn=my_fn to use your own source instead.

Backup & Restore

Method	Description
`db.snapshot(dest_path)`	Copy the database to a backup directory. Returns the destination path.
`AttestDB.restore(src_path, dest_path)`	Restore a database from a snapshot. Returns an open `AttestDB`.

Events

Subscribe to lifecycle events. Callbacks run synchronously after the operation completes. Errors in callbacks are logged, never propagated - your pipeline keeps running.

Method	Description
`db.on(event, callback)`	Register a callback for a lifecycle event.
`db.off(event, callback)`	Remove a registered callback.

Events

Event	Kwargs	Fires when
`"claim_ingested"`	`claim_id`, `claim_input`	After each `ingest()` call
`"claim_corroborated"`	`content_id`, `count`	A newly ingested claim matches an existing one
`"source_retracted"`	`source_id`, `reason`, `claim_ids`	After `retract()`
`"inquiry_matched"`	`inquiry_id`, `claim_id`	A newly ingested claim answers an open inquiry

Example

def on_new_claim(claim_id, claim_input, **kw):
    print(f"New claim: {claim_id}")

def on_corroboration(content_id, count, **kw):
    print(f"Corroborated! {count} independent sources")

db.on("claim_ingested", on_new_claim)
db.on("claim_corroborated", on_corroboration)

Agent Integration

Two ways for external agents to read and write Attest - choose based on your agent framework.

Local-first security model: Attest follows the SQLite deployment model - the database file is your security boundary. The MCP server and REST API are designed for localhost access and do not include authentication. For multi-user or network deployments, place them behind a reverse proxy with your auth layer (OAuth, API keys, mTLS, etc.).

MCP Server (Model Context Protocol)

For Claude Desktop, Claude Code, and any MCP-compatible agent. Ships as a CLI tool.

$ pip install attestdb[mcp]
$ ATTEST_DB_PATH=my.db attest-mcp

Exposes 26 tools (ingest_claim, query_entity, search_entities, knowledge_health, retract_source, attest_impact, attest_blindspots, attest_consensus, attest_investigate, etc.) and 2 resources (attest://entities, attest://schema) over stdio transport.

Claude Desktop configuration

{
  "mcpServers": {
    "attest": {
      "command": "attest-mcp",
      "env": {
        "ATTEST_DB_PATH": "my_knowledge.db"
      }
    }
  }
}

REST API

For web-based agents, custom integrations, or any HTTP client.

Method	Path	Description
`POST`	`/api/v1/claims`	Ingest a single claim
`POST`	`/api/v1/claims/batch`	Bulk-ingest claims
`POST`	`/api/v1/claims/text`	Extract claims from text
`GET`	`/api/v1/entities`	List entities
`GET`	`/api/v1/entities/{id}`	Get entity summary
`GET`	`/api/v1/entities/{id}/claims`	Claims about an entity
`GET`	`/api/v1/entities/{id}/context`	Full context frame
`GET`	`/api/v1/paths/{a}/{b}`	Find paths between entities
`POST`	`/api/v1/retract`	Retract a source
`GET`	`/api/v1/schema`	Schema descriptor
`GET`	`/api/v1/stats`	Database statistics
`GET`	`/api/v1/health`	Knowledge health metrics
`GET`	`/api/v1/quality`	Quality report
`GET`	`/api/v1/insights/bridges`	Bridge predictions
`GET`	`/api/v1/insights/gaps`	Confidence alerts

Example

# Ingest a claim via REST
curl -X POST http://localhost:8877/api/v1/claims \
  -H "Content-Type: application/json" \
  -d '{
    "subject": ["api-gateway", "service"],
    "predicate": ["depends_on", "dependency"],
    "object": ["redis", "service"],
    "source_type": "k8s_manifest",
    "source_id": "deploy/prod"
  }'

# Query an entity
curl http://localhost:8877/api/v1/entities/redis/context

# Check knowledge health
curl http://localhost:8877/api/v1/health

LLM Providers

Set the environment variable for your provider, then configure:

Provider	Environment Variable	Configure
Gemini (recommended)	`GOOGLE_API_KEY`	`db.configure_curator("gemini")`
Together	`TOGETHER_API_KEY`	`db.configure_curator("together")`
OpenAI	`OPENAI_API_KEY`	`db.configure_curator("openai")`
DeepSeek	`DEEPSEEK_API_KEY`	`db.configure_curator("deepseek")`
Grok	`GROK_API_KEY`	`db.configure_curator("grok")`
OpenRouter	`OPENROUTER_API_KEY`	`db.configure_curator("openrouter")`
Groq	`GROQ_API_KEY`	`db.configure_curator("groq")` (currently unavailable)
Anthropic	`ANTHROPIC_API_KEY`	`db.configure_curator("anthropic")`
GLM	`GLM_API_KEY`	`db.configure_curator("glm")`

No API key? Use "heuristic" mode - it works entirely offline.

Data Types

Adding a claim

db.ingest(
    subject=("name", "type"),          # e.g. ("api-gateway", "service")
    predicate=("relationship", "class"), # e.g. ("depends_on", "depends_on")
    object=("name", "type"),             # e.g. ("redis", "service")
    provenance={
        "source_type": "...",           # What kind of source
        "source_id": "...",             # Identifies the specific source
    },
    confidence=0.9,                    # 0.0 to 1.0 (optional)
    payload={...},                     # Any structured data (optional)
)

Query result

frame = db.query("redis")
frame.focal_entity           # EntitySummary: name, type, claim_count
frame.claim_count            # Number of claims about it
frame.direct_relationships   # list[Relationship]: predicate, target, confidence
frame.narrative              # Human-readable summary
frame.contradictions         # list[Contradiction]: conflicting claims
frame.confidence_range       # tuple[float, float]: min and max confidence
frame.topic_membership       # list[str]: community IDs (if topology computed)

Batch input

from attestdb import ClaimInput

claims = [
    ClaimInput(
        subject=("api-gateway", "service"),
        predicate=("depends_on", "depends_on"),
        object=("redis", "service"),
        provenance={"source_type": "config_management", "source_id": "k8s"},
    ),
    # ... more claims
]
result = db.ingest_batch(claims)

Every method maps to one lifecycle

Setup

Parameters

Returns

Example

Parameters

Returns

Example

Ingestion

Parameters

Returns

Example

Parameters

Returns

Example

Parameters

Returns

Example

Parameters

Returns

Example

Parameters

Returns

Example

Parameters

Returns

Examples

Extraction modes

Querying

Parameters

Returns

Example

Parameters

Returns

Returns

Example

Parameters

Returns

Example

Understanding

Parameters

Returns

Example

Returns

Example

Health score breakdown

Parameters

Returns

Example

Parameters

Returns

Parameters

Returns

Topology

Provenance & Trust

Parameters

Returns

Example

Parameters

Returns

Example

Parameters

Returns

Example

Intelligence

Parameters

Returns

Example

Parameters

Returns

Example

Parameters

Returns

Example

Parameters

Returns

Example

Parameters

Returns

Example