Every method maps to one lifecycle

Claims flow through seven stages. The API is organized around this lifecycle — every method has a clear purpose in how knowledge enters, gets queried, reasons about itself, self-corrects, and grows.

1
Ingest
ingest()
ingest_batch()
ingest_text()
ingest_chat()
ingest_slack()
connect()
2
Query
query()
explain()
search()
resolve()
path_exists()
3
Understand
schema()
quality_report()
knowledge_health()
find_gaps()
find_bridges()
consensus()
blindspots()
4
Reason
test_hypothesis()
discover()
analogies()
evolution()
trace()
5
Correct
retract()
retract_cascade()
trace_downstream()
at()
impact()
audit()
drift()
6
Track
ingest_inquiry()
check_inquiry_matches()
open_inquiries()
fragile()
stale()
source_reliability()
hypothetical()
predict()
what_if()
7
Research
investigate()
close_gaps()
suggest_investigations()

Setup

attestdb.open(path, embedding_dim=768, strict=False)

Open or create an Attest database. The primary entry point for all operations.

Parameters

NameTypeDefaultDescription
pathstrFile path for the database
embedding_dimint | None768Embedding vector dimension. None disables embedding index
strictboolFalseRaise on validation warnings instead of logging

Returns

AttestDB — database handle (also usable as context manager)

Example

import attestdb

# Open or create a database
db = attestdb.open("my_knowledge.db")

# Context manager closes automatically
with attestdb.open("my_knowledge.db") as db:
    db.ingest(...)
attestdb.quickstart(path="attest.db", vocabs=None, curator="heuristic", embedding_dim=None)

One-line setup: create a database, register vocabularies, and configure the curator in a single call.

Parameters

NameTypeDefaultDescription
pathstr"attest.db"Database file path
vocabslist[str] | NoneNoneVocabularies to register: "bio", "devops", "ml"
curatorstr"heuristic"Curator mode: "heuristic" or a provider name
embedding_dimint | NoneNoneEmbedding vector dimension. None disables embedding index

Returns

AttestDB — fully configured database handle

Example

import attestdb

db = attestdb.quickstart("bio.db", vocabs=["bio"], curator="gemini")
MethodDescription
db.configure_curator(model="heuristic", api_key=None)Set the curator. "heuristic" (offline) or a provider name (see LLM Providers).
db.register_vocabulary(namespace, vocab)Register entity types, predicates, and constraints for a domain.
db.register_predicate(predicate_id, constraints)Register a single predicate with subject/object type constraints.
db.register_payload_schema(schema_id, schema)Register a JSON schema for payload validation on a predicate type.
db.close()Close the database. Also works as a context manager: with attestdb.open(...) as db:

Ingestion

db.ingest(subject, predicate, object, provenance, confidence=None, payload=None, timestamp=None)

Add a single claim with full provenance. The atomic write operation — every claim must have a source.

Parameters

NameTypeDefaultDescription
subjecttuple[str, str]Entity name and type, e.g. ("redis", "service")
predicatetuple[str, str]Relationship and class, e.g. ("depends_on", "dependency")
objecttuple[str, str]Target entity name and type
provenancedictMust include source_type and source_id
confidencefloat | NoneNone0.0–1.0 confidence score (auto-assigned if omitted)
payloaddict | NoneNoneArbitrary structured data attached to the claim
timestampint | NoneNoneUnix timestamp in nanoseconds (auto-generated if omitted)
external_idsdict | NoneNoneExternal ID mappings for subject/object entities

Returns

str — the claim_id (SHA-256 hash)

Example

claim_id = db.ingest(
    subject=("api-gateway", "service"),
    predicate=("depends_on", "dependency"),
    object=("redis", "service"),
    provenance={"source_type": "k8s_manifest", "source_id": "deploy/prod"},
    confidence=0.95,
)
db.ingest_batch(claims)

Bulk-ingest many claims at once. Faster than individual ingest() calls — warms caches and skips per-claim corroboration tracking.

Parameters

NameTypeDefaultDescription
claimslist[ClaimInput]List of claim input objects

Returns

BatchResult.ingested (int), .duplicates (int), .errors (list)

Example

from attestdb import ClaimInput

claims = [
    ClaimInput(
        subject=("api-gateway", "service"),
        predicate=("depends_on", "dependency"),
        object=("redis", "service"),
        provenance={"source_type": "config", "source_id": "k8s"},
    ),
    # ... more claims
]
result = db.ingest_batch(claims)
print(f"Ingested {result.ingested}, skipped {result.duplicates} dupes")
db.ingest_text(text, source_id="", use_curator=True)

Extract claims from unstructured text using LLM-powered extraction, then ingest them.

Parameters

NameTypeDefaultDescription
textstrRaw text to extract claims from
source_idstr""Identifier for the text source
use_curatorboolTrueTriage extracted claims through the curator

Returns

list — extracted and ingested claims

Example

db.ingest_text(
    "Redis is used as the primary cache for the API gateway. "
    "The gateway also depends on PostgreSQL for persistent storage.",
    source_id="architecture_doc_v2",
)
db.ingest_chat(messages, conversation_id="", platform="generic", use_curator=True, extraction="llm")

Extract claims from a conversation. Accepts OpenAI/Anthropic message format ([{role, content}, ...]).

Parameters

NameTypeDefaultDescription
messageslist[dict]Chat messages in [{role, content}] format
conversation_idstr""Optional conversation identifier
platformstr"generic"Platform hint: "generic", "chatgpt", "claude"
use_curatorboolTrueTriage extracted claims through the curator
extractionstr"llm""llm", "heuristic", or "smart"

Returns

ChatIngestionResult — per-turn breakdown of extracted claims

Example

messages = [
    {"role": "user", "content": "What cache does our API use?"},
    {"role": "assistant", "content": "The API gateway uses Redis for caching."},
]
result = db.ingest_chat(messages, extraction="heuristic")
db.ingest_slack(path, channels=None, use_curator=True, extraction="llm")

Extract claims from a Slack workspace export ZIP. Optionally filter by channel name.

Parameters

NameTypeDefaultDescription
pathstrPath to the Slack export ZIP file
bot_idsset[str] | NoneNoneOnly treat these bot IDs as assistant. None = all bots
channelslist[str] | NoneNoneOnly process these channels. None = all channels
use_curatorboolTrueTriage extracted claims through the curator
extractionstr"llm""llm", "heuristic", or "smart"

Returns

list[ChatIngestionResult] — one result per channel/thread with bot interaction

Example

results = db.ingest_slack(
    "slack_export.zip",
    channels=["engineering", "incidents"],
    extraction="smart",
)
db.connect(name, *, save=False, **kwargs)

Create a connector for an external data source. Returns a Connector instance — call .run(db) to fetch and ingest. 30 connectors available: slack, teams, gmail, gdocs, gdrive, zoho, postgres, mysql, mssql, notion, confluence, sharepoint, csv, sqlite, github, jira, linear, hubspot, salesforce, zendesk, servicenow, pagerduty, http, airtable, mongodb, elasticsearch, s3, google_sheets, box, dsi.

Parameters

NameTypeDefaultDescription
namestrConnector name (e.g. "slack", "postgres")
saveboolFalsePersist credentials to encrypted token store (requires cryptography)
**kwargsConnector-specific options (token, dsn, mapping, etc.)

Returns

Connector — call .run(db) to execute

Examples

# Slack: live channel history
conn = db.connect("slack", token="xoxb-...", channels=["general"])
result = conn.run(db)

# PostgreSQL: map query columns to claims
conn = db.connect("postgres",
    dsn="postgresql://user:pass@host/db",
    query="SELECT gene, relation, target FROM interactions",
    mapping={"subject": "gene", "predicate": "relation", "object": "target"},
)
result = conn.run(db)

# Notion: ingest pages as text
conn = db.connect("notion", api_key="ntn_...", save=True)
result = conn.run(db)
MethodDescription
db.ingest_chat_file(path, platform="auto", use_curator=True, extraction="llm")Extract claims from a file: ChatGPT export ZIP, JSON conversation, or plain text.
db.curate(claims, agent_id="default")Run claims through the curator before ingesting. Returns stored/skipped/flagged.

Extraction modes

ModeAPI Key?When to use
"heuristic"NoExplicit relational text ("X depends on Y"). Fast and free.
"llm"YesNuanced or implicit relationships. Deeper understanding.
"smart"YesLarge volumes. Heuristic first, LLM only for new content. Saves cost.

Querying

db.query(focal_entity, depth=2, min_confidence=0.0, max_claims=500)

Get a full picture of an entity: relationships, narrative summary, confidence scores, contradictions, and topic membership.

Parameters

NameTypeDefaultDescription
focal_entitystrEntity name or ID to query
depthint2BFS traversal depth
min_confidencefloat0.0Minimum confidence threshold for relationships
exclude_source_typeslist[str] | NoneNoneSource types to exclude from results
max_claimsint500Maximum claims to consider
max_tokensint4000Token budget for narrative generation
llm_narrativeboolFalseUse LLM for narrative generation instead of templates
confidence_thresholdfloat0.0Hard filter on relationship confidence
predicate_typeslist[str] | NoneNoneOnly include these predicate types

Returns

ContextFrame — focal entity, relationships, claim count, narrative, confidence range, contradictions, topic membership

Example

frame = db.query("redis")
print(frame.focal_entity.name, "—", frame.claim_count, "claims")
print(frame.narrative)
for rel in frame.direct_relationships:
    print(f"  {rel.predicate} → {rel.target.name} ({rel.confidence:.0%})")
db.explain(focal_entity, depth=2, min_confidence=0.0)

Same as query() but also returns timing and candidate counts for performance profiling.

Returns

tuple[ContextFrame, QueryProfile] — the frame plus profiling data (elapsed_ms, total_candidates, after_scoring)

Example

frame, profile = db.explain("redis")
print(f"Query took {profile.elapsed_ms:.1f}ms, {profile.total_candidates} candidates")
db.find_paths(entity_a, entity_b, max_depth=3, top_k=5)

Find the top-k paths between two entities with per-hop edge details, sorted by total confidence.

Parameters

NameTypeDefaultDescription
entity_astrStart entity
entity_bstrEnd entity
max_depthint3Maximum hops to search
top_kint5Number of paths to return

Returns

list[PathResult] — each with .steps (list of PathStep), .total_confidence, .length

Example

paths = db.find_paths("api-gateway", "postgresql")
for p in paths:
    hops = " → ".join(s.entity_id for s in p.steps)
    print(f"{hops}  (confidence: {p.total_confidence:.2f})")
MethodDescription
db.resolve(entity_id)Resolve an entity name to its canonical normalized ID.
db.get_entity(entity_id)Get entity summary: name, type, claim count. Returns EntitySummary | None.
db.claims_for(entity_id, predicate_type=None, source_type=None, min_confidence=0.0)Get raw claims for an entity. Filter by predicate, source, or confidence. Returns list[Claim].
db.claims_by_content_id(content_id)Get all claims about the same fact (corroboration group). Returns list[Claim].
db.list_entities(entity_type=None, min_claims=0)List all entities. Filter by type or minimum claim count. Returns list[EntitySummary].
db.path_exists(entity_a, entity_b, max_depth=3)Check if two entities are connected. Returns bool.
db.raw_query(query, params=None)Escape hatch: run a raw query against the storage engine. Returns list[list].
db.get_embedding(claim_id)Retrieve the stored embedding vector for a claim. Returns list[float] | None.

Understanding

db.quality_report(stale_threshold=0, expected_patterns=None)

Health metrics for the knowledge base: single-source entities, source distribution, knowledge density, gap counts.

Parameters

NameTypeDefaultDescription
stale_thresholdint0Age in days to consider an entity stale (0 = disabled)
expected_patternsdict | NoneNoneExpected predicate patterns for gap detection

Returns

QualityReport — total_claims, total_entities, single_source_entity_count, avg_claims_per_entity, source_type_distribution, predicate_distribution

Example

report = db.quality_report()
print(f"{report.total_entities} entities, {report.total_claims} claims")
print(f"{report.single_source_entity_count} single-source entities")
print(f"Avg claims/entity: {report.avg_claims_per_entity:.1f}")
db.knowledge_health()

Quantified health score (0–100) with weighted metrics: multi-source ratio, corroboration, freshness, source diversity, and confidence trend.

Returns

KnowledgeHealth — health_score (0–100), multi_source_ratio, corroboration_ratio, freshness_score, source_diversity, confidence_trend, knowledge_density

Example

health = db.knowledge_health()
print(f"Health: {health.health_score:.0f}/100")
print(f"Multi-source ratio: {health.multi_source_ratio:.0%}")
print(f"Corroboration: {health.corroboration_ratio:.0%}")

Health score breakdown

MetricWeightWhat it measures
Multi-source ratio30%Fraction of entities backed by more than one source
Corroboration ratio25%Fraction of claims independently confirmed
Freshness20%Recency of claims (30-day half-life decay)
Source diversity15%Number of distinct source types
Confidence trend10%Whether confidence is improving over time
db.find_gaps(expected_patterns, entity_type=None, min_claims=1)

Vocabulary-driven gap identification. Compares each entity's relationship profile against expected predicate patterns for its type.

Parameters

NameTypeDefaultDescription
expected_patternsdict[str, set[str]]Map of entity_type → expected predicates, e.g. {"gene": {"associated_with", "interacts_with"}}
entity_typestr | NoneNoneFilter to a specific entity type
min_claimsint1Minimum claims for an entity to be checked

Returns

list[GapResult] — entities missing expected relationships

Example

gaps = db.find_gaps({
    "gene": {"associated_with", "interacts_with", "expressed_in"},
    "drug": {"treats", "targets"},
})
for gap in gaps:
    print(f"{gap.entity_id} missing: {gap.missing_predicate_types}")
db.find_bridges(entity_type=None, min_claims=2, max_depth=3, top_k=50)

Predict potential connections between currently-unlinked entities using embedding similarity and common-neighbor scoring.

Parameters

NameTypeDefaultDescription
entity_typestr | NoneNoneFilter to a specific entity type
min_claimsint2Minimum claims for entities to be considered
max_depthint3Maximum graph distance to search
top_kint50Number of bridge predictions to return
max_degreeint | NoneNoneExclude high-degree hub entities

Returns

list[BridgePrediction] — predicted connections with confidence scores

db.find_confidence_alerts(entity_type=None, min_claims=2, quality_spread=0.3)

Find entities with reliability concerns: single-source dependencies, stale data, or wide confidence spreads.

Parameters

NameTypeDefaultDescription
entity_typestr | NoneNoneFilter to a specific entity type
min_claimsint2Minimum claims for an entity to be checked
stale_thresholdint0Days after which data is considered stale
quality_spreadfloat0.3Max confidence range before flagging

Returns

list[EntityConfidenceAlert] — entities with provenance or confidence issues

MethodDescription
db.schema()What entity types, predicates, and patterns exist, with counts. Returns SchemaDescriptor.
db.stats()Entity count, claim count, index size. Returns dict.

Topology

MethodDescription
db.compute_topology(resolutions=None, min_community_size=3)Run Leiden community detection on the claim graph.
db.topics(level=None)Get topic hierarchy from last topology computation. Returns list[TopicNode].
db.density_map()Density metrics per topic: claim count, source diversity. Returns list[DensityMapEntry].
db.cross_domain_bridges(top_k=20)Find entities connecting different knowledge domains. Returns list[CrossDomainBridge].
db.query_topic(topic_id)Get all entities in a specific topic.
db.generate_structural_embeddings(dim=64)SVD-based graph embeddings for all entities. Returns entity count.
db.generate_weighted_structural_embeddings(dim=64)Confidence-weighted SVD graph embeddings. Returns entity count.
db.get_adjacency_list()Build in-memory adjacency list from all claim edges. Returns dict[str, set[str]].
db.get_weighted_adjacency()Weighted adjacency with per-edge confidence and sources. Returns dict.

Provenance & Trust

db.retract_cascade(source_id, reason)

Retract all claims from a source and mark anything that depended on them as degraded. The nuclear option for bad sources.

Parameters

NameTypeDefaultDescription
source_idstrSource to retract
reasonstrHuman-readable reason for retraction

Returns

CascadeResult.source_retract (RetractResult), .degraded_claim_ids (list), .degraded_count (int)

Example

result = db.retract_cascade("unreliable_vendor", "Data quality issues found in audit")
print(f"Retracted {result.source_retract.retracted_count} claims")
print(f"Degraded {result.degraded_count} downstream dependents")
db.at(timestamp)

Time-travel: query the knowledge base as it was at a specific point in time. Returns a read-only snapshot view.

Parameters

NameTypeDefaultDescription
timestampintUnix timestamp in nanoseconds

Returns

AttestDBSnapshot — read-only view supporting query(), claims_for(), list_entities()

Example

import time

# View knowledge base as of yesterday
yesterday = int((time.time() - 86400) * 1_000_000_000)
snapshot = db.at(yesterday)
frame = snapshot.query("redis")
print(f"Yesterday: {frame.claim_count} claims")
db.impact(source_id)

If this source is retracted, how many claims and entities are affected? Preview the blast radius before acting.

Parameters

NameTypeDefaultDescription
source_idstrSource to analyze

Returns

ImpactReport — direct_claims, downstream_claims, affected_entities, claim_ids

Example

report = db.impact("vendor_api_v2")
print(f"Direct: {report.direct_claims}, Downstream: {report.downstream_claims}")
print(f"Affects {len(report.affected_entities)} entities")
MethodDescription
db.retract(source_id, reason)Mark all claims from a source as retracted (tombstoned). Returns RetractResult.
db.trace_downstream(claim_id)See what claims depend on a specific claim. Returns DownstreamNode tree.
db.audit(claim_id)Full provenance chain: who said it, corroborators, dependents. Returns AuditTrail.
db.drift(days=30)How has knowledge changed? New claims, new entities, retracted sources. Returns DriftReport.

Intelligence

Methods that answer questions only a provenance-tracking database can answer.

db.blindspots(min_claims=5)

Find entities backed by only a single source, knowledge gaps, and low-confidence areas.

Parameters

NameTypeDefaultDescription
min_claimsint5Minimum claims for an entity to be flagged as single-source

Returns

BlindspotMap — single_source_entities, knowledge_gaps, low_confidence_areas

Example

blind = db.blindspots()
print(f"{len(blind.single_source_entities)} entities rely on a single source")
for entity in blind.single_source_entities[:5]:
    print(f"  {entity}")
db.consensus(topic)

How many independent sources agree about an entity? Returns agreement ratio, claims by source, and corroborated content IDs.

Parameters

NameTypeDefaultDescription
topicstrEntity name to analyze consensus for

Returns

ConsensusReport — total_claims, unique_sources, avg_confidence, agreement_ratio, claims_by_source, corroborated_content_ids

Example

report = db.consensus("redis")
print(f"Agreement: {report.agreement_ratio:.0%} across {report.unique_sources} sources")
db.source_reliability(source_id=None)

Per-source corroboration and retraction rates. Pass a source_id for one source, or omit for all sources.

Parameters

NameTypeDefaultDescription
source_idstr | NoneNoneSpecific source to check, or None for all

Returns

dict — per-source metrics: total_claims, active, retracted, degraded, corroboration_rate, retraction_rate

Example

reliability = db.source_reliability()
for src, metrics in reliability.items():
    print(f"{src}: {metrics['corroboration_rate']:.0%} corroborated")
db.hypothetical(claim)

What-if analysis: would this claim corroborate existing knowledge? Does it fill a gap between known entities?

Parameters

NameTypeDefaultDescription
claimClaimInputThe hypothetical claim to test

Returns

HypotheticalReport — would_corroborate, existing_corroborations, fills_gap, content_id, related_entities

Example

from attestdb import ClaimInput

report = db.hypothetical(ClaimInput(
    subject=("redis", "service"),
    predicate=("depends_on", "dependency"),
    object=("sentinel", "service"),
    provenance={"source_type": "test", "source_id": "test"},
))
print(f"Would corroborate: {report.would_corroborate}")
print(f"Fills gap: {report.fills_gap}")
db.predict(entity_id, max_intermediaries=100, min_paths=3, directional_only=False, entity_aliases=None)

Discover novel regulatory predictions via causal composition. Follows causal edges through intermediaries and composes predicates (inhibits + inhibits = activates). Returns predictions ranked by convergent evidence — genuine gaps first. No LLM calls. Validated at 47% precision across 4 genes (8/17 confirmed, 0 contradicted).

Parameters

NameTypeDefaultDescription
entity_idstrEntity to predict relationships for
max_intermediariesint100Maximum intermediary entities to explore in BFS
min_pathsint3Minimum independent paths for a prediction
directional_onlyboolFalseExclude "regulates" — only directional predicates (activates, inhibits, etc.)
entity_aliasesdictNoneEntity ID alias map for cross-database dedup (from build_entity_aliases())

Returns

list[Prediction] — target, predicted_predicate, supporting_paths, opposing_paths, consensus, is_gap, evidence

Example

predictions = db.predict("gene_7157")  # TP53
for p in predictions[:5]:
    print(f"{p.predicted_predicate} -> {p.target}")
    print(f"  {p.supporting_paths} supporting, gap={p.is_gap}")
db.what_if(subject, predicate, object, confidence=0.6)

Test a hypothesis against the knowledge graph. Returns causal evidence for/against with multi-hop composition paths, contradiction detection, gap analysis, and follow-up suggestions. No LLM calls.

Parameters

NameTypeDefaultDescription
subjecttuple(entity_id, entity_type)
predicatetuple(predicate_id, predicate_type)
objecttuple(entity_id, entity_type)
confidencefloat0.6Hypothetical confidence level

Returns

SandboxVerdict — verdict (supported/contradicted/plausible/insufficient_data), confidence_score, direct + indirect evidence, gaps, follow-ups

Example

verdict = db.what_if(
    ("gene_940", "gene"),
    ("upregulates", "relation"),
    ("gene_29126", "gene"),
)
print(verdict.verdict)       # "plausible"
print(verdict.explanation)   # "12 causal path(s) supporting"
MethodDescription
db.fragile(max_sources=1, min_age_days=0)Find claims backed by few independent sources. Returns list[Claim].
db.stale(days=90)Find claims not updated within the given period. Returns list[Claim].

Reason

Methods that generate new knowledge from graph structure — hypothesis testing, proactive discovery, and analogical reasoning.

db.discover(top_k=10)

Proactive hypothesis generation from graph structure. Three signals: bridge predictions (ensemble-scored pairs with composed predicates), cross-domain insights (topology bridge entities), and chain completion (2-hop pairs missing direct connections). Pure computation, no LLM.

Parameters

NameTypeDefaultDescription
top_kint10Maximum discoveries to return

Returns

list[Discovery] — hypothesis, predicted_predicate, confidence, novelty_score, evidence_summary, supporting_paths, suggested_action

Example

for d in db.discover(top_k=5):
    print(d.hypothesis)
    print(f"  {d.predicted_predicate} (conf={d.confidence:.2f}, novelty={d.novelty_score:.2f})")
    print(f"  → {d.suggested_action}")
db.analogies(entity_a, entity_b, top_k=5)

Find structural analogies: A:B :: C:D. Uses structural embeddings to find entities similar to A and B, then predicts the C:D pair and relationship.

Parameters

NameTypeDefaultDescription
entity_astrSource entity A
entity_bstrSource entity B (connected to A)
top_kint5Maximum analogies to return

Returns

list[Analogy] — entity_a, entity_b, entity_c, entity_d, predicted_predicate, score, explanation

Example

for a in db.analogies("BRCA1", "apoptosis"):
    print(f"{a.entity_c} : {a.entity_d} (score={a.score:.2f})")
    print(f"  {a.explanation}")
db.test_hypothesis(hypothesis)

Evaluate a natural-language hypothesis against the knowledge base. Parses entities, finds multi-hop evidence chains, and returns a verdict with supporting/contradicting evidence.

Parameters

NameTypeDefaultDescription
hypothesisstrNatural language hypothesis to test

Returns

HypothesisVerdict — verdict (supported/contradicted/partial/unsupported), verdict_confidence, supporting_chains, contradicting_chains, confidence_gaps, suggested_next_steps

Example

verdict = db.test_hypothesis("aspirin reduces inflammation via COX-2")
print(f"{verdict.verdict} (confidence={verdict.verdict_confidence:.2f})")
for chain in verdict.supporting_chains:
    print(f"  {chain.summary}")
MethodDescription
db.evolution(entity_id, since=None)Knowledge evolution over time: new connections, confidence changes, source diversification. Returns EvolutionReport.
db.trace(entity_a, entity_b, max_depth=4)Source-overlap-discounted reasoning chains between two entities. Returns list[ReasoningChain].
db.close_gaps(hypothesis=None, top_k=5)Hypothesis-driven gap closing: test hypothesis, research confidence gaps, re-test. Returns CloseGapsReport.
db.suggest_investigations(top_k=10)Unified prioritized investigation recommendations synthesized from all insight signals. Returns list[Investigation].

Crown Jewels

Features impossible with any other database. These exploit Attest's unique combination of timestamps, provenance, confidence, corroboration grouping, and contradiction detection.

db.diff(since, until=None)

Knowledge diff — like git diff for knowledge. Shows what beliefs formed, strengthened, weakened, or contradicted between two time periods. No other database tracks how beliefs evolve over time.

Parameters

NameTypeDefaultDescription
sincestr | intStart of period — ISO string or nanosecond int
untilstr | int | NoneNoneEnd of period (None = now)

Returns

KnowledgeDiff — new_beliefs, strengthened, weakened, new_contradictions, new_entities, new_sources, total_new_claims, summary

Example

diff = db.diff(since="2025-01-01")
print(diff.summary)
# "47 new beliefs; 12 strengthened; 3 new contradictions; 8 new sources"
for b in diff.new_beliefs[:5]:
    print(f"  {b.subject} {b.predicate} {b.object} (conf={b.confidence_after:.2f})")
db.resolve_contradictions(top_k=10, auto_resolve=False, use_llm=False)

Self-healing contradictions. Finds all opposing claims (via OPPOSITE_PREDICATES), scores evidence quality on each side (corroboration, source diversity, recency, confidence), and picks winners. Optionally ingests resolution meta-claims. No other database can reason about its own conflicts.

Parameters

NameTypeDefaultDescription
top_kint10Maximum contradictions to return
auto_resolveboolFalseIngest resolution meta-claims for clear winners
use_llmboolFalseUse LLM for ambiguous cases

Returns

ContradictionReport — total_found, resolved, ambiguous, analyses (with evidence weights and margins), claims_added

Example

report = db.resolve_contradictions(auto_resolve=True)
print(f"Found {report.total_found}, resolved {report.resolved}")
for a in report.analyses:
    print(f"  {a.subject} ↔ {a.object}: {a.resolution} (margin={a.margin:.2f})")
db.simulate(retract_source=None, add_claim=None, remove_entity=None)

Counterfactual what-if analysis — compute cascading effects without modifying the database. "What if this paper is retracted? 47 claims affected, 3 drug mechanisms break." No other database can simulate scenarios on its own integrity.

Parameters

NameTypeDefaultDescription
retract_sourcestr | NoneNoneSource ID to simulate retracting
add_claimClaimInput | NoneNoneClaim to simulate adding
remove_entitystr | NoneNoneEntity to simulate removing

Returns

SimulationReport — claims_affected, claims_removed, claims_degraded, entities_now_orphaned, connection_losses, confidence_shifts, risk_score, risk_level, summary

Example

# What if our main source is wrong?
sim = db.simulate(retract_source="paper_2024_nature")
print(f"{sim.claims_removed} claims affected, risk: {sim.risk_level}")
for loss in sim.connection_losses:
    print(f"  {loss.entity_a} ↔ {loss.entity_b}: lost {loss.lost_predicates}")
db.compile(topic, max_entities=50, use_llm=False)

Knowledge compilation — generate a structured research brief with citations, confidence levels, contradictions, and gaps. An automated literature review from the graph. No other database can produce a structured document with provenance-tracked evidence chains.

Parameters

NameTypeDefaultDescription
topicstrTopic to compile a brief for
max_entitiesint50Maximum entities to include
use_llmboolFalseUse LLM for narrative generation

Returns

KnowledgeBrief — sections (title, key_findings, citations, contradictions, gaps), executive_summary, total_entities, total_claims_cited, strongest_findings, weakest_areas

Example

brief = db.compile("sickle cell treatment")
print(brief.executive_summary)
for section in brief.sections:
    print(f"\n## {section.title}")
    for f in section.key_findings:
        print(f"  • {f}")
db.explain_why(entity_a, entity_b, max_depth=4, use_llm=False)

Full provenance-traced reasoning chain between two entities. Traces the best path with source citations at every hop, flags contradictions, computes reliability, and generates a human-readable narrative.

Parameters

NameTypeDefaultDescription
entity_astrStart entity
entity_bstrEnd entity
max_depthint4Maximum hops to search
use_llmboolFalseUse LLM for narrative

Returns

Explanation — connected, steps (with source_summary, evidence_text), chain_confidence, narrative, alternative_paths, source_count

Example

exp = db.explain_why("aspirin", "inflammation")
print(exp.narrative)
# Connection: aspirin → inflammation (2 hops, confidence=0.72)
#   1. aspirin —[inhibits]→ cox-2 (conf=0.90) [2 source(s): paper_1, trial_5]
#   2. cox-2 —[promotes]→ inflammation (conf=0.80) [1 source(s): textbook]
db.forecast(entity_id, top_k=10)

Predict next connections for an entity. Uses 2-hop structural analysis and historical growth patterns to predict which entities are most likely to become connected next.

Parameters

NameTypeDefaultDescription
entity_idstrEntity to forecast for
top_kint10Maximum predictions

Returns

Forecast — predictions (target_entity, predicted_predicate, score, reason, evidence_entities), growth_rate, trajectory

Example

fc = db.forecast("BRCA1")
print(f"Trajectory: {fc.trajectory}, {fc.growth_rate:.1f} connections/month")
for p in fc.predictions[:5]:
    print(f"  → {p.target_entity} via {p.predicted_predicate} (score={p.score:.2f})")
db.merge_report(other)

Diff two knowledge bases. Shows what each knows that the other doesn’t, shared beliefs, entity coverage gaps, and contradictions between them. No other database can structurally compare two knowledge bases.

Parameters

NameTypeDefaultDescription
otherAttestDBAnother database to compare against

Returns

MergeReport — self_unique_beliefs, other_unique_beliefs, shared_beliefs, conflicts, self_unique_entities, other_unique_entities, summary

Example

team_a = attestdb.open("team_a.db")
team_b = attestdb.open("team_b.db")
report = team_a.merge_report(team_b)
print(f"Team A knows {report.self_unique_beliefs} things Team B doesn't")
print(f"{len(report.conflicts)} disagreements")

Research

Close the loop: detect knowledge gaps, research answers via LLM, and ingest the results — automatically. Plug in any external source (web search, PubMed, Slack) via the search_fn callback.

db.investigate(max_questions=20, use_curator=True, search_fn=None)

Full gap-closing loop: detect blindspots, formulate questions, research each via LLM, ingest validated claims, and measure improvement.

Parameters

NameTypeDefaultDescription
max_questionsint20Max questions to generate and research
use_curatorboolTrueTriage discovered claims through the curator
search_fncallable | NoneNoneOptional fn(question) → text for external search

Returns

InvestigationReport — questions_generated, questions_researched, claims_ingested, blindspot_before, blindspot_after

Example

report = db.investigate(max_questions=10)
print(f"Researched {report.questions_researched} questions")
print(f"Ingested {report.claims_ingested} new claims")
print(f"Blindspots: {report.blindspot_before} → {report.blindspot_after}")
db.research_question(question, entity_id=None, entity_type="", predicate_hint="")

Research a single question. The LLM generates structured claims, which are validated and ingested.

Parameters

NameTypeDefaultDescription
questionstrNatural-language research question
entity_idstr | NoneNoneOptional focal entity
entity_typestr""Optional entity type hint
predicate_hintstr""Optional predicate to hint at

Returns

ResearchResult — claims_ingested, claims_rejected, inquiry_resolved, source

Example

result = db.research_question(
    "What databases does the API gateway depend on?",
    entity_id="api-gateway",
    entity_type="service",
)
print(f"Ingested {result.claims_ingested} claims")

How it works

StepWhat happens
1. Detectblindspots() + find_gaps() + find_confidence_alerts() identify weak areas
2. QuestionEach gap becomes a natural-language research question, registered as an inquiry
3. ResearchLLM generates structured claims for each question (or search_fn provides external text)
4. IngestClaims are validated and ingested with source_type="llm_research"
5. ResolveMatching inquiries are auto-resolved via the inquiry_matched event

Pluggable search

# Use any external source as the research backend
def pubmed_search(question: str) -> str:
    # Call PubMed, web search, internal wiki, etc.
    return fetch_abstracts(question)

report = db.investigate(max_questions=10, search_fn=pubmed_search)
print(f"Researched {report.questions_researched} questions")
print(f"Ingested {report.claims_ingested} new claims")
print(f"Blindspots: {report.blindspot_before}{report.blindspot_after}")

Research questions

MethodDescription
db.ingest_inquiry(question, subject, object, predicate_hint="")Register a question you want answered. Returns inquiry claim_id.
db.open_inquiries()List all unanswered questions. Returns list[Claim].
db.check_inquiry_matches(subject_id=None, object_id=None, predicate_id=None)Check if new claims match open questions. Returns list[str].

Autonomous self-learning (Autodidact)

Enable a background daemon that runs the detect → research → ingest loop continuously. Built-in evidence sources (PubMed, Semantic Scholar) auto-register. Paid sources (Perplexity, Serper) register when their API key is present. Dual budget caps (call count + dollar amount) prevent runaway costs.

MethodDescription
db.enable_autodidact(interval=3600, max_cost_per_day=1.00, sources="auto")Start the self-learning daemon. Runs gap detection and research on a timer.
db.disable_autodidact()Stop the daemon.
db.autodidact_status()Current status: cycles, claims ingested, cost today, budget state. Returns AutodidactStatus.
db.autodidact_run_now()Trigger an immediate research cycle.
db.autodidact_cost_estimate(cycles=24)Dry-run cost projection without executing. Returns cost breakdown dict.
db.autodidact_history(limit=10)Recent cycle reports with per-cycle costs. Returns list[CycleReport].

Example

# Enable with conservative defaults: 5 tasks/cycle, $1/day cap
db.enable_autodidact(interval=1800, max_cost_per_day=2.00)

# Check estimated costs before committing
estimate = db.autodidact_cost_estimate(cycles=48)
print(f"Est. daily cost: ${estimate['cost_per_day_capped']}")
print(f"Est. monthly: ${estimate['cost_per_month_capped']}")

# Monitor
status = db.autodidact_status()
print(f"Cycles: {status.cycle_count}, Claims learned: {status.total_claims_ingested}")
print(f"Cost today: ${status.estimated_cost_today:.3f} / ${status.max_cost_per_day}")

# Stop when done
db.disable_autodidact()

Built-in evidence sources

PrioritySourceCostAPI Key
0Perplexity Sonar~$0.001/queryPERPLEXITY_API_KEY
1PubMed (NCBI)FreeNone required
2Semantic ScholarFreeNone required
3Serper (Google)~$0.001/querySERPER_API_KEY

With sources="auto" (the default), free sources always register. Paid sources register only when their API key is in the environment. Pass search_fn=my_fn to use your own source instead.

Backup & Restore

MethodDescription
db.snapshot(dest_path)Copy the database to a backup directory. Returns the destination path.
AttestDB.restore(src_path, dest_path)Restore a database from a snapshot. Returns an open AttestDB.

Events

Subscribe to lifecycle events. Callbacks run synchronously after the operation completes. Errors in callbacks are logged, never propagated — your pipeline keeps running.

MethodDescription
db.on(event, callback)Register a callback for a lifecycle event.
db.off(event, callback)Remove a registered callback.

Events

EventKwargsFires when
"claim_ingested"claim_id, claim_inputAfter each ingest() call
"claim_corroborated"content_id, countA newly ingested claim matches an existing one
"source_retracted"source_id, reason, claim_idsAfter retract()
"inquiry_matched"inquiry_id, claim_idA newly ingested claim answers an open inquiry

Example

def on_new_claim(claim_id, claim_input, **kw):
    print(f"New claim: {claim_id}")

def on_corroboration(content_id, count, **kw):
    print(f"Corroborated! {count} independent sources")

db.on("claim_ingested", on_new_claim)
db.on("claim_corroborated", on_corroboration)

Agent Integration

Two ways for external agents to read and write Attest — choose based on your agent framework.

Local-first security model: Attest follows the SQLite deployment model — the database file is your security boundary. The MCP server and REST API are designed for localhost access and do not include authentication. For multi-user or network deployments, place them behind a reverse proxy with your auth layer (OAuth, API keys, mTLS, etc.).

MCP Server (Model Context Protocol)

For Claude Desktop, Claude Code, and any MCP-compatible agent. Ships as a CLI tool.

$ pip install attestdb[mcp]
$ ATTEST_DB_PATH=my.db attest-mcp

Exposes 26 tools (ingest_claim, query_entity, search_entities, knowledge_health, retract_source, attest_impact, attest_blindspots, attest_consensus, attest_investigate, etc.) and 2 resources (attest://entities, attest://schema) over stdio transport.

Claude Desktop configuration

{
  "mcpServers": {
    "attest": {
      "command": "attest-mcp",
      "env": {
        "ATTEST_DB_PATH": "my_knowledge.db"
      }
    }
  }
}

REST API

For web-based agents, custom integrations, or any HTTP client.

MethodPathDescription
POST/api/v1/claimsIngest a single claim
POST/api/v1/claims/batchBulk-ingest claims
POST/api/v1/claims/textExtract claims from text
GET/api/v1/entitiesList entities
GET/api/v1/entities/{id}Get entity summary
GET/api/v1/entities/{id}/claimsClaims about an entity
GET/api/v1/entities/{id}/contextFull context frame
GET/api/v1/paths/{a}/{b}Find paths between entities
POST/api/v1/retractRetract a source
GET/api/v1/schemaSchema descriptor
GET/api/v1/statsDatabase statistics
GET/api/v1/healthKnowledge health metrics
GET/api/v1/qualityQuality report
GET/api/v1/insights/bridgesBridge predictions
GET/api/v1/insights/gapsConfidence alerts

Example

# Ingest a claim via REST
curl -X POST http://localhost:8877/api/v1/claims \
  -H "Content-Type: application/json" \
  -d '{
    "subject": ["api-gateway", "service"],
    "predicate": ["depends_on", "dependency"],
    "object": ["redis", "service"],
    "source_type": "k8s_manifest",
    "source_id": "deploy/prod"
  }'

# Query an entity
curl http://localhost:8877/api/v1/entities/redis/context

# Check knowledge health
curl http://localhost:8877/api/v1/health

LLM Providers

Set the environment variable for your provider, then configure:

ProviderEnvironment VariableConfigure
Gemini (recommended)GOOGLE_API_KEYdb.configure_curator("gemini")
TogetherTOGETHER_API_KEYdb.configure_curator("together")
OpenAIOPENAI_API_KEYdb.configure_curator("openai")
DeepSeekDEEPSEEK_API_KEYdb.configure_curator("deepseek")
GrokGROK_API_KEYdb.configure_curator("grok")
OpenRouterOPENROUTER_API_KEYdb.configure_curator("openrouter")
GroqGROQ_API_KEYdb.configure_curator("groq") (currently unavailable)
AnthropicANTHROPIC_API_KEYdb.configure_curator("anthropic")
GLMGLM_API_KEYdb.configure_curator("glm")

No API key? Use "heuristic" mode — it works entirely offline.

Data Types

Adding a claim

db.ingest(
    subject=("name", "type"),          # e.g. ("api-gateway", "service")
    predicate=("relationship", "class"), # e.g. ("depends_on", "depends_on")
    object=("name", "type"),             # e.g. ("redis", "service")
    provenance={
        "source_type": "...",           # What kind of source
        "source_id": "...",             # Identifies the specific source
    },
    confidence=0.9,                    # 0.0 to 1.0 (optional)
    payload={...},                     # Any structured data (optional)
)

Query result

frame = db.query("redis")
frame.focal_entity           # EntitySummary: name, type, claim_count
frame.claim_count            # Number of claims about it
frame.direct_relationships   # list[Relationship]: predicate, target, confidence
frame.narrative              # Human-readable summary
frame.contradictions         # list[Contradiction]: conflicting claims
frame.confidence_range       # tuple[float, float]: min and max confidence
frame.topic_membership       # list[str]: community IDs (if topology computed)

Batch input

from attestdb import ClaimInput

claims = [
    ClaimInput(
        subject=("api-gateway", "service"),
        predicate=("depends_on", "depends_on"),
        object=("redis", "service"),
        provenance={"source_type": "config_management", "source_id": "k8s"},
    ),
    # ... more claims
]
result = db.ingest_batch(claims)