Setup
Open or create an Attest database. The primary entry point for all operations.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
path | str | — | File path for the database |
embedding_dim | int | None | 768 | Embedding vector dimension. None disables embedding index |
strict | bool | False | Raise on validation warnings instead of logging |
Returns
AttestDB — database handle (also usable as context manager)
Example
import attestdb # Open or create a database db = attestdb.open("my_knowledge.db") # Context manager closes automatically with attestdb.open("my_knowledge.db") as db: db.ingest(...)
One-line setup: create a database, register vocabularies, and configure the curator in a single call.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
path | str | "attest.db" | Database file path |
vocabs | list[str] | None | None | Vocabularies to register: "bio", "devops", "ml" |
curator | str | "heuristic" | Curator mode: "heuristic" or a provider name |
embedding_dim | int | None | None | Embedding vector dimension. None disables embedding index |
Returns
AttestDB — fully configured database handle
Example
import attestdb db = attestdb.quickstart("bio.db", vocabs=["bio"], curator="gemini")
| Method | Description |
|---|---|
db.configure_curator(model="heuristic", api_key=None) | Set the curator. "heuristic" (offline) or a provider name (see LLM Providers). |
db.register_vocabulary(namespace, vocab) | Register entity types, predicates, and constraints for a domain. |
db.register_predicate(predicate_id, constraints) | Register a single predicate with subject/object type constraints. |
db.register_payload_schema(schema_id, schema) | Register a JSON schema for payload validation on a predicate type. |
db.close() | Close the database. Also works as a context manager: with attestdb.open(...) as db: |
Ingestion
Add a single claim with full provenance. The atomic write operation — every claim must have a source.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
subject | tuple[str, str] | — | Entity name and type, e.g. ("redis", "service") |
predicate | tuple[str, str] | — | Relationship and class, e.g. ("depends_on", "dependency") |
object | tuple[str, str] | — | Target entity name and type |
provenance | dict | — | Must include source_type and source_id |
confidence | float | None | None | 0.0–1.0 confidence score (auto-assigned if omitted) |
payload | dict | None | None | Arbitrary structured data attached to the claim |
timestamp | int | None | None | Unix timestamp in nanoseconds (auto-generated if omitted) |
external_ids | dict | None | None | External ID mappings for subject/object entities |
Returns
str — the claim_id (SHA-256 hash)
Example
claim_id = db.ingest( subject=("api-gateway", "service"), predicate=("depends_on", "dependency"), object=("redis", "service"), provenance={"source_type": "k8s_manifest", "source_id": "deploy/prod"}, confidence=0.95, )
Bulk-ingest many claims at once. Faster than individual ingest() calls — warms caches and skips per-claim corroboration tracking.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
claims | list[ClaimInput] | — | List of claim input objects |
Returns
BatchResult — .ingested (int), .duplicates (int), .errors (list)
Example
from attestdb import ClaimInput claims = [ ClaimInput( subject=("api-gateway", "service"), predicate=("depends_on", "dependency"), object=("redis", "service"), provenance={"source_type": "config", "source_id": "k8s"}, ), # ... more claims ] result = db.ingest_batch(claims) print(f"Ingested {result.ingested}, skipped {result.duplicates} dupes")
Extract claims from unstructured text using LLM-powered extraction, then ingest them.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
text | str | — | Raw text to extract claims from |
source_id | str | "" | Identifier for the text source |
use_curator | bool | True | Triage extracted claims through the curator |
Returns
list — extracted and ingested claims
Example
db.ingest_text( "Redis is used as the primary cache for the API gateway. " "The gateway also depends on PostgreSQL for persistent storage.", source_id="architecture_doc_v2", )
Extract claims from a conversation. Accepts OpenAI/Anthropic message format ([{role, content}, ...]).
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
messages | list[dict] | — | Chat messages in [{role, content}] format |
conversation_id | str | "" | Optional conversation identifier |
platform | str | "generic" | Platform hint: "generic", "chatgpt", "claude" |
use_curator | bool | True | Triage extracted claims through the curator |
extraction | str | "llm" | "llm", "heuristic", or "smart" |
Returns
ChatIngestionResult — per-turn breakdown of extracted claims
Example
messages = [
{"role": "user", "content": "What cache does our API use?"},
{"role": "assistant", "content": "The API gateway uses Redis for caching."},
]
result = db.ingest_chat(messages, extraction="heuristic")
Extract claims from a Slack workspace export ZIP. Optionally filter by channel name.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
path | str | — | Path to the Slack export ZIP file |
bot_ids | set[str] | None | None | Only treat these bot IDs as assistant. None = all bots |
channels | list[str] | None | None | Only process these channels. None = all channels |
use_curator | bool | True | Triage extracted claims through the curator |
extraction | str | "llm" | "llm", "heuristic", or "smart" |
Returns
list[ChatIngestionResult] — one result per channel/thread with bot interaction
Example
results = db.ingest_slack( "slack_export.zip", channels=["engineering", "incidents"], extraction="smart", )
Create a connector for an external data source. Returns a Connector instance —
call .run(db) to fetch and ingest. 30 connectors available: slack,
teams, gmail, gdocs, gdrive, zoho,
postgres, mysql, mssql, notion,
confluence, sharepoint, csv, sqlite,
github, jira, linear, hubspot,
salesforce, zendesk, servicenow, pagerduty,
http, airtable, mongodb, elasticsearch,
s3, google_sheets, box, dsi.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
name | str | — | Connector name (e.g. "slack", "postgres") |
save | bool | False | Persist credentials to encrypted token store (requires cryptography) |
**kwargs | — | — | Connector-specific options (token, dsn, mapping, etc.) |
Returns
Connector — call .run(db) to execute
Examples
# Slack: live channel history conn = db.connect("slack", token="xoxb-...", channels=["general"]) result = conn.run(db) # PostgreSQL: map query columns to claims conn = db.connect("postgres", dsn="postgresql://user:pass@host/db", query="SELECT gene, relation, target FROM interactions", mapping={"subject": "gene", "predicate": "relation", "object": "target"}, ) result = conn.run(db) # Notion: ingest pages as text conn = db.connect("notion", api_key="ntn_...", save=True) result = conn.run(db)
| Method | Description |
|---|---|
db.ingest_chat_file(path, platform="auto", use_curator=True, extraction="llm") | Extract claims from a file: ChatGPT export ZIP, JSON conversation, or plain text. |
db.curate(claims, agent_id="default") | Run claims through the curator before ingesting. Returns stored/skipped/flagged. |
Extraction modes
| Mode | API Key? | When to use |
|---|---|---|
"heuristic" | No | Explicit relational text ("X depends on Y"). Fast and free. |
"llm" | Yes | Nuanced or implicit relationships. Deeper understanding. |
"smart" | Yes | Large volumes. Heuristic first, LLM only for new content. Saves cost. |
Querying
Get a full picture of an entity: relationships, narrative summary, confidence scores, contradictions, and topic membership.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
focal_entity | str | — | Entity name or ID to query |
depth | int | 2 | BFS traversal depth |
min_confidence | float | 0.0 | Minimum confidence threshold for relationships |
exclude_source_types | list[str] | None | None | Source types to exclude from results |
max_claims | int | 500 | Maximum claims to consider |
max_tokens | int | 4000 | Token budget for narrative generation |
llm_narrative | bool | False | Use LLM for narrative generation instead of templates |
confidence_threshold | float | 0.0 | Hard filter on relationship confidence |
predicate_types | list[str] | None | None | Only include these predicate types |
Returns
ContextFrame — focal entity, relationships, claim count, narrative, confidence range, contradictions, topic membership
Example
frame = db.query("redis") print(frame.focal_entity.name, "—", frame.claim_count, "claims") print(frame.narrative) for rel in frame.direct_relationships: print(f" {rel.predicate} → {rel.target.name} ({rel.confidence:.0%})")
Semantic search via embeddings. Returns the top-k closest claims by vector similarity.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
query_embedding | list[float] | — | Query vector (must match embedding_dim) |
top_k | int | 10 | Number of results to return |
Returns
list[tuple[str, float]] — list of (claim_id, distance) pairs
Same as query() but also returns timing and candidate counts for performance profiling.
Returns
tuple[ContextFrame, QueryProfile] — the frame plus profiling data (elapsed_ms, total_candidates, after_scoring)
Example
frame, profile = db.explain("redis") print(f"Query took {profile.elapsed_ms:.1f}ms, {profile.total_candidates} candidates")
Find the top-k paths between two entities with per-hop edge details, sorted by total confidence.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
entity_a | str | — | Start entity |
entity_b | str | — | End entity |
max_depth | int | 3 | Maximum hops to search |
top_k | int | 5 | Number of paths to return |
Returns
list[PathResult] — each with .steps (list of PathStep), .total_confidence, .length
Example
paths = db.find_paths("api-gateway", "postgresql") for p in paths: hops = " → ".join(s.entity_id for s in p.steps) print(f"{hops} (confidence: {p.total_confidence:.2f})")
| Method | Description |
|---|---|
db.resolve(entity_id) | Resolve an entity name to its canonical normalized ID. |
db.get_entity(entity_id) | Get entity summary: name, type, claim count. Returns EntitySummary | None. |
db.claims_for(entity_id, predicate_type=None, source_type=None, min_confidence=0.0) | Get raw claims for an entity. Filter by predicate, source, or confidence. Returns list[Claim]. |
db.claims_by_content_id(content_id) | Get all claims about the same fact (corroboration group). Returns list[Claim]. |
db.list_entities(entity_type=None, min_claims=0) | List all entities. Filter by type or minimum claim count. Returns list[EntitySummary]. |
db.path_exists(entity_a, entity_b, max_depth=3) | Check if two entities are connected. Returns bool. |
db.raw_query(query, params=None) | Escape hatch: run a raw query against the storage engine. Returns list[list]. |
db.get_embedding(claim_id) | Retrieve the stored embedding vector for a claim. Returns list[float] | None. |
Understanding
Health metrics for the knowledge base: single-source entities, source distribution, knowledge density, gap counts.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
stale_threshold | int | 0 | Age in days to consider an entity stale (0 = disabled) |
expected_patterns | dict | None | None | Expected predicate patterns for gap detection |
Returns
QualityReport — total_claims, total_entities, single_source_entity_count, avg_claims_per_entity, source_type_distribution, predicate_distribution
Example
report = db.quality_report() print(f"{report.total_entities} entities, {report.total_claims} claims") print(f"{report.single_source_entity_count} single-source entities") print(f"Avg claims/entity: {report.avg_claims_per_entity:.1f}")
Quantified health score (0–100) with weighted metrics: multi-source ratio, corroboration, freshness, source diversity, and confidence trend.
Returns
KnowledgeHealth — health_score (0–100), multi_source_ratio, corroboration_ratio, freshness_score, source_diversity, confidence_trend, knowledge_density
Example
health = db.knowledge_health() print(f"Health: {health.health_score:.0f}/100") print(f"Multi-source ratio: {health.multi_source_ratio:.0%}") print(f"Corroboration: {health.corroboration_ratio:.0%}")
Health score breakdown
| Metric | Weight | What it measures |
|---|---|---|
| Multi-source ratio | 30% | Fraction of entities backed by more than one source |
| Corroboration ratio | 25% | Fraction of claims independently confirmed |
| Freshness | 20% | Recency of claims (30-day half-life decay) |
| Source diversity | 15% | Number of distinct source types |
| Confidence trend | 10% | Whether confidence is improving over time |
Vocabulary-driven gap identification. Compares each entity's relationship profile against expected predicate patterns for its type.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
expected_patterns | dict[str, set[str]] | — | Map of entity_type → expected predicates, e.g. {"gene": {"associated_with", "interacts_with"}} |
entity_type | str | None | None | Filter to a specific entity type |
min_claims | int | 1 | Minimum claims for an entity to be checked |
Returns
list[GapResult] — entities missing expected relationships
Example
gaps = db.find_gaps({ "gene": {"associated_with", "interacts_with", "expressed_in"}, "drug": {"treats", "targets"}, }) for gap in gaps: print(f"{gap.entity_id} missing: {gap.missing_predicate_types}")
Predict potential connections between currently-unlinked entities using embedding similarity and common-neighbor scoring.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
entity_type | str | None | None | Filter to a specific entity type |
min_claims | int | 2 | Minimum claims for entities to be considered |
max_depth | int | 3 | Maximum graph distance to search |
top_k | int | 50 | Number of bridge predictions to return |
max_degree | int | None | None | Exclude high-degree hub entities |
Returns
list[BridgePrediction] — predicted connections with confidence scores
Find entities with reliability concerns: single-source dependencies, stale data, or wide confidence spreads.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
entity_type | str | None | None | Filter to a specific entity type |
min_claims | int | 2 | Minimum claims for an entity to be checked |
stale_threshold | int | 0 | Days after which data is considered stale |
quality_spread | float | 0.3 | Max confidence range before flagging |
Returns
list[EntityConfidenceAlert] — entities with provenance or confidence issues
| Method | Description |
|---|---|
db.schema() | What entity types, predicates, and patterns exist, with counts. Returns SchemaDescriptor. |
db.stats() | Entity count, claim count, index size. Returns dict. |
Topology
| Method | Description |
|---|---|
db.compute_topology(resolutions=None, min_community_size=3) | Run Leiden community detection on the claim graph. |
db.topics(level=None) | Get topic hierarchy from last topology computation. Returns list[TopicNode]. |
db.density_map() | Density metrics per topic: claim count, source diversity. Returns list[DensityMapEntry]. |
db.cross_domain_bridges(top_k=20) | Find entities connecting different knowledge domains. Returns list[CrossDomainBridge]. |
db.query_topic(topic_id) | Get all entities in a specific topic. |
db.generate_structural_embeddings(dim=64) | SVD-based graph embeddings for all entities. Returns entity count. |
db.generate_weighted_structural_embeddings(dim=64) | Confidence-weighted SVD graph embeddings. Returns entity count. |
db.get_adjacency_list() | Build in-memory adjacency list from all claim edges. Returns dict[str, set[str]]. |
db.get_weighted_adjacency() | Weighted adjacency with per-edge confidence and sources. Returns dict. |
Provenance & Trust
Retract all claims from a source and mark anything that depended on them as degraded. The nuclear option for bad sources.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
source_id | str | — | Source to retract |
reason | str | — | Human-readable reason for retraction |
Returns
CascadeResult — .source_retract (RetractResult), .degraded_claim_ids (list), .degraded_count (int)
Example
result = db.retract_cascade("unreliable_vendor", "Data quality issues found in audit") print(f"Retracted {result.source_retract.retracted_count} claims") print(f"Degraded {result.degraded_count} downstream dependents")
Time-travel: query the knowledge base as it was at a specific point in time. Returns a read-only snapshot view.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
timestamp | int | — | Unix timestamp in nanoseconds |
Returns
AttestDBSnapshot — read-only view supporting query(), claims_for(), list_entities()
Example
import time # View knowledge base as of yesterday yesterday = int((time.time() - 86400) * 1_000_000_000) snapshot = db.at(yesterday) frame = snapshot.query("redis") print(f"Yesterday: {frame.claim_count} claims")
If this source is retracted, how many claims and entities are affected? Preview the blast radius before acting.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
source_id | str | — | Source to analyze |
Returns
ImpactReport — direct_claims, downstream_claims, affected_entities, claim_ids
Example
report = db.impact("vendor_api_v2") print(f"Direct: {report.direct_claims}, Downstream: {report.downstream_claims}") print(f"Affects {len(report.affected_entities)} entities")
| Method | Description |
|---|---|
db.retract(source_id, reason) | Mark all claims from a source as retracted (tombstoned). Returns RetractResult. |
db.trace_downstream(claim_id) | See what claims depend on a specific claim. Returns DownstreamNode tree. |
db.audit(claim_id) | Full provenance chain: who said it, corroborators, dependents. Returns AuditTrail. |
db.drift(days=30) | How has knowledge changed? New claims, new entities, retracted sources. Returns DriftReport. |
Intelligence
Methods that answer questions only a provenance-tracking database can answer.
Find entities backed by only a single source, knowledge gaps, and low-confidence areas.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
min_claims | int | 5 | Minimum claims for an entity to be flagged as single-source |
Returns
BlindspotMap — single_source_entities, knowledge_gaps, low_confidence_areas
Example
blind = db.blindspots() print(f"{len(blind.single_source_entities)} entities rely on a single source") for entity in blind.single_source_entities[:5]: print(f" {entity}")
How many independent sources agree about an entity? Returns agreement ratio, claims by source, and corroborated content IDs.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
topic | str | — | Entity name to analyze consensus for |
Returns
ConsensusReport — total_claims, unique_sources, avg_confidence, agreement_ratio, claims_by_source, corroborated_content_ids
Example
report = db.consensus("redis") print(f"Agreement: {report.agreement_ratio:.0%} across {report.unique_sources} sources")
Per-source corroboration and retraction rates. Pass a source_id for one source, or omit for all sources.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
source_id | str | None | None | Specific source to check, or None for all |
Returns
dict — per-source metrics: total_claims, active, retracted, degraded, corroboration_rate, retraction_rate
Example
reliability = db.source_reliability() for src, metrics in reliability.items(): print(f"{src}: {metrics['corroboration_rate']:.0%} corroborated")
What-if analysis: would this claim corroborate existing knowledge? Does it fill a gap between known entities?
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
claim | ClaimInput | — | The hypothetical claim to test |
Returns
HypotheticalReport — would_corroborate, existing_corroborations, fills_gap, content_id, related_entities
Example
from attestdb import ClaimInput report = db.hypothetical(ClaimInput( subject=("redis", "service"), predicate=("depends_on", "dependency"), object=("sentinel", "service"), provenance={"source_type": "test", "source_id": "test"}, )) print(f"Would corroborate: {report.would_corroborate}") print(f"Fills gap: {report.fills_gap}")
Discover novel regulatory predictions via causal composition. Follows causal edges through intermediaries and composes predicates (inhibits + inhibits = activates). Returns predictions ranked by convergent evidence — genuine gaps first. No LLM calls. Validated at 47% precision across 4 genes (8/17 confirmed, 0 contradicted).
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
entity_id | str | — | Entity to predict relationships for |
max_intermediaries | int | 100 | Maximum intermediary entities to explore in BFS |
min_paths | int | 3 | Minimum independent paths for a prediction |
directional_only | bool | False | Exclude "regulates" — only directional predicates (activates, inhibits, etc.) |
entity_aliases | dict | None | Entity ID alias map for cross-database dedup (from build_entity_aliases()) |
Returns
list[Prediction] — target, predicted_predicate, supporting_paths, opposing_paths, consensus, is_gap, evidence
Example
predictions = db.predict("gene_7157") # TP53 for p in predictions[:5]: print(f"{p.predicted_predicate} -> {p.target}") print(f" {p.supporting_paths} supporting, gap={p.is_gap}")
Test a hypothesis against the knowledge graph. Returns causal evidence for/against with multi-hop composition paths, contradiction detection, gap analysis, and follow-up suggestions. No LLM calls.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
subject | tuple | — | (entity_id, entity_type) |
predicate | tuple | — | (predicate_id, predicate_type) |
object | tuple | — | (entity_id, entity_type) |
confidence | float | 0.6 | Hypothetical confidence level |
Returns
SandboxVerdict — verdict (supported/contradicted/plausible/insufficient_data), confidence_score, direct + indirect evidence, gaps, follow-ups
Example
verdict = db.what_if( ("gene_940", "gene"), ("upregulates", "relation"), ("gene_29126", "gene"), ) print(verdict.verdict) # "plausible" print(verdict.explanation) # "12 causal path(s) supporting"
| Method | Description |
|---|---|
db.fragile(max_sources=1, min_age_days=0) | Find claims backed by few independent sources. Returns list[Claim]. |
db.stale(days=90) | Find claims not updated within the given period. Returns list[Claim]. |
Reason
Methods that generate new knowledge from graph structure — hypothesis testing, proactive discovery, and analogical reasoning.
Proactive hypothesis generation from graph structure. Three signals: bridge predictions (ensemble-scored pairs with composed predicates), cross-domain insights (topology bridge entities), and chain completion (2-hop pairs missing direct connections). Pure computation, no LLM.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
top_k | int | 10 | Maximum discoveries to return |
Returns
list[Discovery] — hypothesis, predicted_predicate, confidence, novelty_score, evidence_summary, supporting_paths, suggested_action
Example
for d in db.discover(top_k=5): print(d.hypothesis) print(f" {d.predicted_predicate} (conf={d.confidence:.2f}, novelty={d.novelty_score:.2f})") print(f" → {d.suggested_action}")
Find structural analogies: A:B :: C:D. Uses structural embeddings to find entities similar to A and B, then predicts the C:D pair and relationship.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
entity_a | str | — | Source entity A |
entity_b | str | — | Source entity B (connected to A) |
top_k | int | 5 | Maximum analogies to return |
Returns
list[Analogy] — entity_a, entity_b, entity_c, entity_d, predicted_predicate, score, explanation
Example
for a in db.analogies("BRCA1", "apoptosis"): print(f"{a.entity_c} : {a.entity_d} (score={a.score:.2f})") print(f" {a.explanation}")
Evaluate a natural-language hypothesis against the knowledge base. Parses entities, finds multi-hop evidence chains, and returns a verdict with supporting/contradicting evidence.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
hypothesis | str | — | Natural language hypothesis to test |
Returns
HypothesisVerdict — verdict (supported/contradicted/partial/unsupported), verdict_confidence, supporting_chains, contradicting_chains, confidence_gaps, suggested_next_steps
Example
verdict = db.test_hypothesis("aspirin reduces inflammation via COX-2") print(f"{verdict.verdict} (confidence={verdict.verdict_confidence:.2f})") for chain in verdict.supporting_chains: print(f" {chain.summary}")
| Method | Description |
|---|---|
db.evolution(entity_id, since=None) | Knowledge evolution over time: new connections, confidence changes, source diversification. Returns EvolutionReport. |
db.trace(entity_a, entity_b, max_depth=4) | Source-overlap-discounted reasoning chains between two entities. Returns list[ReasoningChain]. |
db.close_gaps(hypothesis=None, top_k=5) | Hypothesis-driven gap closing: test hypothesis, research confidence gaps, re-test. Returns CloseGapsReport. |
db.suggest_investigations(top_k=10) | Unified prioritized investigation recommendations synthesized from all insight signals. Returns list[Investigation]. |
Crown Jewels
Features impossible with any other database. These exploit Attest's unique combination of timestamps, provenance, confidence, corroboration grouping, and contradiction detection.
Knowledge diff — like git diff for knowledge. Shows what beliefs formed, strengthened, weakened, or contradicted between two time periods. No other database tracks how beliefs evolve over time.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
since | str | int | — | Start of period — ISO string or nanosecond int |
until | str | int | None | None | End of period (None = now) |
Returns
KnowledgeDiff — new_beliefs, strengthened, weakened, new_contradictions, new_entities, new_sources, total_new_claims, summary
Example
diff = db.diff(since="2025-01-01") print(diff.summary) # "47 new beliefs; 12 strengthened; 3 new contradictions; 8 new sources" for b in diff.new_beliefs[:5]: print(f" {b.subject} {b.predicate} {b.object} (conf={b.confidence_after:.2f})")
Self-healing contradictions. Finds all opposing claims (via OPPOSITE_PREDICATES), scores evidence quality on each side (corroboration, source diversity, recency, confidence), and picks winners. Optionally ingests resolution meta-claims. No other database can reason about its own conflicts.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
top_k | int | 10 | Maximum contradictions to return |
auto_resolve | bool | False | Ingest resolution meta-claims for clear winners |
use_llm | bool | False | Use LLM for ambiguous cases |
Returns
ContradictionReport — total_found, resolved, ambiguous, analyses (with evidence weights and margins), claims_added
Example
report = db.resolve_contradictions(auto_resolve=True) print(f"Found {report.total_found}, resolved {report.resolved}") for a in report.analyses: print(f" {a.subject} ↔ {a.object}: {a.resolution} (margin={a.margin:.2f})")
Counterfactual what-if analysis — compute cascading effects without modifying the database. "What if this paper is retracted? 47 claims affected, 3 drug mechanisms break." No other database can simulate scenarios on its own integrity.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
retract_source | str | None | None | Source ID to simulate retracting |
add_claim | ClaimInput | None | None | Claim to simulate adding |
remove_entity | str | None | None | Entity to simulate removing |
Returns
SimulationReport — claims_affected, claims_removed, claims_degraded, entities_now_orphaned, connection_losses, confidence_shifts, risk_score, risk_level, summary
Example
# What if our main source is wrong? sim = db.simulate(retract_source="paper_2024_nature") print(f"{sim.claims_removed} claims affected, risk: {sim.risk_level}") for loss in sim.connection_losses: print(f" {loss.entity_a} ↔ {loss.entity_b}: lost {loss.lost_predicates}")
Knowledge compilation — generate a structured research brief with citations, confidence levels, contradictions, and gaps. An automated literature review from the graph. No other database can produce a structured document with provenance-tracked evidence chains.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
topic | str | — | Topic to compile a brief for |
max_entities | int | 50 | Maximum entities to include |
use_llm | bool | False | Use LLM for narrative generation |
Returns
KnowledgeBrief — sections (title, key_findings, citations, contradictions, gaps), executive_summary, total_entities, total_claims_cited, strongest_findings, weakest_areas
Example
brief = db.compile("sickle cell treatment") print(brief.executive_summary) for section in brief.sections: print(f"\n## {section.title}") for f in section.key_findings: print(f" • {f}")
Full provenance-traced reasoning chain between two entities. Traces the best path with source citations at every hop, flags contradictions, computes reliability, and generates a human-readable narrative.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
entity_a | str | — | Start entity |
entity_b | str | — | End entity |
max_depth | int | 4 | Maximum hops to search |
use_llm | bool | False | Use LLM for narrative |
Returns
Explanation — connected, steps (with source_summary, evidence_text), chain_confidence, narrative, alternative_paths, source_count
Example
exp = db.explain_why("aspirin", "inflammation") print(exp.narrative) # Connection: aspirin → inflammation (2 hops, confidence=0.72) # 1. aspirin —[inhibits]→ cox-2 (conf=0.90) [2 source(s): paper_1, trial_5] # 2. cox-2 —[promotes]→ inflammation (conf=0.80) [1 source(s): textbook]
Predict next connections for an entity. Uses 2-hop structural analysis and historical growth patterns to predict which entities are most likely to become connected next.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
entity_id | str | — | Entity to forecast for |
top_k | int | 10 | Maximum predictions |
Returns
Forecast — predictions (target_entity, predicted_predicate, score, reason, evidence_entities), growth_rate, trajectory
Example
fc = db.forecast("BRCA1") print(f"Trajectory: {fc.trajectory}, {fc.growth_rate:.1f} connections/month") for p in fc.predictions[:5]: print(f" → {p.target_entity} via {p.predicted_predicate} (score={p.score:.2f})")
Diff two knowledge bases. Shows what each knows that the other doesn’t, shared beliefs, entity coverage gaps, and contradictions between them. No other database can structurally compare two knowledge bases.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
other | AttestDB | — | Another database to compare against |
Returns
MergeReport — self_unique_beliefs, other_unique_beliefs, shared_beliefs, conflicts, self_unique_entities, other_unique_entities, summary
Example
team_a = attestdb.open("team_a.db") team_b = attestdb.open("team_b.db") report = team_a.merge_report(team_b) print(f"Team A knows {report.self_unique_beliefs} things Team B doesn't") print(f"{len(report.conflicts)} disagreements")
Research
Close the loop: detect knowledge gaps, research answers via LLM, and ingest the results — automatically.
Plug in any external source (web search, PubMed, Slack) via the search_fn callback.
Full gap-closing loop: detect blindspots, formulate questions, research each via LLM, ingest validated claims, and measure improvement.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
max_questions | int | 20 | Max questions to generate and research |
use_curator | bool | True | Triage discovered claims through the curator |
search_fn | callable | None | None | Optional fn(question) → text for external search |
Returns
InvestigationReport — questions_generated, questions_researched, claims_ingested, blindspot_before, blindspot_after
Example
report = db.investigate(max_questions=10) print(f"Researched {report.questions_researched} questions") print(f"Ingested {report.claims_ingested} new claims") print(f"Blindspots: {report.blindspot_before} → {report.blindspot_after}")
Research a single question. The LLM generates structured claims, which are validated and ingested.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
question | str | — | Natural-language research question |
entity_id | str | None | None | Optional focal entity |
entity_type | str | "" | Optional entity type hint |
predicate_hint | str | "" | Optional predicate to hint at |
Returns
ResearchResult — claims_ingested, claims_rejected, inquiry_resolved, source
Example
result = db.research_question( "What databases does the API gateway depend on?", entity_id="api-gateway", entity_type="service", ) print(f"Ingested {result.claims_ingested} claims")
How it works
| Step | What happens |
|---|---|
| 1. Detect | blindspots() + find_gaps() + find_confidence_alerts() identify weak areas |
| 2. Question | Each gap becomes a natural-language research question, registered as an inquiry |
| 3. Research | LLM generates structured claims for each question (or search_fn provides external text) |
| 4. Ingest | Claims are validated and ingested with source_type="llm_research" |
| 5. Resolve | Matching inquiries are auto-resolved via the inquiry_matched event |
Pluggable search
# Use any external source as the research backend def pubmed_search(question: str) -> str: # Call PubMed, web search, internal wiki, etc. return fetch_abstracts(question) report = db.investigate(max_questions=10, search_fn=pubmed_search) print(f"Researched {report.questions_researched} questions") print(f"Ingested {report.claims_ingested} new claims") print(f"Blindspots: {report.blindspot_before} → {report.blindspot_after}")
Research questions
| Method | Description |
|---|---|
db.ingest_inquiry(question, subject, object, predicate_hint="") | Register a question you want answered. Returns inquiry claim_id. |
db.open_inquiries() | List all unanswered questions. Returns list[Claim]. |
db.check_inquiry_matches(subject_id=None, object_id=None, predicate_id=None) | Check if new claims match open questions. Returns list[str]. |
Autonomous self-learning (Autodidact)
Enable a background daemon that runs the detect → research → ingest loop continuously. Built-in evidence sources (PubMed, Semantic Scholar) auto-register. Paid sources (Perplexity, Serper) register when their API key is present. Dual budget caps (call count + dollar amount) prevent runaway costs.
| Method | Description |
|---|---|
db.enable_autodidact(interval=3600, max_cost_per_day=1.00, sources="auto") | Start the self-learning daemon. Runs gap detection and research on a timer. |
db.disable_autodidact() | Stop the daemon. |
db.autodidact_status() | Current status: cycles, claims ingested, cost today, budget state. Returns AutodidactStatus. |
db.autodidact_run_now() | Trigger an immediate research cycle. |
db.autodidact_cost_estimate(cycles=24) | Dry-run cost projection without executing. Returns cost breakdown dict. |
db.autodidact_history(limit=10) | Recent cycle reports with per-cycle costs. Returns list[CycleReport]. |
Example
# Enable with conservative defaults: 5 tasks/cycle, $1/day cap db.enable_autodidact(interval=1800, max_cost_per_day=2.00) # Check estimated costs before committing estimate = db.autodidact_cost_estimate(cycles=48) print(f"Est. daily cost: ${estimate['cost_per_day_capped']}") print(f"Est. monthly: ${estimate['cost_per_month_capped']}") # Monitor status = db.autodidact_status() print(f"Cycles: {status.cycle_count}, Claims learned: {status.total_claims_ingested}") print(f"Cost today: ${status.estimated_cost_today:.3f} / ${status.max_cost_per_day}") # Stop when done db.disable_autodidact()
Built-in evidence sources
| Priority | Source | Cost | API Key |
|---|---|---|---|
| 0 | Perplexity Sonar | ~$0.001/query | PERPLEXITY_API_KEY |
| 1 | PubMed (NCBI) | Free | None required |
| 2 | Semantic Scholar | Free | None required |
| 3 | Serper (Google) | ~$0.001/query | SERPER_API_KEY |
With sources="auto" (the default), free sources always register. Paid sources register
only when their API key is in the environment. Pass search_fn=my_fn to use your own source instead.
Backup & Restore
| Method | Description |
|---|---|
db.snapshot(dest_path) | Copy the database to a backup directory. Returns the destination path. |
AttestDB.restore(src_path, dest_path) | Restore a database from a snapshot. Returns an open AttestDB. |
Events
Subscribe to lifecycle events. Callbacks run synchronously after the operation completes. Errors in callbacks are logged, never propagated — your pipeline keeps running.
| Method | Description |
|---|---|
db.on(event, callback) | Register a callback for a lifecycle event. |
db.off(event, callback) | Remove a registered callback. |
Events
| Event | Kwargs | Fires when |
|---|---|---|
"claim_ingested" | claim_id, claim_input | After each ingest() call |
"claim_corroborated" | content_id, count | A newly ingested claim matches an existing one |
"source_retracted" | source_id, reason, claim_ids | After retract() |
"inquiry_matched" | inquiry_id, claim_id | A newly ingested claim answers an open inquiry |
Example
def on_new_claim(claim_id, claim_input, **kw): print(f"New claim: {claim_id}") def on_corroboration(content_id, count, **kw): print(f"Corroborated! {count} independent sources") db.on("claim_ingested", on_new_claim) db.on("claim_corroborated", on_corroboration)
Agent Integration
Two ways for external agents to read and write Attest — choose based on your agent framework.
MCP Server (Model Context Protocol)
For Claude Desktop, Claude Code, and any MCP-compatible agent. Ships as a CLI tool.
$ pip install attestdb[mcp] $ ATTEST_DB_PATH=my.db attest-mcp
Exposes 26 tools (ingest_claim, query_entity, search_entities,
knowledge_health, retract_source, attest_impact, attest_blindspots,
attest_consensus, attest_investigate, etc.) and 2 resources
(attest://entities, attest://schema) over stdio transport.
Claude Desktop configuration
{
"mcpServers": {
"attest": {
"command": "attest-mcp",
"env": {
"ATTEST_DB_PATH": "my_knowledge.db"
}
}
}
}
REST API
For web-based agents, custom integrations, or any HTTP client.
| Method | Path | Description |
|---|---|---|
POST | /api/v1/claims | Ingest a single claim |
POST | /api/v1/claims/batch | Bulk-ingest claims |
POST | /api/v1/claims/text | Extract claims from text |
GET | /api/v1/entities | List entities |
GET | /api/v1/entities/{id} | Get entity summary |
GET | /api/v1/entities/{id}/claims | Claims about an entity |
GET | /api/v1/entities/{id}/context | Full context frame |
GET | /api/v1/paths/{a}/{b} | Find paths between entities |
POST | /api/v1/retract | Retract a source |
GET | /api/v1/schema | Schema descriptor |
GET | /api/v1/stats | Database statistics |
GET | /api/v1/health | Knowledge health metrics |
GET | /api/v1/quality | Quality report |
GET | /api/v1/insights/bridges | Bridge predictions |
GET | /api/v1/insights/gaps | Confidence alerts |
Example
# Ingest a claim via REST curl -X POST http://localhost:8877/api/v1/claims \ -H "Content-Type: application/json" \ -d '{ "subject": ["api-gateway", "service"], "predicate": ["depends_on", "dependency"], "object": ["redis", "service"], "source_type": "k8s_manifest", "source_id": "deploy/prod" }' # Query an entity curl http://localhost:8877/api/v1/entities/redis/context # Check knowledge health curl http://localhost:8877/api/v1/health
LLM Providers
Set the environment variable for your provider, then configure:
| Provider | Environment Variable | Configure |
|---|---|---|
| Gemini (recommended) | GOOGLE_API_KEY | db.configure_curator("gemini") |
| Together | TOGETHER_API_KEY | db.configure_curator("together") |
| OpenAI | OPENAI_API_KEY | db.configure_curator("openai") |
| DeepSeek | DEEPSEEK_API_KEY | db.configure_curator("deepseek") |
| Grok | GROK_API_KEY | db.configure_curator("grok") |
| OpenRouter | OPENROUTER_API_KEY | db.configure_curator("openrouter") |
| Groq | GROQ_API_KEY | db.configure_curator("groq") (currently unavailable) |
| Anthropic | ANTHROPIC_API_KEY | db.configure_curator("anthropic") |
| GLM | GLM_API_KEY | db.configure_curator("glm") |
No API key? Use "heuristic" mode — it works entirely offline.
Data Types
Adding a claim
db.ingest( subject=("name", "type"), # e.g. ("api-gateway", "service") predicate=("relationship", "class"), # e.g. ("depends_on", "depends_on") object=("name", "type"), # e.g. ("redis", "service") provenance={ "source_type": "...", # What kind of source "source_id": "...", # Identifies the specific source }, confidence=0.9, # 0.0 to 1.0 (optional) payload={...}, # Any structured data (optional) )
Query result
frame = db.query("redis") frame.focal_entity # EntitySummary: name, type, claim_count frame.claim_count # Number of claims about it frame.direct_relationships # list[Relationship]: predicate, target, confidence frame.narrative # Human-readable summary frame.contradictions # list[Contradiction]: conflicting claims frame.confidence_range # tuple[float, float]: min and max confidence frame.topic_membership # list[str]: community IDs (if topology computed)
Batch input
from attestdb import ClaimInput claims = [ ClaimInput( subject=("api-gateway", "service"), predicate=("depends_on", "depends_on"), object=("redis", "service"), provenance={"source_type": "config_management", "source_id": "k8s"}, ), # ... more claims ] result = db.ingest_batch(claims)