A background daemon that detects knowledge gaps, searches evidence sources, extracts claims via LLM, and ingests validated findings — all with built-in cost controls so it never runs away.
Autodidact runs a continuous loop: detect gaps in your knowledge graph, generate research questions, search evidence sources, extract structured claims via LLM, and ingest validated findings. Each cycle closes blindspots and raises confidence on under-evidenced entities.
Scans for single-source entities, low-confidence claims, unresolved questions, and entities
with no incoming evidence. Uses blindspots() and inquiry tracking.
Generates targeted questions for each gap. Prioritizes by entity importance (claim count, centrality) and gap severity (single-source vs. low-confidence).
Queries configured sources — PubMed, Semantic Scholar, Perplexity, Serper, or custom search functions. Each source returns documents with full provenance.
LLM extracts structured claims from evidence documents. Optional curator validation filters low-quality extractions before ingestion.
Accepted claims are ingested with full provenance chain. Negative results are recorded so the same dead ends aren’t explored again.
from attestdb import AttestDB db = AttestDB("my.attest") # Enable with defaults: 1-hour cycle, $1/day cap, 100 LLM calls/day status = db.enable_autodidact() print(f"Next cycle at: {status.next_cycle_at}") # Check what it will cost estimate = db.autodidact_cost_estimate(cycles=24) print(f"Estimated daily cost: ${estimate['daily']:.2f}") # Check current status status = db.autodidact_status() print(f"Cycles completed: {status.cycle_count}") print(f"Claims ingested: {status.total_claims_ingested}") print(f"Cost today: ${status.estimated_cost_today:.2f}") # Review history for report in db.autodidact_history(limit=5): print(f"Cycle {report.cycle_number}: {report.claims_ingested} claims, " f"{report.negative_results} negatives, ${report.estimated_cost:.3f}") # Trigger an immediate cycle (doesn't wait for timer) db.autodidact_run_now() # Disable when done db.disable_autodidact()
enable_autodidact() accepts fine-grained controls for budget, scope, and behavior:
| Parameter | Default | Description |
|---|---|---|
interval | 3600 | Seconds between cycles |
max_llm_calls_per_day | 100 | Hard cap on LLM API calls per day |
max_questions_per_cycle | 5 | Research questions generated per cycle |
max_cost_per_day | 1.00 | USD cost cap per day. Daemon pauses when reached. |
sources | "auto" | Evidence sources: "auto", "pubmed", "semantic_scholar", "perplexity", "serper" |
search_fn | None | Custom search function (query) → list[dict] |
connectors | None | Connector names to use as evidence sources |
gap_types | None | Filter gap types: ["single_source", "low_confidence", "no_evidence"] |
entity_types | None | Restrict to entity types: ["gene", "disease", "drug"] |
use_curator | True | Run curator validation on extracted claims |
jitter | 0.1 | Random jitter fraction on cycle interval |
negative_result_limit | 3 | Skip entity after N consecutive negative results |
enabled_triggers | None | Event triggers: ["timer", "retraction", "inquiry"] |
trigger_cooldown | 60.0 | Seconds between event-triggered cycles |
Autodidact can search multiple evidence providers. Free sources require no API key.
NCBI’s biomedical literature database. 36M+ abstracts. No API key required (uses E-utilities). Best for biomedical and life science domains.
AI2’s academic search engine. 200M+ papers across all domains. No API key required. Good for broad academic research.
AI-powered web search with citations. Returns synthesized answers with source URLs.
Requires PERPLEXITY_API_KEY.
Google search API. Fast web-scale search for general knowledge.
Requires SERPER_API_KEY.
Pass a search_fn callable that takes a query string and returns a list of
result dicts with title, text, and url fields.
# Use free sources only (no API key needed) db.enable_autodidact(sources="pubmed") # Use multiple sources db.enable_autodidact(sources="auto") # auto-detects available API keys # Custom search function def my_search(query): return [{"title": "...", "text": "...", "url": "..."}] db.enable_autodidact(search_fn=my_search)
Autodidact will never exceed your configured cost cap. When the daily budget is exhausted, the daemon pauses until the next day. Every cycle reports its cost.
# Conservative: $0.50/day, 50 LLM calls, 30-minute cycles db.enable_autodidact( interval=1800, max_cost_per_day=0.50, max_llm_calls_per_day=50, max_questions_per_cycle=3, ) # Check cost estimate before enabling estimate = db.autodidact_cost_estimate(cycles=24) print(f"Per cycle: ${estimate['per_cycle']:.3f}") print(f"Daily (24 cycles): ${estimate['daily']:.2f}") print(f"Monthly: ${estimate['monthly']:.2f}")
Beyond the timer-based cycle, autodidact can react to events in the knowledge graph:
| Trigger | When it fires | What it does |
|---|---|---|
timer |
Every interval seconds |
Standard cycle: gap detection → research → ingest |
retraction |
After db.retract() |
Searches for replacement evidence for retracted claims |
inquiry |
After db.register_inquiry() |
Immediately researches the registered question |
# Enable all triggers db.enable_autodidact( enabled_triggers=["timer", "retraction", "inquiry"], trigger_cooldown=60.0, # min seconds between event-triggered cycles ) # Register a question — triggers immediate research if inquiry trigger is enabled db.register_inquiry("What are the off-target effects of CRISPR-Cas9 in hepatocytes?")
Every cycle produces a CycleReport with detailed metrics:
| Field | Type | Description |
|---|---|---|
cycle_number | int | Sequential cycle counter |
started_at | float | Unix timestamp when cycle began |
finished_at | float | Unix timestamp when cycle ended |
tasks_generated | int | Research questions created |
tasks_researched | int | Questions that received evidence |
claims_ingested | int | New claims added to the knowledge graph |
claims_rejected | int | Claims filtered by curator |
negative_results | int | Searches that found no relevant evidence |
llm_calls | int | Total LLM API calls in this cycle |
estimated_cost | float | Estimated USD cost of this cycle |
blindspot_before | int | Blindspot count at cycle start |
blindspot_after | int | Blindspot count at cycle end |
trigger | str | "timer", "retraction", or "inquiry" |
errors | list | Any errors encountered during the cycle |
The enterprise API exposes autodidact via 6 endpoints. All require an API key with admin role.
| Endpoint | Method | Description |
|---|---|---|
/api/v1/autodidact/enable | POST | Enable autodidact with configuration |
/api/v1/autodidact/disable | POST | Disable autodidact |
/api/v1/autodidact/status | GET | Current status, cycle count, cost |
/api/v1/autodidact/run-now | POST | Trigger an immediate cycle |
/api/v1/autodidact/history | GET | Past cycle reports |
/api/v1/autodidact/cost-estimate | GET | Project costs for N cycles |
# Enable via REST API curl -X POST https://api.attestdb.com/api/v1/autodidact/enable \ -H 'Authorization: Bearer YOUR_API_KEY' \ -H 'Content-Type: application/json' \ -d '{"interval": 3600, "max_cost_per_day": 1.0}' # Check status curl https://api.attestdb.com/api/v1/autodidact/status \ -H 'Authorization: Bearer YOUR_API_KEY' # Get cost estimate for 24 cycles curl 'https://api.attestdb.com/api/v1/autodidact/cost-estimate?cycles=24' \ -H 'Authorization: Bearer YOUR_API_KEY'
Autodidact is also accessible via 5 MCP tools for AI agent integration:
| Tool | Description |
|---|---|
autodidact_enable | Enable with configuration parameters |
autodidact_disable | Stop the background daemon |
autodidact_status | Current status and metrics |
autodidact_run_now | Trigger an immediate cycle |
autodidact_history | Past cycle reports with metrics |