Autodidact - Attest

How It Works

Autodidact runs a continuous loop: detect gaps in your knowledge graph, generate research questions, search evidence sources, extract structured claims via LLM, and ingest validated findings. Each cycle closes blindspots and raises confidence on under-evidenced entities.

1

Gap Detection

Scans for single-source entities, low-confidence claims, unresolved questions, and entities with no incoming evidence. Uses blindspots() and inquiry tracking.

2

Research Questions

Generates targeted questions for each gap. Prioritizes by entity importance (claim count, centrality) and gap severity (single-source vs. low-confidence).

3

Evidence Search

Queries configured sources - PubMed, Semantic Scholar, Perplexity, Serper, or custom search functions. Each source returns documents with full provenance.

4

Claim Extraction

LLM extracts structured claims from evidence documents. Optional curator validation filters low-quality extractions before ingestion.

5

Validated Ingestion

Accepted claims are ingested with full provenance chain. Negative results are recorded so the same dead ends aren’t explored again.

Quick Start

from attestdb import AttestDB

db = AttestDB("my.attest")

# Enable with defaults: 1-hour cycle, $1/day cap, 100 LLM calls/day
status = db.enable_autodidact()
print(f"Next cycle at: {status.next_cycle_at}")

# Check what it will cost
estimate = db.autodidact_cost_estimate(cycles=24)
print(f"Estimated daily cost: ${estimate['daily']:.2f}")

# Check current status
status = db.autodidact_status()
print(f"Cycles completed: {status.cycle_count}")
print(f"Claims ingested: {status.total_claims_ingested}")
print(f"Cost today: ${status.estimated_cost_today:.2f}")

# Review history
for report in db.autodidact_history(limit=5):
    print(f"Cycle {report.cycle_number}: {report.claims_ingested} claims, "
          f"{report.negative_results} negatives, ${report.estimated_cost:.3f}")

# Trigger an immediate cycle (doesn't wait for timer)
db.autodidact_run_now()

# Disable when done
db.disable_autodidact()

Configuration

enable_autodidact() accepts fine-grained controls for budget, scope, and behavior:

Parameter	Default	Description
`interval`	`3600`	Seconds between cycles
`max_llm_calls_per_day`	`100`	Hard cap on LLM API calls per day
`max_questions_per_cycle`	`5`	Research questions generated per cycle
`max_cost_per_day`	`1.00`	USD cost cap per day. Daemon pauses when reached.
`sources`	`"auto"`	Evidence sources: `"auto"`, `"pubmed"`, `"semantic_scholar"`, `"perplexity"`, `"serper"`
`search_fn`	`None`	Custom search function `(query) → list[dict]`
`connectors`	`None`	Connector names to use as evidence sources
`gap_types`	`None`	Filter gap types: `["single_source", "low_confidence", "no_evidence"]`
`entity_types`	`None`	Restrict to entity types: `["gene", "disease", "drug"]`
`use_curator`	`True`	Run curator validation on extracted claims
`jitter`	`0.1`	Random jitter fraction on cycle interval
`negative_result_limit`	`3`	Skip entity after N consecutive negative results
`enabled_triggers`	`None`	Event triggers: `["timer", "retraction", "inquiry"]`
`trigger_cooldown`	`60.0`	Seconds between event-triggered cycles

Evidence Sources

Autodidact can search multiple evidence providers. Free sources require no API key.

PubMed

Free

NCBI’s biomedical literature database. 36M+ abstracts. No API key required (uses E-utilities). Best for biomedical and life science domains.

Semantic Scholar

Free

AI2’s academic search engine. 200M+ papers across all domains. No API key required. Good for broad academic research.

Perplexity

API key

AI-powered web search with citations. Returns synthesized answers with source URLs. Requires PERPLEXITY_API_KEY.

Serper

API key

Google search API. Fast web-scale search for general knowledge. Requires SERPER_API_KEY.

Custom

Any

Pass a search_fn callable that takes a query string and returns a list of result dicts with title, text, and url fields.

# Use free sources only (no API key needed)
db.enable_autodidact(sources="pubmed")

# Use multiple sources
db.enable_autodidact(sources="auto")  # auto-detects available API keys

# Custom search function
def my_search(query):
    return [{"title": "...", "text": "...", "url": "..."}]

db.enable_autodidact(search_fn=my_search)

Budget Controls

Autodidact will never exceed your configured cost cap. When the daily budget is exhausted, the daemon pauses until the next day. Every cycle reports its cost.

# Conservative: $0.50/day, 50 LLM calls, 30-minute cycles
db.enable_autodidact(
    interval=1800,
    max_cost_per_day=0.50,
    max_llm_calls_per_day=50,
    max_questions_per_cycle=3,
)

# Check cost estimate before enabling
estimate = db.autodidact_cost_estimate(cycles=24)
print(f"Per cycle: ${estimate['per_cycle']:.3f}")
print(f"Daily (24 cycles): ${estimate['daily']:.2f}")
print(f"Monthly: ${estimate['monthly']:.2f}")

Event Triggers

Beyond the timer-based cycle, autodidact can react to events in the knowledge graph:

Trigger	When it fires	What it does
`timer`	Every `interval` seconds	Standard cycle: gap detection → research → ingest
`retraction`	After `db.retract()`	Searches for replacement evidence for retracted claims
`inquiry`	After `db.register_inquiry()`	Immediately researches the registered question

# Enable all triggers
db.enable_autodidact(
    enabled_triggers=["timer", "retraction", "inquiry"],
    trigger_cooldown=60.0,  # min seconds between event-triggered cycles
)

# Register a question - triggers immediate research if inquiry trigger is enabled
db.register_inquiry("What are the off-target effects of CRISPR-Cas9 in hepatocytes?")

Cycle Reports

Every cycle produces a CycleReport with detailed metrics:

Field	Type	Description
`cycle_number`	`int`	Sequential cycle counter
`started_at`	`float`	Unix timestamp when cycle began
`finished_at`	`float`	Unix timestamp when cycle ended
`tasks_generated`	`int`	Research questions created
`tasks_researched`	`int`	Questions that received evidence
`claims_ingested`	`int`	New claims added to the knowledge graph
`claims_rejected`	`int`	Claims filtered by curator
`negative_results`	`int`	Searches that found no relevant evidence
`llm_calls`	`int`	Total LLM API calls in this cycle
`estimated_cost`	`float`	Estimated USD cost of this cycle
`blindspot_before`	`int`	Blindspot count at cycle start
`blindspot_after`	`int`	Blindspot count at cycle end
`trigger`	`str`	`"timer"`, `"retraction"`, or `"inquiry"`
`errors`	`list`	Any errors encountered during the cycle

REST API

The enterprise API exposes autodidact via 6 endpoints. All require an API key with admin role.

Endpoint	Method	Description
`/api/v1/autodidact/enable`	POST	Enable autodidact with configuration
`/api/v1/autodidact/disable`	POST	Disable autodidact
`/api/v1/autodidact/status`	GET	Current status, cycle count, cost
`/api/v1/autodidact/run-now`	POST	Trigger an immediate cycle
`/api/v1/autodidact/history`	GET	Past cycle reports
`/api/v1/autodidact/cost-estimate`	GET	Project costs for N cycles

# Enable via REST API
curl -X POST https://api.attestdb.com/api/v1/autodidact/enable \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{"interval": 3600, "max_cost_per_day": 1.0}'

# Check status
curl https://api.attestdb.com/api/v1/autodidact/status \
  -H 'Authorization: Bearer YOUR_API_KEY'

# Get cost estimate for 24 cycles
curl 'https://api.attestdb.com/api/v1/autodidact/cost-estimate?cycles=24' \
  -H 'Authorization: Bearer YOUR_API_KEY'

MCP Tools

Autodidact is also accessible via 5 MCP tools for AI agent integration:

Tool	Description
`autodidact_enable`	Enable with configuration parameters
`autodidact_disable`	Stop the background daemon
`autodidact_status`	Current status and metrics
`autodidact_run_now`	Trigger an immediate cycle
`autodidact_history`	Past cycle reports with metrics

AutodidactAutonomous Self-Learning