Autodidact
Autonomous Self-Learning

A background daemon that detects knowledge gaps, searches evidence sources, extracts claims via LLM, and ingests validated findings — all with built-in cost controls so it never runs away.

How It Works

Autodidact runs a continuous loop: detect gaps in your knowledge graph, generate research questions, search evidence sources, extract structured claims via LLM, and ingest validated findings. Each cycle closes blindspots and raises confidence on under-evidenced entities.

1

Gap Detection

Scans for single-source entities, low-confidence claims, unresolved questions, and entities with no incoming evidence. Uses blindspots() and inquiry tracking.

2

Research Questions

Generates targeted questions for each gap. Prioritizes by entity importance (claim count, centrality) and gap severity (single-source vs. low-confidence).

3

Evidence Search

Queries configured sources — PubMed, Semantic Scholar, Perplexity, Serper, or custom search functions. Each source returns documents with full provenance.

4

Claim Extraction

LLM extracts structured claims from evidence documents. Optional curator validation filters low-quality extractions before ingestion.

5

Validated Ingestion

Accepted claims are ingested with full provenance chain. Negative results are recorded so the same dead ends aren’t explored again.

Quick Start

from attestdb import AttestDB

db = AttestDB("my.attest")

# Enable with defaults: 1-hour cycle, $1/day cap, 100 LLM calls/day
status = db.enable_autodidact()
print(f"Next cycle at: {status.next_cycle_at}")

# Check what it will cost
estimate = db.autodidact_cost_estimate(cycles=24)
print(f"Estimated daily cost: ${estimate['daily']:.2f}")

# Check current status
status = db.autodidact_status()
print(f"Cycles completed: {status.cycle_count}")
print(f"Claims ingested: {status.total_claims_ingested}")
print(f"Cost today: ${status.estimated_cost_today:.2f}")

# Review history
for report in db.autodidact_history(limit=5):
    print(f"Cycle {report.cycle_number}: {report.claims_ingested} claims, "
          f"{report.negative_results} negatives, ${report.estimated_cost:.3f}")

# Trigger an immediate cycle (doesn't wait for timer)
db.autodidact_run_now()

# Disable when done
db.disable_autodidact()

Configuration

enable_autodidact() accepts fine-grained controls for budget, scope, and behavior:

ParameterDefaultDescription
interval3600Seconds between cycles
max_llm_calls_per_day100Hard cap on LLM API calls per day
max_questions_per_cycle5Research questions generated per cycle
max_cost_per_day1.00USD cost cap per day. Daemon pauses when reached.
sources"auto"Evidence sources: "auto", "pubmed", "semantic_scholar", "perplexity", "serper"
search_fnNoneCustom search function (query) → list[dict]
connectorsNoneConnector names to use as evidence sources
gap_typesNoneFilter gap types: ["single_source", "low_confidence", "no_evidence"]
entity_typesNoneRestrict to entity types: ["gene", "disease", "drug"]
use_curatorTrueRun curator validation on extracted claims
jitter0.1Random jitter fraction on cycle interval
negative_result_limit3Skip entity after N consecutive negative results
enabled_triggersNoneEvent triggers: ["timer", "retraction", "inquiry"]
trigger_cooldown60.0Seconds between event-triggered cycles

Evidence Sources

Autodidact can search multiple evidence providers. Free sources require no API key.

PubMed
Free

NCBI’s biomedical literature database. 36M+ abstracts. No API key required (uses E-utilities). Best for biomedical and life science domains.

Semantic Scholar
Free

AI2’s academic search engine. 200M+ papers across all domains. No API key required. Good for broad academic research.

Perplexity
API key

AI-powered web search with citations. Returns synthesized answers with source URLs. Requires PERPLEXITY_API_KEY.

Serper
API key

Google search API. Fast web-scale search for general knowledge. Requires SERPER_API_KEY.

Custom
Any

Pass a search_fn callable that takes a query string and returns a list of result dicts with title, text, and url fields.

# Use free sources only (no API key needed)
db.enable_autodidact(sources="pubmed")

# Use multiple sources
db.enable_autodidact(sources="auto")  # auto-detects available API keys

# Custom search function
def my_search(query):
    return [{"title": "...", "text": "...", "url": "..."}]

db.enable_autodidact(search_fn=my_search)

Budget Controls

Autodidact will never exceed your configured cost cap. When the daily budget is exhausted, the daemon pauses until the next day. Every cycle reports its cost.

# Conservative: $0.50/day, 50 LLM calls, 30-minute cycles
db.enable_autodidact(
    interval=1800,
    max_cost_per_day=0.50,
    max_llm_calls_per_day=50,
    max_questions_per_cycle=3,
)

# Check cost estimate before enabling
estimate = db.autodidact_cost_estimate(cycles=24)
print(f"Per cycle: ${estimate['per_cycle']:.3f}")
print(f"Daily (24 cycles): ${estimate['daily']:.2f}")
print(f"Monthly: ${estimate['monthly']:.2f}")

Event Triggers

Beyond the timer-based cycle, autodidact can react to events in the knowledge graph:

TriggerWhen it firesWhat it does
timer Every interval seconds Standard cycle: gap detection → research → ingest
retraction After db.retract() Searches for replacement evidence for retracted claims
inquiry After db.register_inquiry() Immediately researches the registered question
# Enable all triggers
db.enable_autodidact(
    enabled_triggers=["timer", "retraction", "inquiry"],
    trigger_cooldown=60.0,  # min seconds between event-triggered cycles
)

# Register a question — triggers immediate research if inquiry trigger is enabled
db.register_inquiry("What are the off-target effects of CRISPR-Cas9 in hepatocytes?")

Cycle Reports

Every cycle produces a CycleReport with detailed metrics:

FieldTypeDescription
cycle_numberintSequential cycle counter
started_atfloatUnix timestamp when cycle began
finished_atfloatUnix timestamp when cycle ended
tasks_generatedintResearch questions created
tasks_researchedintQuestions that received evidence
claims_ingestedintNew claims added to the knowledge graph
claims_rejectedintClaims filtered by curator
negative_resultsintSearches that found no relevant evidence
llm_callsintTotal LLM API calls in this cycle
estimated_costfloatEstimated USD cost of this cycle
blindspot_beforeintBlindspot count at cycle start
blindspot_afterintBlindspot count at cycle end
triggerstr"timer", "retraction", or "inquiry"
errorslistAny errors encountered during the cycle

REST API

The enterprise API exposes autodidact via 6 endpoints. All require an API key with admin role.

EndpointMethodDescription
/api/v1/autodidact/enablePOSTEnable autodidact with configuration
/api/v1/autodidact/disablePOSTDisable autodidact
/api/v1/autodidact/statusGETCurrent status, cycle count, cost
/api/v1/autodidact/run-nowPOSTTrigger an immediate cycle
/api/v1/autodidact/historyGETPast cycle reports
/api/v1/autodidact/cost-estimateGETProject costs for N cycles
# Enable via REST API
curl -X POST https://api.attestdb.com/api/v1/autodidact/enable \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{"interval": 3600, "max_cost_per_day": 1.0}'

# Check status
curl https://api.attestdb.com/api/v1/autodidact/status \
  -H 'Authorization: Bearer YOUR_API_KEY'

# Get cost estimate for 24 cycles
curl 'https://api.attestdb.com/api/v1/autodidact/cost-estimate?cycles=24' \
  -H 'Authorization: Bearer YOUR_API_KEY'

MCP Tools

Autodidact is also accessible via 5 MCP tools for AI agent integration:

ToolDescription
autodidact_enableEnable with configuration parameters
autodidact_disableStop the background daemon
autodidact_statusCurrent status and metrics
autodidact_run_nowTrigger an immediate cycle
autodidact_historyPast cycle reports with metrics