Fifteen verticals, same engine

Each cookbook is a complete, runnable script. No API keys required. From sales pipelines to clinical trials, every domain runs on the same claim-native engine.

Org

Company Knowledge

Connect Slack + Gmail + Google Docs via OAuth. Ingest your company's knowledge, explore it in a graph, and ask questions - all from a browser.

Biology

Drug Repurposing

Ingest literature + ChatGPT + Slack. Bridge prediction finds candidates - 4/10 validated in published research.

DevOps

Incident Knowledge

Model architecture from K8s manifests, capture incident patterns from Slack. "What breaks if Redis goes down?" - answered with sources.

Experiment Lineage

Track models, datasets, and features with provenance. "What was tried for churn prediction?" - every answer traces to an experiment run.

Research

Autonomous Research

Detect blindspots, generate research questions, call LLMs or external search, ingest answers - knowledge heals itself.

Sales

Sales & Marketing

Prospect pipeline, competitive intel with contradictions, A/B content testing, referral chains, cross-product attribution.

Product

Product & Engineering

Service architecture, ADR contradictions, bug regressions, tech debt tracking, deploy history with rollback chains.

Incident

Incident Response

Cascading outage with competing root cause theories, SLO violations, runbook effectiveness, oncall escalation.

Finance

Finance & Risk

Contradicting VaR models, counterparty ratings from Moody’s/S&P/Fitch, limit breaches, stress test scenarios.

Legal

Legal & Compliance

Contract networks, contradicting audit findings, regulation gap detection, obligation deadlines, jurisdictional complexity.

Supply Chain

Demand forecast contradictions, expired certifications, defect recall cascades, multi-tier supplier risk.

Healthcare

Clinical Decision Support

Contradicting treatment guidelines (ADA vs NICE), drug interactions, Phase 2 vs Phase 3 trial contradictions.

Security

Threat Intelligence

APT attribution conflicts, IOC corroboration across feeds, kill chain timelines, vulnerability lifecycle tracking.

HR & Talent

Skill calibration gaps, interviewer disagreements, attrition risk signals, cross-team SPOF detection.

Academia

Research & Academia

Replication crisis modeling, citation networks, hypothesis evolution, dataset provenance with error cascades.

Consensus

Multi-LLM Consensus

Use your existing ChatGPT, Claude, and Gemini subscriptions. Cross-pollinate answers, converge on the best response. No API keys needed.

Org

Organization Knowledge Console

Your company's most valuable knowledge lives in Slack threads, email chains, and Google Docs that nobody can find six months later. This cookbook shows how to connect those sources via OAuth, ingest everything into a Attest knowledge base, and explore it from a browser dashboard.

What it covers

Install Attest and launch the Streamlit explorer
Connect Slack workspace via bot token (read-only)
Connect Gmail & Google Docs via OAuth token
Smart extraction: heuristic pre-scan, LLM only for novel content (~$0.01–0.05 for a full workspace on Gemini)
Auto-detect domain context from Slack channel names
Interactive graph visualization with pyvis (people, teams, projects, decisions, tools)
Natural-language Ask: "Who is responsible for the auth migration?" - answered with claim evidence
Quality dashboard: health score, knowledge gaps, bridge predictions, confidence alerts
Source retraction: disconnect or retract a source, corroborated facts survive

Quick start

$ pip install attestdb requests streamlit pyvis
$ python examples/quickstart.py

Add SLACK_BOT_TOKEN and GOOGLE_API_KEY to .env, then run. The quickstart connects to Slack, extracts claims, and opens localhost:8501 with the Streamlit explorer:

Ask → Type a question, get matched entities with evidence
Explore → Browse entities, filter by type, see interactive graph

How it works under the hood

import attestdb

db = attestdb.quickstart("my_company.db", vocabs=["devops"])

# Connect live Slack
conn = db.connect("slack", token="xoxb-...")
result = conn.run(db)
print(f"Ingested {result.claims_ingested} claims")

# Or ingest text directly (from an email, a doc, anything)
db.ingest_text(
    "The auth team decided to migrate from JWT to session tokens. "
    "Sarah proposed this and the platform team approved it.",
    source_id="eng-all-hands-2024-02",
)

# Query what you know
frame = db.query("auth-service", depth=2)
print(frame.narrative)

# Find who knows what
for rel in frame.direct_relationships:
    print(f"  {rel.predicate} → {rel.target.name} ({rel.n_independent_sources} sources)")

Org vocabulary

Attest includes built-in vocabularies (bio, devops, ml) and discovers new entity types and predicates automatically. The DevOps vocabulary includes types and predicates designed for company knowledge:

Entity Types	Predicate Types
`person`, `team`, `channel`, `document`, `email_thread`	`authored`, `mentioned`, `decided`, `discussed_in`
`project`, `decision`, `tool`, `process`, `meeting`	`responsible_for`, `member_of`, `reports_to`, `depends_on`
	`related_to`, `proposed`, `approved`, `uses`

Cost model

Smart extraction on Groq (free tier, ~$0.05/M tokens):

Source	Items	Smart Skip	Est. Cost
10K Slack messages	10,000	60%	$0.02
2K email threads	2,000	50%	$0.025
50 Google Docs	50	30%	$0.004
100 NL queries	100	-	$0.01
Total			$0.06

Heuristic mode (no LLM) is completely free. Good for structured text like meeting notes and project updates.

Dashboard pages

Dashboard

Stats overview: entity/claim counts, source breakdown, health score, quick actions.

Entity Explorer

Search with HTMX typeahead, filter by type and min claims, click through to detail pages.

Graph

Full-screen Cytoscape.js canvas. Node size = claims, edge thickness = confidence. Focus on any entity.

Quality

Knowledge health: multi-source %, confidence alerts, bridge predictions, knowledge gaps.

Ingestion

Pick a source, choose extraction mode, watch SSE progress. Job history with costs.

Ask

Natural-language questions with grounded answers. Evidence panel shows exact claims used.

Biology

Biomedical Research Team

A cancer biology lab uses ChatGPT and Slack daily. This cookbook captures claims from published literature, ChatGPT research sessions, and Slack channel discussions into a unified knowledge base. Then it discovers novel drug targets, finds knowledge gaps, and tracks research questions.

What it covers

Batch ingest structured claims from PubMed, PDB, Reactome, and clinical trials
Corroboration - same fact confirmed by independent sources
Extract claims from a ChatGPT research session on triple-negative breast cancer
Import Slack channels (brca-project, journal-club) from a workspace export
Unified query: BRCA1 aggregates literature + chat + Slack sources
Bridge prediction for drug repurposing candidates
Research question tracking with inquiry matching
Source retraction with provenance cascade

Key code

# One-line setup
db = attestdb.quickstart("cancer_lab.db", vocabs=["bio"], curator="heuristic")

# Ingest from multiple sources
db.ingest_batch(literature_claims)                           # PubMed, PDB, Reactome
db.ingest_chat(chatgpt_session, extraction="heuristic")      # ChatGPT conversation
db.ingest_slack("lab_slack.zip", extraction="heuristic")     # Slack workspace

# Query unified knowledge
frame = db.query("BRCA1", depth=2)
print(frame.narrative)

# Discover hidden connections
db.generate_structural_embeddings(dim=32)
bridges = db.find_bridges(top_k=10)

# Track research questions
db.ingest_inquiry(question="Can Olaparib treat breast cancer?",
    subject=("Olaparib", "compound"), object=("Breast Cancer", "disease"))

Sample output

  Knowledge base: 21 entities, 29 claims
  BRCA1: 24 claims, 23 relationships
    --[associated_with]--> Breast Cancer (disease, conf=0.91)
    --[binds]--> RAD51 (protein, conf=0.85)
    --[involved_in]--> DNA Repair Pathway (pathway, conf=0.70)
  Bridge predictions:
    olaparib <-> talazoparib (similarity=0.816)
  Total runtime: 1.98s

Copy full runnable script

#!/usr/bin/env python3
"""Biomedical Research Cookbook - No API keys required. Runs in ~2 seconds."""

from __future__ import annotations
import json, logging, os, sys, tempfile, time, zipfile
logging.disable(logging.WARNING)
import attestdb
from attestdb import ClaimInput

def section(title):
    print(f"\n{'─'*60}\n  {title}\n{'─'*60}\n")

def main():
    start = time.perf_counter()
    with tempfile.TemporaryDirectory() as tmp:
        db = attestdb.quickstart(os.path.join(tmp, "cancer_lab"), vocabs=["bio"], curator="heuristic")
        section("Ingest published literature")
        literature_claims = [
            ClaimInput(subject=("BRCA1","gene"), predicate=("associated_with","associated_with"),
                       object=("Breast Cancer","disease"), provenance={"source_type":"database_import","source_id":"pmid:20301425"}, confidence=0.95),
            ClaimInput(subject=("BRCA2","gene"), predicate=("associated_with","associated_with"),
                       object=("Breast Cancer","disease"), provenance={"source_type":"database_import","source_id":"pmid:20301425"}, confidence=0.93),
            ClaimInput(subject=("BRCA1","gene"), predicate=("associated_with","associated_with"),
                       object=("Ovarian Cancer","disease"), provenance={"source_type":"database_import","source_id":"pmid:24677121"}, confidence=0.91),
            ClaimInput(subject=("BRCA1","protein"), predicate=("binds","binds"),
                       object=("RAD51","protein"), provenance={"source_type":"experimental","source_id":"pdb:1n0w"}, confidence=0.92),
            ClaimInput(subject=("TP53","protein"), predicate=("interacts","interacts"),
                       object=("BRCA1","protein"), provenance={"source_type":"experimental","source_id":"pmid:19837678"}, confidence=0.88),
            ClaimInput(subject=("BRCA1","gene"), predicate=("involved_in","involved_in"),
                       object=("DNA Repair Pathway","pathway"), provenance={"source_type":"database_import","source_id":"reactome:R-HSA-73894"}, confidence=0.94),
            ClaimInput(subject=("RAD51","protein"), predicate=("involved_in","involved_in"),
                       object=("Homologous Recombination","pathway"), provenance={"source_type":"database_import","source_id":"reactome:R-HSA-5693532"}, confidence=0.96),
            ClaimInput(subject=("Tamoxifen","compound"), predicate=("treats","treats"),
                       object=("Breast Cancer","disease"), provenance={"source_type":"clinical_trial","source_id":"nct:00003140"}, confidence=0.97),
            ClaimInput(subject=("Olaparib","compound"), predicate=("treats","treats"),
                       object=("Ovarian Cancer","disease"), provenance={"source_type":"clinical_trial","source_id":"nct:01874353"}, confidence=0.89),
            ClaimInput(subject=("Olaparib","compound"), predicate=("inhibits","inhibits"),
                       object=("PARP1","protein"), provenance={"source_type":"experimental","source_id":"pmid:16912195"}, confidence=0.95),
            ClaimInput(subject=("PARP1","protein"), predicate=("involved_in","involved_in"),
                       object=("DNA Repair Pathway","pathway"), provenance={"source_type":"database_import","source_id":"reactome:R-HSA-73894"}, confidence=0.93),
            ClaimInput(subject=("Tamoxifen","compound"), predicate=("inhibits","inhibits"),
                       object=("Estrogen Receptor","protein"), provenance={"source_type":"experimental","source_id":"pmid:15928335"}, confidence=0.93),
            ClaimInput(subject=("Estrogen Receptor","protein"), predicate=("associated_with","associated_with"),
                       object=("Breast Cancer","disease"), provenance={"source_type":"experimental","source_id":"pmid:18202748"}, confidence=0.94),
        ]
        batch = db.ingest_batch(literature_claims)
        print(f"  Ingested {batch.ingested} claims from literature")
        # Corroboration
        db.ingest(subject=("BRCA1","gene"), predicate=("associated_with","associated_with"),
                  object=("Breast Cancer","disease"), provenance={"source_type":"database_import","source_id":"disgenet:C0006142"}, confidence=0.90)
        print(f"  Corroboration: {len(db.claims_by_content_id(db.claims_for('brca1')[0].content_id))} sources confirm BRCA1 ~ Breast Cancer")

        section("Extract from ChatGPT")
        result = db.ingest_chat([
            {"role":"user","content":"What genes are mutated in triple-negative breast cancer?"},
            {"role":"assistant","content":"BRCA1 is associated with Triple Negative Breast Cancer. TP53 is associated with Triple Negative Breast Cancer. PIK3CA is associated with Triple Negative Breast Cancer. PTEN is associated with Triple Negative Breast Cancer."},
            {"role":"user","content":"Treatment options for BRCA-mutated TNBC?"},
            {"role":"assistant","content":"Olaparib treats Triple Negative Breast Cancer. Talazoparib treats Triple Negative Breast Cancer. Talazoparib inhibits PARP1. Carboplatin treats Triple Negative Breast Cancer. Pembrolizumab treats Triple Negative Breast Cancer."},
        ], conversation_id="tnbc-research", extraction="heuristic")
        print(f"  {result.claims_ingested} claims from ChatGPT session")

        section("Import Slack channels")
        slack_zip = os.path.join(tmp, "lab_slack.zip")
        with zipfile.ZipFile(slack_zip, "w") as zf:
            zf.writestr("channels.json", json.dumps([{"id":"C1","name":"brca-project"},{"id":"C2","name":"journal-club"}]))
            zf.writestr("users.json", json.dumps([{"id":"U1","name":"sarah","profile":{"real_name":"Sarah"}}]))
            zf.writestr("brca-project/2024-01-20.json", json.dumps([
                {"type":"message","user":"U1","text":"PALB2 connection to BRCA2?","ts":"1705700000.000"},
                {"type":"message","bot_id":"B1","text":"PALB2 binds BRCA2. PALB2 is associated with Breast Cancer. PALB2 is involved in Homologous Recombination.","ts":"1705700060.000"},
            ]))
            zf.writestr("journal-club/2024-01-22.json", json.dumps([
                {"type":"message","user":"U1","text":"ATM inhibitor paper","ts":"1705900000.000"},
                {"type":"message","bot_id":"B1","text":"ATM is involved in DNA Repair Pathway. AZD0156 inhibits ATM. ATM interacts with BRCA1.","ts":"1705900060.000"},
            ]))
        slack_results = db.ingest_slack(slack_zip, extraction="heuristic")
        print(f"  {sum(r.claims_ingested for r in slack_results)} claims from Slack")

        section("Query unified knowledge")
        stats = db.stats()
        print(f"  {stats['entity_count']} entities, {stats['total_claims']} claims")
        frame = db.query("BRCA1", depth=2)
        print(f"  BRCA1: {frame.claim_count} claims, {len(frame.direct_relationships)} relationships")
        for rel in frame.direct_relationships[:8]:
            print(f"    --[{rel.predicate}]--> {rel.target.name} ({rel.target.entity_type}, conf={rel.confidence:.2f})")

        section("Discover hidden connections")
        db.generate_structural_embeddings(dim=32)
        for b in db.find_bridges(top_k=5):
            print(f"    {b.entity_a} <-> {b.entity_b} (similarity={b.similarity:.3f})")
        gaps = db.find_gaps({"gene":{"associated_with","involved_in"},"compound":{"treats","inhibits"}}, min_claims=1)
        print(f"  {len(gaps)} knowledge gaps found")

        section("Research questions")
        db.ingest_inquiry(question="Can Olaparib treat breast cancer?", subject=("Olaparib","compound"), object=("Breast Cancer","disease"), predicate_hint="treats")
        db.ingest(subject=("Olaparib","compound"), predicate=("treats","treats"), object=("Breast Cancer","disease"),
                  provenance={"source_type":"clinical_trial","source_id":"nct:02000622"}, confidence=0.88)
        print(f"  Inquiry matches: {len(db.check_inquiry_matches(subject_id='Olaparib', object_id='Breast Cancer'))}")

        section("Source retraction")
        cascade = db.retract_cascade("pmid:20301425", reason="Data fabrication")
        print(f"  Retracted: {cascade.source_retract.retracted_count} claims")
        print(f"  Downstream degraded: {cascade.degraded_count}")
        active = [c for c in db.claims_for("brca1") if c.status.name == "ACTIVE"]
        print(f"  BRCA1 still has {len(active)} active claims (corroboration preserved)")
        db.close()
    print(f"\n  Total runtime: {time.perf_counter() - start:.2f}s")
    return 0

if __name__ == "__main__":
    sys.exit(main())

DevOps

DevOps Knowledge Base

An infrastructure team models service dependencies from Kubernetes manifests, captures incident response claims from ChatGPT debugging sessions, and pulls tribal knowledge from Slack ops channels. Query "what depends on Redis?" - answered with every source that says so.

What it covers

Model architecture from K8s manifests (dependencies, ownership, monitoring)
Extract from an incident response ChatGPT session (Redis outage blast radius)
Import ops-incidents and architecture Slack channels
Dependency query: "What depends on Redis Cache?"
Impact analysis: "Who gets paged if Postgres goes down?"
Path analysis between monitoring and databases
Post-incident research question tracking

Key code

db = attestdb.quickstart("infra.db", vocabs=["devops"], curator="heuristic")

# Model architecture
db.ingest_batch(architecture_claims)     # K8s manifests, org chart, monitoring
db.ingest_chat(incident_chat, ...)       # Post-incident ChatGPT sessions
db.ingest_slack("ops_slack.zip", ...)     # Slack ops channels

# Impact analysis
redis_frame = db.query("Redis Cache", depth=2)
has_path = db.path_exists("PagerDuty Alert", "PostgreSQL", max_depth=3)

Copy full runnable script

#!/usr/bin/env python3
"""DevOps Knowledge Base Cookbook - No API keys required. Runs in ~2 seconds."""

from __future__ import annotations
import json, logging, os, sys, tempfile, time, zipfile
logging.disable(logging.WARNING)
import attestdb
from attestdb import ClaimInput

def section(title):
    print(f"\n{'─'*60}\n  {title}\n{'─'*60}\n")

def main():
    start = time.perf_counter()
    with tempfile.TemporaryDirectory() as tmp:
        db = attestdb.quickstart(os.path.join(tmp, "infra_kb"), vocabs=["devops"], curator="heuristic")

        section("Model service architecture")
        batch = db.ingest_batch([
            ClaimInput(subject=("API Gateway","service"), predicate=("depends_on","depends_on"), object=("Auth Service","service"),
                       provenance={"source_type":"config_management","source_id":"k8s:api-gateway"}, confidence=1.0),
            ClaimInput(subject=("API Gateway","service"), predicate=("depends_on","depends_on"), object=("User Service","service"),
                       provenance={"source_type":"config_management","source_id":"k8s:api-gateway"}, confidence=1.0),
            ClaimInput(subject=("API Gateway","service"), predicate=("depends_on","depends_on"), object=("Redis Cache","service"),
                       provenance={"source_type":"config_management","source_id":"k8s:api-gateway"}, confidence=1.0),
            ClaimInput(subject=("Auth Service","service"), predicate=("depends_on","depends_on"), object=("PostgreSQL","service"),
                       provenance={"source_type":"config_management","source_id":"k8s:auth-service"}, confidence=1.0),
            ClaimInput(subject=("Auth Service","service"), predicate=("depends_on","depends_on"), object=("Redis Cache","service"),
                       provenance={"source_type":"config_management","source_id":"k8s:auth-service"}, confidence=1.0),
            ClaimInput(subject=("User Service","service"), predicate=("depends_on","depends_on"), object=("PostgreSQL","service"),
                       provenance={"source_type":"config_management","source_id":"k8s:user-service"}, confidence=1.0),
            ClaimInput(subject=("Datadog Agent","service"), predicate=("monitors","monitors"), object=("API Gateway","service"),
                       provenance={"source_type":"monitoring","source_id":"datadog:config"}, confidence=0.95),
            ClaimInput(subject=("Datadog Agent","service"), predicate=("monitors","monitors"), object=("PostgreSQL","service"),
                       provenance={"source_type":"monitoring","source_id":"datadog:config"}, confidence=0.95),
            ClaimInput(subject=("PagerDuty Alert","alert"), predicate=("monitors","monitors"), object=("Redis Cache","service"),
                       provenance={"source_type":"monitoring","source_id":"pagerduty:redis-alert"}, confidence=0.9),
            ClaimInput(subject=("Platform Team","team"), predicate=("owns","owns"), object=("API Gateway","service"),
                       provenance={"source_type":"config_management","source_id":"org-chart"}, confidence=1.0),
            ClaimInput(subject=("Platform Team","team"), predicate=("owns","owns"), object=("Auth Service","service"),
                       provenance={"source_type":"config_management","source_id":"org-chart"}, confidence=1.0),
            ClaimInput(subject=("Data Team","team"), predicate=("owns","owns"), object=("PostgreSQL","service"),
                       provenance={"source_type":"config_management","source_id":"org-chart"}, confidence=1.0),
            ClaimInput(subject=("Platform Team","team"), predicate=("owns","owns"), object=("Redis Cache","service"),
                       provenance={"source_type":"config_management","source_id":"org-chart"}, confidence=1.0),
        ])
        print(f"  {batch.ingested} architecture claims")

        section("Incident response chat")
        result = db.ingest_chat([
            {"role":"user","content":"Redis Cache outage at 3am. API Gateway returning 502s. Blast radius?"},
            {"role":"assistant","content":"API Gateway depends on Redis Cache for session caching. Auth Service depends on Redis Cache for token storage. PagerDuty Alert monitors Redis Cache. The runbook RB-REDIS-001 mitigates Redis Cache outages."},
            {"role":"user","content":"How to prevent this?"},
            {"role":"assistant","content":"Redis Sentinel monitors Redis Cache for failover. Platform Team owns Redis Cache. Datadog Agent monitors API Gateway. Consider a circuit breaker in API Gateway that falls back to PostgreSQL."},
        ], conversation_id="inc-2024-redis-outage", extraction="heuristic")
        print(f"  {result.claims_ingested} claims from incident chat")

        section("Slack ops channels")
        slack_zip = os.path.join(tmp, "ops_slack.zip")
        with zipfile.ZipFile(slack_zip, "w") as zf:
            zf.writestr("channels.json", json.dumps([{"id":"C1","name":"ops-incidents"},{"id":"C2","name":"architecture"}]))
            zf.writestr("users.json", json.dumps([{"id":"U1","name":"eng","profile":{"real_name":"Engineer"}}]))
            zf.writestr("ops-incidents/2024-02-01.json", json.dumps([
                {"type":"message","user":"U1","text":"PostgreSQL connection limits again","ts":"1706800000.000"},
                {"type":"message","bot_id":"B1","text":"Auth Service depends on PostgreSQL. User Service depends on PostgreSQL. Data Team owns PostgreSQL.","ts":"1706800060.000"},
            ]))
            zf.writestr("architecture/2024-02-05.json", json.dumps([
                {"type":"message","user":"U1","text":"Redis to KeyDB migration impact?","ts":"1707100000.000"},
                {"type":"message","bot_id":"B1","text":"Redis Cache is used by API Gateway for rate limiting. Auth Service depends on Redis Cache for JWT blacklisting.","ts":"1707100060.000"},
            ]))
        slack_results = db.ingest_slack(slack_zip, extraction="heuristic")
        print(f"  {sum(r.claims_ingested for r in slack_results)} claims from Slack")

        section("What depends on Redis?")
        redis_frame = db.query("Redis Cache", depth=2)
        print(f"  Redis Cache: {redis_frame.claim_count} claims")
        for rel in redis_frame.direct_relationships:
            print(f"    {rel.predicate} <-- {rel.target.name}")

        section("Postgres blast radius")
        pg_frame = db.query("PostgreSQL", depth=2)
        deps = [r for r in pg_frame.direct_relationships if r.predicate == "depends_on"]
        print(f"  {len(deps)} services depend on PostgreSQL")
        print(f"  PagerDuty -> PostgreSQL path: {db.path_exists('PagerDuty Alert', 'PostgreSQL', max_depth=3)}")

        section("Post-incident questions")
        db.ingest_inquiry(question="Does Redis Sentinel monitor Redis Cache?",
                          subject=("Redis Sentinel","service"), object=("Redis Cache","service"), predicate_hint="monitors")
        print(f"  Open inquiries: {len(db.open_inquiries())}")

        final = db.stats()
        print(f"\n  Final: {final['entity_count']} entities, {final['total_claims']} claims")
        db.close()
    print(f"  Runtime: {time.perf_counter() - start:.2f}s")
    return 0

if __name__ == "__main__":
    sys.exit(main())

ML Experiment Tracker

An ML team tracks experiments, model comparisons, and feature engineering decisions. Register experiment results as claims with provenance, extract insights from model discussion chats, and query "what was tried for churn prediction?" - every answer traces back to its source.

What it covers

Register experiments: models, datasets, features, performance comparisons
Extract from a model comparison ChatGPT session (XGBoost vs LightGBM vs Neural Net)
Import Slack #ml-experiments channel with ensemble results
Query "What was tried for churn prediction?" - shows all models trained on dataset
Feature importance across experiments
Model lineage tracing (XGBoost V2 derived from V1)
Knowledge gap detection (models missing evaluated_on)

Key code

db = attestdb.quickstart("ml.db", vocabs=["ml"], curator="heuristic")

# Track experiments with performance payloads
db.ingest(
    subject=("XGBoost V2", "model"),
    predicate=("trained_on", "trained_on"),
    object=("Churn Dataset Q4", "dataset"),
    provenance={"source_type": "experiment_log", "source_id": "exp:003"},
    payload={"accuracy": 0.87, "auc": 0.90},
)

# Query what was tried
frame = db.query("Churn Dataset Q4", depth=2)

# Model lineage
claims = db.claims_for("xgboost v2", predicate_type="derived_from")

Copy full runnable script

#!/usr/bin/env python3
"""ML Experiment Tracker Cookbook - No API keys required. Runs in ~2 seconds."""

from __future__ import annotations
import json, logging, os, sys, tempfile, time, zipfile
logging.disable(logging.WARNING)
import attestdb
from attestdb import ClaimInput

def section(title):
    print(f"\n{'─'*60}\n  {title}\n{'─'*60}\n")

def main():
    start = time.perf_counter()
    with tempfile.TemporaryDirectory() as tmp:
        db = attestdb.quickstart(os.path.join(tmp, "ml_tracker"), vocabs=["ml"], curator="heuristic")

        section("Register experiments")
        batch = db.ingest_batch([
            ClaimInput(subject=("XGBoost V1","model"), predicate=("trained_on","trained_on"), object=("Churn Dataset Q4","dataset"),
                       provenance={"source_type":"experiment_log","source_id":"exp:churn-001"}, confidence=1.0, payload={"accuracy":0.82,"auc":0.85}),
            ClaimInput(subject=("XGBoost V1","model"), predicate=("uses_feature","uses_feature"), object=("Tenure Months","feature"),
                       provenance={"source_type":"experiment_log","source_id":"exp:churn-001"}, confidence=0.95),
            ClaimInput(subject=("XGBoost V1","model"), predicate=("uses_feature","uses_feature"), object=("Monthly Charges","feature"),
                       provenance={"source_type":"experiment_log","source_id":"exp:churn-001"}, confidence=0.92),
            ClaimInput(subject=("XGBoost V1","model"), predicate=("uses_feature","uses_feature"), object=("Contract Type","feature"),
                       provenance={"source_type":"experiment_log","source_id":"exp:churn-001"}, confidence=0.88),
            ClaimInput(subject=("Random Forest V1","model"), predicate=("trained_on","trained_on"), object=("Churn Dataset Q4","dataset"),
                       provenance={"source_type":"experiment_log","source_id":"exp:churn-002"}, confidence=1.0, payload={"accuracy":0.79,"auc":0.81}),
            ClaimInput(subject=("XGBoost V1","model"), predicate=("outperforms","outperforms"), object=("Random Forest V1","model"),
                       provenance={"source_type":"experiment_log","source_id":"exp:churn-compare-001"}, confidence=0.88),
            ClaimInput(subject=("XGBoost V2","model"), predicate=("trained_on","trained_on"), object=("Churn Dataset Q4","dataset"),
                       provenance={"source_type":"experiment_log","source_id":"exp:churn-003"}, confidence=1.0, payload={"accuracy":0.87,"auc":0.90}),
            ClaimInput(subject=("XGBoost V2","model"), predicate=("derived_from","derived_from"), object=("XGBoost V1","model"),
                       provenance={"source_type":"experiment_log","source_id":"exp:churn-003"}, confidence=1.0),
            ClaimInput(subject=("XGBoost V2","model"), predicate=("uses_feature","uses_feature"), object=("Support Tickets 30D","feature"),
                       provenance={"source_type":"experiment_log","source_id":"exp:churn-003"}, confidence=0.96),
            ClaimInput(subject=("XGBoost V2","model"), predicate=("uses_feature","uses_feature"), object=("NPS Score","feature"),
                       provenance={"source_type":"experiment_log","source_id":"exp:churn-003"}, confidence=0.91),
            ClaimInput(subject=("XGBoost V2","model"), predicate=("outperforms","outperforms"), object=("XGBoost V1","model"),
                       provenance={"source_type":"experiment_log","source_id":"exp:churn-compare-002"}, confidence=0.92),
            ClaimInput(subject=("Neural Net V1","model"), predicate=("trained_on","trained_on"), object=("Fraud Dataset 2024","dataset"),
                       provenance={"source_type":"experiment_log","source_id":"exp:fraud-001"}, confidence=1.0, payload={"accuracy":0.94,"f1":0.72}),
            ClaimInput(subject=("Neural Net V1","model"), predicate=("uses_feature","uses_feature"), object=("Transaction Amount","feature"),
                       provenance={"source_type":"experiment_log","source_id":"exp:fraud-001"}, confidence=0.97),
            ClaimInput(subject=("Neural Net V1","model"), predicate=("uses_feature","uses_feature"), object=("Merchant Category","feature"),
                       provenance={"source_type":"experiment_log","source_id":"exp:fraud-001"}, confidence=0.85),
            ClaimInput(subject=("Fraud Dataset 2024","dataset"), predicate=("derived_from","derived_from"), object=("Fraud Dataset 2023","dataset"),
                       provenance={"source_type":"experiment_log","source_id":"data:fraud-v2"}, confidence=1.0),
        ])
        print(f"  {batch.ingested} experiment claims")

        section("ChatGPT model discussion")
        result = db.ingest_chat([
            {"role":"user","content":"XGBoost V2 is at 0.90 AUC. Should we try LightGBM?"},
            {"role":"assistant","content":"XGBoost V2 outperforms Random Forest V1. LightGBM Churn V1 was trained on Churn Dataset Q4 with 0.88 AUC. XGBoost V2 outperforms LightGBM Churn V1. Key features: Support Tickets 30D and NPS Score."},
            {"role":"user","content":"What about fraud detection?"},
            {"role":"assistant","content":"Neural Net V1 was evaluated on Fraud Dataset 2024 with 0.72 F1. Isolation Forest V1 was trained on Fraud Dataset 2024 as baseline. Neural Net V1 outperforms Isolation Forest V1 on precision."},
        ], conversation_id="ml-review-weekly", extraction="heuristic")
        print(f"  {result.claims_ingested} claims from chat")

        section("Slack #ml-experiments")
        slack_zip = os.path.join(tmp, "ml_slack.zip")
        with zipfile.ZipFile(slack_zip, "w") as zf:
            zf.writestr("channels.json", json.dumps([{"id":"C1","name":"ml-experiments"}]))
            zf.writestr("users.json", json.dumps([{"id":"U1","name":"ds","profile":{"real_name":"Data Scientist"}}]))
            zf.writestr("ml-experiments/2024-03-01.json", json.dumps([
                {"type":"message","user":"U1","text":"Trying ensemble for churn","ts":"1709300000.000"},
                {"type":"message","bot_id":"B1","text":"Stacked Ensemble V1 was trained on Churn Dataset Q4. Stacked Ensemble V1 outperforms XGBoost V2 with 0.92 AUC. Stacked Ensemble V1 uses feature Contract Type. Stacked Ensemble V1 was derived from XGBoost V2.","ts":"1709300060.000"},
            ]))
        slack_results = db.ingest_slack(slack_zip, extraction="heuristic")
        print(f"  {sum(r.claims_ingested for r in slack_results)} claims from Slack")

        section("What was tried for churn prediction?")
        churn_frame = db.query("Churn Dataset Q4", depth=2)
        print(f"  Models trained on Churn Dataset Q4:")
        for rel in churn_frame.direct_relationships:
            if rel.predicate == "trained_on":
                print(f"    {rel.target.name}")

        section("Feature importance")
        for feat in db.list_entities(entity_type="feature"):
            claims = db.claims_for(feat.id, predicate_type="uses_feature")
            if claims:
                models = set(c.subject.display_name or c.subject.id for c in claims)
                print(f"    {feat.name}: {len(models)} model(s)")

        section("Model lineage")
        for c in db.claims_for("xgboost v2", predicate_type="derived_from"):
            print(f"    XGBoost V2 derived from {c.object.display_name or c.object.id}")

        final = db.stats()
        print(f"\n  Final: {final['entity_count']} entities, {final['total_claims']} claims")
        db.close()
    print(f"  Runtime: {time.perf_counter() - start:.2f}s")
    return 0

if __name__ == "__main__":
    sys.exit(main())

Research

Autonomous Research Loop

Attest can detect blindspots, missing relationships, and low-confidence areas - but surfacing them passively isn't enough. The autonomous research loop closes the gap: detect problems, generate questions, call an LLM or external search, ingest the answers, and measure improvement. Your knowledge base heals itself.

What it covers

Detect single-source entities, missing predicates, and low-confidence areas
Formulate natural-language research questions from detected gaps
Research each question using any of 9 LLM providers (Gemini, Together, OpenAI, DeepSeek, Grok, and more)
Plug in external search (PubMed, web search, Slack) via a simple callback
Auto-ingest extracted claims with provenance tracking
Inquiry auto-resolution - open questions close when matching evidence arrives
Full-cycle investigate() measures blindspot reduction

Key code

# One call does it all: detect → question → research → ingest
db.configure_curator("gemini")  # or any of the 9 providers
report = db.investigate(max_questions=10)

print(f"Questions generated: {report.questions_generated}")
print(f"Claims ingested:    {report.claims_ingested}")
print(f"Blindspot before:   {report.blindspot_before}")
print(f"Blindspot after:    {report.blindspot_after}")

# Or research a single question
result = db.research_question(
    question="What pathways involve BRCA1?",
    entity_id="brca1",
    entity_type="gene",
)
print(f"New claims: {result.claims_ingested}")

Pluggable search

Pass any search_fn(question) → str to use external sources instead of (or alongside) the LLM. The callback returns raw text; Attest extracts structured claims from it.

import httpx

def pubmed_search(question: str) -> str:
    """Search PubMed and return abstracts."""
    resp = httpx.get("https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi",
        params={"db": "pubmed", "term": question, "retmax": 5, "retmode": "json"})
    ids = resp.json()["esearchresult"]["idlist"]
    # Fetch abstracts...
    return abstracts_text

# Use PubMed as the research source
report = db.investigate(max_questions=5, search_fn=pubmed_search)

# Or use Slack as a source
from attestdb import Researcher
researcher = Researcher(db, model="gemini")
slack_fn = lambda q: researcher.ask_slack(q, channel="#research", bot_token="xoxb-...")
report = db.investigate(search_fn=slack_fn)

How it works

Step	What happens
1. Detect	`blindspots()` + `find_gaps()` + `find_confidence_alerts()` identify weak areas
2. Question	`formulate_questions()` converts each gap into a natural-language research question and registers it as an inquiry
3. Research	Each question is sent to the LLM (or `search_fn`) with existing context from `db.query()`
4. Extract	LLM response is parsed into structured claims via `TextExtractor`
5. Ingest	Claims are ingested with `source_type="llm_research"` provenance; matching inquiries auto-resolve
6. Measure	`InvestigationReport` shows blindspot reduction, claims ingested, inquiries resolved

Sample output

  Blindspots before: 12 single-source entities
  Generated 10 research questions
  Researched 10 questions:
    gene_0: 2 claims ingested (LLM found inhibition + disease association)
    gene_1: 1 claim ingested  (search_fn found pathway membership)
    gene_2: 3 claims ingested (LLM found protein interactions)
    ...
  Total claims ingested: 18
  Inquiries resolved: 4
  Blindspots after: 5 single-source entities
  Improvement: 58% blindspot reduction

Sales

Sales & Marketing Intelligence

A B2B SaaS growth team models their prospect pipeline, competitive landscape, and content performance as claims. Contradictions surface naturally - "won against Schrodinger" vs "lost to Schrodinger" from different reps - and multi-source corroboration separates signal from noise.

What it covers

Prospect pipeline: companies, contacts, ICP scoring from CRM + enrichment tools
Competitive intelligence with contradictions (win/loss from different reps)
A/B content variants with engagement payloads (CTR, conversions)
Referral chains and lead attribution across touchpoints
Deal lifecycle with stage progression and temporal signals
Cross-product attribution: which content drives which deals

Key code

db = attestdb.open(":memory:", embedding_dim=None)
db.register_vocabulary("entity_type", {"prospect": True, "company": True, "competitor": True})
db.register_vocabulary("predicate_type", {"works_at": True, "outperforms": True})

# CRM says we won; field rep says we lost - contradiction preserved with provenance
db.ingest(subject=("AttestDB","product"), predicate=("won_against","won_against"),
         object=("Schrodinger","competitor"), provenance={"source_type":"crm","source_id":"deal:vertex"})
db.ingest(subject=("AttestDB","product"), predicate=("lost_to","lost_to"),
         object=("Schrodinger","competitor"), provenance={"source_type":"field_report","source_id":"rep:sarah"})

# Who knows whom at the prospect?
frame = db.query("Dr. Sarah Chen", depth=2)
blindspots = db.blindspots()

View full script on GitHub

Product

Product & Engineering

An engineering org models service ownership, feature lifecycle, and architecture decisions as claims. ADRs that get reversed, bugs that regress, tech debt that accumulates - all tracked with full provenance.

What it covers

Team ownership and service dependency graph
Architecture Decision Records with contradictions (PostgreSQL → DynamoDB reversal)
Bug lifecycle with regression detection
Feature stages across sprints with rollback tracking
Deploy history with canary/rollback chains
Tech debt inventory with priority scoring

Key code

# ADR reversal: decided PostgreSQL, then reversed to DynamoDB
db.ingest(subject=("ADR-007","adr"), predicate=("decided","decided"),
         object=("PostgreSQL","technology"), provenance={"source_type":"adr","source_id":"adr-007-v1"})
db.ingest(subject=("ADR-007","adr"), predicate=("decided","decided"),
         object=("DynamoDB","technology"), provenance={"source_type":"adr","source_id":"adr-007-v2"})

# Bug regression: fixed, then reappeared
frame = db.query("BUG-103: Feature store cache invalidation", depth=2)
# Shows: fixed_by → PR-2190, regressed_in → v2.4.0, caused_by → config drift

View full script on GitHub

Incident

Incident Response

An SRE team tracks a cascading outage across microservices. Three competing root cause theories emerge during the incident - L1 says database, L2 says network, postmortem finds connection pool - all preserved with timestamps and source attribution.

What it covers

Service topology with dependency chains
SLO definitions with multi-source corroboration
Full incident lifecycle: detection → triage → 3 competing theories → resolution
Oncall rotation and escalation paths
Deploy history with rollback triggers
Runbook effectiveness tracking (used, worked, failed)
Metric time series as claim payloads

Key code

# Three competing root causes - each from a different source
db.ingest(subject=("INC-2025-0042","incident"), predicate=("root_cause","root_cause"),
         object=("database timeout","theory"), provenance={"source_type":"oncall","source_id":"l1:triage"},
         confidence=0.6)
db.ingest(subject=("INC-2025-0042","incident"), predicate=("root_cause","root_cause"),
         object=("connection pool exhaustion","theory"),
         provenance={"source_type":"postmortem","source_id":"postmortem:042"}, confidence=0.95)

# What really happened?
frame = db.query("INC-2025-0042", depth=2)

View full script on GitHub

Finance

Finance & Risk Management

A trading desk models risk with multiple VaR methodologies that disagree. Historical VaR says $2.3M, Monte Carlo says $3.1M, Parametric says $1.8M. Counterparty ratings from Moody’s, S&P, and Fitch contradict. Every number has provenance.

What it covers

Portfolio structure: desks, traders, positions with mark-to-market
Contradicting risk models: Historical VaR vs Monte Carlo vs Parametric
Counterparty credit ratings from 3 agencies (corroboration + disagreement)
Audit findings with remediation tracking and regressions
Market risk regime shifts (pre/post rate hike)
Stress test scenarios with P&L impact payloads
Limit breach escalation chains

Key code

# Three models, three different VaR numbers
for model, var in [("Historical", 2.3), ("Monte Carlo", 3.1), ("Parametric", 1.8)]:
    db.ingest(subject=(f"{model} VaR Model","risk_model"),
             predicate=("computes_var","computes_var"),
             object=("UST Macro Book","trading_book"),
             provenance={"source_type":"risk_system","source_id":f"var:{model.lower()}"},
             payload={"var_1d_95": var})

# Counterparty: Moody's says A2, S&P says BBB+ - who's right?
frame = db.query("Deutsche Bank", depth=1)

View full script on GitHub

Legal

Legal & Compliance

A corporate legal team tracks contracts, regulatory obligations, and audit findings. Internal audit rates password policy as “low risk” while external auditors flag it as “high risk” - the contradiction is preserved with full attribution so GRC can investigate.

What it covers

Contract networks: parties, obligations, cross-references, renewal deadlines
Regulatory landscape: GDPR, SOX, HIPAA with policy-to-regulation mappings
Contradicting audit findings (internal “low” vs external “high”)
Policy decisions with approval chains and reversals
Matter tracking with counsel assignments
Obligation deadlines with temporal signals
Risk register with contradicting assessor scores
US vs EU jurisdictional interpretation differences

Key code

# Internal says low risk, external says high risk - both are claims
db.ingest(subject=("CTRL-001 Password Policy","control"),
         predicate=("assessed_as","assessed_as"), object=("low","risk_level"),
         provenance={"source_type":"internal_audit","source_id":"audit:q1-2025"})
db.ingest(subject=("CTRL-001 Password Policy","control"),
         predicate=("assessed_as","assessed_as"), object=("high","risk_level"),
         provenance={"source_type":"external_audit","source_id":"kpmg:soc2-2025"})

frame = db.query("CTRL-001 Password Policy", depth=2)
# Shows both assessments with their provenance chains

View full script on GitHub

Supply Chain

A manufacturing company tracks multi-tier supplier networks, quality certifications, and demand forecasts. Sales predicts 10K units, the ML model says 7K, historical average says 12.5K - all three forecasts are claims with provenance, not just numbers in a spreadsheet.

What it covers

Multi-tier supplier network (Tier 1/2 relationships)
Quality certifications with corroboration and expiry tracking
Demand forecast contradictions (sales vs ML model vs historical)
Defect tracing with recall cascade chains
Shipment tracking with temporal progression
Route optimization with conflicting logistics quotes
Supplier risk assessment from multiple sources

Key code

# Three forecasts, three different numbers
db.ingest(subject=("Sales Team Forecast","forecast"),
         predicate=("forecasts_demand","forecasts_demand"),
         object=("Power-MOSFET-Module","product"),
         provenance={"source_type":"sales_team","source_id":"forecast:q3-sales"},
         payload={"units": 10000, "period": "Q3-2025"})

# Defect cascade: component → subassembly → finished product
frame = db.query("DEF-2025-0042", depth=2)

View full script on GitHub

Healthcare

Clinical Decision Support

A hospital system models treatment protocols, drug interactions, and clinical trial evidence. The ADA recommends metformin first-line for Type 2 Diabetes; NICE guidelines disagree and prefer lifestyle intervention first. Both are claims - clinicians see both with full provenance.

What it covers

Treatment protocols from ADA vs NICE (contradicting guidelines)
Drug interaction database with multi-source corroboration
Clinical trial evidence: Phase 2 vs Phase 3 contradictions
Diagnostic pathways with symptom-to-biomarker chains
Contraindication tracking (15+ drug-condition pairs)
Patient outcome tracking with regime shifts
Specialist referral network
Guideline version evolution (v3 → v4 updates)

Key code

# ADA says metformin first; NICE says lifestyle first
db.ingest(subject=("Metformin","drug"), predicate=("first_line_for","first_line_for"),
         object=("Type 2 Diabetes","condition"),
         provenance={"source_type":"guideline","source_id":"ada:2024-standards"}, confidence=0.95)
db.ingest(subject=("Lifestyle Intervention","therapy"), predicate=("first_line_for","first_line_for"),
         object=("Type 2 Diabetes","condition"),
         provenance={"source_type":"guideline","source_id":"nice:ng28-2024"}, confidence=0.92)

# What does the evidence say about Type 2 Diabetes treatment?
frame = db.query("Type 2 Diabetes", depth=2)

View full script on GitHub

Security

Threat Intelligence

A SOC team correlates threat actor profiles, IOCs from multiple feeds, and vulnerability lifecycle data. When CrowdStrike attributes an attack to PHANTOM BEAR but Mandiant says SANDSTORM COLLECTIVE, both attributions are preserved - confidence scores and source provenance help analysts decide.

What it covers

Threat actor profiles with MITRE ATT&CK TTP mapping
IOC correlation across feeds (corroboration + gaps)
Competing vulnerability severity assessments
Full kill chain timeline (Operation Nightfall)
Asset exposure tracking across the network
Mitigation effectiveness (verified vs bypassed)
Alert triage with false positive rate tracking
Malware analysis with sandbox corroboration

Key code

# Two threat intel providers, two different attributions
db.ingest(subject=("Operation Nightfall","campaign"),
         predicate=("attributed_to","attributed_to"),
         object=("PHANTOM BEAR","threat_actor"),
         provenance={"source_type":"threat_intel","source_id":"crowdstrike:apt-report-2025"},
         confidence=0.85)
db.ingest(subject=("Operation Nightfall","campaign"),
         predicate=("attributed_to","attributed_to"),
         object=("SANDSTORM COLLECTIVE","threat_actor"),
         provenance={"source_type":"threat_intel","source_id":"mandiant:m-trends-2025"},
         confidence=0.72)

frame = db.query("PHANTOM BEAR", depth=2)

View full script on GitHub

HR & Talent Management

An HR team models org structure, skill inventories, and performance reviews. When an employee self-rates as “expert” in Python but peer review says “intermediate”, both assessments are claims. The calibration gap surfaces automatically.

What it covers

Org structure: departments, teams, reporting lines
Multi-source skill assessments (self, manager, peer - with contradictions)
Performance reviews with temporal Q1 → Q2 progression
Hiring pipeline with conflicting interviewer assessments
Cross-team project dependencies and SPOF detection
Training and certifications with expiry tracking
Attrition risk signals from multiple HR systems
Compensation equity tracking

Key code

# Self-assessment vs peer review: contradiction preserved
db.ingest(subject=("Jordan Whitfield","employee"), predicate=("has_skill","has_skill"),
         object=("Python","skill"),
         provenance={"source_type":"self_assessment","source_id":"skills:jordan:2024"},
         payload={"level": "expert"})
db.ingest(subject=("Jordan Whitfield","employee"), predicate=("has_skill","has_skill"),
         object=("Python","skill"),
         provenance={"source_type":"peer_review","source_id":"review:priya:jordan:2024"},
         payload={"level": "intermediate"})

frame = db.query("Jordan Whitfield", depth=1)
# Shows both skill claims - the calibration gap is visible

View full script on GitHub

Academia

Research & Academia

A research group tracks the replication crisis in their field. Lab B replicates a finding; Lab C contradicts it with a larger sample. The citation network, hypothesis evolution, and dataset provenance are all first-class claims with confidence scores.

What it covers

Citation networks with corroboration hubs
Replication crisis: replicated by Lab B, contradicted by Lab C
Method comparisons with context-dependent results
Grant and collaboration networks
Dataset provenance with error cascade tracking
Conference and journal quality metrics
Hypothesis evolution (proposed → supported → revised)

Key code

# Original finding, replication, and contradiction - all claims
db.ingest(subject=("RNA folding pathway F1 is crowding-dependent","finding"),
         predicate=("reported_by","reported_by"),
         object=("Yamamoto Lab","lab"),
         provenance={"source_type":"publication","source_id":"doi:10.1234/rna.2024.001"})
db.ingest(subject=("RNA folding pathway F1 is crowding-dependent","finding"),
         predicate=("replicated_by","replicated_by"),
         object=("Okonkwo Lab","lab"),
         provenance={"source_type":"publication","source_id":"doi:10.1234/rna.2024.047"})
db.ingest(subject=("RNA folding pathway F1 is crowding-dependent","finding"),
         predicate=("contradicted_by","contradicted_by"),
         object=("Chen Lab","lab"),
         provenance={"source_type":"publication","source_id":"doi:10.1234/rna.2025.003"},
         confidence=0.88)

frame = db.query("RNA folding pathway F1 is crowding-dependent", depth=2)
# Shows replication + contradiction with full provenance

View full script on GitHub

API

New API Methods - Knowledge Intelligence

Nine new methods that answer questions only a provenance-tracking database can answer. Every example below works on any Attest database - no API keys, no external services.

Impact Analysis

What happens if a source turns out to be wrong?

# "If we retract paper_42, what breaks?"
report = db.impact("paper_42")
print(f"Direct claims: {report.direct_claims}")
print(f"Downstream claims: {report.downstream_claims}")
print(f"Affected entities: {report.affected_entities}")

Blindspot Detection

Where is your knowledge vulnerable?

# Find entities backed by only a single source
blindspots = db.blindspots(min_claims=5)
print(f"Single-source entities: {blindspots.single_source_entities}")
print(f"Low-confidence areas: {len(blindspots.low_confidence_areas)}")

Consensus Analysis

Do multiple sources agree?

# "How much agreement is there about BRCA1?"
report = db.consensus("BRCA1")
print(f"Sources: {report.unique_sources}, Agreement: {report.agreement_ratio:.0%}")
print(f"Claims by source: {report.claims_by_source}")

Fragile Claims

What's your weakest knowledge?

# Find claims backed by only one source
fragile = db.fragile(max_sources=1)
for c in fragile[:5]:
    print(f"  {c.subject.id} -[{c.predicate.id}]→ {c.object.id} ({c.provenance.source_type})")

Stale Knowledge

What hasn't been updated recently?

# Claims older than 90 days without corroboration
stale = db.stale(days=90)
print(f"{len(stale)} claims need refreshing")

Audit Trail

Full provenance chain for any claim.

# "Where did this claim come from? What depends on it?"
trail = db.audit(claim_id)
print(f"Source: {trail.source_type}:{trail.source_id}")
print(f"Corroborating claims: {len(trail.corroborating_claims)}")
print(f"Downstream dependents: {trail.downstream_dependents}")

Knowledge Drift

How is your knowledge changing over time?

# "What changed in the last 30 days?"
drift = db.drift(days=30)
print(f"New claims: {drift.new_claims}, New entities: {drift.new_entities}")
print(f"Retracted: {drift.retracted_claims}")
print(f"Confidence delta: {drift.confidence_delta:+.3f}")

Source Reliability

Which sources can you trust?

# Per-source corroboration and retraction rates
reliability = db.source_reliability()
for source_id, metrics in reliability.items():
    print(f"  {source_id}: {metrics['corroboration_rate']:.0%} corroborated, "
          f"{metrics['retraction_rate']:.0%} retracted")

What-If Analysis

Would a hypothetical claim add value?

from attestdb import ClaimInput

# "If someone claims Olaparib treats breast cancer, what happens?"
hyp = ClaimInput(
    subject=("Olaparib", "compound"),
    predicate=("treats", "treats"),
    object=("Breast Cancer", "disease"),
    provenance={"source_type": "review", "source_id": "hypothetical"},
)
report = db.hypothetical(hyp)
print(f"Would corroborate: {report.would_corroborate}")
print(f"Fills gap: {report.fills_gap}")
print(f"Related entities: {report.related_entities}")

Consensus

Multi-LLM Consensus

Get the best answer to any question by sending it to ChatGPT, Claude, and Gemini simultaneously - using your existing subscriptions. Cross-pollinate their answers, then have an independent judge score and synthesize the best response. No API keys required for the primary providers.

Install

Option A: One-line install (macOS)

$ curl -fsSL https://attestdb.com/install.sh | bash

This installs AttestDB + Playwright + Chromium in an isolated venv at ~/.attestdb/venv.

Option B: pip

$ pip install 'attestdb[browser]' attest-py
$ playwright install chromium

Launch

$ attest chat

This opens a Chromium browser with three tabs: ChatGPT, Claude, and Gemini.

First run only: log into each chat app in the browser window. Your sessions are saved to ~/.attestdb/browser-profile/ and remembered for all future runs. After that, attest chat opens already logged in.

How it works

You type a question in the terminal
All three providers receive it simultaneously in their browser tabs
Responses are collected and printed in your terminal as they arrive
/share - Each provider sees the others' answers and revises
/tournament - Automatic loop: providers see each other's responses, revise, and score each other (1–10) every round until they all agree on the best answer
/consensus - Quick one-shot judge via free API (Gemini/Groq)
AttestDB learns which providers excel at which task types over time

Commands

`/share`	One round: each provider sees the others' responses and revises
`/tournament`	Loop until consensus: providers revise + score each other every round
`/tournament 3`	Same, max 3 rounds (default: 5)
`/consensus`	Quick judge via free API (Gemini/Groq)
`/providers`	List active providers and message counts
`/quit`	Exit and close browser
`@filename`	Inline file contents (supports globs: `@*.py`)

Judge API (optional, free)

For best results, add a free Gemini API key for the judging/scoring layer. Without it, one of the browser providers acts as judge (still works, just less impartial).

# .env (optional - Gemini API is free)
GOOGLE_API_KEY=your-key-from-aistudio.google.com/apikey

API mode

If you prefer API-only (no browser), use --mode api with provider API keys:

$ attest chat --mode api --providers gemini,openai,deepseek

Python API

Use the consensus engine programmatically (requires API keys):

import attestdb

db = attestdb.AttestDB("research.db")

result = db.agent_consensus(
    question="What is the mechanism of action of metformin?",
    max_rounds=3,
)

print(f"Consensus ({result.confidence:.0%} confidence):")
print(result.consensus)
print(f"Converged: {result.converged} in {result.rounds} rounds")