Google Docs

Text

Text connector that fetches Google Docs via the Drive and Docs APIs. Each document's content is extracted and ingested through db.ingest_text() to produce structured, provenanced claims.

How it works

Lists documents via the Drive API using a MIME type filter for application/vnd.google-apps.document
Fetches each document via the Docs API and recursively extracts text from paragraphs, headings, and tables
Ingests each non-empty document through db.ingest_text(text, source_id="gdocs:{doc_id}")
Rate limited with a 0.1s sleep between API calls to avoid quota exhaustion
Empty documents are silently skipped; fetch errors are logged and collected in result.errors

Connector type

This is a text connector. It calls db.ingest_text() to extract claims from document content using heuristic or LLM-powered extraction. Configure an LLM provider with db.configure_curator() for deeper extraction.

Prerequisites

Enable the Google Docs API and Google Drive API in the Google Cloud Console
Create OAuth2 credentials and obtain an access token with the following scopes:
- https://www.googleapis.com/auth/drive.readonly
- https://www.googleapis.com/auth/documents.readonly
Install the requests package:
```
pip install requests
```

Authentication

The connector requires a Google OAuth2 access token. Tokens are short-lived (typically 1 hour). Use a refresh token flow or the Google Auth library to obtain a fresh access token before connecting.

Pass the token as token in db.connect(). The factory maps it to the access_token parameter on the underlying GDocsConnector.

Parameter	Required	Default	Description
`access_token`	Yes	—	Google OAuth2 access token with `drive.readonly` and `documents.readonly` scopes
`max_docs`	No	`50`	Maximum number of documents to fetch from Drive
`save`	No	`False`	Encrypt and persist the token to disk (requires `cryptography` package)

Note: In db.connect(), the access token is passed as token, which the factory maps to access_token on the underlying GDocsConnector.

Basic usage

conn = db.connect("gdocs", token="ya29.a0...")
result = conn.run(db)

Limited docs with save

conn = db.connect("gdocs",
    token="ya29.a0...",
    max_docs=20,
    save=True,
)
result = conn.run(db)

Environment variable

conn = db.connect("gdocs",
    token=os.environ["GOOGLE_TOKEN"],
    max_docs=100,
)
result = conn.run(db)