Connectors Google Docs

Google Docs

Text

Text connector that fetches Google Docs via the Drive and Docs APIs. Each document's content is extracted and ingested through db.ingest_text() to produce structured, provenanced claims.

How it works

Connector type

This is a text connector. It calls db.ingest_text() to extract claims from document content using heuristic or LLM-powered extraction. Configure an LLM provider with db.configure_curator() for deeper extraction.

Prerequisites

  1. Enable the Google Docs API and Google Drive API in the Google Cloud Console
  2. Create OAuth2 credentials and obtain an access token with the following scopes:
    • https://www.googleapis.com/auth/drive.readonly
    • https://www.googleapis.com/auth/documents.readonly
  3. Install the requests package:
    pip install requests

Authentication

The connector requires a Google OAuth2 access token. Tokens are short-lived (typically 1 hour). Use a refresh token flow or the Google Auth library to obtain a fresh access token before connecting.

Pass the token as token in db.connect(). The factory maps it to the access_token parameter on the underlying GDocsConnector.

Parameter Required Default Description
access_token Yes Google OAuth2 access token with drive.readonly and documents.readonly scopes
max_docs No 50 Maximum number of documents to fetch from Drive
save No False Encrypt and persist the token to disk (requires cryptography package)

Note: In db.connect(), the access token is passed as token, which the factory maps to access_token on the underlying GDocsConnector.

Basic usage

conn = db.connect("gdocs", token="ya29.a0...")
result = conn.run(db)

Limited docs with save

conn = db.connect("gdocs",
    token="ya29.a0...",
    max_docs=20,
    save=True,
)
result = conn.run(db)

Environment variable

conn = db.connect("gdocs",
    token=os.environ["GOOGLE_TOKEN"],
    max_docs=100,
)
result = conn.run(db)

Related Connectors