Google Drive

Text

The Google Drive connector is a text connector that lists files from Google Drive, exports or downloads their text content, and ingests claims via db.ingest_text().

How it works

Lists files using the Drive API v3 with optional folder and MIME type filters
Google Workspace files (Docs, Sheets, Slides) are exported as text/plain
Regular text files (.txt, .md, .csv) are downloaded directly
Binary formats (PDF, DOCX) are downloaded and passed through text extraction
Each file is ingested as a separate text document via db.ingest_text()
Provenance tracks the Drive file ID and file name as source identifiers

Supported file types

File type	Method	Notes
Google Docs	Export as text/plain	Full text including headings and tables
Google Sheets	Export as text/plain	Tab-separated values per sheet
Google Slides	Export as text/plain	Slide text in order
Text files (.txt, .md)	Direct download	Raw content
Other formats	Download + extract	Depends on extraction mode

1 Enable the Google Drive API

Go to the Google Cloud Console, select your project, and enable the Google Drive API.

2 Create OAuth credentials

Create OAuth 2.0 credentials (or a service account) and obtain an access token with drive.readonly scope. For service accounts, use domain-wide delegation.

3 Install requests

$ pip install requests

4 Connect

conn = db.connect("gdrive",
    access_token=os.environ["GOOGLE_ACCESS_TOKEN"],
)
result = conn.run(db)

Parameter	Required	Default	Description
`access_token`	Yes	—	OAuth 2.0 access token with drive.readonly scope
`folder_id`	No	All files	Limit to a specific Drive folder by ID
`mime_types`	No	All types	List of MIME types to filter (e.g. `["application/vnd.google-apps.document"]`)
`max_files`	No	`50`	Maximum number of files to process
`extraction`	No	`heuristic`	Text extraction mode: `heuristic` or `llm`
`save`	No	`False`	Encrypt and persist access token

Basic usage

conn = db.connect("gdrive",
    access_token=os.environ["GOOGLE_ACCESS_TOKEN"],
)
result = conn.run(db)
print(f"Ingested {result.claims_ingested} claims from {result.files_processed} files")

With folder filter

conn = db.connect("gdrive",
    access_token=os.environ["GOOGLE_ACCESS_TOKEN"],
    folder_id="1A2B3C4D5E6F",
    mime_types=["application/vnd.google-apps.document"],
    max_files=100,
)
result = conn.run(db)

With save (persist encrypted token)

conn = db.connect("gdrive",
    access_token=os.environ["GOOGLE_ACCESS_TOKEN"],
    extraction="llm",
    save=True,
)
result = conn.run(db)