Gmail to Vector Embeddings with PGVector and Ollama
This workflow automatically retrieves emails from a Gmail account, structurally stores the email content in a PostgreSQL database, and uses the Ollama model to convert the text into vector embeddings, which are then stored in a PGVector database. It supports batch importing of historical emails and real-time monitoring of new emails, automatically processes attachments, and enhances the efficiency of email data organization and intelligent retrieval. This solution is suitable for businesses and individuals that need to quickly locate and analyze large amounts of email information.

Workflow Name
Gmail to Vector Embeddings with PGVector and Ollama
Key Features and Highlights
This workflow automates the extraction of emails from a Gmail inbox and structures the email content for storage in a PostgreSQL database. It leverages Ollama’s nomic-embed-text model to convert email text into vector embeddings, which are stored in the PGVector vector database to enable content-based similarity search. The workflow supports bulk import of historical emails as well as real-time monitoring of new emails, automatically processes attachments, and can fetch emails in batches based on time intervals.
Core Problems Addressed
- Automates the organization and storage of massive email data, eliminating inefficiencies and disorder caused by manual management
- Transforms unstructured email text into structured data and vector representations for fast retrieval and intelligent analysis
- Enables similarity search based on email content, improving the utilization efficiency of email information
- Provides unified management for both bulk import of historical emails and real-time synchronization of new emails
Use Cases
- Archiving, searching, and analyzing large volumes of emails for enterprises or individuals
- Building knowledge bases, intelligent search, and data mining based on email content
- Quickly locating relevant email content in customer service and sales tracking scenarios
- Preparing foundational email data for AI applications such as chatbots and intelligent Q&A systems
Main Workflow Steps
- Gmail Trigger: Periodically (every minute) monitors the Gmail inbox for new emails, with support for attachment downloads
- Bulk Historical Email Retrieval: Fetches historical emails in bulk by specifying a time range
- Email Field Extraction: Extracts key information including email body text, sender, recipients, CC, subject, and attachments
- Structured Storage: Saves extracted email metadata into the
emails_metadata
table in PostgreSQL - Text Splitting: Recursively splits the email body text to ensure quality input for vectorization
- Vector Embedding Generation: Calls Ollama’s nomic-embed-text model to generate vector representations of the email text
- Vector Storage: Stores the generated vector embeddings in the
emails_embeddings
table of PGVector, associating them with email IDs and thread IDs - Conditional Routing: Differentiates processing flows for manual and automatic triggers to ensure flexible execution
Involved Systems and Services
- Gmail (email collection and triggering)
- PostgreSQL (structured data storage)
- PGVector (vector database extension for storing embeddings)
- Ollama (invokes nomic-embed-text model for text embedding generation)
- n8n Automation Platform (workflow orchestration and execution)
Target Users and Value
- IT Operations and Data Engineers: Automate email data archiving and management to enhance system data asset value
- Data Scientists and AI Developers: Quickly obtain structured email data and corresponding vectors as a foundation for machine learning and intelligent applications
- Business Managers and Office Staff: Improve email retrieval efficiency and optimize customer communication and internal information management
- Any users needing intelligent analysis, search, and archiving of large volumes of email content
This workflow integrates email data processing with advanced vectorization technology to achieve structured storage and intelligent retrieval of email content, significantly enhancing the intelligence and automation of email management. Its flexible support for batch import over time intervals and real-time email monitoring meets the demands for efficient utilization of email data across diverse scenarios.