[1/3 - Anomaly Detection] [1/2 - KNN Classification] Batch Upload Dataset to Qdrant (Crops Dataset)

This workflow implements the bulk import of crop image datasets from Google Cloud Storage and performs multimodal feature embedding. The generated vectors and associated metadata are batch uploaded to the Qdrant vector database, supporting the automatic creation of collections and indexes to ensure data structure compliance. Specifically designed for anomaly detection scenarios, it filters images of specific categories to facilitate subsequent model training and validation. It is suitable for agricultural image classification, anomaly detection, and large-scale image data management, enhancing data processing efficiency and accuracy.

Tags

Vector DBQdrant

Workflow Name

[1/3 - Anomaly Detection] [1/2 - KNN Classification] Batch Upload Dataset to Qdrant (Crops Dataset)

Key Features and Highlights

This workflow enables batch importing of crop image datasets from Google Cloud Storage, performs multimodal feature embedding on the images, and uploads the generated vectors along with associated metadata in batches to the Qdrant vector database. It supports automatic creation of Qdrant collections and indexes, ensuring standardized and efficient data structure and upload processes. Specifically designed for anomaly detection scenarios, it filters out images of certain categories (e.g., “tomato”) for testing purposes, facilitating subsequent training and validation of anomaly detection models.

Core Problems Addressed

  • Automates batch import and processing of large-scale image datasets, eliminating manual and tedious operations.
  • Unifies vectorized representation of image data to facilitate subsequent vector-based similarity search and classification.
  • Checks for the existence of Qdrant collections and dynamically creates them to prevent errors from duplicate creation.
  • Generates unique UUIDs as Qdrant point IDs to guarantee data point uniqueness.
  • Supports batch-wise processing and uploading to improve upload efficiency and reduce API pressure.
  • Creates payload indexes to optimize query performance based on metadata fields such as crop_name.

Application Scenarios

  • Agricultural image classification and anomaly detection: vectorized storage of various crop images to support downstream anomaly recognition and classification tasks.
  • Any machine learning or AI applications requiring image data conversion to vectors and storage in vector databases.
  • Batch processing and management of large-scale image datasets.
  • Construction of vector search systems based on Qdrant.

Main Workflow Steps

  1. Manually trigger the workflow start.
  2. Configure Qdrant cloud connection variables, including cluster URL, collection name, embedding vector dimension, and batch size.
  3. Check if the specified Qdrant collection exists; if not, create the collection and establish payload indexes.
  4. Retrieve the list of crop images from the specified Google Cloud Storage bucket and prefix path.
  5. Construct publicly accessible image URLs and extract crop names from the file paths.
  6. Filter out images belonging to the “tomato” category (for anomaly detection testing).
  7. Split image data into batches according to batch size and generate unique UUIDs for each data point.
  8. Format data to comply with the input requirements of the Voyage AI multimodal embedding API.
  9. Call the Voyage multimodal embedding API to obtain vector representations of the images.
  10. Batch upload the generated vectors along with corresponding metadata to the Qdrant collection.

Involved Systems or Services

  • Google Cloud Storage: Storage and retrieval of image datasets.
  • Qdrant Cloud: Vector database for storing and retrieving image embedding vectors.
  • Voyage AI Multimodal Embedding API: Converts images into high-dimensional vector representations.
  • n8n Automation Platform: Orchestrates node execution and manages the entire workflow.

Target Users and Value

  • AI engineers and data scientists: Quickly build image vectorization and storage pipelines to accelerate anomaly detection and classification model development.
  • Agri-tech companies: Enable intelligent analysis and anomaly detection of crop images.
  • Machine learning R&D teams: Batch process and manage large-scale image data to improve data preprocessing efficiency.
  • Vector database users: Demonstrate end-to-end vector data management by integrating data storage, vector generation, and batch uploading.

This workflow template features a clear structure and is easily adaptable to other image datasets by simply replacing the storage bucket path and collection configuration. Through batch processing and automated integration, it significantly simplifies image vector data preparation, laying a solid foundation for subsequent machine learning tasks.

Recommend Templates

Stackby Data Write and Read Automation Process

This workflow enables the automatic writing of a data entry to a specified table in the Stackby database through a manual trigger, followed by an immediate retrieval of all data entries from that table. With this automation process, users can avoid cumbersome manual operations, significantly improving the efficiency and accuracy of data management. It is suitable for teams and individuals who need to frequently update and query data. This process effectively reduces operational complexity and is applicable to various automated office scenarios.

Stackby Automationn8n Integration

Google Sheets Auto Export and Sync to Dropbox

This workflow automatically reads data from Google Sheets and converts it into XLS format files, which are then uploaded to Dropbox cloud storage. It is triggered every 15 minutes to ensure timely and stable data synchronization. By automating the process, it reduces the cumbersome steps of manual exporting and uploading, thereby improving work efficiency and ensuring real-time sharing and backup of files for the team. This is particularly suitable for teams in finance, sales, and other areas that require frequent updates and sharing of spreadsheets.

Google SheetsDropbox Sync

Export SQL Table Data to CSV File

This workflow can automatically read data from specified tables in a Microsoft SQL database and convert it into a CSV file. Users can easily complete the data export by simply clicking the "Execute Workflow" button, making it suitable for data analysts, business personnel, and IT operations. By automating the process, it simplifies the traditional manual export procedure, improves efficiency and accuracy, reduces human errors, and facilitates subsequent data analysis and management.

SQL ExportCSV Convert

PostgreSQL Export to CSV

This workflow is designed to simplify the process of exporting data from a PostgreSQL database to CSV format. Users only need to manually trigger the workflow, and the system will automatically execute the query and generate a CSV file, facilitating data backup, sharing, and analysis. This process effectively addresses the cumbersome issues of manual exporting and format conversion, improving the efficiency and accuracy of data processing, making it suitable for various application scenarios such as data analysts, product managers, and developers.

PostgreSQL ExportCSV Conversion

Box Folder Event Trigger

The main function of this workflow is to monitor "move" and "download" events in a specified folder on the Box cloud storage platform in real time. Once relevant actions are detected, the system automatically triggers subsequent processing workflows, such as sending notifications or data synchronization. This process ensures that users can quickly respond to changes in the status of critical folders, improving work efficiency and reducing manual monitoring costs. It is suitable for users such as enterprise IT administrators and project managers who require automated file management.

Box TriggerFolder Watch

SQLite MCP Server Database Management Workflow

This workflow implements automated management of a local database by building an SQLite-based MCP server, including secure create, read, update, and delete (CRUD) operations. Users can remotely execute database operations through the MCP client, ensuring the security and compliance of these operations. Additionally, the workflow provides a description and query functionality for the database table structure, supports intelligent routing of requests, and simplifies business processes. It is suitable for internal data management, intelligent analysis, and integration with AI assistants, facilitating digital transformation.

SQLite ManagementMCP Protocol

Automated Product Label Generation and Printing Workflow

This workflow automatically receives Webhook requests to gather and integrate detailed information about products and their rolls, generating complete product label data that supports fast and accurate printing. It effectively reduces manual input and data omissions, improving the efficiency and accuracy of label generation. It is suitable for the bulk printing needs of the apparel, textile, and manufacturing industries, optimizing warehouse management and e-commerce shipping processes, thereby enhancing overall business performance.

Product TagsAuto Print

Create a Table and Insert Data into It

The main function of this workflow is to automate the creation and insertion of data into tables in the QuestDB database. Users can trigger the system with a simple click, which will execute the table creation and data insertion operations, simplifying the complex processes of traditional database operations. This workflow is particularly suitable for development and testing environments, as it can quickly initialize the database table structure, automate data entry, reduce operational risks, and improve work efficiency.

QuestDBDatabase Automation