[1/3 - Anomaly Detection] [1/2 - KNN Classification] Batch Upload Dataset to Qdrant (Crops Dataset)

This workflow implements the bulk import of crop image datasets from Google Cloud Storage and performs multimodal feature embedding. The generated vectors and associated metadata are batch uploaded to the Qdrant vector database, supporting the automatic creation of collections and indexes to ensure data structure compliance. Specifically designed for anomaly detection scenarios, it filters images of specific categories to facilitate subsequent model training and validation. It is suitable for agricultural image classification, anomaly detection, and large-scale image data management, enhancing data processing efficiency and accuracy.

Workflow Diagram
[1/3 - Anomaly Detection] [1/2 - KNN Classification] Batch Upload Dataset to Qdrant (Crops Dataset) Workflow diagram

Workflow Name

[1/3 - Anomaly Detection] [1/2 - KNN Classification] Batch Upload Dataset to Qdrant (Crops Dataset)

Key Features and Highlights

This workflow enables batch importing of crop image datasets from Google Cloud Storage, performs multimodal feature embedding on the images, and uploads the generated vectors along with associated metadata in batches to the Qdrant vector database. It supports automatic creation of Qdrant collections and indexes, ensuring standardized and efficient data structure and upload processes. Specifically designed for anomaly detection scenarios, it filters out images of certain categories (e.g., “tomato”) for testing purposes, facilitating subsequent training and validation of anomaly detection models.

Core Problems Addressed

  • Automates batch import and processing of large-scale image datasets, eliminating manual and tedious operations.
  • Unifies vectorized representation of image data to facilitate subsequent vector-based similarity search and classification.
  • Checks for the existence of Qdrant collections and dynamically creates them to prevent errors from duplicate creation.
  • Generates unique UUIDs as Qdrant point IDs to guarantee data point uniqueness.
  • Supports batch-wise processing and uploading to improve upload efficiency and reduce API pressure.
  • Creates payload indexes to optimize query performance based on metadata fields such as crop_name.

Application Scenarios

  • Agricultural image classification and anomaly detection: vectorized storage of various crop images to support downstream anomaly recognition and classification tasks.
  • Any machine learning or AI applications requiring image data conversion to vectors and storage in vector databases.
  • Batch processing and management of large-scale image datasets.
  • Construction of vector search systems based on Qdrant.

Main Workflow Steps

  1. Manually trigger the workflow start.
  2. Configure Qdrant cloud connection variables, including cluster URL, collection name, embedding vector dimension, and batch size.
  3. Check if the specified Qdrant collection exists; if not, create the collection and establish payload indexes.
  4. Retrieve the list of crop images from the specified Google Cloud Storage bucket and prefix path.
  5. Construct publicly accessible image URLs and extract crop names from the file paths.
  6. Filter out images belonging to the “tomato” category (for anomaly detection testing).
  7. Split image data into batches according to batch size and generate unique UUIDs for each data point.
  8. Format data to comply with the input requirements of the Voyage AI multimodal embedding API.
  9. Call the Voyage multimodal embedding API to obtain vector representations of the images.
  10. Batch upload the generated vectors along with corresponding metadata to the Qdrant collection.

Involved Systems or Services

  • Google Cloud Storage: Storage and retrieval of image datasets.
  • Qdrant Cloud: Vector database for storing and retrieving image embedding vectors.
  • Voyage AI Multimodal Embedding API: Converts images into high-dimensional vector representations.
  • n8n Automation Platform: Orchestrates node execution and manages the entire workflow.

Target Users and Value

  • AI engineers and data scientists: Quickly build image vectorization and storage pipelines to accelerate anomaly detection and classification model development.
  • Agri-tech companies: Enable intelligent analysis and anomaly detection of crop images.
  • Machine learning R&D teams: Batch process and manage large-scale image data to improve data preprocessing efficiency.
  • Vector database users: Demonstrate end-to-end vector data management by integrating data storage, vector generation, and batch uploading.

This workflow template features a clear structure and is easily adaptable to other image datasets by simply replacing the storage bucket path and collection configuration. Through batch processing and automated integration, it significantly simplifies image vector data preparation, laying a solid foundation for subsequent machine learning tasks.