[2/3] Set up Medoids (2 Types) for Anomaly Detection (Crops Dataset)
This workflow is primarily used for clustering analysis in agricultural crop image datasets. It automates the setting of representative center points (medoids) for clustering and their threshold scores to support subsequent anomaly detection. By combining traditional distance matrix methods with multimodal text-image embedding techniques, it accurately locates clustering centers and calculates reasonable thresholds, enhancing the effectiveness of anomaly detection. It is suitable for applications in the agricultural field, such as pest and disease identification and anomaly warning, ensuring efficient and accurate data processing.
![[2/3] Set up Medoids (2 Types) for Anomaly Detection (Crops Dataset) Workflow diagram](/_next/image?url=https%3A%2F%2Fimg.n8ntemplates.dev%2Fcdn-cgi%2Fimage%2Fwidth%3D1024%2Cheight%3D640%2Cquality%3D85%2Cformat%3Dauto%2Cfit%3Dcover%2Conerror%3Dredirect%2Ftemplates%2Fset-up-medoids-2-types-anomaly-detection-crops-0b7245.png&w=3840&q=75)
Workflow Name
[2/3] Set up Medoids (2 Types) for Anomaly Detection (Crops Dataset)
Key Features and Highlights
This workflow is designed to establish representative center points (medoids) and their threshold scores for clusters within a crop image dataset, serving as the foundation for subsequent anomaly detection. It employs two approaches—traditional medoid selection based on distance matrices and semantic matching using a multimodal embedding model—to accurately locate cluster centers and set thresholds for various crop clusters. By integrating the Qdrant vector database with the Voyage AI multimodal embedding API, the workflow automates clustering analysis and threshold calculation, enhancing the precision and efficiency of anomaly detection.
Core Problems Addressed
- How to accurately identify the representative medoid of each cluster to serve as a baseline for anomaly detection.
- How to establish reasonable threshold scores to differentiate normal samples from anomalies.
- How to overcome the limitations of relying on single-feature representations by combining image and text multimodal information for improved representativeness of cluster centers.
- Providing an automated and reusable process adaptable to different crop categories and other image datasets.
Application Scenarios
- Anomaly detection in agricultural crop images, such as pest and disease identification or abnormal growth alerts.
- Any scenario requiring identification of cluster medoids and preparation for anomaly detection based on multimodal image and text data.
- Large-scale vector data management and analysis using the Qdrant vector database.
Main Workflow Steps
- Manually trigger the workflow to initialize variables and parameters.
- Retrieve the total number of points and clustering information for the crop dataset from Qdrant, obtaining crop categories and their data volumes.
- For each crop category, execute the following:
- Distance Matrix Method: Call Qdrant’s distance matrix API to compute similarity matrices within clusters, then use Scipy sparse matrix computations to determine the most representative medoid.
- Multimodal Embedding Method: Generate text vectors from hardcoded textual descriptions via the Voyage AI multimodal embedding API, then query Qdrant with these text vectors to find the image point best matching the description as the medoid.
- Mark the identified medoids back in Qdrant with the tags “is_medoid” and “is_text_anchor_medoid” respectively.
- Calculate threshold scores for each medoid based on cosine similarity distances to the least similar points, and write these thresholds back to the cluster centers for anomaly detection use.
- Complete medoid and threshold setup for all categories, providing a data foundation for subsequent anomaly detection workflows.
Involved Systems and Services
- Qdrant Cloud: Vector database storing crop image vectors and clustering data, offering distance matrix and vector search APIs.
- Voyage AI API: Provides multimodal text-image embedding services for medoid localization based on textual descriptions.
- n8n Automation Workflow Platform: Orchestrates API calls and coordinates data flow and processing.
- Python Code Node (Scipy): Performs sparse distance matrix computations to assist in medoid determination.
Target Users and Value
- Data scientists and machine learning engineers: Quickly build anomaly detection preprocessing pipelines based on vector databases.
- Agricultural technology researchers: Establish infrastructure for automated anomaly recognition in crop images.
- AI developers and automation engineers: Leverage the n8n platform for multi-system integration and automated data processing.
- Industry users requiring anomaly detection based on cluster centers, especially in complex multimodal data scenarios.
This workflow serves as a critical intermediate step that, combined with data upload and anomaly detection workflows, forms a complete solution for crop image anomaly detection. By flexibly leveraging Qdrant and multimodal APIs, and balancing traditional distance matrix analysis with modern semantic understanding, it significantly improves the accuracy of cluster center identification and the rationality of thresholds, providing a solid foundation for anomaly detection.