[2/3] Set up medoids (2 types) for anomaly detection (crops dataset)
This workflow establishes clustering representative points and thresholds for crop image datasets using two methods, providing a foundation for anomaly detection. It utilizes vector database APIs and Python libraries for sparse matrix calculations, ensuring the efficient and accurate determination of cluster centers and thresholds. This approach is applicable in various scenarios such as agricultural smart monitoring and preprocessing for machine learning models, significantly enhancing the accuracy and reliability of anomaly detection while simplifying the complex clustering analysis process.
![[2/3] Set up medoids (2 types) for anomaly detection (crops dataset) Workflow diagram](/_next/image?url=https%3A%2F%2Fimg.n8ntemplates.dev%2Fcdn-cgi%2Fimage%2Fwidth%3D1024%2Cheight%3D640%2Cquality%3D85%2Cformat%3Dauto%2Cfit%3Dcover%2Conerror%3Dredirect%2Ftemplates%2Fset-up-medoids-2-types-anomaly-detection-crops-d44f9c.png&w=3840&q=75)
Workflow Name
[2/3] Set up medoids (2 types) for anomaly detection (crops dataset)
Key Features and Highlights
This workflow establishes representative points (medoids) and clustering threshold scores for crop image datasets using two methods: the distance matrix approach and the multimodal embedding model approach. It lays the foundation for subsequent anomaly detection. By leveraging the Qdrant vector database API combined with Python’s Scipy library for sparse matrix computations, it achieves efficient and precise determination of cluster centers and threshold settings.
Core Problem Addressed
How to accurately identify the “central” sample (medoid) and its boundary threshold for each crop category within the dataset, ensuring that subsequent anomaly detection is based on well-founded cluster representatives and thresholds, thereby improving the accuracy and reliability of anomaly identification.
Application Scenarios
- Intelligent agricultural monitoring and anomaly detection: Detecting abnormal crop growth, pests, and diseases through image data
- Clustering analysis preprocessing for machine learning models: Providing accurate representative points and thresholds for downstream models
- Any multi-class image or multimodal data scenario requiring cluster center and threshold determination based on vector databases
Main Workflow Steps
- Manually trigger the workflow, initializing variables such as Qdrant cluster URL and collection name
- Retrieve the total number of data points and distribution information of crop categories within the collection
- Split data by each crop category and call Qdrant’s distance matrix API to obtain similarity matrices between points
- Use Scipy sparse matrix computations to identify medoids based on cosine similarity via the distance matrix method
- Employ the Voyage multimodal embedding model to embed crop textual descriptions and identify medoids via the text embedding method
- Mark the medoid points obtained from both methods back into the Qdrant database with distinct payload tags
- Calculate the furthest (least similar) point from each medoid within its category to determine clustering threshold scores
- Save the threshold scores into the payload of the corresponding medoid points in the Qdrant database
- Complete the configuration of cluster representative points and thresholds, preparing for subsequent anomaly detection
Systems and Services Involved
- Qdrant Cloud: Hosted vector database for storing and querying vector data, providing distance matrix and vector search APIs
- Voyage AI API: Multimodal embedding model interface converting textual descriptions into vectors
- Python Scipy Library: Numerical computations on sparse matrices and medoid determination
- n8n Automation Platform: Integrates triggers, HTTP requests, and code execution nodes to automate the entire workflow
Target Users and Value
- Data Scientists and Machine Learning Engineers: Users needing efficient cluster center and threshold setting within vector databases
- Agricultural Technology Professionals: Researchers and practitioners conducting anomaly detection and analysis based on crop image data
- Automation and Workflow Designers: Users aiming to build complex heterogeneous API calls and data processing pipelines
- Multimodal Data Analysis Developers: Practitioners combining text and image data for clustering analysis
By automating the complex process of setting cluster representative points and thresholds, this workflow enhances the practicality and accuracy of anomaly detection models and adapts to the preprocessing needs of various image and multimodal datasets.