[2/3] Set up Medoids (2 Types) for Anomaly Detection (Crops Dataset)

This workflow is primarily used for clustering analysis in agricultural crop image datasets. It automates the setting of representative center points (medoids) for clustering and their threshold scores to support subsequent anomaly detection. By combining traditional distance matrix methods with multimodal text-image embedding techniques, it accurately locates clustering centers and calculates reasonable thresholds, enhancing the effectiveness of anomaly detection. It is suitable for applications in the agricultural field, such as pest and disease identification and anomaly warning, ensuring efficient and accurate data processing.

Tags

Anomaly DetectionMultimodal Clustering

Workflow Name

[2/3] Set up Medoids (2 Types) for Anomaly Detection (Crops Dataset)

Key Features and Highlights

This workflow is designed to establish representative center points (medoids) and their threshold scores for clusters within a crop image dataset, serving as the foundation for subsequent anomaly detection. It employs two approaches—traditional medoid selection based on distance matrices and semantic matching using a multimodal embedding model—to accurately locate cluster centers and set thresholds for various crop clusters. By integrating the Qdrant vector database with the Voyage AI multimodal embedding API, the workflow automates clustering analysis and threshold calculation, enhancing the precision and efficiency of anomaly detection.

Core Problems Addressed

  • How to accurately identify the representative medoid of each cluster to serve as a baseline for anomaly detection.
  • How to establish reasonable threshold scores to differentiate normal samples from anomalies.
  • How to overcome the limitations of relying on single-feature representations by combining image and text multimodal information for improved representativeness of cluster centers.
  • Providing an automated and reusable process adaptable to different crop categories and other image datasets.

Application Scenarios

  • Anomaly detection in agricultural crop images, such as pest and disease identification or abnormal growth alerts.
  • Any scenario requiring identification of cluster medoids and preparation for anomaly detection based on multimodal image and text data.
  • Large-scale vector data management and analysis using the Qdrant vector database.

Main Workflow Steps

  1. Manually trigger the workflow to initialize variables and parameters.
  2. Retrieve the total number of points and clustering information for the crop dataset from Qdrant, obtaining crop categories and their data volumes.
  3. For each crop category, execute the following:
    • Distance Matrix Method: Call Qdrant’s distance matrix API to compute similarity matrices within clusters, then use Scipy sparse matrix computations to determine the most representative medoid.
    • Multimodal Embedding Method: Generate text vectors from hardcoded textual descriptions via the Voyage AI multimodal embedding API, then query Qdrant with these text vectors to find the image point best matching the description as the medoid.
  4. Mark the identified medoids back in Qdrant with the tags “is_medoid” and “is_text_anchor_medoid” respectively.
  5. Calculate threshold scores for each medoid based on cosine similarity distances to the least similar points, and write these thresholds back to the cluster centers for anomaly detection use.
  6. Complete medoid and threshold setup for all categories, providing a data foundation for subsequent anomaly detection workflows.

Involved Systems and Services

  • Qdrant Cloud: Vector database storing crop image vectors and clustering data, offering distance matrix and vector search APIs.
  • Voyage AI API: Provides multimodal text-image embedding services for medoid localization based on textual descriptions.
  • n8n Automation Workflow Platform: Orchestrates API calls and coordinates data flow and processing.
  • Python Code Node (Scipy): Performs sparse distance matrix computations to assist in medoid determination.

Target Users and Value

  • Data scientists and machine learning engineers: Quickly build anomaly detection preprocessing pipelines based on vector databases.
  • Agricultural technology researchers: Establish infrastructure for automated anomaly recognition in crop images.
  • AI developers and automation engineers: Leverage the n8n platform for multi-system integration and automated data processing.
  • Industry users requiring anomaly detection based on cluster centers, especially in complex multimodal data scenarios.

This workflow serves as a critical intermediate step that, combined with data upload and anomaly detection workflows, forms a complete solution for crop image anomaly detection. By flexibly leveraging Qdrant and multimodal APIs, and balancing traditional distance matrix analysis with modern semantic understanding, it significantly improves the accuracy of cluster center identification and the rationality of thresholds, providing a solid foundation for anomaly detection.

Recommend Templates

FileMaker Data Contacts Extraction and Processing Workflow

This workflow effectively extracts and processes contact information by automatically calling the FileMaker Data API. It can parse complex nested data structures and standardize contact data, facilitating subsequent analysis, synchronization, and automation. It is suitable for scenarios such as enterprise customer relationship management and marketing campaign preparation, significantly enhancing data processing efficiency, reducing manual intervention, and helping users easily manage and utilize contact information, thereby strengthening digital operational capabilities.

FileMakerContact Extraction

Customer Data Synchronization to Google Sheets

This workflow automatically extracts information from the customer data repository, formats it, and synchronizes it to Google Sheets for efficient data management. Field adjustments are made through the "Set" node to ensure the data meets requirements, avoiding errors that may occur during manual operations. This process addresses the issues of scattered customer data and inconsistent formatting, making it suitable for marketing and customer service teams. It helps them update and maintain customer information in real-time, enhancing data accuracy and operational efficiency.

Customer Data SyncGoogle Sheets

Automated Collection and Consolidation of Recent Startup Financing Information

This workflow automates the collection and organization of startup financing information, retrieving the latest Seed, Series A, and Series B financing events from Piloterr on a daily schedule. Through multi-step data processing, key financing information is integrated and updated in Google Sheets, allowing users to view and manage it in real time. This automation process significantly enhances the efficiency and accuracy of data updates, helping investors and entrepreneurial service organizations quickly grasp market dynamics and saving a substantial amount of human resources.

Funding CollectionAutomated Management

Bubble Data Access

This workflow is manually triggered and automatically sends secure HTTP requests to the Bubble application's API, conveniently accessing and retrieving user data. It is designed to help non-technical users and business personnel quickly and securely extract the required information without the need to write code, simplifying the data acquisition process and enhancing work efficiency. It is suitable for scenarios such as data analysis, user management, and CRM system integration.

Bubble Data AccessAPI Automation

Spot Workplace Discrimination Patterns with AI

This workflow automatically scrapes employee review data from Glassdoor and utilizes AI for intelligent analysis to identify patterns of discrimination and bias in the workplace. It calculates the rating differences among different groups and generates intuitive charts to help users gain a deeper understanding of the company's diversity and inclusion status. This tool is particularly suitable for human resources departments, research institutions, and corporate management, as it can quickly identify potential unfair practices and promote a more equitable and inclusive work environment.

Workplace DiscriminationDiversity Management

Automated Extraction of University Semester Important Dates and Calendar Generation Workflow

This workflow automatically downloads an Excel file containing semester dates from the university's official website. It utilizes Markdown conversion services and large language models to extract key events and dates, generating a calendar file that complies with the ICS standard. Finally, the system sends the calendar file as an email attachment to designated personnel, significantly reducing the time and errors associated with manually organizing semester schedules, thereby enhancing the efficiency of academic administration in higher education. It is particularly suitable for students, teachers, and teams for time management and information sharing.

Semester DatesAuto Calendar

Moving Metrics from Google Sheets to Orbit

This workflow automatically synchronizes community members and their activity data from Google Sheets to the Orbit platform. By intelligently matching GitHub usernames, the workflow can update member information and associate activities in real-time, reducing the complexity and errors of manual operations. It is suitable for teams that need to regularly analyze community data, enhancing data consistency and operational efficiency, making it particularly beneficial for community operations managers and data analysts.

Google Sheets SyncOrbit Community Management

AI SQL Agent for Data Analysis and Visualization

This workflow utilizes intelligent SQL query agents and automated chart generation technology to facilitate efficient interaction between natural language questions and databases. Users do not need to have SQL knowledge; they can ask questions directly, and the system will automatically generate the appropriate SQL queries and determine whether chart assistance is needed for display. By combining text answers with graphical presentations, it simplifies the data analysis process and enhances the data insight capabilities of non-technical users, making it particularly suitable for scenarios such as business analysis, sales trends, team collaboration, and educational training.

Smart SQLData Visualization