[2/3] Set up medoids (2 types) for anomaly detection (crops dataset)

This workflow establishes clustering representative points and thresholds for crop image datasets using two methods, providing a foundation for anomaly detection. It utilizes vector database APIs and Python libraries for sparse matrix calculations, ensuring the efficient and accurate determination of cluster centers and thresholds. This approach is applicable in various scenarios such as agricultural smart monitoring and preprocessing for machine learning models, significantly enhancing the accuracy and reliability of anomaly detection while simplifying the complex clustering analysis process.

Anomaly DetectionCluster Centroid

Workflow Name

Key Features and Highlights

This workflow establishes representative points (medoids) and clustering threshold scores for crop image datasets using two methods: the distance matrix approach and the multimodal embedding model approach. It lays the foundation for subsequent anomaly detection. By leveraging the Qdrant vector database API combined with Python’s Scipy library for sparse matrix computations, it achieves efficient and precise determination of cluster centers and threshold settings.

Core Problem Addressed

How to accurately identify the “central” sample (medoid) and its boundary threshold for each crop category within the dataset, ensuring that subsequent anomaly detection is based on well-founded cluster representatives and thresholds, thereby improving the accuracy and reliability of anomaly identification.

Application Scenarios

Intelligent agricultural monitoring and anomaly detection: Detecting abnormal crop growth, pests, and diseases through image data
Clustering analysis preprocessing for machine learning models: Providing accurate representative points and thresholds for downstream models
Any multi-class image or multimodal data scenario requiring cluster center and threshold determination based on vector databases

Main Workflow Steps

Manually trigger the workflow, initializing variables such as Qdrant cluster URL and collection name
Retrieve the total number of data points and distribution information of crop categories within the collection
Split data by each crop category and call Qdrant’s distance matrix API to obtain similarity matrices between points
Use Scipy sparse matrix computations to identify medoids based on cosine similarity via the distance matrix method
Employ the Voyage multimodal embedding model to embed crop textual descriptions and identify medoids via the text embedding method
Mark the medoid points obtained from both methods back into the Qdrant database with distinct payload tags
Calculate the furthest (least similar) point from each medoid within its category to determine clustering threshold scores
Save the threshold scores into the payload of the corresponding medoid points in the Qdrant database
Complete the configuration of cluster representative points and thresholds, preparing for subsequent anomaly detection

Systems and Services Involved

Qdrant Cloud: Hosted vector database for storing and querying vector data, providing distance matrix and vector search APIs
Voyage AI API: Multimodal embedding model interface converting textual descriptions into vectors
Python Scipy Library: Numerical computations on sparse matrices and medoid determination
n8n Automation Platform: Integrates triggers, HTTP requests, and code execution nodes to automate the entire workflow

Target Users and Value

Data Scientists and Machine Learning Engineers: Users needing efficient cluster center and threshold setting within vector databases
Agricultural Technology Professionals: Researchers and practitioners conducting anomaly detection and analysis based on crop image data
Automation and Workflow Designers: Users aiming to build complex heterogeneous API calls and data processing pipelines
Multimodal Data Analysis Developers: Practitioners combining text and image data for clustering analysis

By automating the complex process of setting cluster representative points and thresholds, this workflow enhances the practicality and accuracy of anomaly detection models and adapts to the preprocessing needs of various image and multimodal datasets.

Recommend Templates

Google Analytics: Weekly Report

This workflow automates the generation of weekly Google Analytics data reports, focusing on comparing key performance indicators from the last 7 days with the same period last year. Utilizing AI technology for intelligent analysis and formatting, the reports can be pushed through multiple channels, including email and Telegram, helping users save time, gain insights into trends, and enhance report quality. It is suitable for website operations teams, data analysts, and management, supporting informed decision-making and efficient communication.

Google AnalyticsAutomated Reports

Hacker News Comment Clustering and Insight Generation Workflow

This workflow automatically fetches all comments for specified stories from Hacker News and stores the comment text vectors in a vector database. It clusters the comments using the K-means algorithm and utilizes the GPT-4 model to generate content summaries and sentiment analysis. Finally, the analysis results are exported to Google Sheets. This process efficiently handles a large volume of comments, helping users identify community hot topics and extract valuable feedback, making it suitable for various scenarios such as community management, product optimization, and data analysis.

comment clusteringsentiment analysis

SERPBear Analytics Template

This workflow automatically retrieves keyword ranking data through scheduled or manual triggers and uses custom code for trend analysis. The analyzed data is then sent to an artificial intelligence model for in-depth analysis, and the final results are stored in a low-code database for easier management and viewing. It integrates data collection, intelligent analysis, and result storage, enhancing the efficiency of SEO monitoring and optimization, making it suitable for SEO teams, digital marketers, and website administrators.

SEO AutomationSmart Analytics

AI Agent to Chat with Your Search Console Data Using OpenAI and Postgres

This workflow implements an intelligent chat agent by integrating the OpenAI GPT-4o language model with a Postgres database, allowing users to interact with Google Search Console data using natural language. It automatically parses user requests, generates corresponding API queries, and returns data in the form of Markdown tables. This tool simplifies the data access process and enhances user experience, making it suitable for website operators, SEO experts, and data analysts, enabling them to quickly obtain and analyze website performance data.

Smart ChatSearch Console

🤖🧑‍💻 AI Agent for Top n8n Creators Leaderboard Reporting

This workflow automates the collection and analysis of active data from top community creators and their workflows, generating detailed statistical reports. Utilizing advanced AI language models in conjunction with GitHub data, it provides clear reports in Markdown format, supporting various distribution methods such as email and cloud storage. This systematic process helps community managers efficiently identify outstanding creators, promotes knowledge sharing, enhances user experience, and drives the promotion and application of automated processes.

n8n StatsAI Reports

AI-Powered Customer Data Query Agent

This workflow integrates AI technology with Google Sheets to enable intelligent customer data querying and analysis. Users can ask questions in natural language, and the AI agent will interpret the intent and invoke the appropriate tools to accurately return customer information, avoiding the inefficiencies and errors of traditional manual queries. The platform supports quick retrieval of column names, specified column values, and complete customer data, enhancing response speed and accuracy. It is suitable for various scenarios such as customer service, sales, and data analysis, simplifying data operations and lowering the usage threshold.

Customer DataSmart Assistant

Convert Parquet, Avro, ORC & Feather via ParquetReader to JSON

This workflow receives files in Parquet, Avro, ORC, or Feather format via Webhook and uses an online API to convert them into JSON format. It automates the processing of complex binary big data files, simplifies data preprocessing, lowers the technical barrier, and is suitable for data analysis, ETL processes, and development teams, enhancing data utilization efficiency. Users can easily upload files and quickly obtain the parsed JSON data, supporting various application scenarios and facilitating data-driven decision-making and system integration.

Big Data ETLJSON Parsing

Automated User Research Insight Analysis Workflow

This workflow automates the processing of user research data by importing survey responses from Google Sheets, generating text vectors using OpenAI, and storing them in the Qdrant database. It identifies major groups through the K-means clustering algorithm and utilizes large language models to perform intelligent summarization and sentiment analysis on the group responses. Finally, the insights are automatically exported back to Google Sheets, creating a structured research report. This process enhances analysis efficiency and helps decision-makers quickly gain deep insights.

Research AnalysisSentiment Insight