Testing Multiple Local LLMs with LM Studio
This workflow implements automated testing and performance evaluation of multiple local large language models. It integrates the LM Studio server, supporting dynamic invocation of various models to generate text. Users can guide the model to produce text that meets specific readability standards through custom prompts. Additionally, the workflow includes multiple text analysis metrics that calculate output quality in real-time and automatically save the results to Google Sheets, facilitating subsequent comparisons and data tracking, significantly enhancing the efficiency and accuracy of language model testing.

Workflow Name
Testing Multiple Local LLMs with LM Studio
Key Features and Highlights
This workflow enables automated testing and performance evaluation of multiple local large language models (LLMs) by integrating the LM Studio server. It supports dynamic invocation of each model for text generation responses. Through customizable system prompts, the workflow guides models to produce outputs that meet specific readability standards (e.g., 5th-grade reading level). Built-in text analysis metrics—including word count, sentence count, average sentence length, and readability scores—are automatically calculated. Final results can be synchronized and saved to Google Sheets for convenient batch comparison and data tracking.
Core Problems Addressed
- Automated management and unified invocation of multiple models
- Quantitative analysis and comparison of language model output quality
- Real-time statistics on key indicators such as readability and response time
- Effective storage and visualization of test result data
Application Scenarios
- Performance comparison testing of local models by language model R&D teams
- Evaluation of text readability and clarity in education or content creation domains
- Rapid understanding and comparison of multiple LLM performances by product managers and data analysts
- Any automated workflow requiring batch generation and analysis of text outputs
Main Workflow Steps
- LM Studio Environment Setup: Download, install, and configure the LM Studio server, loading the LLM models to be tested.
- Retrieve Model List: Dynamically fetch all available model IDs on the server via HTTP requests.
- Trigger on Incoming Chat Messages: Listen for external chat message inputs to serve as test prompts.
- Add System Prompts: Automatically inject guiding instructions to ensure concise and readable model outputs.
- Invoke Models for Response Generation: Run text generation individually for each model.
- Record Timestamps: Capture request start and end times to calculate response latency.
- Text Metrics Analysis: Execute custom code nodes to compute word count, sentence count, average sentence length, average word length, and Flesch-Kincaid readability scores.
- Data Preparation and Storage: Organize data and automatically append test results to Google Sheets online spreadsheets.
Involved Systems or Services
- LM Studio: Local LLM server for loading and managing multiple language models
- n8n: Automation platform for scheduling triggers, invoking models, and processing data
- Google Sheets: Online spreadsheet service for storing and displaying test result data
Target Users and Value
- AI Researchers and Developers: Conveniently compare performance and output quality of various locally deployed LLMs
- Content Creators and Editors: Assess text readability to optimize content expression
- Data Analysts and Product Managers: Obtain detailed model response metrics to support decision-making
- Educators: Verify AI-generated text compliance with specific reading level standards
- Automation Engineers: Enhance model testing efficiency and reduce manual operations through automated workflows
By structuring and automating the testing process, this workflow significantly simplifies the complexity of multi-model local testing, providing a scientific basis for performance and text quality comparison, thereby empowering teams to rapidly iterate and optimize language model applications.