Testing Multiple Local LLMs with LM Studio

This workflow is designed to automate the testing and analysis of the performance of multiple large language models locally. By dynamically retrieving the list of models and standardizing system prompts, users can easily compare the output performance of different models on specific tasks. The workflow records request and response times, conducts multi-dimensional text analysis, and structures the results for storage in Google Sheets, facilitating subsequent management and comparison. Additionally, it supports flexible parameter configuration to meet diverse testing needs, enhancing the efficiency and scientific rigor of model evaluation.

Tags

Local LLM TestPerformance Analysis

Workflow Name

Testing Multiple Local LLMs with LM Studio

Key Features and Highlights

This workflow enables fully automated end-to-end testing and performance analysis of multiple large language models (LLMs) deployed locally. Highlights include:

  • Dynamically retrieving and iteratively invoking all loaded models on the local LM Studio server.
  • Standardizing model outputs via a unified system prompt to facilitate comparative evaluation across different models on specific tasks.
  • Automatically capturing request send and response receive timestamps to calculate response latency.
  • Multi-dimensional text analysis, including word count, sentence count, average sentence length, average word length, and Flesch-Kincaid readability scoring.
  • Structuring and automatically saving test results to Google Sheets for convenient aggregation and comparative analysis.
  • Flexible configuration supporting parameters such as temperature, Top P, and presence penalty to meet diverse testing requirements.
  • User guidance through annotations and prompts for quick LM Studio server setup and workflow parameter updates.

Core Problems Addressed

  • Complexity in managing and testing multiple models: Automatically fetches and iterates through all local models, simplifying the testing process.
  • Lack of standardized evaluation for output text quality and readability: Built-in multi-dimensional text analysis algorithms provide quantitative metrics.
  • Dispersed and hard-to-manage test data: Automatically syncs test results to Google Sheets for centralized data management.
  • Difficulty in debugging and reproducing results: Precisely records request and response times to facilitate performance monitoring and issue diagnosis.

Application Scenarios

  • AI researchers and developers comparing performance across different local LLMs.
  • Machine learning engineers tuning local language model parameters and evaluating output quality.
  • Education and content creation sectors assessing readability and conciseness of model-generated text.
  • Enterprises deploying private LLM services for continuous quality monitoring and optimization.
  • Automated workflows requiring batch testing and analysis of model responses.

Main Workflow Steps

  1. LM Studio Server Configuration: Install LM Studio, load required models, and update the server IP address in the workflow.
  2. Chat Message Trigger: Listen for incoming text via webhook to initiate the workflow.
  3. Retrieve Local Model List: Call LM Studio API to obtain the list of currently active model IDs.
  4. Iterative Model Testing: Sequentially send requests to each model using a unified system prompt to standardize response style.
  5. Timestamp Collection: Record request start and end times to compute response latency.
  6. Text Response Analysis: Execute code nodes to calculate word count, sentence count, average sentence length, average word length, and readability scores.
  7. Data Preparation and Saving: Organize all test parameters and analysis results, then automatically append them to a Google Sheets spreadsheet.
  8. Result Review and Reuse: Users can directly view detailed model test reports within Google Sheets.

Involved Systems and Services

  • LM Studio: Locally deployed language model server providing model listings and conversational interfaces.
  • n8n: Automation workflow platform responsible for process control and node orchestration.
  • Google Sheets: Cloud-based spreadsheet service used for storing and managing test data.
  • Webhook: Receives external chat messages to trigger workflow execution.
  • JavaScript Code Nodes: Perform multi-dimensional semantic and readability analysis on text.

Target Users and Value

  • AI Researchers and Data Scientists: Facilitate rapid evaluation and comparison of multiple local models’ text generation quality.
  • Machine Learning Engineers: Assist in debugging and optimizing model parameters to improve performance.
  • Content Review and Editing Teams: Quantify text readability to ensure outputs meet target audience reading levels.
  • Enterprise Technical Teams: Enable automated testing and performance monitoring of private LLM services.
  • Educational and Training Institutions: Assess whether model outputs are suitable for different educational stages.

By providing comprehensive automated testing and analytical capabilities, this workflow significantly lowers the barrier and workload for evaluating multiple local models, enhances efficiency in model selection and optimization, and delivers scientific, systematic insights into model performance for a wide range of users.

Recommend Templates

Telegram RAG PDF

This workflow receives PDF files via Telegram, automatically splits them, and converts the content into vectors stored in the Pinecone database, supporting vector-based intelligent Q&A. Users can conveniently query document information in the chat window, significantly improving the speed and accuracy of knowledge acquisition. It is suitable for scenarios such as enterprise document management, customer support, and education and training, greatly enhancing information retrieval efficiency and user experience.

Telegram Q&AVector Search

Pyragogy AI Village - Orchestrazione Master (Deep Architecture V2)

This workflow is an intelligent orchestration system that efficiently processes and optimizes content using a multi-agent architecture. It dynamically schedules various AI agents, such as content summarization, review, and guidance instructions, in conjunction with human oversight to ensure high-quality output. The system supports content version management and automatic synchronization to GitHub, creating a closed-loop knowledge management process that is suitable for complex document generation and review, enhancing the efficiency of content production and quality assurance in enterprises. This process achieves a perfect combination of intelligence and human supervision.

Multi-Agent OrchestrationContent Automation

[AI/LangChain] Output Parser 4

This workflow utilizes a powerful language model to automatically process natural language requests and generate structured and standardized output data. Its key highlight is the integration of an automatic output correction parser, which can intelligently correct outputs that do not meet expectations, thereby ensuring the accuracy and consistency of the data. Additionally, the workflow defines a strict JSON Schema for output validation, addressing the issue of lack of structure in traditional language model outputs. This significantly reduces the costs associated with manual verification and correction, making it suitable for various automated tasks that require high-quality data.

Structured OutputAuto Correction

Intelligent Text Fact-Checking Assistant

The Intelligent Text Fact-Checking Assistant efficiently splits the input text sentence by sentence and conducts fact-checking, using a customized AI model to quickly identify and correct erroneous information. This tool generates structured reports that list incorrect statements and provide an overall accuracy assessment, helping content creators, editorial teams, and research institutions enhance the accuracy and quality control of their texts. It addresses the time-consuming and labor-intensive issues of traditional manual review and is applicable in various fields such as news, academia, and content moderation.

fact checktext split

RAG AI Agent with Milvus and Cohere

This workflow integrates a vector database and a multilingual embedding model to achieve intelligent document processing and a question-answering system. It can automatically monitor and process PDF files in Google Drive, extract text, and generate vectors, supporting efficient semantic retrieval and intelligent responses. Users can quickly access a vast amount of document information, enhancing the management and query efficiency of multilingual content. It is suitable for scenarios such as enterprise knowledge bases, customer service robots, and automatic indexing and querying in specialized fields.

Vector SearchSmart Q&A

Multi-Agent Conversation

This workflow enables simultaneous conversations between users and multiple AI agents, supporting personalized configurations for each agent's name, instructions, and language model. Users can mention specific agents using @, allowing the system to dynamically invoke multiple agents, avoiding the creation of duplicate nodes, and supporting multi-turn dialogue memory to enhance the coherence of interactions. It is suitable for scenarios such as intelligent Q&A, decision support, and education and training, meeting complex and diverse interaction needs.

Multi-agentMulti-turn Dialogue

Intelligent Q&A and Citation Generation Based on File Content

This workflow achieves efficient information retrieval and intelligent Q&A by automatically downloading specified files from Google Drive and splitting their content into manageable text blocks. Users can ask questions through a chat interface, and the system quickly searches for relevant content using a vector database and OpenAI models, generating accurate answers along with citations. This process significantly enhances the efficiency of document information acquisition and the credibility of answers, making it suitable for various scenarios such as academic research, enterprise knowledge management, and customer support.

Intelligent QAVector Search

Daily Cartoon (w/ AI Translate)

This workflow automatically retrieves "Calvin and Hobbes" comics daily, extracts image links, and uses AI to translate the comic dialogues into English and Korean. Finally, the comics, complete with original text and translations, are automatically pushed to a Discord channel, allowing users to access the latest content in real time. This process eliminates the hassle of manually visiting websites and enables intelligent sharing of multilingual comics, making it suitable for comic enthusiasts, content operators, and language learners.

comic scrapingAI translation