Agent with Custom HTTP Request

This workflow combines intelligent AI agents with the OpenAI GPT-4 model to achieve automatic web content scraping and processing. After the user inputs a chat message, the system automatically generates HTTP request parameters, retrieves web content from a specified URL, performs deep cleaning of the HTML, and finally outputs it in Markdown format. It supports both complete and simplified scraping modes, intelligently handles request errors, and provides feedback and suggestions. This workflow is suitable for content monitoring, information collection, and AI question-answering systems, enhancing information retrieval efficiency and reducing manual intervention.

Web ScrapingContent Cleaning

Workflow Name

Key Features and Highlights

This workflow leverages an intelligent AI Agent (ReAct AI Agent) combined with the OpenAI GPT-4 model to process user-initiated chat messages. It intelligently generates query parameters conforming to HTTP request formats, fetches web content from specified URLs, and performs deep cleaning and simplification of the webpage HTML. The cleaned content is then converted into Markdown format for output. It supports two content extraction modes (full and simplified) and automatically handles request errors by providing appropriate feedback and adjustment suggestions.

Core Problems Addressed

Automates web content scraping and intelligent extraction of relevant information, eliminating the complexity and inefficiency of manual webpage parsing.
Utilizes an AI Agent to guide the construction of request parameters, reducing API call complexity and error rates.
Cleans redundant scripts, styles, multimedia tags, and other irrelevant elements from webpages to minimize noise.
Implements content length restrictions to prevent overly long content from affecting subsequent processing and storage.
Offers a simplified mode to further compress content size, catering to diverse application scenarios.

Application Scenarios

Automated workflows requiring web content scraping and intelligent organization, such as content monitoring, information gathering, and pre-processing for data analysis.
AI Q&A systems that enhance response accuracy by integrating real-time web data.
Developers or business users needing quick access to concise webpage text for further processing or display.
Automated customer service or knowledge management systems that automatically update webpage content summaries in the backend.

Main Workflow Steps

Listen for user-triggered manual chat messages (On new manual Chat Message).
Use the ReAct AI Agent to process input and generate HTTP request parameters (e.g., URL and method).
Parse query parameters and set content length limits (CONFIG node).
Send HTTP requests to retrieve webpage HTML.
Check for request errors; if any, generate appropriate error messages.
Extract content within the webpage’s tag.
Remove
Based on the method parameter, decide whether to simplify content by removing links and image URLs.
Convert the cleaned HTML into Markdown format.
Verify content length and return an error message if it exceeds limits.
Output the final page content for downstream use.

Involved Systems or Services

OpenAI GPT-4 (invoked via OpenAI Chat Model node)
Custom HTTP requests (n8n built-in HTTP Request node)
n8n Langchain plugin (ReAct AI Agent and related tool nodes)
Markdown conversion node (converts HTML to Markdown)

Target Users and Value

Automation developers and technical professionals building intelligent content collection and processing tools.
Content operators and data analysts seeking rapid access to structured webpage text.
AI application developers enhancing intelligent Q&A and knowledge bases with real-time web data.
Enterprises and teams aiming to improve information acquisition efficiency and reduce manual intervention.

This workflow integrates AI intelligence with web data scraping, significantly enhancing the automation and intelligence of content acquisition. It serves as a foundational component for building modern information service platforms.

Recommend Templates

News Extraction

This workflow automatically scrapes the latest content from specified news websites, extracting the publication time, title, and body of the news articles. It then uses AI technology to generate summaries and key technical keywords for each news item, ultimately storing the organized data in a database. This process enables efficient monitoring and analysis of news sources without RSS feeds, making it suitable for various scenarios such as media monitoring, market research, and content management, significantly enhancing the efficiency and accuracy of information retrieval.

News CollectionSmart Summary

News Extraction

This workflow can automatically scrape the latest news articles from specified news websites without relying on RSS subscriptions. It regularly extracts article links, publication dates, titles, and body content, and uses the GPT-4 model to generate brief summaries and extract key technical keywords. The organized structured data will be stored in a NocoDB database, facilitating subsequent retrieval and analysis, significantly improving the efficiency of news monitoring and content management, making it suitable for use by businesses, media, and data analysts.

news scrapingsmart summary

Open Deep Research - AI-Powered Autonomous Research Workflow

This workflow utilizes AI language models and various data sources to achieve automated deep information retrieval and research report generation. After the user inputs a query, the system generates precise search keywords, conducts web searches using SerpAPI, and combines content analysis with Jina AI, ultimately integrating the results into a structured research report. This process enhances research efficiency, ensures the coherence and accuracy of information extraction, and is applicable in scenarios such as academic research, market research, content creation, and corporate decision-making, helping users quickly obtain high-quality materials.

AI ResearchDeep Study

Make OpenAI Citation for File Retrieval RAG

This workflow integrates an intelligent assistant and vector storage, aiming to achieve smart Q&A after document retrieval and automatically add literature citations to the retrieved content. Users can format the output results as Markdown or HTML, facilitating the generation of professional documents with dynamic citation numbers, thereby enhancing the credibility and traceability of the information. It is suitable for fields such as research, education, and law, addressing issues of missing citations and strange characters in answers, and helping users efficiently generate standardized documents.

File SearchAuto Citation

Load Prompts from GitHub Repo and Auto-Populate n8n Expressions

This workflow is capable of automatically loading text prompt files from a specified GitHub repository, extracting and replacing variable placeholders, and generating complete prompt content for use by AI models. It features a variable validation mechanism to ensure that all required variables are correctly assigned, preventing errors and improving efficiency. Additionally, by integrating the Ollama chat model and LangChain AI Agent, it achieves full-process automation from text prompts to intelligent responses, making it suitable for various scenarios that require dynamic content generation.

Prompt ManagementAI Text Generation

Daily AI News Translation & Summary with GPT-4 and Telegram Delivery

This workflow automatically fetches the latest artificial intelligence news from mainstream news APIs at a scheduled time every day. It then filters, summarizes, and translates the news into Traditional Chinese using advanced AI models. Finally, the organized news summaries are promptly pushed to designated Telegram chat groups or channels, helping users efficiently access cutting-edge AI information. This solution addresses the cumbersome issues of manual searching and translation, ensuring the timeliness and continuity of information, making it suitable for various AI industry professionals and general users.

AI NewsAuto Translation

SearchApi Youtube Video Summary

This workflow automatically extracts the transcription text from a YouTube video by inputting the video ID and performs intelligent summarization. After obtaining the text using the SearchApi, it undergoes multiple steps of splitting and content merging, combined with the OpenAI GPT-4 model to generate a concise summary. This process effectively addresses the challenge of quickly extracting key information from long videos, making it suitable for content creators, educators, and market researchers, significantly improving the efficiency and accuracy of information retrieval.

Video SummarySmart Transcription

Image to License Plate Number

This workflow can automatically identify and extract license plate numbers from uploaded vehicle images, directly returning clean license plate characters, eliminating the need for manual input by users. By integrating advanced large language models, it significantly improves the efficiency and accuracy of license plate recognition, streamlining the traditional license plate extraction process. It is applicable in various scenarios such as traffic management, parking lots, and logistics monitoring, helping users achieve rapid automated collection of vehicle information, enhance management intelligence, and save time and labor costs.

License Plate RecognitionLarge Language Model