Generate AI-Ready llms.txt Files from Screaming Frog Website Crawls
This workflow automatically processes CSV files exported from Screaming Frog to generate an `llms.txt` file that meets AI training standards. It supports multilingual environments and features intelligent URL filtering and optional AI text classification, ensuring that the extracted content is of high quality and highly relevant. Users simply need to upload the file to obtain structured data, facilitating AI model training and website content optimization, significantly enhancing work efficiency and the accuracy of data processing. The final file can be easily downloaded or directly saved to cloud storage.

Workflow Name
Generate AI-Ready llms.txt Files from Screaming Frog Website Crawls
Key Features and Highlights
This workflow automatically generates AI training-ready llms.txt
text files based on CSV exports from Screaming Frog website crawls. It supports automatic field adaptation in multilingual environments and includes flexible, extensible URL filtering criteria. An optional AI text classifier can be employed for intelligent content filtering to ensure high quality and relevance of the generated files. The resulting llms.txt
files can be directly downloaded within the n8n interface or seamlessly integrated for automatic upload and storage on cloud drives such as Google Drive and OneDrive.
Core Problems Addressed
Traditional web crawl data is often disorganized and unsuitable for direct use in training large language models (LLMs). This workflow automates the cleaning and filtering of high-quality, indexable page information from websites, producing structured text files that are easy for machine learning models to understand. It significantly reduces manual filtering and formatting efforts while improving the accuracy and efficiency of training data preparation.
Use Cases
- SEO specialists and content strategists needing to quickly generate website content index files to aid content optimization and discovery
- AI developers training custom language models using website crawl data
- Digital marketing teams organizing website structure and content descriptions for automated reporting and analysis
- Multilingual website content management, supporting languages such as French, Italian, German, Spanish, and more
Main Workflow Steps
- Form Trigger: Upload the website name, a brief description, and the Screaming Frog-exported
internal_html.csv
file - Data Extraction: Parse the CSV file to extract seven key fields including URL, title, description, and status code
- URL Filtering: Filter pages with status code 200, that are indexable by search engines, and have an HTML content type
- (Optional) Text Classification: Enable an AI text classifier to intelligently distinguish high-quality content from others based on URL, title, description, and word count
- Format Setting: Generate text lines for each record in the format
- [Title](URL): Description
; omit the colon and description if no description is available - Content Aggregation: Combine all qualifying lines into a complete
llms.txt
file content - File Generation and Download: Produce the final text file, which can be downloaded directly or automatically saved to cloud storage via a replaceable upload node
Involved Systems or Services
- Screaming Frog SEO Spider (website crawler with CSV export)
- n8n Automation Platform (workflow engine)
- OpenAI GPT-4o-mini (optional AI text classification model)
- Cloud Storage Services (e.g., Google Drive, OneDrive; users must configure and replace the upload node accordingly)
Target Users and Value Proposition
- Website administrators and SEO experts: Quickly organize website content structure and improve SEO content filtering efficiency
- AI engineers and data scientists: Build high-quality training corpora to enhance language model performance
- Content operations and digital marketing professionals: Automate content directory generation to support content management and optimization decisions
- Multilingual website operation teams: Automatically adapt fields across languages, simplifying workflows without language barriers
With this workflow, users only need to upload a simple Screaming Frog export file to effortlessly obtain a structured llms.txt
file, greatly enhancing the convenience and accuracy of applying AI to website content.