ETL Pipeline
This workflow implements an automated ETL data pipeline that regularly scrapes tweets on specific topics from Twitter, performs sentiment analysis, and stores the data in MongoDB and Postgres databases. The analysis results are filtered and pushed to a Slack channel, allowing the team to receive important information in real time. This process effectively avoids the tedious task of manually monitoring social media, improves data processing efficiency, and supports quick responses to market dynamics and brand reputation management.

Workflow Name
ETL Pipeline
Key Features and Highlights
This workflow implements an automated ETL (Extract-Transform-Load) data pipeline that captures tweets on specific topics from Twitter, performs sentiment analysis, stores the data in both MongoDB and Postgres databases, and finally pushes important information to a designated Slack channel based on the analysis results. The process is highly efficient and automated, integrating social media data collection, natural language processing, and multi-database storage, supporting real-time monitoring and team collaboration.
Core Problems Addressed
- Automatically fetches the latest tweets tagged with #OnThisDay on Twitter, eliminating the need for manual social media monitoring.
- Utilizes Google Cloud Natural Language API for sentiment analysis, quickly identifying the emotional tone and intensity of tweets.
- Implements dual data storage (MongoDB and Postgres) to meet diverse business requirements for data structure and querying.
- Filters tweets with high sentiment scores through conditional logic and automatically pushes important content to Slack, enhancing team responsiveness.
Use Cases
- Social media sentiment monitoring and analysis
- Brand reputation management and crisis alerting
- Marketing campaign performance tracking
- Data-driven content recommendation and insights
- Internal corporate information flow and collaboration alerts
Main Workflow Steps
- Scheduled Trigger (Cron): Automatically initiates data extraction daily at 6 AM.
- Twitter Search: Retrieves the latest 3 tweets containing the #OnThisDay hashtag.
- Store in MongoDB: Saves raw tweet texts into a MongoDB collection.
- Sentiment Analysis: Calls Google Cloud Natural Language API to score sentiment and analyze intensity of tweet texts.
- Set Data Fields: Extracts and encapsulates sentiment score, magnitude, and tweet text.
- Store in Postgres: Inserts the processed data into the
tweets
table in the Postgres database. - Conditional Check: Evaluates whether the sentiment score is greater than 0 (indicating positive sentiment).
- Information Push: If positive sentiment, automatically sends the tweet and sentiment data to a specified Slack channel; otherwise, skips pushing.
Involved Systems and Services
- Twitter API: For real-time social media data extraction.
- MongoDB: NoSQL database used to store raw tweet texts.
- Google Cloud Natural Language API: Provides text sentiment analysis services.
- Postgres Database: Relational database used to store structured tweet and sentiment data.
- Slack: Team communication tool used to push analysis result notifications.
- n8n Automation Platform: Connects and orchestrates various system nodes to achieve workflow automation.
Target Users and Value
- Social media analysts and marketing teams: Automatically acquire and analyze trending tweets to quickly respond to market dynamics.
- Data engineers and developers: Demonstrates multi-source data integration and automated workflow construction.
- Business managers and decision-makers: Gain insights into customer feedback and public opinion trends through sentiment data to support decision-making.
- Content planners and PR personnel: Monitor brand-related tweets in real time and adjust strategies promptly.
- Any organizations or teams needing to transform social media data into structured intelligence.
This ETL pipeline workflow automates the entire loop from data acquisition, sentiment analysis, storage, to team notification, significantly enhancing data processing efficiency and the value conversion of information.