Remove PII from CSV Files (Automated Sensitive Data Cleanup for CSV Files)

This workflow automatically monitors a Google Drive folder for new CSV files, downloads them, and extracts their content. It uses artificial intelligence to intelligently identify columns containing personally identifiable information (PII) in the files and automatically removes this sensitive information through custom code. Finally, the desensitized CSV files are re-uploaded. This process significantly enhances the efficiency and accuracy of data desensitization, helping users comply with sensitive data handling regulations and effectively mitigating the risk of privacy breaches. It is suitable for corporate data sharing and legal compliance needs.

Workflow Diagram
Remove PII from CSV Files (Automated Sensitive Data Cleanup for CSV Files) Workflow diagram

Workflow Name

Remove PII from CSV Files (Automated Sensitive Data Cleanup for CSV Files)

Key Features and Highlights

This workflow automatically monitors a specified Google Drive folder and, upon detecting new CSV file uploads, downloads the files and extracts their contents. Leveraging OpenAI’s intelligence, it identifies columns containing Personally Identifiable Information (PII). Subsequently, a custom code node removes these sensitive columns from the data. Finally, the sanitized CSV files are re-uploaded to a designated Google Drive folder, enabling intelligent PII detection and rapid data anonymization for CSV files.

Core Problem Addressed

Manually identifying and removing PII during data sharing and processing is time-consuming and prone to errors. This workflow uses AI to automatically detect PII columns, significantly improving the efficiency and accuracy of data anonymization. It helps organizations and individuals comply with data privacy regulations and mitigates the risk of potential privacy breaches.

Use Cases

  • Automated sensitive data cleanup before enterprise data sharing
  • Customer data anonymization to meet legal compliance requirements
  • Privacy protection prior to data analysis and processing
  • Batch processing of large volumes of CSV files and automation workflow development

Main Process Steps

  1. Google Drive Trigger: Real-time monitoring of newly uploaded CSV files in a specified folder.
  2. Retrieve Filename and Download File: Automatically extract the filename and download the file content.
  3. Extract File Content: Parse the tabular data from the CSV file.
  4. Invoke OpenAI Model: Intelligently analyze the header row to identify columns containing PII.
  5. Merge Data: Combine file content, filename, and identified PII column information.
  6. Sensitive Data Cleanup Code Node: Remove PII columns based on detection results and generate the anonymized CSV data.
  7. Upload to Google Drive: Upload the sanitized file with a new filename back to the specified folder, completing the fully automated workflow.

Systems and Services Involved

  • Google Drive: File monitoring, downloading, and uploading operations.
  • OpenAI GPT-4o-mini Model: Intelligent identification of PII columns within CSV files.
  • n8n Custom Code Node: Removal of sensitive columns and CSV format conversion.

Target Users and Value

  • Data Security and Compliance Teams: Automate sensitive data anonymization to reduce manual workload.
  • Data Analysts and Data Engineers: Quickly obtain clean datasets free of sensitive information.
  • Enterprise IT and Automation Developers: Build data processing workflows compliant with GDPR and other privacy regulations.
  • Any organizations or individuals needing to batch process and share CSV data, enhancing data handling efficiency and security.