Agent-Skills.md

Agent Skills: PII Sanitizer

Detects and redacts Personally Identifiable Information (PII) like emails, phone numbers, and credit cards. Use when cleaning logs, datasets, or communications to comply with GDPR/CCPA privacy standards.

UncategorizedID: jorgealves/agent_skills/pii-sanitizer

Author

jorgealves

https://github.com/jorgealves View all skills

Repository

jorgealves/agent_skills

jorgealvesLicense: GPL-3.0

1

Install this agent skill to your local

pnpm dlx add-skill https://github.com/jorgealves/agent_skills/tree/HEAD/pii-sanitizer

Skill Files

Browse the full folder contents for pii-sanitizer.

Loading file tree…

pii-sanitizer/SKILL.md

Skill Metadata

Name: pii-sanitizer
Description: Detects and redacts Personally Identifiable Information (PII) like emails, phone numbers, and credit cards. Use when cleaning logs, datasets, or communications to comply with GDPR/CCPA privacy standards.

PII Sanitizer

Purpose and Intent

The pii-sanitizer is a data protection tool designed to identify and mask Personally Identifiable Information (PII) from datasets, logs, or communications to comply with privacy regulations like GDPR and CCPA.

When to Use

Log Scrubbing: Clean application logs before sending them to centralized logging platforms (e.g., ELK, Datadog).
Dataset Preparation: Sanitize production data before using it in staging or training environments.
Customer Support: Mask sensitive info in support tickets before sharing them with engineering teams.

When NOT to Use

Encryption: This is a redaction tool, not an encryption tool. It is for removing data, not securing it for later retrieval.
Structured Database Migration: While it handles some structure, specialized ETL tools are better for massive DB sanitization.

Error Conditions and Edge Cases

False Positives: Strings that resemble PII (like internal serial numbers) might be accidentally redacted.
Ambiguous Context: "Rose" could be a name (PII) or a flower; the tool may err on the side of caution.
Encoding Issues: Ensure input text is UTF-8 to avoid detection failures on special characters.

Security and Data-Handling Considerations

Zero Retention: Input data must never be saved to disk.
Local Processing: Highly recommended to run this within a secure perimeter so sensitive raw data never leaves the local environment.