MASK Action
The MASK action in ByteSizer allows users to anonymize or obfuscate sensitive data fields using a variety of masking techniques. It operates on multiple columns of a Dask DataFrame, with each column configurable to use a different masking method.
How It Works
The MASK action is configured in the workflow YAML file. Example:
- id: mask
action: MASK
parameters:
fields:
name:
type: faker
provider: name
email:
type: hashlib
algo: sha256
ssn:
type: ff3
key: "0123456789abcdef0123456789abcdef"
tweak: "abcdef9876543210"
radix: 10
phone:
type: pseudonym
salt: "mysalt"
token:
type: fernet
key: "" Supported Masking Techniques
1. Hashlib
Description: Hashes the value using standard hashing algorithms.
Library: hashlib
algo: Algorithm (e.g., sha256, sha512, md5)
2. Fernet
Description: Encrypts the value using Fernet symmetric encryption.
Library: cryptography (Fernet)
key: Base64-encoded 32-byte key
3. Faker
Description: Replaces the value with realistic fake data such as names, addresses, emails.
Library: Faker
provider: Fake data type (e.g., name, address, email)
4. FF3 (Format-Preserving Encryption)
Description: Applies format-preserving encryption, useful for numeric or string fields.
Library: FFX/FF3
key: Hex-encoded encryption keytweak: Hex-encoded tweak valueradix: Numerical base (default: 10)
5. Pseudonym
Description: Deterministic pseudonymization using salted hashing.
salt: Optional salt string
Summary
The MASK action is flexible and powerful, supporting multiple anonymization strategies. It can be tailored for:
- Generating anonymized test data
- Obfuscating PII for analytics
- Ensuring GDPR/CCPA compliance