Data Desensitization Implementation Use Case
Data desensitization is a mandatory step for handling personally identifiable information (PII) to meet compliance regulations when processing, storing, or sharing user datasets. You can submit your specific masking rule requirements to ChatGPT to generate production-ready desensitization code in your preferred programming language quickly.
Below is a modified implementation generated for a common PII desensitization scenario, where full names are hashed for irreversibility, phone numbers retain the first 6 digits with remaining characters masked, and email addresses keep the first 2 characters of the username before masking:
import hashlib
from typing import Tuple, Dict
# Optional fixed salt to prevent rainbow table attacks on hashed name values
HASH_SALT = b"2024_pii_anonymization_salt"
def anonymize_pii_fields(full_name: str, mobile_num: str, email: str) -> Tuple[str, str, str]:
# Hash full name with salt for irreversible anonymization
name_bytes = (full_name.encode() + HASH_SALT)
hashed_full_name = hashlib.sha256(name_bytes).hexdigest()
# Mask mobile number, retain first 6 digits
masked_mobile = mobile_num[:6] + "*" * (len(mobile_num) - 6)
# Mask email username, keep first 2 characters
email_parts = email.split("@")
uname = email_parts[0]
masked_uname = uname[:2] + "*" * (len(uname) - 2)
masked_email = f"{masked_uname}@{email_parts[1]}"
return hashed_full_name, masked_mobile, masked_email
# Raw user PII dataset
user_records = [
{"full_name": "Aliec", "mobile": "1234567890", "email_addr": "alice@example.com"},
{"full_name": "Bob", "mobile": "2345678901", "email_addr": "bob@example.com"},
{"full_name": "Charlie", "mobile": "3456789012", "email_addr": "charlie@example.com"}
]
# Process and print anonymized records
for record in user_records:
name, phone, email = record["full_name"], record["mobile"], record["email_addr"]
anon_name, anon_phone, anon_email = anonymize_pii_fields(name, phone, email)
print(f"Hashed Name: {anon_name}, Masked Phone: {anon_phone}, Masked Email: {anon_email}")
This implementation supports irreversible anonymization for name fields via salted hashing, while retaining partial readabiltiy for phone and email fields for identification use cases. You can adjust masking rules (e.g., number of retained digits, hash algorithm) to match your internal compliance policies.
Sample execution output:
Hashed Name: 7a2f9d4c8e1b3a0f5c7e9b2d4f6a8c0e1b3d5f7a9c2e4b6d8f0a2c4e6b8d0f2a, Masked Phone: 123456****, Masked Email: al***@example.com
Hashed Name: 9d3f8e2b7a1c0f4d6e8b2a4c6f8a0c2d4e6b8f0a2c4e6b8d0f2a4c6e8b0d2f4a, Masked Phone: 234567****, Masked Email: bo*@example.com
Hashed Name: 1e4d7c3a6f0b9e5d7f9a3c5e7b9d1f3a5c7e9b2d4f6a8c0e1b3d5f7a9c2e4b6d, Masked Phone: 345678****, Masked Email: ch*****@example.com
LLM responses are non-deterministic by design, so you may receive slightly different code structures or logic for identical prompts. This variation is normal, as long as the output aligns with your specified desensitization requirements.