cloakd.app
A developer-first privacy API that scrubs PII, faces, financial data, and confidential information from any file type before it reaches an AI model. EU-native. GDPR-by-design.
Architecture
cloakd.app runs as a single Docker container behind nginx on a Hetzner VPS in Frankfurt. All processing happens in-memory or in /tmp. No input files are persisted. Output files auto-delete after 1 hour.
Processing Modes
| Mode | File Types | Response |
|---|---|---|
| Synchronous | Text, PDF, DOCX, Images | Immediate RedactResponse with redacted file/text |
| Asynchronous | Video, Audio | Immediate JobStatus with job_id — poll for result |
Providers & Engines
cloakd.app combines multiple specialized engines to detect and redact different types of sensitive data:
How They Connect
Pricing
Simple, transparent pricing. No hidden fees. All plans include every file type and redaction engine. EU-hosted, GDPR-compliant.
- 50 requests / month
- All file types
- All redaction engines
- Community support
- Audio / video redaction
- Priority processing
- Unlimited requests
- All file types
- All redaction engines
- Audio + video redaction
- Email support
- Priority processing
- Unlimited requests
- All file types
- All redaction engines
- Audio + video redaction
- Priority support
- Priority processing
- Everything in Pro
- SLA guarantee
- Dedicated infrastructure
- On-premise deployment
- Custom entity types
- Volume discounts
Authentication
All redaction requests require a Bearer token in the Authorization header. API keys are created via the admin endpoint using the master key.
Two types of keys
| Key Type | Purpose | Who Uses It |
|---|---|---|
| Master API Key | Create new API keys via POST /v1/keys | You (admin only) |
| API Key | Call POST /v1/redact and other endpoints | Your customers / developers |
API keys are hashed with bcrypt before storage. The plaintext key is returned only once at creation time and cannot be retrieved again.
cloakd.db). Each key tracks: owner_email, created_at, request_count, is_active.
POST /v1/redact
The core endpoint. Accepts a file or text, detects and redacts all sensitive data, returns the cleaned output with a redaction report.
Option A: File Upload (multipart)
RequestPOST /v1/redact
Authorization: Bearer <api_key>
Content-Type: multipart/form-data
-- Form fields --
file: <binary file data>
options: '{"redact_pii": true, "redact_faces": true, "redact_financial": true, "replacement_style": "mask", "language": "en"}'
Option B: Plain Text (JSON body)
RequestPOST /v1/redact
Authorization: Bearer <api_key>
Content-Type: application/json
{
"text": "Contact Hans Mueller at hans@firma.de, IBAN DE89370400440532013000",
"options": {
"redact_pii": true,
"redact_financial": true,
"replacement_style": "mask",
"language": "en"
}
}
Options
| Field | Type | Default | Description |
|---|---|---|---|
redact_pii | bool | true | Redact text-based PII (names, emails, phones, etc.) |
redact_faces | bool | false | Blur detected faces in images and video |
redact_financial | bool | true | Redact IBANs and credit card numbers |
replacement_style | string | "mask" | mask, label, or synthetic |
language | string | "en" | en or de |
Response (synchronous)
Response 200{
"status": "success",
"file_type": "text/plain",
"redacted_file_url": null,
"redacted_text": "Contact [PERSON] at [EMAIL_ADDRESS], IBAN [IBAN_CODE]",
"report": {
"total_entities_found": 3,
"entities": [
{ "type": "PERSON", "count": 1, "risk": "high", "replacement": "[PERSON]" },
{ "type": "EMAIL_ADDRESS", "count": 1, "risk": "high", "replacement": "[EMAIL_ADDRESS]" },
{ "type": "IBAN_CODE", "count": 1, "risk": "high", "replacement": "[IBAN_CODE]" }
],
"processing_time_ms": 6,
"engines_used": ["text"]
}
}
Response (async — video/audio)
Response 200{
"job_id": "a1b2c3d4e5f6...",
"status": "pending",
"progress": null,
"result": null,
"error": null
}
Error Codes
| Code | Meaning |
|---|---|
401 | Missing or invalid API key |
413 | File exceeds 100 MB limit |
415 | Unsupported file type |
422 | Invalid options JSON or missing required fields |
429 | Rate limit exceeded |
500 | Processing error |
GET /v1/jobs/{job_id}
Poll the status of an asynchronous redaction job (video or audio files).
Response 200{
"job_id": "a1b2c3d4e5f6...",
"status": "completed", // pending | processing | completed | failed
"progress": 100, // 0-100
"result": {
"status": "success",
"file_type": "video/mp4",
"redacted_file_url": "/v1/files/abc123.mp4",
"report": { ... }
},
"error": null
}
POST /v1/keys
Create a new API key. Requires the master API key for authentication.
RequestPOST /v1/keys
Authorization: Bearer <master_api_key>
Content-Type: application/json
{ "owner_email": "developer@company.com" }
Response 200
{
"api_key": "JeQ-0V1_V3ZhioVnb8m3bCiHQX2HD9NSRa-6GSgF97g",
"message": "Store this key securely -- it cannot be retrieved again."
}
GET /health
Response 200{ "status": "healthy", "version": "0.1.0" }
Entity Types
cloakd.app detects these PII entity types using Microsoft Presidio + spaCy NLP:
| Entity | Example | Risk |
|---|---|---|
PERSON | Hans Mueller, John Smith | High |
EMAIL_ADDRESS | hans@firma.de | High |
PHONE_NUMBER | +49 170 1234567 | Medium |
IBAN_CODE | DE89370400440532013000 | High |
CREDIT_CARD | 4111 1111 1111 1111 | High |
NRP | National ID numbers | High |
LOCATION | Frankfurt, Berlin | Medium |
DATE_TIME | March 19, 2026 | Low |
MEDICAL_LICENSE | Medical license numbers | High |
IP_ADDRESS | 192.168.1.1 | Medium |
FACE | Detected via OpenCV | High |
IBAN_CODE and CREDIT_CARD can be excluded by setting redact_financial: false.
Replacement Styles
mask (default)
Replaces with entity type label
Hans Mueller -> [PERSON]
hans@firma.de -> [EMAIL_ADDRESS]
DE893704... -> [IBAN_CODE]
label
Indexed labels, consistent per value
Hans Mueller -> <PERSON_1>
Anna Schmidt -> <PERSON_2>
Hans Mueller -> <PERSON_1>
synthetic
Realistic fake data via Faker (locale-aware)
Hans Mueller -> Thomas Weber
hans@firma.de -> lisa@example.net
DE893704... -> DE12500105170648489890
Supported File Types
| Category | Extensions | MIME Type | Processing | Mode |
|---|---|---|---|---|
| Text | .txt, .html | text/plain, text/html | Presidio NER | Sync |
application/pdf | pdfplumber + Presidio + reportlab | Sync | ||
| Word | .docx | application/vnd.openxml... | python-docx + Presidio | Sync |
| Images | .jpg .png .webp .bmp | image/* | OpenCV + Tesseract + Presidio | Sync |
| Video | .mp4 .mov .avi | video/* | Frame-by-frame OpenCV + moviepy | Async |
| Audio | .mp3 .wav .m4a .flac | audio/* | AssemblyAI STT + PII redaction | Async |
Text & Document Redaction
Image Redaction
Video Redaction
Video is processed asynchronously. Submit the file, receive a job_id, and poll GET /v1/jobs/{job_id} for progress.
Audio Redaction
Audio is processed via the AssemblyAI API, which provides both speech-to-text and PII redaction.
AssemblyAI PII Policies
| Policy | What It Detects |
|---|---|
person_name | Spoken names |
phone_number | Phone numbers |
email_address | Email addresses |
credit_card_number | Credit card numbers |
banking_information | Bank accounts, routing numbers |
us_social_security_number | SSN / national ID numbers |
PII in audio is replaced with hash substitution in the transcript and beeped out in the audio file.
ASSEMBLYAI_API_KEY environment variable. Without it, video frames are still redacted but the audio track is left unprocessed.
Deployment
Docker Container
- Base image: python:3.11-slim (Debian Trixie)
- System packages: libmagic, tesseract-ocr (EN + DE), OpenCV libs, ffmpeg
- Python packages: FastAPI, Presidio, spaCy, OpenCV, moviepy, AssemblyAI SDK
- spaCy models: en_core_web_lg (~560MB), de_core_news_lg (~540MB)
- RAM usage: ~758 MB
Rate Limiting
| Zone | Scope | Limit |
|---|---|---|
| API requests | Per API key (or IP fallback) | 100/hour (configurable) |
| nginx: cloakd_api | Per IP | 30 req/min, burst 20 |
| nginx: cloakd_keys | Per IP | 5 req/min, burst 3 |
Rate limit key is the SHA256 hash of the Bearer token. Falls back to client IP if no token provided. Exceeding returns 429.
Security
HTTP Headers
| Header | Value |
|---|---|
Strict-Transport-Security | max-age=63072000; includeSubDomains; preload |
X-Frame-Options | DENY |
X-Content-Type-Options | nosniff |
X-XSS-Protection | 1; mode=block |
Referrer-Policy | strict-origin-when-cross-origin |
X-Processing-Time-Ms | (per-request wall clock time) |
Data Handling
- Input files are never persisted -- processed in-memory or /tmp, deleted immediately
- Output files auto-delete after 1 hour
- API keys are stored as bcrypt hashes -- plaintext cannot be recovered
- TLS 1.2/1.3 only, strong cipher suites
- Exploit paths (.php, .asp, .jsp, dotfiles) blocked at nginx
Environment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
MASTER_API_KEY | Yes | -- | Admin key for creating API keys |
PORT | No | 8001 | Port the API listens on |
ASSEMBLYAI_API_KEY | No | -- | Required for audio/video audio redaction |
STORAGE_BACKEND | No | local | local for MVP; s3 or r2 planned |
STORAGE_PATH | No | /tmp/cloakd_output | Directory for redacted output files |
RATE_LIMIT_PER_HOUR | No | 100 | Max requests per API key per hour |
DB_PATH | No | cloakd.db | SQLite database path |
SENTRY_DSN | No | -- | Sentry error tracking DSN |
All Endpoints
| Method | Path | Auth | Rate Limited | Description |
|---|---|---|---|---|
| POST | /v1/redact | API Key | Yes | Redact PII from file or text |
| GET | /v1/jobs/{job_id} | None | Yes | Poll async job status |
| POST | /v1/keys | Master Key | Yes | Create new API key |
| GET | /v1/files/{id} | None | No | Download redacted output file |
| GET | /health | None | No | Health check |