cloakd.app

A developer-first privacy API that scrubs PII, faces, financial data, and confidential information from any file type before it reaches an AI model. EU-native. GDPR-by-design.

Live at cloakd.app v0.1.0 MVP EU-hosted (Hetzner DE)

Architecture

cloakd.app runs as a single Docker container behind nginx on a Hetzner VPS in Frankfurt. All processing happens in-memory or in /tmp. No input files are persisted. Output files auto-delete after 1 hour.

Client Request | v [ nginx ] ---- SSL termination, rate limiting, security headers | port 443 -> 127.0.0.1:8001 v [ FastAPI ] ---- Auth (Bearer token), file size check, MIME detection | v [ Router ] ---- python-magic MIME detection, routes to correct engine | ----+---- | | sync async | | v v [Engine] [Background Job] | | v v Result Poll GET /v1/jobs/{id}

Processing Modes

Mode	File Types	Response
Synchronous	Text, PDF, DOCX, Images	Immediate `RedactResponse` with redacted file/text
Asynchronous	Video, Audio	Immediate `JobStatus` with `job_id` — poll for result

Providers & Engines

cloakd.app combines multiple specialized engines to detect and redact different types of sensitive data:

Microsoft Presidio

Text PII Detection & Anonymization

spaCy NLP

Named Entity Recognition (EN + DE models)

OpenCV

Face Detection & Video Frame Processing

Tesseract OCR

Text Extraction from Images

AssemblyAI

Speech-to-Text + Audio PII Redaction

Faker

Synthetic Data Generation (DE/EN)

How They Connect

Text / PDF / DOCX | +--> spaCy NLP (en_core_web_lg / de_core_news_lg) +--> Presidio AnalyzerEngine (entity detection) +--> Presidio AnonymizerEngine (replacement) +--> Faker (if synthetic mode) Images | +--> OpenCV Haar Cascade (face detection) +--> OpenCV GaussianBlur (face blurring) +--> Tesseract OCR (text extraction) +--> Presidio (PII in extracted text) +--> OpenCV fillRect (blackout PII regions) Video | +--> OpenCV VideoCapture (frame extraction) +--> Image pipeline (per frame) +--> moviepy (audio extraction) +--> AssemblyAI (audio PII redaction) +--> OpenCV VideoWriter (reassembly) Audio | +--> AssemblyAI Transcription API +--> 6 PII redaction policies +--> Redacted audio file + transcript

Pricing

Simple, transparent pricing. No hidden fees. All plans include every file type and redaction engine. EU-hosted, GDPR-compliant.

Free

€0

forever

50 requests / month
All file types
All redaction engines
Community support
Audio / video redaction
Priority processing

Pay-as-you-go

€0.01 / req

no monthly commitment

Unlimited requests
All file types
All redaction engines
Audio + video redaction
Email support
Priority processing

Pro

€29 / mo

unlimited requests

Unlimited requests
All file types
All redaction engines
Audio + video redaction
Priority support
Priority processing

Enterprise

Custom

talk to us

Everything in Pro
SLA guarantee
Dedicated infrastructure
On-premise deployment
Custom entity types
Volume discounts

All plans include Text, PDF, DOCX, image redaction with Presidio + spaCy + OpenCV. German and English language support. Mask, label, and synthetic replacement styles. GDPR-compliant EU hosting.

Authentication

All redaction requests require a Bearer token in the Authorization header. API keys are created via the admin endpoint using the master key.

Two types of keys

Key Type	Purpose	Who Uses It
Master API Key	Create new API keys via `POST /v1/keys`	You (admin only)
API Key	Call `POST /v1/redact` and other endpoints	Your customers / developers

API keys are hashed with bcrypt before storage. The plaintext key is returned only once at creation time and cannot be retrieved again.

Key storage Keys are stored in a local SQLite database (cloakd.db). Each key tracks: owner_email, created_at, request_count, is_active.

POST /v1/redact

The core endpoint. Accepts a file or text, detects and redacts all sensitive data, returns the cleaned output with a redaction report.

Option A: File Upload (multipart)

Request

POST /v1/redact
Authorization: Bearer <api_key>
Content-Type: multipart/form-data

-- Form fields --
file:     <binary file data>
options:  '{"redact_pii": true, "redact_faces": true, "redact_financial": true, "replacement_style": "mask", "language": "en"}'

Option B: Plain Text (JSON body)

Request

POST /v1/redact
Authorization: Bearer <api_key>
Content-Type: application/json

{
  "text": "Contact Hans Mueller at hans@firma.de, IBAN DE89370400440532013000",
  "options": {
    "redact_pii": true,
    "redact_financial": true,
    "replacement_style": "mask",
    "language": "en"
  }
}

Options

Field	Type	Default	Description
`redact_pii`	bool	`true`	Redact text-based PII (names, emails, phones, etc.)
`redact_faces`	bool	`false`	Blur detected faces in images and video
`redact_financial`	bool	`true`	Redact IBANs and credit card numbers
`replacement_style`	string	`"mask"`	`mask`, `label`, or `synthetic`
`language`	string	`"en"`	`en` or `de`

Response (synchronous)

Response 200

{
  "status": "success",
  "file_type": "text/plain",
  "redacted_file_url": null,
  "redacted_text": "Contact [PERSON] at [EMAIL_ADDRESS], IBAN [IBAN_CODE]",
  "report": {
    "total_entities_found": 3,
    "entities": [
      { "type": "PERSON", "count": 1, "risk": "high", "replacement": "[PERSON]" },
      { "type": "EMAIL_ADDRESS", "count": 1, "risk": "high", "replacement": "[EMAIL_ADDRESS]" },
      { "type": "IBAN_CODE", "count": 1, "risk": "high", "replacement": "[IBAN_CODE]" }
    ],
    "processing_time_ms": 6,
    "engines_used": ["text"]
  }
}

Response (async — video/audio)

Response 200

{
  "job_id": "a1b2c3d4e5f6...",
  "status": "pending",
  "progress": null,
  "result": null,
  "error": null
}

Error Codes

Code	Meaning
`401`	Missing or invalid API key
`413`	File exceeds 100 MB limit
`415`	Unsupported file type
`422`	Invalid options JSON or missing required fields
`429`	Rate limit exceeded
`500`	Processing error

GET /v1/jobs/{job_id}

Poll the status of an asynchronous redaction job (video or audio files).

Response 200

{
  "job_id": "a1b2c3d4e5f6...",
  "status": "completed",      // pending | processing | completed | failed
  "progress": 100,              // 0-100
  "result": {
    "status": "success",
    "file_type": "video/mp4",
    "redacted_file_url": "/v1/files/abc123.mp4",
    "report": { ... }
  },
  "error": null
}

POST /v1/keys

Create a new API key. Requires the master API key for authentication.

Request

POST /v1/keys
Authorization: Bearer <master_api_key>
Content-Type: application/json

{ "owner_email": "developer@company.com" }

Response 200

{
  "api_key": "JeQ-0V1_V3ZhioVnb8m3bCiHQX2HD9NSRa-6GSgF97g",
  "message": "Store this key securely -- it cannot be retrieved again."
}

GET /health

Response 200

{ "status": "healthy", "version": "0.1.0" }

Entity Types

cloakd.app detects these PII entity types using Microsoft Presidio + spaCy NLP:

Entity	Example	Risk
`PERSON`	Hans Mueller, John Smith	High
`EMAIL_ADDRESS`	hans@firma.de	High
`PHONE_NUMBER`	+49 170 1234567	Medium
`IBAN_CODE`	DE89370400440532013000	High
`CREDIT_CARD`	4111 1111 1111 1111	High
`NRP`	National ID numbers	High
`LOCATION`	Frankfurt, Berlin	Medium
`DATE_TIME`	March 19, 2026	Low
`MEDICAL_LICENSE`	Medical license numbers	High
`IP_ADDRESS`	192.168.1.1	Medium
`FACE`	Detected via OpenCV	High

Financial entities IBAN_CODE and CREDIT_CARD can be excluded by setting redact_financial: false.

Replacement Styles

mask (default)

Replaces with entity type label

Hans Mueller   -> [PERSON]
hans@firma.de  -> [EMAIL_ADDRESS]
DE893704...    -> [IBAN_CODE]

label

Indexed labels, consistent per value

Hans Mueller   -> <PERSON_1>
Anna Schmidt   -> <PERSON_2>
Hans Mueller   -> <PERSON_1>

synthetic

Realistic fake data via Faker (locale-aware)

Hans Mueller   -> Thomas Weber
hans@firma.de  -> lisa@example.net
DE893704...    -> DE12500105170648489890

Supported File Types

Category	Extensions	MIME Type	Processing	Mode
Text	.txt, .html	`text/plain`, `text/html`	Presidio NER	Sync
PDF	.pdf	`application/pdf`	pdfplumber + Presidio + reportlab	Sync
Word	.docx	`application/vnd.openxml...`	python-docx + Presidio	Sync
Images	.jpg .png .webp .bmp	`image/*`	OpenCV + Tesseract + Presidio	Sync
Video	.mp4 .mov .avi	`video/*`	Frame-by-frame OpenCV + moviepy	Async
Audio	.mp3 .wav .m4a .flac	`audio/*`	AssemblyAI STT + PII redaction	Async

File type detection MIME types are detected from file content via libmagic, not from file extensions. This prevents spoofing.

Text & Document Redaction

Input (text / PDF / DOCX) | v Extract Text PDF: pdfplumber (page by page) DOCX: python-docx (paragraph by paragraph) | v Presidio AnalyzerEngine NLP model: en_core_web_lg or de_core_news_lg Detects 10 entity types with confidence scores | v Presidio AnonymizerEngine Applies selected replacement style (mask/label/synthetic) | v Rebuild Document PDF: reportlab (one text block per page) DOCX: replace paragraph runs, preserve formatting | v Output + Entity Report

Image Redaction

Input Image (JPG, PNG, WebP, BMP) | +--------+--------+ | | v v Face Detection OCR + PII Detection OpenCV Haar Tesseract -> Presidio Cascade 10 entity types | | v v Gaussian Blur Black Fill Rect (99x99, sigma=30) (over PII regions) +--------+--------+ | v Output Image + Entity Report

Video Redaction

Video is processed asynchronously. Submit the file, receive a job_id, and poll GET /v1/jobs/{job_id} for progress.

Input Video (MP4, MOV, AVI) | +------------+------------+ | | v v Video Frames Audio Track OpenCV VideoCapture moviepy extraction | | v v Per-Frame Processing AssemblyAI Faces: every frame Speech-to-text PII OCR: every 30th frame Audio beep/redaction | | v v VideoWriter (mp4v) Redacted Audio +------------+------------+ | v Mux (libx264/aac) | v Output Video + Entity Report

Audio Redaction

Audio is processed via the AssemblyAI API, which provides both speech-to-text and PII redaction.

AssemblyAI PII Policies

Policy	What It Detects
`person_name`	Spoken names
`phone_number`	Phone numbers
`email_address`	Email addresses
`credit_card_number`	Credit card numbers
`banking_information`	Bank accounts, routing numbers
`us_social_security_number`	SSN / national ID numbers

PII in audio is replaced with hash substitution in the transcript and beeped out in the audio file.

Requires AssemblyAI API key Audio/video audio redaction requires the ASSEMBLYAI_API_KEY environment variable. Without it, video frames are still redacted but the audio track is left unprocessed.

Deployment

Hetzner VPS (46.225.177.32 / Frankfurt, DE) | +-- /opt/erpforgeai/ ERPforgeAI.de (port 8000, native Python) | +-- /opt/cloakd/ cloakd.app (port 8001, Docker) | +-- current/ symlink to active release | +-- releases/ versioned releases (keep last 5) | +-- shared/ .env + database + venv | +-- backups/ daily SQLite backups (30 days) | +-- nginx erpforgeai.de --> 127.0.0.1:8000 cloakd.app --> 127.0.0.1:8001

Docker Container

Base image: python:3.11-slim (Debian Trixie)
System packages: libmagic, tesseract-ocr (EN + DE), OpenCV libs, ffmpeg
Python packages: FastAPI, Presidio, spaCy, OpenCV, moviepy, AssemblyAI SDK
spaCy models: en_core_web_lg (~560MB), de_core_news_lg (~540MB)
RAM usage: ~758 MB

Rate Limiting

Zone	Scope	Limit
API requests	Per API key (or IP fallback)	100/hour (configurable)
nginx: cloakd_api	Per IP	30 req/min, burst 20
nginx: cloakd_keys	Per IP	5 req/min, burst 3

Rate limit key is the SHA256 hash of the Bearer token. Falls back to client IP if no token provided. Exceeding returns 429.

Security

HTTP Headers

Header	Value
`Strict-Transport-Security`	max-age=63072000; includeSubDomains; preload
`X-Frame-Options`	DENY
`X-Content-Type-Options`	nosniff
`X-XSS-Protection`	1; mode=block
`Referrer-Policy`	strict-origin-when-cross-origin
`X-Processing-Time-Ms`	(per-request wall clock time)

Data Handling

Input files are never persisted -- processed in-memory or /tmp, deleted immediately
Output files auto-delete after 1 hour
API keys are stored as bcrypt hashes -- plaintext cannot be recovered
TLS 1.2/1.3 only, strong cipher suites
Exploit paths (.php, .asp, .jsp, dotfiles) blocked at nginx

Environment Variables

Variable	Required	Default	Description
`MASTER_API_KEY`	Yes	--	Admin key for creating API keys
`PORT`	No	`8001`	Port the API listens on
`ASSEMBLYAI_API_KEY`	No	--	Required for audio/video audio redaction
`STORAGE_BACKEND`	No	`local`	local for MVP; s3 or r2 planned
`STORAGE_PATH`	No	`/tmp/cloakd_output`	Directory for redacted output files
`RATE_LIMIT_PER_HOUR`	No	`100`	Max requests per API key per hour
`DB_PATH`	No	`cloakd.db`	SQLite database path
`SENTRY_DSN`	No	--	Sentry error tracking DSN

All Endpoints

Method	Path	Auth	Rate Limited	Description
POST	`/v1/redact`	API Key	Yes	Redact PII from file or text
GET	`/v1/jobs/{job_id}`	None	Yes	Poll async job status
POST	`/v1/keys`	Master Key	Yes	Create new API key
GET	`/v1/files/{id}`	None	No	Download redacted output file
GET	`/health`	None	No	Health check