cloakd.app

A developer-first privacy API that scrubs PII, faces, financial data, and confidential information from any file type before it reaches an AI model. EU-native. GDPR-by-design.

Live at cloakd.app v0.1.0 MVP EU-hosted (Hetzner DE)

Architecture

cloakd.app runs as a single Docker container behind nginx on a Hetzner VPS in Frankfurt. All processing happens in-memory or in /tmp. No input files are persisted. Output files auto-delete after 1 hour.

Client Request | v [ nginx ] ---- SSL termination, rate limiting, security headers | port 443 -> 127.0.0.1:8001 v [ FastAPI ] ---- Auth (Bearer token), file size check, MIME detection | v [ Router ] ---- python-magic MIME detection, routes to correct engine | ----+---- | | sync async | | v v [Engine] [Background Job] | | v v Result Poll GET /v1/jobs/{id}

Processing Modes

ModeFile TypesResponse
SynchronousText, PDF, DOCX, ImagesImmediate RedactResponse with redacted file/text
AsynchronousVideo, AudioImmediate JobStatus with job_id — poll for result

Providers & Engines

cloakd.app combines multiple specialized engines to detect and redact different types of sensitive data:

Microsoft Presidio
Text PII Detection & Anonymization
spaCy NLP
Named Entity Recognition (EN + DE models)
OpenCV
Face Detection & Video Frame Processing
Tesseract OCR
Text Extraction from Images
AssemblyAI
Speech-to-Text + Audio PII Redaction
Faker
Synthetic Data Generation (DE/EN)

How They Connect

Text / PDF / DOCX | +--> spaCy NLP (en_core_web_lg / de_core_news_lg) +--> Presidio AnalyzerEngine (entity detection) +--> Presidio AnonymizerEngine (replacement) +--> Faker (if synthetic mode) Images | +--> OpenCV Haar Cascade (face detection) +--> OpenCV GaussianBlur (face blurring) +--> Tesseract OCR (text extraction) +--> Presidio (PII in extracted text) +--> OpenCV fillRect (blackout PII regions) Video | +--> OpenCV VideoCapture (frame extraction) +--> Image pipeline (per frame) +--> moviepy (audio extraction) +--> AssemblyAI (audio PII redaction) +--> OpenCV VideoWriter (reassembly) Audio | +--> AssemblyAI Transcription API +--> 6 PII redaction policies +--> Redacted audio file + transcript

Pricing

Simple, transparent pricing. No hidden fees. All plans include every file type and redaction engine. EU-hosted, GDPR-compliant.

Free
0
forever
  • 50 requests / month
  • All file types
  • All redaction engines
  • Community support
  • Audio / video redaction
  • Priority processing
Pay-as-you-go
0.01 / req
no monthly commitment
  • Unlimited requests
  • All file types
  • All redaction engines
  • Audio + video redaction
  • Email support
  • Priority processing
Enterprise
Custom
talk to us
  • Everything in Pro
  • SLA guarantee
  • Dedicated infrastructure
  • On-premise deployment
  • Custom entity types
  • Volume discounts
All plans include Text, PDF, DOCX, image redaction with Presidio + spaCy + OpenCV. German and English language support. Mask, label, and synthetic replacement styles. GDPR-compliant EU hosting.

Authentication

All redaction requests require a Bearer token in the Authorization header. API keys are created via the admin endpoint using the master key.

Two types of keys

Key TypePurposeWho Uses It
Master API KeyCreate new API keys via POST /v1/keysYou (admin only)
API KeyCall POST /v1/redact and other endpointsYour customers / developers

API keys are hashed with bcrypt before storage. The plaintext key is returned only once at creation time and cannot be retrieved again.

Key storage Keys are stored in a local SQLite database (cloakd.db). Each key tracks: owner_email, created_at, request_count, is_active.

POST /v1/redact

The core endpoint. Accepts a file or text, detects and redacts all sensitive data, returns the cleaned output with a redaction report.

Option A: File Upload (multipart)

Request
POST /v1/redact
Authorization: Bearer <api_key>
Content-Type: multipart/form-data

-- Form fields --
file:     <binary file data>
options:  '{"redact_pii": true, "redact_faces": true, "redact_financial": true, "replacement_style": "mask", "language": "en"}'

Option B: Plain Text (JSON body)

Request
POST /v1/redact
Authorization: Bearer <api_key>
Content-Type: application/json

{
  "text": "Contact Hans Mueller at hans@firma.de, IBAN DE89370400440532013000",
  "options": {
    "redact_pii": true,
    "redact_financial": true,
    "replacement_style": "mask",
    "language": "en"
  }
}

Options

FieldTypeDefaultDescription
redact_piibooltrueRedact text-based PII (names, emails, phones, etc.)
redact_facesboolfalseBlur detected faces in images and video
redact_financialbooltrueRedact IBANs and credit card numbers
replacement_stylestring"mask"mask, label, or synthetic
languagestring"en"en or de

Response (synchronous)

Response 200
{
  "status": "success",
  "file_type": "text/plain",
  "redacted_file_url": null,
  "redacted_text": "Contact [PERSON] at [EMAIL_ADDRESS], IBAN [IBAN_CODE]",
  "report": {
    "total_entities_found": 3,
    "entities": [
      { "type": "PERSON", "count": 1, "risk": "high", "replacement": "[PERSON]" },
      { "type": "EMAIL_ADDRESS", "count": 1, "risk": "high", "replacement": "[EMAIL_ADDRESS]" },
      { "type": "IBAN_CODE", "count": 1, "risk": "high", "replacement": "[IBAN_CODE]" }
    ],
    "processing_time_ms": 6,
    "engines_used": ["text"]
  }
}

Response (async — video/audio)

Response 200
{
  "job_id": "a1b2c3d4e5f6...",
  "status": "pending",
  "progress": null,
  "result": null,
  "error": null
}

Error Codes

CodeMeaning
401Missing or invalid API key
413File exceeds 100 MB limit
415Unsupported file type
422Invalid options JSON or missing required fields
429Rate limit exceeded
500Processing error

GET /v1/jobs/{job_id}

Poll the status of an asynchronous redaction job (video or audio files).

Response 200
{
  "job_id": "a1b2c3d4e5f6...",
  "status": "completed",      // pending | processing | completed | failed
  "progress": 100,              // 0-100
  "result": {
    "status": "success",
    "file_type": "video/mp4",
    "redacted_file_url": "/v1/files/abc123.mp4",
    "report": { ... }
  },
  "error": null
}

POST /v1/keys

Create a new API key. Requires the master API key for authentication.

Request
POST /v1/keys
Authorization: Bearer <master_api_key>
Content-Type: application/json

{ "owner_email": "developer@company.com" }
Response 200
{
  "api_key": "JeQ-0V1_V3ZhioVnb8m3bCiHQX2HD9NSRa-6GSgF97g",
  "message": "Store this key securely -- it cannot be retrieved again."
}

GET /health

Response 200
{ "status": "healthy", "version": "0.1.0" }

Entity Types

cloakd.app detects these PII entity types using Microsoft Presidio + spaCy NLP:

EntityExampleRisk
PERSONHans Mueller, John SmithHigh
EMAIL_ADDRESShans@firma.deHigh
PHONE_NUMBER+49 170 1234567Medium
IBAN_CODEDE89370400440532013000High
CREDIT_CARD4111 1111 1111 1111High
NRPNational ID numbersHigh
LOCATIONFrankfurt, BerlinMedium
DATE_TIMEMarch 19, 2026Low
MEDICAL_LICENSEMedical license numbersHigh
IP_ADDRESS192.168.1.1Medium
FACEDetected via OpenCVHigh
Financial entities IBAN_CODE and CREDIT_CARD can be excluded by setting redact_financial: false.

Replacement Styles

mask (default)

Replaces with entity type label

Hans Mueller   -> [PERSON]
hans@firma.de  -> [EMAIL_ADDRESS]
DE893704...    -> [IBAN_CODE]

label

Indexed labels, consistent per value

Hans Mueller   -> <PERSON_1>
Anna Schmidt   -> <PERSON_2>
Hans Mueller   -> <PERSON_1>

synthetic

Realistic fake data via Faker (locale-aware)

Hans Mueller   -> Thomas Weber
hans@firma.de  -> lisa@example.net
DE893704...    -> DE12500105170648489890

Supported File Types

CategoryExtensionsMIME TypeProcessingMode
Text.txt, .htmltext/plain, text/htmlPresidio NERSync
PDF.pdfapplication/pdfpdfplumber + Presidio + reportlabSync
Word.docxapplication/vnd.openxml...python-docx + PresidioSync
Images.jpg .png .webp .bmpimage/*OpenCV + Tesseract + PresidioSync
Video.mp4 .mov .avivideo/*Frame-by-frame OpenCV + moviepyAsync
Audio.mp3 .wav .m4a .flacaudio/*AssemblyAI STT + PII redactionAsync
File type detection MIME types are detected from file content via libmagic, not from file extensions. This prevents spoofing.

Text & Document Redaction

Input (text / PDF / DOCX) | v Extract Text PDF: pdfplumber (page by page) DOCX: python-docx (paragraph by paragraph) | v Presidio AnalyzerEngine NLP model: en_core_web_lg or de_core_news_lg Detects 10 entity types with confidence scores | v Presidio AnonymizerEngine Applies selected replacement style (mask/label/synthetic) | v Rebuild Document PDF: reportlab (one text block per page) DOCX: replace paragraph runs, preserve formatting | v Output + Entity Report

Image Redaction

Input Image (JPG, PNG, WebP, BMP) | +--------+--------+ | | v v Face Detection OCR + PII Detection OpenCV Haar Tesseract -> Presidio Cascade 10 entity types | | v v Gaussian Blur Black Fill Rect (99x99, sigma=30) (over PII regions) +--------+--------+ | v Output Image + Entity Report

Video Redaction

Video is processed asynchronously. Submit the file, receive a job_id, and poll GET /v1/jobs/{job_id} for progress.

Input Video (MP4, MOV, AVI) | +------------+------------+ | | v v Video Frames Audio Track OpenCV VideoCapture moviepy extraction | | v v Per-Frame Processing AssemblyAI Faces: every frame Speech-to-text PII OCR: every 30th frame Audio beep/redaction | | v v VideoWriter (mp4v) Redacted Audio +------------+------------+ | v Mux (libx264/aac) | v Output Video + Entity Report

Audio Redaction

Audio is processed via the AssemblyAI API, which provides both speech-to-text and PII redaction.

AssemblyAI PII Policies

PolicyWhat It Detects
person_nameSpoken names
phone_numberPhone numbers
email_addressEmail addresses
credit_card_numberCredit card numbers
banking_informationBank accounts, routing numbers
us_social_security_numberSSN / national ID numbers

PII in audio is replaced with hash substitution in the transcript and beeped out in the audio file.

Requires AssemblyAI API key Audio/video audio redaction requires the ASSEMBLYAI_API_KEY environment variable. Without it, video frames are still redacted but the audio track is left unprocessed.

Deployment

Hetzner VPS (46.225.177.32 / Frankfurt, DE) | +-- /opt/erpforgeai/ ERPforgeAI.de (port 8000, native Python) | +-- /opt/cloakd/ cloakd.app (port 8001, Docker) | +-- current/ symlink to active release | +-- releases/ versioned releases (keep last 5) | +-- shared/ .env + database + venv | +-- backups/ daily SQLite backups (30 days) | +-- nginx erpforgeai.de --> 127.0.0.1:8000 cloakd.app --> 127.0.0.1:8001

Docker Container

Rate Limiting

ZoneScopeLimit
API requestsPer API key (or IP fallback)100/hour (configurable)
nginx: cloakd_apiPer IP30 req/min, burst 20
nginx: cloakd_keysPer IP5 req/min, burst 3

Rate limit key is the SHA256 hash of the Bearer token. Falls back to client IP if no token provided. Exceeding returns 429.

Security

HTTP Headers

HeaderValue
Strict-Transport-Securitymax-age=63072000; includeSubDomains; preload
X-Frame-OptionsDENY
X-Content-Type-Optionsnosniff
X-XSS-Protection1; mode=block
Referrer-Policystrict-origin-when-cross-origin
X-Processing-Time-Ms(per-request wall clock time)

Data Handling

Environment Variables

VariableRequiredDefaultDescription
MASTER_API_KEYYes--Admin key for creating API keys
PORTNo8001Port the API listens on
ASSEMBLYAI_API_KEYNo--Required for audio/video audio redaction
STORAGE_BACKENDNolocallocal for MVP; s3 or r2 planned
STORAGE_PATHNo/tmp/cloakd_outputDirectory for redacted output files
RATE_LIMIT_PER_HOURNo100Max requests per API key per hour
DB_PATHNocloakd.dbSQLite database path
SENTRY_DSNNo--Sentry error tracking DSN

All Endpoints

MethodPathAuthRate LimitedDescription
POST/v1/redactAPI KeyYesRedact PII from file or text
GET/v1/jobs/{job_id}NoneYesPoll async job status
POST/v1/keysMaster KeyYesCreate new API key
GET/v1/files/{id}NoneNoDownload redacted output file
GET/healthNoneNoHealth check