woladi

πŸ”’ Privacy Tiers for Document AI β€” Three Pipeline Configurations

Privacy-First AI Pipelines

One OCR engine. Three trust models. A practical guide to choosing how much of your document processing stays on your machine.


The Core Insight

Most document AI tools work like this:

πŸ“„ Your file β†’ ☁️ Cloud API β†’ πŸ€– LLM

Your raw file β€” image, PDF, scan β€” travels to a third-party server before any processing happens. You’re trusting the provider’s infrastructure, their logging policy, and every hop in between.

macos-vision-mcp changes the boundary:

πŸ“„ Your file β†’ 🍎 Apple Vision (local) β†’ πŸ“ Extracted text β†’ [your choice what happens next]

The file never leaves your Mac. What you decide to do with the extracted text is where your three options diverge β€” and where the privacy trade-offs actually live.


The Three Pipelines

Pipeline 1 β€” Fully Local

πŸ“„ File
  β”‚
  β–Ό
🍎 macos-vision-mcp
   Apple Vision OCR
   (on-device, Neural Engine)
  β”‚
  β–Ό
πŸ¦™ Local LLM (Ollama)
   mistral-nemo / llama3 / etc.
  β”‚
  β–Ό
βœ… Result β€” nothing left the machine

How it works: macos-vision-mcp extracts text locally via Apple Vision. The extracted text is passed directly to a local Ollama model for formatting, summarising, or answering questions. No network requests are made at any stage.

Privacy guarantee: Absolute, within your local system. No file content, no extracted text, no metadata touches any external server. The only thing that leaves the process boundary is the final response rendered in your terminal or Claude Desktop interface.

Performance reality: The macos-vision vs Tesseract benchmark ran exactly this configuration β€” Apple Vision OCR feeding mistral-nemo on Ollama as the downstream formatter. Mean latency was 25.3 seconds per page, with ~95% of that time spent in Ollama, not in OCR. Apple Vision’s native API access is essentially zero-overhead from Node.js; the bottleneck is the local model, not the extraction step.

When to use this:

The honest trade-off: Local models are weaker than frontier cloud models. mistral-nemo handles formatting and simple extraction well; it will struggle with nuanced reasoning, multi-document synthesis, or anything that requires the kind of world-knowledge that large frontier models carry. If your task requires Claude-level reasoning, Pipeline 1 is not the right fit β€” unless you can run a large enough local model.


Pipeline 2 β€” Local OCR, Cloud Reasoning

πŸ“„ File
  β”‚
  β–Ό
🍎 macos-vision-mcp
   Apple Vision OCR
   (on-device, Neural Engine)
  β”‚
  β–Ό
☁️ Cloud LLM
   (Claude / GPT-4 / Gemini)
  β”‚
  β–Ό
βœ… Result β€” file stayed local, text went to cloud

How it works: OCR runs locally. The extracted plain-text representation of the document is sent to a cloud LLM for reasoning, summarisation, or Q&A. The original file β€” pixels, layout, fonts, embedded metadata β€” never leaves the machine.

Privacy guarantee: Partial but meaningful. This is a significant improvement over uploading the file directly:

However, the extracted text still contains all PII in cleartext. Names, account numbers, diagnoses, addresses β€” whatever Apple Vision read off the page arrives at the cloud provider verbatim. If the document is sensitive, the cloud provider’s logging and retention policies matter.

When to use this:

The honest trade-off: The privacy win here is real but often overstated. If a document contains a Social Security Number, that SSN will appear in the text sent to the cloud just as clearly as it appeared in the original PDF. The file format is protected; the information is not. Use this pipeline when you care about the raw file not leaving your machine, not when you need to prevent the content from reaching a provider.


Pipeline 3 β€” Local OCR + Pseudonymisation, Cloud Reasoning

πŸ“„ File
  β”‚
  β–Ό
🍎 macos-vision-mcp
   Apple Vision OCR
   (on-device, Neural Engine)
  β”‚
  β–Ό
πŸ•΅οΈ pseudonym-mcp
   PII β†’ reversible tokens
   [PERSON:1], [SSN:1], [CREDIT_CARD:1]...
   (on-device, Regex + optional Ollama NER)
  β”‚
  β–Ό
☁️ Cloud LLM
   sees tokens, not real values
  β”‚
  β–Ό
πŸ”“ pseudonym-mcp
   unmask_text()
   tokens β†’ original values
  β”‚
  β–Ό
βœ… Result with real names restored

How it works: OCR runs locally. The extracted text is passed through pseudonym-mcp before any cloud call β€” structured PII (SSNs, card numbers, IBANs, email addresses, phone numbers) is replaced by deterministic tokens via regex; names and organisations are masked via a local Ollama NER model if available. The cloud LLM reasons over a pseudonymised document. The response is unmasked locally before being shown to you.

This is the pipeline described in the Obsidian Vault guide and the OpenClaw messaging guide.

Privacy guarantee: The strongest available when using a cloud LLM. The cloud provider receives a document where identifiable values have been replaced with opaque tokens. It can reason about structure, obligations, dates, patterns, and relationships β€” but it does not see the real names or numbers involved.

What each layer protects:

Layer What stays local
macos-vision-mcp Raw file: pixels, layout, fonts, metadata, embedded artifacts
pseudonym-mcp PII values: names, SSNs, card numbers, IBAN, PESEL, email, phone
Cloud LLM Receives: pseudonymised text, structural context, document meaning

When to use this:


Side-by-Side Comparison

Β  Pipeline 1 Pipeline 2 Pipeline 3
Raw file reaches cloud ❌ Never ❌ Never ❌ Never
Extracted text reaches cloud ❌ Never βœ… Yes (cleartext) ⚠️ Yes (pseudonymised)
PII values reach cloud ❌ Never βœ… Yes ❌ Masked
LLM reasoning quality ⚠️ Local model βœ… Frontier βœ… Frontier
Latency ~25 s/page Fast Fast + small local overhead
External dependencies Ollama only Cloud API key Cloud API key + optional Ollama
Best for Maximum privacy Metadata protection PII protection + cloud quality

The Boundary That All Three Share

Regardless of which pipeline you use, macos-vision-mcp enforces one guarantee that no cloud-upload approach can match: the raw document never leaves your machine.

This matters more than it might seem. A scanned medical report is not just its text. It is a TIFF-embedded image, a page geometry, a document structure, potentially a watermark or stamp, and whatever metadata the scanner attached. Sending that to a cloud OCR API hands over the full artifact. Apple Vision reads it on your Neural Engine and returns a structured text representation. That’s the extraction boundary β€” and it’s a hard one.

What you choose to do with the extracted text is a separate decision, with separate trade-offs. The three pipelines above are the principal configurations. Most real workflows fall into one of them, or combine them: local LLM for initial triage, cloud LLM for the final reasoning step, with pseudonymisation in between.


⚠️ The OCR Quality Caveat

There is a non-obvious interaction between OCR quality and privacy that is worth naming explicitly.

The macos-vision vs Tesseract benchmark showed that Apple Vision has a bimodal error distribution on a 50-PDF academic corpus: excellent on clean body text (16/50 files with CER < 5%), but with a brittle tail on stylized display typography (13/50 files with CER > 50%). When OCR fails catastrophically on a page, names and numbers are misread as gibberish.

This affects Pipeline 3 in a specific way: pseudonym-mcp can only mask values it recognises. If Kowalski becomes Kowaloki in the OCR output, the NER model will not flag it as a person name and it will not be tokenised before reaching the cloud. OCR errors create gaps in the pseudonymisation layer that are invisible to the user.

Practical implication: For Pipeline 3 on documents with unusual fonts, handwriting, or stylized layouts, verify OCR quality before treating the pseudonymisation pass as reliable. The benchmark’s latency data is also relevant here: at 25.3 s/page mean, it is feasible to include a local review step for high-stakes documents without making the workflow prohibitively slow.


Quick Setup Reference

# Add both MCP servers to Claude Code
claude mcp add macos-vision-mcp -- npx -y macos-vision-mcp
claude mcp add pseudonym-mcp -- npx -y pseudonym-mcp --engines hybrid

# For Pipeline 1 / Pipeline 3 NER: pull a local model
ollama pull mistral-nemo   # formatter
ollama pull llama3         # NER for pseudonym-mcp

For Claude Desktop, add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "macos-vision-mcp": {
      "command": "npx",
      "args": ["-y", "macos-vision-mcp"]
    },
    "pseudonym-mcp": {
      "command": "npx",
      "args": ["-y", "pseudonym-mcp", "--engines", "hybrid"]
    }
  }
}