π± Send a PDF on WhatsApp. π OCR it locally. π΅οΈ Mask PII. π€ Ask Claude. π Zero personal data reaches the cloud.
People send sensitive documents over WhatsApp, Telegram, and Slack every day β medical reports, bank statements, NDAs, lease agreements, payslips. When you forward them to an AI assistant, the document hits a cloud server before any reasoning happens.
Most setups look like this:
π± You β (document with real names, SSNs, card numbers) β Cloud Gateway β LLM API
The gap is at the gateway. Even if you trust the LLM provider, your raw document crosses multiple hops β infrastructure you donβt control, logs you canβt audit.
β This guide closes that gap using three components that all run on your Mac.
| Component | What it does | Key technology |
|---|---|---|
| OpenClaw | Receives messages from WhatsApp / Telegram / Slack, routes them to a local agent | Local-first gateway β self-hosted |
macos-vision-mcp |
Extracts text from images and PDFs sent via messaging | Apple Vision framework β fully offline |
pseudonym-mcp |
Replaces PII with reversible tokens before anything reaches the cloud | Regex NER + local Ollama β fully offline |
OpenClaw acts as the local control plane. Your messages arrive there first. The two MCP servers plug in as tools available to its agent β no data leaves your Mac until PII is already masked.
π± WhatsApp / Telegram / Slack
β
βΌ
π OpenClaw (local gateway)
β
βββ plain text βββββββββββΊ pseudonym-mcp
β mask_text()
β β
βββ file / image βββΊ macos-vision-mcp
extract_text()
β
ββββΊ pseudonym-mcp
mask_text()
β
βΌ
[PERSON:1], [SSN:1],
[CREDIT_CARD:1], [PESEL:1]...
β
βΌ
βοΈ Cloud LLM API
(Claude / GPT-4 / Gemini)
β
response with
tokens only
β
βΌ
π pseudonym-mcp
unmask_text()
β
βΌ
β
Real names restored
β
βΌ
π± Reply in your app
The LLM reasons about structure, obligations, and meaning β never about real identities. The unmask step happens locally before the reply is sent back.
npm install -g openclaw@latest
openclaw onboard --install-daemon
The wizard guides you through connecting a channel (WhatsApp, Telegram, Slackβ¦) and choosing a cloud model (Claude, GPT-4, Gemini).
openclaw mcp set macos-vision-mcp '{"command": "npx", "args": ["macos-vision-mcp"]}'
openclaw mcp set pseudonym-mcp '{"command": "npx", "args": ["pseudonym-mcp", "--engines", "hybrid"]}'
Verify both are registered:
openclaw mcp list
Alternative β edit ~/.openclaw/openclaw.json directly:
{
"mcp": {
"servers": {
"macos-vision-mcp": {
"command": "npx",
"args": ["macos-vision-mcp"]
},
"pseudonym-mcp": {
"command": "npx",
"args": ["pseudonym-mcp", "--engines", "hybrid"]
}
}
}
}
ollama pull llama3
π‘ Skip this if you only need regex-based masking β SSN, credit cards, PESEL, IBAN, phone, email are covered without Ollama.
Your lawyer forwards an NDA over WhatsApp. Instead of opening a web-based AI tool and pasting the content:
Forward me the PDF and ask:
Extract text from the attached file, mask all PII, then summarise
the key obligations, deadlines, and termination clauses.
Restore real names in the final answer.
What happens:
Your doctor sends a scan of a cardiology report:
Summarise this report in plain language and suggest questions
I should prepare for my next appointment.
Your doctorβs name, your SSN, the diagnosis β all tokenised locally. The cloud provider never processes Protected Health Information. No BAA required. β
You drop a screenshot of your bank statement into a private Slack channel connected to OpenClaw:
Extract the transactions from this image, mask all card numbers
and account holders, then group them by category (food, transport,
subscriptions) and give me a monthly total.
Card numbers pass through as [CREDIT_CARD:1]. Account names pass as [PERSON:1]. The LLM categorises patterns β not your financial identity.
OpenClaw routes a session across multiple messages. You can chain masking across files and keep the token mapping consistent:
# Message 1
Extract and mask the text from invoice_jan.pdf β remember the session.
# Message 2
Do the same for invoice_feb.pdf using the same session.
# Message 3
Which supplier charged the most across both months?
Restore names in the answer.
π‘
[PERSON:1]and[ORG:1]remain stable across all messages in the session β the LLM can reason about patterns and relationships without ever knowing real identities.
You photograph a handwritten meeting note and send it:
Transcribe this note and extract the action items with owners and deadlines.
Apple Vision handles handwriting recognition natively. Owners are masked before the LLM sees them, then restored in the action-item list.
pseudonym-mcp ships two prompt templates that chain the full pipeline automatically. They work the same way inside OpenClaw as in Claude Desktop.
pseudonymize_task β inline text/pseudonymize_task text="Meeting with Jan Kowalski (PESEL: 90010112318). Contract: 45 000 zΕ." task="Extract action items"
[PERSON:1], [PESEL:1]Optional lang argument: en (default) or pl.
privacy_scan_file β file or image pathRequires macos-vision-mcp alongside pseudonym-mcp.
/privacy_scan_file filePath="/path/to/document.pdf" task="Summarise key obligations"
--lang en, default)| Token | Covers |
|---|---|
[PERSON:1] |
π€ Full names (via Ollama NER) |
[ORG:1] |
π’ Organisation names (via Ollama NER) |
[SSN:1] |
πͺͺ Social Security Numbers β with area-number validation |
[CREDIT_CARD:1] |
π³ 13β19 digit card numbers β with Luhn checksum |
[EMAIL:1] |
π§ Email addresses |
[PHONE:1] |
π± US phone formats |
π Polish users:
--lang pladds PESEL, Polish IBAN, and Polish phone formats.
| Regulation | Who it affects | How the pipeline helps |
|---|---|---|
| πΊπΈ HIPAA | Healthcare providers, patients | PHI never reaches a non-BAA cloud provider |
| π³ PCI DSS 3.4 | Anyone handling card data | Card numbers masked before LLM transit |
| πͺπΊ GDPR Art. 44 | EU users & businesses | No personal data transferred cross-border |
| π’ SOC 2 | SaaS & enterprise | Demonstrates PII leaves no trust boundary |
β οΈ Note: Pseudonymisation does not equal anonymisation β the data remains personal in your local system. However, it substantially reduces risk and demonstrates compliance with accountability principles.
This pipeline is a risk-reduction tool, not a guarantee of zero data exposure.
Pseudonymisation is a compromise by design: it replaces identifiable values with tokens, but the surrounding context β sentence structure, topic, document type, dates, amounts β is still sent to the cloud. A sufficiently determined party with access to LLM logs and additional context could potentially re-identify individuals from that surrounding content alone.
No tool, including macos-vision-mcp and pseudonym-mcp, can provide a 100% guarantee that personal data will never leak or be inferred. Edge cases exist:
Use this pipeline as one layer of a broader privacy strategy, not as a substitute for legal advice, a BAA, or a formal data protection assessment. If youβre handling data subject to strict regulatory requirements (HIPAA, GDPR Article 9 special categories, classified information), consult a qualified professional before relying on any automated pseudonymisation tool.
# 1. Install OpenClaw
npm install -g openclaw@latest
openclaw onboard --install-daemon
# 2. Add MCP servers
openclaw mcp set macos-vision-mcp '{"command": "npx", "args": ["macos-vision-mcp"]}'
openclaw mcp set pseudonym-mcp '{"command": "npx", "args": ["pseudonym-mcp", "--engines", "hybrid"]}'
# 3. Optional: full NER
ollama pull llama3
# 4. Verify
openclaw mcp list
Or as ~/.openclaw/openclaw.json:
{
"mcp": {
"servers": {
"macos-vision-mcp": {
"command": "npx",
"args": ["macos-vision-mcp"]
},
"pseudonym-mcp": {
"command": "npx",
"args": ["pseudonym-mcp", "--engines", "hybrid"]
}
}
}
}