FastAPIRabbitMQKeycloakPostgresMinIODocker

AI Document Anonymization Platform

Production-style event-driven microservices platform using FastAPI, RabbitMQ, Keycloak, OCR, NER and RBAC.

Event-driven
Outbox Pattern
JWT + RBAC
OCR + NER
Dockerized

Pipeline Snapshot

10

Core Services

3

Security Layers

7

Smoke Checks

API Gateway validates JWT and enforces RBAC before any internal workload is accepted. Service-to-service trust relies on propagated identity headers and controlled network boundaries.

System Topology

Interactive Architecture Flow

Gateway authentication and role checks control ingress, while outbox-backed event publication drives asynchronous OCR and NER pipelines.

React Flow mini map
AuthenticationGateway / APIMessagingWorkersObject StorageDatabase
Runtime StateAwaiting authenticated job submission
orchestration story

Execution Graph

Pipeline orchestration as a live control-plane walkthrough

A cinematic runtime view of the anonymization DAG, from authenticated ingress through outbox publication, OCR extraction, entity detection, and review-ready artifacts.

Control State

Idle

Awaiting authenticated job submission

Execution

0/10

Sequenced pipeline checkpoints

Completion

0%

Progress snapshot for the DAG walkthrough

Runtime Progress

Replay the sequence to inspect each stage

Waiting for trigger
Ingress0% completeReview API

Orchestration Surface

Control-plane view of sequential state transitions, event publication, and downstream worker execution.

Standby

Security Model

Authentication and RBAC Guardrails

Authentication terminates at the gateway boundary. Internal services trust only propagated identity headers from validated requests.

Keycloak OIDC Authentication

Users authenticate through Keycloak. Access tokens carry issuer, subject and role claims.

JWT Validation at API Gateway

Gateway validates signature, expiry and audience before forwarding to internal services.

Role-Based Access Control

Route-level policies enforce uploader, reviewer and admin permissions at ingress.

Trusted Identity Propagation

Gateway injects trusted `X-User-Sub` and role context for internal service auditing.

Role Matrix

Policy enforced at API Gateway before internal routing.

RoleUploadReviewAdmin
uploaderAllowDenyDeny
reviewerDenyAllowDeny
adminAllowAllowAllow

Engineering Notes

Platform Design Decisions

The system emphasizes reliability, controlled trust boundaries and observable asynchronous workflows.

Outbox Pattern

Document and event records are persisted atomically before asynchronous publication.

Event-Driven Architecture

RabbitMQ decouples ingestion and downstream compute workloads with durable messaging.

Async Workers

OCR and NER workers process jobs independently and scale without gateway pressure.

Artifact Registration

Each generated artifact is addressable through metadata links and traceable job states.

MinIO Object Storage

Binary documents and extracted artifacts are versioned and retained in object storage.

Dockerized Local Platform

Services, broker and storage are orchestrated in reproducible local environments.

Smoke-Tested End-to-End

Critical user and worker paths are validated with deterministic smoke scenarios.

Keycloak + Gateway Security

Identity issuance and request authorization remain centralized and auditable.

Artifacts

Generated Outputs

Pipeline stages produce concrete intermediate artifacts and review-ready responses, not opaque background tasks.

Uploaded PDF

{
  "document_id": "doc_8f19",
  "filename": "fichier_de_test.pdf",
  "size_bytes": 294212,
  "content_type": "application/pdf",
  "stored_at": "s3://documents/doc_8f19/original.pdf"
}

OCR Output

{
  "artifact": "ocr_json",
  "document_id": "doc_8f19",
  "language": "fr",
  "pages": 2,
  "extract": "Fichier de test ..."
}

Detected Entities

MISC: "Fichier de test"confidence 0.98
MISC: "Fichier"confidence 0.94

Review Response

{
  "document_id": "doc_8f19",
  "status": "ready_for_review",
  "entities": 2,
  "artifacts": {
    "ner_json": "s3://artifacts/doc_8f19/ner.json",
    "ocr_json": "s3://artifacts/doc_8f19/ocr.json"
  }
}

Verification

Validated Scenarios

Core security and processing paths are smoke-tested to verify role boundaries and artifact lifecycles.

Upload through API Gateway with JWTPass
JWT validated against KeycloakPass
RBAC enforced at gatewayPass
document.uploaded publishedPass
OCR artifact persistedPass
NER artifact persistedPass
Review entities exposedPass
Role restrictions verifiedPass

This project demonstrates platform engineering concerns across security, messaging, async processing, storage and API design.

Amun Data Consulting - Conseil Stratégique