The independent microservices that make up the CortexOps control plane.
Purpose: Telemetry Ingestion
Kubernetes API Watch Streams, Prometheus metrics.
Normalized protobuf events to NATS JetStream.
Kubernetes API, NATS JetStream.
If K8s API is unreachable, local caching handles retries. If NATS is down, backpressure is applied to prevent OOM.
Purpose: Incident Grouping
Event streams from NATS JetStream.
Correlated incident objects to RCA Service.
NATS, Topology Service, PostgreSQL.
Uses deterministic heuristic scoring. If Topology Service is slow, correlation degrades gracefully using cached relationships.
Purpose: Blast Radius Analysis
State change events from K8s.
Graph queries (ancestors, descendants).
PostgreSQL (for async persistence).
Runs in-memory. If the pod crashes, it restores the graph from Postgres snapshots upon restart.
Purpose: Advisory Root Cause Generation
Incidents, Qdrant vectors (historical docs).
Advisory RCAReport object.
Qdrant, External LLM Provider.
If the LLM is unavailable, it fails closed to deterministic rules. The AI is advisory only and never mutates state.
Purpose: Workflow Execution & Governance
RCAReport, Human Approval.
Infrastructure mutations via K8s API.
Temporal, OPA, Kubernetes API.
If OPA denies the action, execution is aborted. If mutation fails, Temporal triggers the compensation (rollback) transaction.