Strict boundaries and safety guarantees for autonomous infrastructure operations.
CortexOps operates under a strict separation of concerns regarding Artificial Intelligence.
The LLM analyzes telemetry and proposes a root cause and a suggested fix. It has zero direct access to the Kubernetes API.
The workflow engine receives the proposal, subjects it to OPA policy evaluation, requests human approval, and executes safely.
Correlation heuristics and remediation actions are 100% deterministic. Given the same set of telemetry events, CortexOps will always produce the exact same incident grouping and trigger the same workflow path. There are no probabilistic models in the critical execution path.
If a network partition causes NATS JetStream to redeliver an event, or if a Temporal worker crashes and is restarted, the system guarantees that mutations are not applied twice. Idempotency keys and deterministic state machines prevent unintended side effects.
Every `Execute()` call must pass Open Policy Agent (OPA) validation. Policies are written in Rego and distributed across the cluster. Examples include blocking operations on the `kube-system` namespace or requiring 2-person approvals for statefulset scale-downs.
Remediation is not fire-and-forget. The workflow pauses in a `VERIFYING` state to analyze post-patch telemetry. If the system does not stabilize, a pre-calculated compensation transaction is executed to roll back the cluster to its previous state.