CortexOps leverages Temporal to provide durable, replay-safe, and highly-available remediation pipelines. Never lose state during an infrastructure failure.
Every step of a remediation workflow is persisted. If the orchestrator crashes during a node restart, the workflow resumes exactly where it left off upon recovery.
Workflows must pass Open Policy Agent (OPA) checks before modifying cluster state. High-risk actions automatically pause and request human-in-the-loop approval.
All automation scripts are wrapped in exponential backoffs with jitter. Safe, deterministic retries guarantee that temporary network blips don't break the recovery process.