Rules

container_cpu_usage_is_high

9.189s ago

5.218ms

Rule State Error Last Evaluation Evaluation Time
alert: POD_CPU_IS_HIGH expr: sum by(container, pod, namespace) (rate(container_cpu_usage_seconds_total{container!=""}[5m])) * 100 > 90 for: 1m labels: severity: critical annotations: description: Container {{ $labels.container }} CPU usage inside POD {{ $labels.pod}} is high in {{ $labels.namespace}} summary: POD {{ $labels.pod}} CPU Usage is high in {{ $labels.namespace}} ok 9.195s ago 5.204ms

container_memory_usage_is_high

19.587s ago

19.04ms

Rule State Error Last Evaluation Evaluation Time
alert: POD_MEMORY_USAGE_IS_HIGH expr: (sum by(container, pod, namespace) (container_memory_working_set_bytes{container!=""}) / sum by(container, pod, namespace) (container_spec_memory_limit_bytes > 0) * 100) > 80 for: 1m labels: severity: critical annotations: description: |- Container Memory usage is above 80% VALUE = {{ $value }} LABELS = {{ $labels }} summary: Container {{ $labels.container }} Memory usage inside POD {{ $labels.pod}} is high in {{ $labels.namespace}} ok 19.587s ago 19.03ms

node_cpu_greater_than_80

54.452s ago

914.2us

Rule State Error Last Evaluation Evaluation Time
alert: NODE_CPU_IS_HIGH expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 90 for: 1m labels: severity: critical annotations: description: node {{ $labels.kubernetes_node }} cpu is high summary: node cpu is greater than 80 precent ok 54.453s ago 894.6us

node_disk_space_too_low

49.573s ago

1.376ms

Rule State Error Last Evaluation Evaluation Time
alert: NODE_DISK_SPACE_IS_LOW expr: (100 * ((node_filesystem_avail_bytes{fstype!="rootfs",mountpoint="/"}) / (node_filesystem_size_bytes{fstype!="rootfs",mountpoint="/"}))) < 10 for: 1m labels: severity: critical annotations: description: node {{ $labels.node }} disk space is only {{ printf "%0.2f" $value }}% free. summary: node disk space remaining is less than 10 percent ok 49.574s ago 1.36ms

node_down

4.013s ago

631.6us

Rule State Error Last Evaluation Evaluation Time
alert: NODE_DOWN expr: up{component="node-exporter"} == 0 for: 3m labels: severity: warning annotations: description: '{{ $labels.job }} job failed to scrape instance {{ $labels.instance }} for more than 3 minutes. Node Seems to be down' summary: Node {{ $labels.kubernetes_node }} is down ok 4.013s ago 616.6us

node_memory_left_lessser_than_10

11.173s ago

16.39ms

Rule State Error Last Evaluation Evaluation Time
alert: NODE_MEMORY_LESS_THAN_10% expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10 for: 1m labels: severity: critical annotations: description: node {{ $labels.kubernetes_node }} memory left is low summary: node memory left is lesser than 10 precent ok 11.173s ago 16.37ms

Front50-cache

32.729s ago

338.6us

Rule State Error Last Evaluation Evaluation Time
alert: front50:storageServiceSupport:cacheAge__value expr: front50:storageServiceSupport:cacheAge__value > 300000 for: 2m labels: severity: warning annotations: description: front50 cacheAge for {{$labels.pod}} in namespace {{$labels.namespace}} has value = {{$value}} summary: front50 cacheAge too high ok 32.729s ago 325.5us

autopilot-component-jvm-errors

7.087s ago

3.464ms

Rule State Error Last Evaluation Evaluation Time
alert: jvm-memory-filling-up-for-oes-audit-client expr: (sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_used_bytes{app="oes",area="heap",component="auditclient"}) / sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_max_bytes{app="oes",area="heap",component="auditclient"})) * 100 > 90 for: 5m labels: severity: warning annotations: description: |- JVM memory is filling up for {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }} (> 80%) VALUE = {{ $value }} summary: JVM memory filling up for {{ $labels.component }} for pod {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }}) ok 7.088s ago 785.9us
alert: jvm-memory-filling-up-for-oes-autopilot expr: (sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_used_bytes{app="oes",area="heap",component="autopilot"}) / sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_max_bytes{app="oes",area="heap",component="autopilot"})) * 100 > 90 for: 5m labels: severity: warning annotations: description: |- JVM memory is filling up for {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }} (> 80%) VALUE = {{ $value }} summary: JVM memory filling up for {{ $labels.component }} for pod {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }}) ok 7.087s ago 510.5us
alert: jvm-memory-filling-up-for-oes-dashboard expr: (sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_used_bytes{app="oes",area="heap",component="dashboard"}) / sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_max_bytes{app="oes",area="heap",component="autopilot"})) * 100 > 90 for: 5m labels: severity: warning annotations: description: |- JVM memory is filling up for {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }} (> 80%) VALUE = {{ $value }} summary: JVM memory filling up for {{ $labels.component }} for pod {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }}) ok 7.087s ago 483us
alert: jvm-memory-filling-up-for-oes-platform expr: (sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_used_bytes{app="oes",area="heap",component="platform"}) / sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_max_bytes{app="oes",area="heap",component="platform"})) * 100 > 90 for: 5m labels: severity: warning annotations: description: |- JVM memory is filling up for {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }} (> 80%) VALUE = {{ $value }} summary: JVM memory filling up for {{ $labels.component }} for pod {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }}) ok 7.086s ago 540us
alert: jvm-memory-filling-up-for-oes-sapor expr: (sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_used_bytes{app="oes",area="heap",component="sapor"}) / sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_max_bytes{app="oes",area="heap",component="sapor"})) * 100 > 90 for: 5m labels: severity: warning annotations: description: |- JVM memory is filling up for {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }} (> 80%) VALUE = {{ $value }} summary: JVM memory filling up for {{ $labels.component }} for pod {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }}) ok 7.086s ago 454.4us
alert: jvm-memory-filling-up-for-oes-visibility expr: (sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_used_bytes{app="oes",area="heap",component="visibility"}) / sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_max_bytes{app="oes",area="heap",component="visibility"})) * 100 > 90 for: 5m labels: severity: warning annotations: description: |- JVM memory is filling up for {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }} (> 80%) VALUE = {{ $value }} summary: JVM memory filling up for {{ $labels.component }} for pod {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }}) ok 7.086s ago 650.5us

autopilot-component-latency-too-high

11.173s ago

6.645ms

Rule State Error Last Evaluation Evaluation Time
alert: oes-audit-client-latency-too-high expr: sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_sum{component="auditclient"}[2m])) / sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_count{component="auditclient"}[2m])) > 0.5 for: 2m labels: severity: warning annotations: description: Latency of the component {{ $labels.component }} is {{ $value }} seconds for {{ $labels }} summary: Latency of the component {{ $labels.component }} in namespace {{$labels.kubernetes_namespace}} is high ok 11.174s ago 1.091ms
alert: oes-autopilot-latency-too-high expr: sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_sum{component="autopilot"}[2m])) / sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_count{component="autopilot"}[2m])) > 0.5 for: 2m labels: severity: warning annotations: description: Latency of the component {{ $labels.component }} is {{ $value }} seconds for {{ $labels }} summary: Latency of the component {{ $labels.component }} in namespace {{$labels.kubernetes_namespace}} is high ok 11.173s ago 614.9us
alert: oes-dashboard-latency-too-high expr: sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_sum{component="dashboard"}[2m])) / sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_count{component="dashboard"}[2m])) > 0.5 for: 2m labels: severity: warning annotations: description: Latency of the component {{ $labels.component }} is {{ $value }} seconds for {{ $labels }} summary: Latency of the component {{ $labels.component }} in namespace {{$labels.kubernetes_namespace}} is high ok 11.172s ago 504.1us
alert: oes-platform-latency-too-high expr: sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_sum{component="platform"}[2m])) / sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_count{component="platform"}[2m])) > 0.5 for: 2m labels: severity: warning annotations: description: Latency of the component {{ $labels.component }} is {{ $value }} seconds for {{ $labels }} summary: Latency of the component {{ $labels.component }} in namespace {{$labels.kubernetes_namespace}} is high ok 11.172s ago 2.734ms
alert: oes-sapor-latency-too-high expr: sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_sum{component="sapor"}[2m])) / sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_count{component="sapor"}[2m])) > 0.5 for: 2m labels: severity: warning annotations: description: Latency of the component {{ $labels.component }} is {{ $value }} seconds for {{ $labels }} summary: Latency of the component {{ $labels.component }} in namespace {{$labels.kubernetes_namespace}} is high ok 11.169s ago 1.097ms
alert: oes-visibility-latency-too-high expr: sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_sum{component="visibility"}[2m])) / sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_count{component="visibility"}[2m])) > 0.5 for: 2m labels: severity: warning annotations: description: Latency of the component {{ $labels.component }} is {{ $value }} seconds for {{ $labels }} summary: Latency of the component {{ $labels.component }} in namespace {{$labels.kubernetes_namespace}} is high ok 11.169s ago 569.5us

autopilot-scrape-target-is-down

35.43s ago

1.189ms

Rule State Error Last Evaluation Evaluation Time
alert: oes-audit-client-scrape-target-is-down expr: up{component="auditclient"} == 0 labels: severity: critical annotations: description: The scrape target endpoint of component {{$labels.component}} in namespace {{$labels.kubernetes_namespace}} is down summary: oes-audit-client scrape target is down ok 35.431s ago 389.4us
alert: oes-autopilot-scrape-target-is-down expr: up{component="autopilot"} == 0 labels: severity: critical annotations: description: The scrape target endpoint of component {{$labels.component}} in namespace {{$labels.kubernetes_namespace}} is down summary: oes-autopilot scrape target is down ok 35.43s ago 162.1us
alert: oes-dashboard-scrape-target-is-down expr: up{component="dashboard"} == 0 labels: severity: critical annotations: description: The scrape target endpoint of component {{$labels.component}} in namespace {{$labels.kubernetes_namespace}} is down summary: oes-dashboard scrape target is down ok 35.431s ago 149.9us
alert: oes-platform-scrape-target-is-down expr: up{component="platform"} == 0 labels: severity: critical annotations: description: The scrape target endpoint of component {{$labels.component}} in namespace {{$labels.kubernetes_namespace}} is down summary: oes-platform scrape target is down ok 35.431s ago 155.5us
alert: oes-sapor-scrape-target-is-down expr: up{component="sapor"} == 0 labels: severity: critical annotations: description: The scrape target endpoint of component {{$labels.component}} in namespace {{$labels.kubernetes_namespace}} is down summary: oes-sapor scrape target is down ok 35.431s ago 150.5us
alert: oes-visibility-scrape-target-is-down expr: up{component="visibility"} == 0 labels: severity: critical annotations: description: The scrape target endpoint of component {{$labels.component}} in namespace {{$labels.kubernetes_namespace}} is down summary: oes-visibility scrape target is down ok 35.431s ago 147.3us

igor-needs-attention

36.574s ago

401.3us

Rule State Error Last Evaluation Evaluation Time
alert: igor-needs-attention expr: igor:pollingMonitor:itemsOverThreshold__value > 0 labels: severity: crtical annotations: description: Igor in namespace {{$labels.namespace}} needs human help summary: Igor needs attention ok 36.574s ago 388us

jvm-too-high

39.135s ago

3.802ms

Rule State Error Last Evaluation Evaluation Time
alert: clouddriver-rw-pod-may-be-evicted-soon expr: (sum by(instance, area) (clouddriver_rw:jvm:memory:used__value) / sum by(instance, area) (clouddriver_rw:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: Clouddriver-rw JVM memory too high ok 39.135s ago 963.8us
alert: clouddriver-ro-pod-may-be-evicted-soon expr: (sum by(instance, area) (clouddriver_ro:jvm:memory:used__value) / sum by(instance, area) (clouddriver_ro:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: Clouddriver-ro JVM memory too high ok 39.135s ago 360.7us
alert: clouddriver-caching-pod-may-be-evicted-soon expr: (sum by(instance, area) (clouddriver_caching:jvm:memory:used__value) / sum by(instance, area) (clouddriver_caching:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: Clouddriver-caching JVM memory too high ok 39.134s ago 212.3us
alert: gate-pod-may-be-evicted-soon expr: (sum by(instance, area) (gate:jvm:memory:used__value) / sum by(instance, area) (gate:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: gate JVM memory too high ok 39.135s ago 1.2ms
alert: orca-pod-may-be-evicted-soon expr: (sum by(instance, area) (orca:jvm:gc:liveDataSize__value) / sum by(instance, area) (orca:jvm:gc:maxDataSize__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: orca JVM memory too high ok 39.134s ago 198.8us
alert: igor-pod-may-be-evicted-soon expr: (sum by(instance, area) (igor:jvm:memory:used__value) / sum by(instance, area) (igor:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: igor JVM memory too high ok 39.134s ago 197.5us
alert: echo-scheduler-pod-may-be-evicted-soon expr: (sum by(instance, area) (echo_scheduler:jvm:memory:used__value) / sum by(instance, area) (echo_scheduler:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: echo-scheduler JVM memory too high ok 39.134s ago 213.7us
alert: echo-worker-pod-may-be-evicted-soon expr: (sum by(instance, area) (echo_worker:jvm:memory:used__value) / sum by(instance, area) (echo_worker:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: echo-worker JVM memory too high ok 39.134s ago 170.8us
alert: front50-pod-may-be-evicted-soon expr: (sum by(instance, area) (front50:jvm:memory:used__value) / sum by(instance, area) (front50:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: Front50 JVM memory too high ok 39.134s ago 174.2us

kube-api-server-is-down

6.55s ago

532.7us

Rule State Error Last Evaluation Evaluation Time
alert: kube-api-server-down expr: up{job="kubernetes-apiservers"} == 0 for: 2m labels: severity: critical annotations: description: Kubernetes API Server service went down LABELS = {{ $labels }} summary: Kube API Server job {{ $labels.job }} is down ok 6.55s ago 519.1us

kubernetes-api-server-experiencing-high-error-rate

47.454s ago

33.73ms

Rule State Error Last Evaluation Evaluation Time
alert: kube-api-server-errors expr: sum(rate(apiserver_request_total{code=~"^(?:5..)$",job="kubernetes-apiservers"}[2m])) / sum(rate(apiserver_request_total{job="kubernetes-apiservers"}[2m])) * 100 > 3 for: 2m labels: severity: critical annotations: description: |- Kubernetes API server is experiencing high error rate VALUE = {{ $value }} LABELS = {{ $labels }} summary: Kubernetes API server errors (instance {{ $labels.instance }}) ok 47.455s ago 33.72ms

latency-too-high

36.225s ago

7.14ms

Rule State Error Last Evaluation Evaluation Time
alert: clouddriver-ro-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_ro:controller:invocations__total{service="spin-clouddriver-ro"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_ro:controller:invocations__count_total{service="spin-clouddriver-ro"}[5m])) > 1 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 36.225s ago 2.761ms
alert: clouddriver-rw-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_rw:controller:invocations__total{service="spin-clouddriver-rw"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_rw:controller:invocations__count_total{service="spin-clouddriver-rw"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is ({{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 36.223s ago 423.5us
alert: clouddriver-caching-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_caching:controller:invocations__total{service="spin-clouddriver-caching"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_caching:controller:invocations__count_total{service="spin-clouddriver-caching"}[5m])) > 5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 36.223s ago 399.1us
alert: clouddriver_ro_deck-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_ro_deck:controller:invocations__total{service="spin-clouddriver-ro-deck"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_ro_deck:controller:invocations__total{service="spin-clouddriver-ro-deck"}[5m])) > 5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 36.222s ago 388.3us
alert: gate-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(gate:controller:invocations__total{service="spin-gate"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(gate:controller:invocations__count_total{service="spin-gate"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 36.222s ago 373.9us
alert: orca-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(orca:controller:invocations__total{service="spin-orca"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(orca:controller:invocations__count_total{service="spin-orca"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 36.222s ago 393.8us
alert: igor-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(igor:controller:invocations__total{service="spin-igor"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(igor:controller:invocations__count_total{service="spin-igor"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 36.222s ago 458us
alert: echo_scheduler-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(echo_scheduler:controller:invocations__total{service="spin-echo-scheduler"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(echo_scheduler:controller:invocations__count_total{service="spin-echo-scheduler"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 36.222s ago 363us
alert: echo_worker-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(echo_worker:controller:invocations__total{service="spin-echo-worker"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(echo_worker:controller:invocations__count_total{service="spin-echo-worker"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 36.221s ago 363.4us
alert: front50-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(front50:controller:invocations__total{service="spin-front50"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(front50:controller:invocations__count_total{service="spin-front50"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 36.221s ago 382.3us
alert: fiat-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(fiat:controller:invocations__total{service="spin-fiat"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(fiat:controller:invocations__count_total{service="spin-fiat"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 36.221s ago 363.2us
alert: rosco-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(rosco:controller:invocations__total{service="spin-rosco"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(rosco:controller:invocations__count_total{service="spin-rosco"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 36.221s ago 400us

orca-queue-issue

32.298s ago

1.398ms

Rule State Error Last Evaluation Evaluation Time
alert: orca-queue-depth-high expr: (sum by(instance) (orca:queue:ready:depth__value{namespace!=""})) > 10 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: Orca queue depth is high ok 32.299s ago 1.097ms
alert: orca-queue-lag-high expr: sum by(instance, service, namespace) (rate(orca:controller:invocations__total[2m])) / sum by(instance, service, namespace) (rate(orca:controller:invocations__count_total[2m])) > 0.5 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} has Lag value of {{$value}} summary: Orca queue lag is high ok 32.298s ago 282.2us

prometheus-job-down

11.111s ago

392.6us

Rule State Error Last Evaluation Evaluation Time
alert: prometheus-job-is-down expr: up{job="prometheus"} == 0 for: 5m labels: severity: warning annotations: description: Default Prometheus Job is Down LABELS = {{ $labels }} summary: The Default Prometheus Job is Down (job {{ $labels.job}}) ok 11.112s ago 384.7us

spinnaker-service-is-down

37.045s ago

4.667ms

Rule State Error Last Evaluation Evaluation Time
alert: clouddriver-rw-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-clouddriver-rw"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Clouddriver-rw Spinnaker service is down ok 37.045s ago 726.7us
alert: clouddriver-ro-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-clouddriver-ro"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Clouddriver-ro Spinnaker service is down ok 37.045s ago 462.2us
alert: clouddriver-caching-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-clouddriver-caching"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Clouddriver-caching Spinnaker service is down ok 37.044s ago 408.1us
alert: clouddriver-ro-deck-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-clouddriver-ro-deck"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Clouddriver-ro-deck Spinnaker service is down ok 37.044s ago 289.8us
alert: gate-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-gate"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Gate Spinnaker services is down ok 37.044s ago 1.153ms
alert: orca-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-orca"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Orca Spinnaker service is down ok 37.043s ago 438.8us
alert: igor-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-igor"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Igor Spinnaker service is down ok 37.043s ago 227.3us
alert: echo-scheduler-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-echo-scheduler"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Echo-Scheduler Spinnaker service is down ok 37.043s ago 171.6us
alert: echo-worker-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-echo-worker"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Echo-worker Spinnaker service is down ok 37.043s ago 157us
alert: front50-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-front50"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Front50 Spinnaker service is down ok 37.042s ago 136.8us
alert: fiat-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-fiat"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Fiat Spinnaker service is down ok 37.043s ago 192.8us
alert: rosco-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-rosco"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Rosco Spinnaker service is down ok 37.042s ago 158.7us

volume-is-almost-full (< 10% left)

44.065s ago

2.743ms

Rule State Error Last Evaluation Evaluation Time
alert: pvc-storage-full expr: kubelet_volume_stats_available_bytes / kubelet_volume_stats_capacity_bytes * 100 < 10 for: 2m labels: severity: warning annotations: description: |- Volume is almost full (< 10% left) VALUE = {{ $value }} LABELS = {{ $labels }} summary: Kubernetes Volume running out of disk space for (persistentvolumeclaim {{ $labels.persistentvolumeclaim }} in namespace {{$labels.namespace}}) ok 44.065s ago 2.73ms