Rules

container_cpu_usage_is_high

51.191s ago

497.8us

Rule State Error Last Evaluation Evaluation Time
alert: POD_CPU_IS_HIGH expr: sum by(container, pod, namespace) (rate(container_cpu_usage_seconds_total{container!=""}[5m])) * 100 > 90 for: 1m labels: severity: critical annotations: description: Container {{ $labels.container }} CPU usage inside POD {{ $labels.pod}} is high in {{ $labels.namespace}} summary: POD {{ $labels.pod}} CPU Usage is high in {{ $labels.namespace}} ok 51.196s ago 477.8us

container_memory_usage_is_high

1.763s ago

676.6us

Rule State Error Last Evaluation Evaluation Time
alert: POD_MEMORY_USAGE_IS_HIGH expr: (sum by(container, pod, namespace) (container_memory_working_set_bytes{container!=""}) / sum by(container, pod, namespace) (container_spec_memory_limit_bytes > 0) * 100) > 80 for: 1m labels: severity: critical annotations: description: |- Container Memory usage is above 80% VALUE = {{ $value }} LABELS = {{ $labels }} summary: Container {{ $labels.container }} Memory usage inside POD {{ $labels.pod}} is high in {{ $labels.namespace}} ok 1.764s ago 662.4us

node_cpu_greater_than_80

36.454s ago

5.436ms

Rule State Error Last Evaluation Evaluation Time
alert: NODE_CPU_IS_HIGH expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 90 for: 1m labels: severity: critical annotations: description: node {{ $labels.kubernetes_node }} cpu is high summary: node cpu is greater than 80 precent ok 36.455s ago 5.423ms

node_disk_space_too_low

31.577s ago

1.428ms

Rule State Error Last Evaluation Evaluation Time
alert: NODE_DISK_SPACE_IS_LOW expr: (100 * ((node_filesystem_avail_bytes{fstype!="rootfs",mountpoint="/"}) / (node_filesystem_size_bytes{fstype!="rootfs",mountpoint="/"}))) < 10 for: 1m labels: severity: critical annotations: description: node {{ $labels.node }} disk space is only {{ printf "%0.2f" $value }}% free. summary: node disk space remaining is less than 10 percent ok 31.577s ago 1.416ms

node_down

46.014s ago

440.4us

Rule State Error Last Evaluation Evaluation Time
alert: NODE_DOWN expr: up{component="node-exporter"} == 0 for: 3m labels: severity: warning annotations: description: '{{ $labels.job }} job failed to scrape instance {{ $labels.instance }} for more than 3 minutes. Node Seems to be down' summary: Node {{ $labels.kubernetes_node }} is down ok 46.014s ago 429.1us

node_memory_left_lessser_than_10

56.729s ago

1.754ms

Rule State Error Last Evaluation Evaluation Time
alert: NODE_MEMORY_LESS_THAN_10% expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10 for: 1m labels: severity: critical annotations: description: node {{ $labels.kubernetes_node }} memory left is low summary: node memory left is lesser than 10 precent ok 56.729s ago 1.739ms

Front50-cache

14.77s ago

305.1us

Rule State Error Last Evaluation Evaluation Time
alert: front50:storageServiceSupport:cacheAge__value expr: front50:storageServiceSupport:cacheAge__value > 300000 for: 2m labels: severity: warning annotations: description: front50 cacheAge for {{$labels.pod}} in namespace {{$labels.namespace}} has value = {{$value}} summary: front50 cacheAge too high ok 14.77s ago 291.9us

autopilot-component-jvm-errors

49.09s ago

3.243ms

Rule State Error Last Evaluation Evaluation Time
alert: jvm-memory-filling-up-for-oes-audit-client expr: (sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_used_bytes{app="oes",area="heap",component="auditclient"}) / sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_max_bytes{app="oes",area="heap",component="auditclient"})) * 100 > 90 for: 5m labels: severity: warning annotations: description: |- JVM memory is filling up for {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }} (> 80%) VALUE = {{ $value }} summary: JVM memory filling up for {{ $labels.component }} for pod {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }}) ok 49.091s ago 788.1us
alert: jvm-memory-filling-up-for-oes-autopilot expr: (sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_used_bytes{app="oes",area="heap",component="autopilot"}) / sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_max_bytes{app="oes",area="heap",component="autopilot"})) * 100 > 90 for: 5m labels: severity: warning annotations: description: |- JVM memory is filling up for {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }} (> 80%) VALUE = {{ $value }} summary: JVM memory filling up for {{ $labels.component }} for pod {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }}) ok 49.09s ago 540.2us
alert: jvm-memory-filling-up-for-oes-dashboard expr: (sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_used_bytes{app="oes",area="heap",component="dashboard"}) / sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_max_bytes{app="oes",area="heap",component="autopilot"})) * 100 > 90 for: 5m labels: severity: warning annotations: description: |- JVM memory is filling up for {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }} (> 80%) VALUE = {{ $value }} summary: JVM memory filling up for {{ $labels.component }} for pod {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }}) ok 49.09s ago 470.1us
alert: jvm-memory-filling-up-for-oes-platform expr: (sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_used_bytes{app="oes",area="heap",component="platform"}) / sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_max_bytes{app="oes",area="heap",component="platform"})) * 100 > 90 for: 5m labels: severity: warning annotations: description: |- JVM memory is filling up for {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }} (> 80%) VALUE = {{ $value }} summary: JVM memory filling up for {{ $labels.component }} for pod {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }}) ok 49.09s ago 485.5us
alert: jvm-memory-filling-up-for-oes-sapor expr: (sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_used_bytes{app="oes",area="heap",component="sapor"}) / sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_max_bytes{app="oes",area="heap",component="sapor"})) * 100 > 90 for: 5m labels: severity: warning annotations: description: |- JVM memory is filling up for {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }} (> 80%) VALUE = {{ $value }} summary: JVM memory filling up for {{ $labels.component }} for pod {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }}) ok 49.089s ago 471us
alert: jvm-memory-filling-up-for-oes-visibility expr: (sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_used_bytes{app="oes",area="heap",component="visibility"}) / sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_max_bytes{app="oes",area="heap",component="visibility"})) * 100 > 90 for: 5m labels: severity: warning annotations: description: |- JVM memory is filling up for {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }} (> 80%) VALUE = {{ $value }} summary: JVM memory filling up for {{ $labels.component }} for pod {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }}) ok 49.089s ago 446.5us

autopilot-component-latency-too-high

55.826s ago

6.223ms

Rule State Error Last Evaluation Evaluation Time
alert: oes-audit-client-latency-too-high expr: sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_sum{component="auditclient"}[2m])) / sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_count{component="auditclient"}[2m])) > 0.5 for: 2m labels: severity: warning annotations: description: Latency of the component {{ $labels.component }} is {{ $value }} seconds for {{ $labels }} summary: Latency of the component {{ $labels.component }} in namespace {{$labels.kubernetes_namespace}} is high ok 55.826s ago 1.264ms
alert: oes-autopilot-latency-too-high expr: sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_sum{component="autopilot"}[2m])) / sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_count{component="autopilot"}[2m])) > 0.5 for: 2m labels: severity: warning annotations: description: Latency of the component {{ $labels.component }} is {{ $value }} seconds for {{ $labels }} summary: Latency of the component {{ $labels.component }} in namespace {{$labels.kubernetes_namespace}} is high ok 55.825s ago 797.4us
alert: oes-dashboard-latency-too-high expr: sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_sum{component="dashboard"}[2m])) / sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_count{component="dashboard"}[2m])) > 0.5 for: 2m labels: severity: warning annotations: description: Latency of the component {{ $labels.component }} is {{ $value }} seconds for {{ $labels }} summary: Latency of the component {{ $labels.component }} in namespace {{$labels.kubernetes_namespace}} is high ok 55.825s ago 705us
alert: oes-platform-latency-too-high expr: sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_sum{component="platform"}[2m])) / sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_count{component="platform"}[2m])) > 0.5 for: 2m labels: severity: warning annotations: description: Latency of the component {{ $labels.component }} is {{ $value }} seconds for {{ $labels }} summary: Latency of the component {{ $labels.component }} in namespace {{$labels.kubernetes_namespace}} is high ok 55.825s ago 1.923ms
alert: oes-sapor-latency-too-high expr: sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_sum{component="sapor"}[2m])) / sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_count{component="sapor"}[2m])) > 0.5 for: 2m labels: severity: warning annotations: description: Latency of the component {{ $labels.component }} is {{ $value }} seconds for {{ $labels }} summary: Latency of the component {{ $labels.component }} in namespace {{$labels.kubernetes_namespace}} is high ok 55.823s ago 867.6us
alert: oes-visibility-latency-too-high expr: sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_sum{component="visibility"}[2m])) / sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_count{component="visibility"}[2m])) > 0.5 for: 2m labels: severity: warning annotations: description: Latency of the component {{ $labels.component }} is {{ $value }} seconds for {{ $labels }} summary: Latency of the component {{ $labels.component }} in namespace {{$labels.kubernetes_namespace}} is high ok 55.823s ago 626.2us

autopilot-scrape-target-is-down

17.446s ago

2.479ms

Rule State Error Last Evaluation Evaluation Time
alert: oes-audit-client-scrape-target-is-down expr: up{component="auditclient"} == 0 labels: severity: critical annotations: description: The scrape target endpoint of component {{$labels.component}} in namespace {{$labels.kubernetes_namespace}} is down summary: oes-audit-client scrape target is down ok 17.446s ago 385.9us
alert: oes-autopilot-scrape-target-is-down expr: up{component="autopilot"} == 0 labels: severity: critical annotations: description: The scrape target endpoint of component {{$labels.component}} in namespace {{$labels.kubernetes_namespace}} is down summary: oes-autopilot scrape target is down ok 17.446s ago 184us
alert: oes-dashboard-scrape-target-is-down expr: up{component="dashboard"} == 0 labels: severity: critical annotations: description: The scrape target endpoint of component {{$labels.component}} in namespace {{$labels.kubernetes_namespace}} is down summary: oes-dashboard scrape target is down ok 17.446s ago 682us
alert: oes-platform-scrape-target-is-down expr: up{component="platform"} == 0 labels: severity: critical annotations: description: The scrape target endpoint of component {{$labels.component}} in namespace {{$labels.kubernetes_namespace}} is down summary: oes-platform scrape target is down ok 17.445s ago 284.9us
alert: oes-sapor-scrape-target-is-down expr: up{component="sapor"} == 0 labels: severity: critical annotations: description: The scrape target endpoint of component {{$labels.component}} in namespace {{$labels.kubernetes_namespace}} is down summary: oes-sapor scrape target is down ok 17.445s ago 243.1us
alert: oes-visibility-scrape-target-is-down expr: up{component="visibility"} == 0 labels: severity: critical annotations: description: The scrape target endpoint of component {{$labels.component}} in namespace {{$labels.kubernetes_namespace}} is down summary: oes-visibility scrape target is down ok 17.445s ago 660.2us

igor-needs-attention

18.577s ago

328.7us

Rule State Error Last Evaluation Evaluation Time
alert: igor-needs-attention expr: igor:pollingMonitor:itemsOverThreshold__value > 0 labels: severity: crtical annotations: description: Igor in namespace {{$labels.namespace}} needs human help summary: Igor needs attention ok 18.578s ago 316.4us

jvm-too-high

21.138s ago

2.722ms

Rule State Error Last Evaluation Evaluation Time
alert: clouddriver-rw-pod-may-be-evicted-soon expr: (sum by(instance, area) (clouddriver_rw:jvm:memory:used__value) / sum by(instance, area) (clouddriver_rw:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: Clouddriver-rw JVM memory too high ok 21.138s ago 510.3us
alert: clouddriver-ro-pod-may-be-evicted-soon expr: (sum by(instance, area) (clouddriver_ro:jvm:memory:used__value) / sum by(instance, area) (clouddriver_ro:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: Clouddriver-ro JVM memory too high ok 21.138s ago 214.2us
alert: clouddriver-caching-pod-may-be-evicted-soon expr: (sum by(instance, area) (clouddriver_caching:jvm:memory:used__value) / sum by(instance, area) (clouddriver_caching:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: Clouddriver-caching JVM memory too high ok 21.138s ago 189.2us
alert: gate-pod-may-be-evicted-soon expr: (sum by(instance, area) (gate:jvm:memory:used__value) / sum by(instance, area) (gate:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: gate JVM memory too high ok 21.138s ago 401.8us
alert: orca-pod-may-be-evicted-soon expr: (sum by(instance, area) (orca:jvm:gc:liveDataSize__value) / sum by(instance, area) (orca:jvm:gc:maxDataSize__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: orca JVM memory too high ok 21.138s ago 355us
alert: igor-pod-may-be-evicted-soon expr: (sum by(instance, area) (igor:jvm:memory:used__value) / sum by(instance, area) (igor:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: igor JVM memory too high ok 21.138s ago 228.5us
alert: echo-scheduler-pod-may-be-evicted-soon expr: (sum by(instance, area) (echo_scheduler:jvm:memory:used__value) / sum by(instance, area) (echo_scheduler:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: echo-scheduler JVM memory too high ok 21.138s ago 250.5us
alert: echo-worker-pod-may-be-evicted-soon expr: (sum by(instance, area) (echo_worker:jvm:memory:used__value) / sum by(instance, area) (echo_worker:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: echo-worker JVM memory too high ok 21.138s ago 246.1us
alert: front50-pod-may-be-evicted-soon expr: (sum by(instance, area) (front50:jvm:memory:used__value) / sum by(instance, area) (front50:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: Front50 JVM memory too high ok 21.138s ago 270.4us

kube-api-server-is-down

48.559s ago

1.081ms

Rule State Error Last Evaluation Evaluation Time
alert: kube-api-server-down expr: up{job="kubernetes-apiservers"} == 0 for: 2m labels: severity: critical annotations: description: Kubernetes API Server service went down LABELS = {{ $labels }} summary: Kube API Server job {{ $labels.job }} is down ok 48.559s ago 1.065ms

kubernetes-api-server-experiencing-high-error-rate

29.457s ago

760.2us

Rule State Error Last Evaluation Evaluation Time
alert: kube-api-server-errors expr: sum(rate(apiserver_request_total{code=~"^(?:5..)$",job="kubernetes-apiservers"}[2m])) / sum(rate(apiserver_request_total{job="kubernetes-apiservers"}[2m])) * 100 > 3 for: 2m labels: severity: critical annotations: description: |- Kubernetes API server is experiencing high error rate VALUE = {{ $value }} LABELS = {{ $labels }} summary: Kubernetes API server errors (instance {{ $labels.instance }}) ok 29.457s ago 744.1us

latency-too-high

18.237s ago

4.595ms

Rule State Error Last Evaluation Evaluation Time
alert: clouddriver-ro-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_ro:controller:invocations__total{service="spin-clouddriver-ro"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_ro:controller:invocations__count_total{service="spin-clouddriver-ro"}[5m])) > 1 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 18.237s ago 532.7us
alert: clouddriver-rw-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_rw:controller:invocations__total{service="spin-clouddriver-rw"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_rw:controller:invocations__count_total{service="spin-clouddriver-rw"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is ({{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 18.237s ago 306.5us
alert: clouddriver-caching-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_caching:controller:invocations__total{service="spin-clouddriver-caching"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_caching:controller:invocations__count_total{service="spin-clouddriver-caching"}[5m])) > 5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 18.236s ago 1.543ms
alert: clouddriver_ro_deck-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_ro_deck:controller:invocations__total{service="spin-clouddriver-ro-deck"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_ro_deck:controller:invocations__total{service="spin-clouddriver-ro-deck"}[5m])) > 5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 18.235s ago 282.9us
alert: gate-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(gate:controller:invocations__total{service="spin-gate"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(gate:controller:invocations__count_total{service="spin-gate"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 18.235s ago 244.3us
alert: orca-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(orca:controller:invocations__total{service="spin-orca"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(orca:controller:invocations__count_total{service="spin-orca"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 18.235s ago 231.4us
alert: igor-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(igor:controller:invocations__total{service="spin-igor"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(igor:controller:invocations__count_total{service="spin-igor"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 18.235s ago 233.4us
alert: echo_scheduler-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(echo_scheduler:controller:invocations__total{service="spin-echo-scheduler"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(echo_scheduler:controller:invocations__count_total{service="spin-echo-scheduler"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 18.235s ago 229.8us
alert: echo_worker-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(echo_worker:controller:invocations__total{service="spin-echo-worker"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(echo_worker:controller:invocations__count_total{service="spin-echo-worker"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 18.235s ago 233.7us
alert: front50-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(front50:controller:invocations__total{service="spin-front50"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(front50:controller:invocations__count_total{service="spin-front50"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 18.235s ago 229.5us
alert: fiat-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(fiat:controller:invocations__total{service="spin-fiat"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(fiat:controller:invocations__count_total{service="spin-fiat"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 18.235s ago 232us
alert: rosco-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(rosco:controller:invocations__total{service="spin-rosco"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(rosco:controller:invocations__count_total{service="spin-rosco"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 18.235s ago 229.7us

orca-queue-issue

14.307s ago

848.1us

Rule State Error Last Evaluation Evaluation Time
alert: orca-queue-depth-high expr: (sum by(instance) (orca:queue:ready:depth__value{namespace!=""})) > 10 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: Orca queue depth is high ok 14.308s ago 486.8us
alert: orca-queue-lag-high expr: sum by(instance, service, namespace) (rate(orca:controller:invocations__total[2m])) / sum by(instance, service, namespace) (rate(orca:controller:invocations__count_total[2m])) > 0.5 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} has Lag value of {{$value}} summary: Orca queue lag is high ok 14.307s ago 342.7us

prometheus-job-down

55.549s ago

377.1us

Rule State Error Last Evaluation Evaluation Time
alert: prometheus-job-is-down expr: up{job="prometheus"} == 0 for: 5m labels: severity: warning annotations: description: Default Prometheus Job is Down LABELS = {{ $labels }} summary: The Default Prometheus Job is Down (job {{ $labels.job}}) ok 55.549s ago 355.6us

spinnaker-service-is-down

19.051s ago

2.281ms

Rule State Error Last Evaluation Evaluation Time
alert: clouddriver-rw-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-clouddriver-rw"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Clouddriver-rw Spinnaker service is down ok 19.051s ago 349.4us
alert: clouddriver-ro-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-clouddriver-ro"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Clouddriver-ro Spinnaker service is down ok 19.051s ago 176.3us
alert: clouddriver-caching-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-clouddriver-caching"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Clouddriver-caching Spinnaker service is down ok 19.051s ago 209.9us
alert: clouddriver-ro-deck-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-clouddriver-ro-deck"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Clouddriver-ro-deck Spinnaker service is down ok 19.051s ago 146.8us
alert: gate-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-gate"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Gate Spinnaker services is down ok 19.051s ago 179.9us
alert: orca-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-orca"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Orca Spinnaker service is down ok 19.051s ago 141.5us
alert: igor-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-igor"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Igor Spinnaker service is down ok 19.051s ago 161.4us
alert: echo-scheduler-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-echo-scheduler"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Echo-Scheduler Spinnaker service is down ok 19.051s ago 238us
alert: echo-worker-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-echo-worker"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Echo-worker Spinnaker service is down ok 19.05s ago 181.8us
alert: front50-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-front50"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Front50 Spinnaker service is down ok 19.05s ago 143.7us
alert: fiat-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-fiat"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Fiat Spinnaker service is down ok 19.05s ago 129.3us
alert: rosco-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-rosco"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Rosco Spinnaker service is down ok 19.05s ago 127.2us

volume-is-almost-full (< 10% left)

26.089s ago

499.7us

Rule State Error Last Evaluation Evaluation Time
alert: pvc-storage-full expr: kubelet_volume_stats_available_bytes / kubelet_volume_stats_capacity_bytes * 100 < 10 for: 2m labels: severity: warning annotations: description: |- Volume is almost full (< 10% left) VALUE = {{ $value }} LABELS = {{ $labels }} summary: Kubernetes Volume running out of disk space for (persistentvolumeclaim {{ $labels.persistentvolumeclaim }} in namespace {{$labels.namespace}}) ok 26.089s ago 477.2us