Quote
This awesome collections is a place where store articles, blogs, tools and strategies to using monitoring and observability stack in your project
General
Monitoring Stacks
-
ELK: Combination from four main components. Guide, Helm and Docker
- Beats: lightweight, single-purpose data shippers that can send data from hundreds or thousands of machines to either Logstash or Elasticsearch.
- Elasticsearch: a distributed REST search engine which stores all of the collected data.
- Kibana: a web interface for searching and visualizing logs.
- Logstash: the data processing component of the Elastic Stack which sends incoming data to Elasticsearch.
-
Grafana Stack: Ecosystem of Grafana for providing monitor methodology for your application, container, nodes and moreover.
- Agent: alloy, agent, mimir
- Dashboard: FlameGraph - (Pyroscope), gitana
- Installer: Helm, Docker and Guide
- Logging: fluentd, fluent-bit, loki, promtail
- Metrics: prometheus, cadvisor, node_exporter, thanos
- Profiling: pyroscope
- Traces: jaeger, OpenTelemetry, tempo
- SLO: sloth
-
VictoriaMetrics Stack: New monitoring solution for both metrics and logs
- VictoriaMetrics is a fast, cost-effective and scalable monitoring solution and time series database
- VictoriaLogs is open source user-friendly database for logs from VictoriaMetrics.
-
dcgm: Manage and Monitor GPUs in Cluster Environments
-
zabbix: Real-time monitoring of IT components and services, such as networks, servers, VMs, applications and the cloud.
Product Error Analytics
- openreplay: Session replay and analytics tool you can self-host. Ideal for reproducing issues, co-browsing with users and optimizing your product.
- Sentry: Developer-first error tracking and performance monitoring. Website
Repositories
- awesome-monitoring: List of tools for monitoring and analyze everything
- awesome-observability: Awesome observability page
Technique Articles
- Medium - Observability Series: A Step-by-Step Guide to Logs, Traces, and Metrics
- Grafana - Private data source connect (PDC)
Technology Articles
- Medium - Grafana Alloy & OpenTelemetry
- Medium - SLOs should be easy, say hi to Sloth
- Medium - Observability 2.0 with AWS OpenTelemetry Collector
- DevOps Cube - Prometheus Architecture: Complete Breakdown of Key Components
- Medium - 6 Best Free OnCall Software in 2024, Open-Source and SaaS
- Medium - 11 Automation Scripts for Prometheus Configurations.
Web Analytics
- Plausible: Simple, open source, lightweight (< 1 KB) and privacy-friendly web analytics alternative to Google Analytics.
- umami: Umami is a simple, fast, privacy-focused alternative to Google Analytics.
Docker Collections
Info
The place where store and reuse
Dockerfile
anddocker-commpose
in use for monitoring cluster
Grafana, Prometheus and Exporter
# Author: XeusNguyen - NTMA for Anomally Detection
# Github: https://github.com/Xeus-Territory/ntma_anomaly/blob/main/Infrastructure/docker/get-data-metric-compose.yaml
version: '3'
volumes:
prometheus_data: {}
grafana_data: {}
alertmanager_data: {}
networks:
monitoring:
external: true
services:
prometheus:
image: prom/prometheus:v2.37.6
deploy:
resources:
limits:
cpus: '0.50'
memory: 500M
container_name: prometheus
restart: unless-stopped
healthcheck:
test: wget --quiet --tries=1 --spider http://localhost:9090
interval: 30s
timeout: 10s
retries: 5
command:
- '--config.file=/etc/prometheus/prometheus.yml'
-
- '--web.enable-lifecycle'
- '--storage.tsdb.path=/prometheus'
volumes:
- prometheus_data:/prometheus
- ./conf/monitoring/prometheus/:/etc/prometheus/
ports:
- 9090:9090
labels:
org.label-schema.group: "monitoring"
networks:
- "monitoring"
alertmanager:
image: prom/alertmanager:v0.25.0
deploy:
resources:
limits:
cpus: '0.10'
memory: 100M
container_name: alert-manager
restart: unless-stopped
healthcheck:
test: wget --quiet --tries=1 --spider http://localhost:9093
interval: 30s
timeout: 10s
retries: 5
command:
- '--config.file=/etc/alertmanager/alertmanager.yml'
- '--storage.path=/alertmanager'
ports:
- 9093:9093
volumes:
- alertmanager_data:/alertmanager
- ./conf/monitoring/alertmanager/:/etc/alertmanager/
depends_on:
- prometheus
labels:
org.label-schema.group: "monitoring"
networks:
- "monitoring"
grafana:
image: grafana/grafana:9.4.3
deploy:
resources:
limits:
cpus: '0.50'
memory: 500M
container_name: grafana
restart: unless-stopped
healthcheck:
test: wget --quiet --tries=1 --spider http://localhost:3000
interval: 30s
timeout: 10s
retries: 5
ports:
- 3000:3000
environment:
- GF_USERS_ALLOW_SIGN_UP=false
volumes:
- grafana_data:/var/lib/grafana
- ./conf/monitoring/grafana/provisioning:/etc/grafana/provisioning
labels:
org.label-schema.group: "monitoring"
networks:
- "monitoring"
nginxlog_exporter:
image: quay.io/martinhelmich/prometheus-nginxlog-exporter:v1.10.0
deploy:
resources:
limits:
cpus: '0.50'
memory: 200M
container_name: nginxlog-exporter
command:
- '--config-file=/etc/prometheus-nginxlog-exporter.yml'
ports:
- 4040:4040
volumes:
- ./log/access.log:/mnt/nginxlogs/access.log
- ./conf/nginxlog/nginxlog_exporter.yml:/etc/prometheus-nginxlog-exporter.yml
labels:
org.label-schema.group: "monitoring"
networks:
- "monitoring"