๐ Monitoring Polkadot Nodes with Prometheus and Alertmanager
Requirements
A Linux server (Ubuntu/Debian) with a running Polkadot node
Open ports:
9100
(node_exporter),9090
(Prometheus),9093
(Alertmanager),9615
(Polkadot metrics endpoint)Root or sudo access
A Telegram bot token from telepush.dev
Automatic Installation
source <(curl -s https://raw.githubusercontent.com/validexisinfra/polkadot/main/install-alertmanager.sh)
Manual Installation
Install Node Exporter
Node Exporter collects server-level metrics such as CPU, memory, disk, and more.
cd $HOME
sudo wget $(curl -s https://api.github.com/repos/prometheus/node_exporter/releases/latest | grep "tag_name" | awk '{print "https://github.com/prometheus/node_exporter/releases/download/" substr($2, 2, length($2)-3) "/node_exporter-" substr($2, 3, length($2)-4) ".linux-amd64.tar.gz"}')
sudo tar xvf node_exporter-*.tar.gz
sudo cp ./node_exporter-*.linux-amd64/node_exporter /usr/local/bin/
sudo useradd --no-create-home --shell /usr/sbin/nologin node_exporter
sudo rm -rf ./node_exporter*
Create a dedicated system user and service:
sudo tee /etc/systemd/system/node_exporter.service > /dev/null <<EOF
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable node_exporter.service
sudo systemctl start node_exporter.service
sudo systemctl status node_exporter.service
Install Prometheus
Download and install
curl -s https://api.github.com/repos/prometheus/prometheus/releases/latest \
| grep browser_download_url | grep linux-amd64.tar.gz \
| cut -d '"' -f 4 | wget -qi -
tar xvf prometheus-*.tar.gz
cd prometheus-*.linux-amd64
sudo cp prometheus promtool /usr/local/bin/
sudo mkdir -p /etc/prometheus /var/lib/prometheus
if [ -d "consoles" ]; then
sudo cp -r consoles /etc/prometheus/
fi
if [ -d "console_libraries" ]; then
sudo cp -r console_libraries /etc/prometheus/
fi
Create a dedicated user and set ownership
sudo id -u prometheus &>/dev/null || sudo useradd --no-create-home --shell /usr/sbin/nologin prometheus
sudo chown -R prometheus:prometheus /etc/prometheus /var/lib/prometheus
Prometheus Configuration
sudo tee /etc/prometheus/prometheus.yml > /dev/null <<EOF
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- 'rules.yml'
alerting:
alertmanagers:
- static_configs:
- targets: ['localhost:9093']
scrape_configs:
- job_name: 'node_exporter'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9100']
- job_name: 'polkadot_node'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9615']
EOF
Create Prometheus systemd service
sudo tee /etc/systemd/system/prometheus.service > /dev/null <<EOF
[Unit]
Description=Prometheus Monitoring
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries \
--storage.tsdb.retention.time 30d \
--web.enable-admin-api
ExecReload=/bin/kill -HUP $MAINPID
[Install]
WantedBy=multi-user.target
EOF
Create Alert Rules
cd /etc/prometheus
sudo tee rules.yml > /dev/null <<EOF
groups:
- name: alert_rules
rules:
- alert: PolkadotNodeSyncLag
expr: (max(substrate_block_height{status="best"}) by (instance) - max(substrate_block_height{status="finalized"}) by (instance)) > 20
for: 5m
labels:
severity: critical
annotations:
summary: "Node polkadot-1 lagging behind"
description: "Node polkadot-1 is lagging more than 20 blocks behind the network."
- alert: NodeDown
expr: up{job="polkadot_node"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Node polkadot-1 down"
description: "Node polkadot-1 has been down for more than 1 minute."
- alert: HighDiskUsage
expr: (node_filesystem_avail_bytes{job="node_exporter", fstype!="tmpfs", fstype!="sysfs", fstype!="proc"} / node_filesystem_size_bytes{job="node_exporter", fstype!="tmpfs", fstype!="sysfs", fstype!="proc"}) * 100 < 2
for: 5m
labels:
severity: critical
annotations:
summary: "High disk usage on polkadot-1"
description: "Disk usage is above 98% on polkadot-1."
- alert: PolkadotNodeNotSyncing
expr: substrate_sub_libp2p_sync_is_major_syncing{job="polkadot_node"} == 1
for: 5m
labels:
severity: critical
annotations:
summary: "Node polkadot-1 not syncing"
description: "Node polkadot-1 is not syncing blocks for more than 5 minutes."
- alert: PolkadotNodeHighCPUUsage
expr: rate(process_cpu_seconds_total{job="polkadot_node"}[5m]) > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage on polkadot-1"
description: "CPU usage is above 80% on polkadot-1 for more than 5 minutes."
EOF
Start and Enable the Prometheus Service
sudo chown prometheus:prometheus rules.yml
sudo systemctl daemon-reload
sudo systemctl enable prometheus.service
sudo systemctl start prometheus.service
sudo systemctl status prometheus.service
Install Alertmanager
Download and install
cd ~
sudo wget https://github.com/prometheus/alertmanager/releases/download/v0.24.0/alertmanager-0.24.0.linux-amd64.tar.gz
sudo tar xvf alertmanager-0.24.0.linux-amd64.tar.gz
sudo rm alertmanager-0.24.0.linux-amd64.tar.gz
sudo mkdir /etc/alertmanager /var/lib/prometheus/alertmanager
cd alertmanager-0.24.0.linux-amd64
sudo cp alertmanager amtool /usr/local/bin/
sudo cp alertmanager.yml /etc/alertmanager
sudo useradd --no-create-home --shell /bin/false alertmanager
sudo chown -R alertmanager:alertmanager /etc/alertmanager /var/lib/prometheus/alertmanager
sudo chown alertmanager:alertmanager /usr/local/bin/{alertmanager,amtool}
Configuration
Example configuration (replace YOUR_TOKEN
):
sudo tee /etc/alertmanager/alertmanager.yml > /dev/null <<EOF
route:
group_by: ['alertname', 'instance', 'severity']
group_wait: 30s
group_interval: 5m
repeat_interval: 1h
receiver: 'telepush'
receivers:
- name: 'telepush'
webhook_configs:
- url: 'https://telepush.dev/api/inlets/alertmanager/<YOUR_TOKEN>'
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
EOF
Create Alertmanager service
sudo tee /etc/systemd/system/alertmanager.service > /dev/null <<EOF
[Unit]
Description=AlertManager Server Service
Wants=network-online.target
After=network-online.target
[Service]
User=root
Group=root
Type=simple
ExecStart=/usr/local/bin/alertmanager --config.file /etc/alertmanager/alertmanager.yml --web.external-url=http://$IP_ADDRESS:9093 --cluster.advertise-address='0.0.0.0:9093'
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable alertmanager
sudo systemctl start alertmanager
sudo systemctl restart prometheus.service
sudo systemctl restart alertmanager.service
Grafana
Install Required Dependencies
sudo apt-get install -y apt-transport-https software-properties-common wget
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
Add the Grafana Repository
echo "deb https://packages.grafana.com/enterprise/deb stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
Create a User for Grafana
sudo useradd -m -s /bin/bash grafana
sudo groupadd --system grafana
sudo usermod -aG grafana grafana
Install Grafana Enterprise
#Install Additional Utilities
sudo apt-get install -y adduser libfontconfig1
#Download and Install Grafana Enterprise
wget https://dl.grafana.com/enterprise/release/grafana-enterprise_9.3.2_amd64.deb
sudo dpkg -i grafana-enterprise_9.3.2_amd64.deb
Start and Enable the Grafana Server
sudo systemctl daemon-reload
sudo systemctl enable grafana-server
sudo systemctl start grafana-server
sudo systemctl status grafana-server
Importing Dashboards into Grafana
To set up dashboards in Grafana, you need the JSON files of the dashboards. These files can either be:
Downloaded from a public source like Grafana's Dashboard Library
Created manually by you directly in Grafana
If you're using Grafana's library, search for the dashboard by its ID and download the JSON file. Once downloaded, you can import the JSON file into Grafana via:
Grafana UI โ Dashboards โ Import โ Upload JSON file
โ
You can also download our predefined Polkadot node dashboard here:
๐ Polkadot_Dashboard.json
Final Checks
Access Prometheus:
http://<your-server-ip>:9090
Access Alertmanager:
http://<your-server-ip>:9093
Access Grafana:
http://<your-server-ip>:3000
Verify that your Polkadot node metrics (
:9615
) and alerts are visible
Last updated