VALIDEXIS Services
  • πŸ‘‹ About US
  • 🌐 Supported Networks
  • 🌍 Our Validators
  • πŸ” Validator Security: Our Approach and Protection Measures
    • πŸ”’Enhancing SSH Security for a Validator
    • πŸ“‘ TMKMS for Remote Signing
    • πŸ”‘ Horcrux
    • πŸ›‘οΈ Protecting Validator from DDoS Attacks
    • 🧩Multi-Factor Authentication (MFA) for a Validator
  • 🚨Monitoring
    • πŸ“Š Node-exporter + Prometheus + Grafana
    • πŸ•΅οΈ TenderDuty for Node Monitoring
  • 🧡 Setting up connection to IBC with Hermes
  • πŸ€–CelestiaUltraBot
    • πŸ† Contest
    • πŸš€ Getting started
    • πŸ–₯️ Validator monitoring
    • πŸ” Bridge Node Monitoring
    • πŸ‘› Wallet
    • πŸ›Έ AI Troubleshooting
  • 🟒 MAINNETS
    • Celestia
      • πŸ—‚οΈ API/RPC/gRPC/AddrBook
      • πŸ“Έ Snapshot
      • πŸ”— Validator Node Installation
        • πŸ› οΈ Upgrade
        • πŸ“œ Useful Commands
      • πŸ’Ύ Full Node Installation
        • πŸ› οΈ Upgrade
        • πŸ“œ Useful Commands
      • πŸŒ‰ Bridge Node Installation
        • πŸ› οΈ Upgrade
        • πŸ“œ Useful Commands
      • πŸ’‘ Light Node Installation
        • πŸ› οΈ Upgrade
        • πŸ“œ Useful Commands
    • Zetachain
      • βš™οΈ Installation
      • πŸ› οΈ Upgrade
      • πŸ“œ Useful Commands
    • Xion
      • βš™οΈ Installation
      • πŸ› οΈ Upgrade
      • πŸ“œ Useful Commands
    • Nibiru
      • βš™οΈ Installation
      • πŸ› οΈ Upgrade
      • πŸ“œ Useful Commands
    • Kyve
      • βš™οΈ Installation
      • πŸ› οΈ Upgrade
      • πŸ“œ Useful Commands
    • Juno
      • βš™οΈ Installation
      • πŸ› οΈ Upgrade
      • πŸ“œ Useful Commands
    • Agoric
      • βš™οΈ Installation
      • πŸ› οΈ Upgrade
      • πŸ“œ Useful Commands
    • CosmosHub
      • βš™οΈ Installation
      • πŸ› οΈ Upgrade
      • πŸ“œ Useful Commands
    • Lava
      • βš™οΈ Installation
      • πŸ› οΈ Upgrade
      • πŸ“œ Useful Commands
    • Uptick
      • βš™οΈ Installation
      • πŸ› οΈ Upgrade
      • πŸ“œ Useful Commands
    • Paloma
      • βš™οΈ Installation
      • πŸ› οΈ Upgrade
      • πŸ“œ Useful Commands
    • Kusama
      • πŸ—‚οΈ API/RPC
      • πŸ“Έ Snapshot
      • βš™οΈ Installation
      • πŸ› οΈ Upgrade
      • πŸ“œ Useful Commands
    • Polkadot
      • πŸ—‚οΈ API/RPC
      • πŸ“Έ Snapshot
      • βš™οΈ Installation
      • πŸ› οΈ Upgrade
      • πŸ“œ Useful Commands
      • πŸ”§ Installing a Polkadot Node Using Kagome
      • πŸ§ͺ Benchmarking a Server for Substrate/Polkadot
      • πŸ“˜ Monitoring Polkadot Nodes with Prometheus and Alertmanager
      • πŸ” UFW Security Template
    • Starknet
      • βš™οΈ Installation
      • πŸ› οΈ Upgrade
      • πŸ“œ Useful Commands
    • NYM
      • βš™οΈ Installation
      • πŸ› οΈ Upgrade
      • πŸ“œ Useful Commands
  • 🟑 TESTNETS
    • Celestia
      • πŸ—‚οΈ API/RPC/AddrBook
      • πŸ“Έ Snapshot
      • πŸ”— Validator Node Installation
        • πŸ› οΈ Upgrade
        • πŸ“œ Useful Commands
      • πŸ’Ύ Full Node Installation
        • πŸ› οΈ Upgrade
        • πŸ“œ Useful Commands
      • πŸŒ‰ Bridge Node Installation
        • πŸ› οΈ Upgrade
        • πŸ“œ Useful Commands
      • πŸ’‘ Light Node Installation
        • πŸ› οΈ Upgrade
        • πŸ“œ Useful Commands
Powered by GitBook
On this page
  • Requirements
  • Automatic Installation
  • Manual Installation
  • Install Node Exporter
  • Install Prometheus
  • Create Alert Rules
  • Install Alertmanager
  • Grafana
  • Final Checks
  1. 🟒 MAINNETS
  2. Polkadot

πŸ“˜ Monitoring Polkadot Nodes with Prometheus and Alertmanager

PreviousπŸ§ͺ Benchmarking a Server for Substrate/PolkadotNextπŸ” UFW Security Template

Last updated 22 days ago

Requirements

  • A Linux server (Ubuntu/Debian) with a running Polkadot node

  • Open ports: 9100 (node_exporter), 9090 (Prometheus), 9093 (Alertmanager), 9615 (Polkadot metrics endpoint)

  • Root or sudo access

  • A Telegram bot token from


Automatic Installation

source <(curl -s https://raw.githubusercontent.com/validexisinfra/polkadot/main/install-alertmanager.sh)

Manual Installation

Install Node Exporter

Node Exporter collects server-level metrics such as CPU, memory, disk, and more.

cd $HOME
sudo wget $(curl -s https://api.github.com/repos/prometheus/node_exporter/releases/latest | grep "tag_name" | awk '{print "https://github.com/prometheus/node_exporter/releases/download/" substr($2, 2, length($2)-3) "/node_exporter-" substr($2, 3, length($2)-4) ".linux-amd64.tar.gz"}')
sudo tar xvf node_exporter-*.tar.gz
sudo cp ./node_exporter-*.linux-amd64/node_exporter /usr/local/bin/
sudo useradd --no-create-home --shell /usr/sbin/nologin node_exporter
sudo rm -rf ./node_exporter*

Create a dedicated system user and service:

sudo tee /etc/systemd/system/node_exporter.service > /dev/null <<EOF
[Unit]
  Description=Node Exporter
  Wants=network-online.target
  After=network-online.target
[Service] 
  User=node_exporter
  Group=node_exporter
  Type=simple
  ExecStart=/usr/local/bin/node_exporter
[Install]
  WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable node_exporter.service
sudo systemctl start node_exporter.service

sudo systemctl status node_exporter.service

Install Prometheus

Download and install

curl -s https://api.github.com/repos/prometheus/prometheus/releases/latest \
| grep browser_download_url | grep linux-amd64.tar.gz \
| cut -d '"' -f 4 | wget -qi -

tar xvf prometheus-*.tar.gz
cd prometheus-*.linux-amd64

sudo cp prometheus promtool /usr/local/bin/
sudo mkdir -p /etc/prometheus /var/lib/prometheus

if [ -d "consoles" ]; then
    sudo cp -r consoles /etc/prometheus/
fi

if [ -d "console_libraries" ]; then
    sudo cp -r console_libraries /etc/prometheus/
fi

Create a dedicated user and set ownership

sudo id -u prometheus &>/dev/null || sudo useradd --no-create-home --shell /usr/sbin/nologin prometheus
sudo chown -R prometheus:prometheus /etc/prometheus /var/lib/prometheus

Prometheus Configuration

sudo tee /etc/prometheus/prometheus.yml > /dev/null <<EOF
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - 'rules.yml'

alerting:
  alertmanagers:
    - static_configs:
        - targets: ['localhost:9093']

scrape_configs:
  - job_name: 'node_exporter'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9100']

  - job_name: 'polkadot_node'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9615']
EOF

Create Prometheus systemd service

sudo tee /etc/systemd/system/prometheus.service > /dev/null <<EOF
[Unit]
  Description=Prometheus Monitoring
  Wants=network-online.target
  After=network-online.target
[Service]
  User=prometheus
  Group=prometheus
  Type=simple
  ExecStart=/usr/local/bin/prometheus \
  --config.file /etc/prometheus/prometheus.yml \
  --storage.tsdb.path /var/lib/prometheus/ \
  --web.console.templates=/etc/prometheus/consoles \
  --web.console.libraries=/etc/prometheus/console_libraries \
  --storage.tsdb.retention.time 30d \
  --web.enable-admin-api
  ExecReload=/bin/kill -HUP $MAINPID
[Install]
  WantedBy=multi-user.target
EOF

Create Alert Rules

cd /etc/prometheus
sudo tee rules.yml > /dev/null <<EOF
groups:
  - name: alert_rules
    rules:
      - alert: PolkadotNodeSyncLag
        expr: (max(substrate_block_height{status="best"}) by (instance) - max(substrate_block_height{status="finalized"}) by (instance)) > 20
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Node polkadot-1 lagging behind"
          description: "Node polkadot-1 is lagging more than 20 blocks behind the network."
      - alert: NodeDown
        expr: up{job="polkadot_node"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Node polkadot-1 down"
          description: "Node polkadot-1 has been down for more than 1 minute."
      - alert: HighDiskUsage
        expr: (node_filesystem_avail_bytes{job="node_exporter", fstype!="tmpfs", fstype!="sysfs", fstype!="proc"} / node_filesystem_size_bytes{job="node_exporter", fstype!="tmpfs", fstype!="sysfs", fstype!="proc"}) * 100 < 2
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High disk usage on polkadot-1"
          description: "Disk usage is above 98% on polkadot-1."
      - alert: PolkadotNodeNotSyncing
        expr: substrate_sub_libp2p_sync_is_major_syncing{job="polkadot_node"} == 1
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Node polkadot-1 not syncing"
          description: "Node polkadot-1 is not syncing blocks for more than 5 minutes."
      - alert: PolkadotNodeHighCPUUsage
        expr: rate(process_cpu_seconds_total{job="polkadot_node"}[5m]) > 0.8
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on polkadot-1"
          description: "CPU usage is above 80% on polkadot-1 for more than 5 minutes."
EOF

Start and Enable the Prometheus Service

sudo chown prometheus:prometheus rules.yml

sudo systemctl daemon-reload
sudo systemctl enable prometheus.service
sudo systemctl start prometheus.service

sudo systemctl status prometheus.service

Install Alertmanager

Download and install

cd ~
sudo wget https://github.com/prometheus/alertmanager/releases/download/v0.24.0/alertmanager-0.24.0.linux-amd64.tar.gz
sudo tar xvf alertmanager-0.24.0.linux-amd64.tar.gz
sudo rm alertmanager-0.24.0.linux-amd64.tar.gz
sudo mkdir /etc/alertmanager /var/lib/prometheus/alertmanager
cd alertmanager-0.24.0.linux-amd64
sudo cp alertmanager amtool /usr/local/bin/
sudo cp alertmanager.yml /etc/alertmanager
sudo useradd --no-create-home --shell /bin/false alertmanager
sudo chown -R alertmanager:alertmanager /etc/alertmanager /var/lib/prometheus/alertmanager
sudo chown alertmanager:alertmanager /usr/local/bin/{alertmanager,amtool}

Configuration

Example configuration (replace YOUR_TOKEN):

sudo tee /etc/alertmanager/alertmanager.yml > /dev/null <<EOF
route:
  group_by: ['alertname', 'instance', 'severity']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 1h
  receiver: 'telepush'
receivers:
  - name: 'telepush'
    webhook_configs:
      - url: 'https://telepush.dev/api/inlets/alertmanager/<YOUR_TOKEN>'
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']
EOF

Create Alertmanager service

sudo tee /etc/systemd/system/alertmanager.service > /dev/null <<EOF
[Unit]
Description=AlertManager Server Service
Wants=network-online.target
After=network-online.target
[Service]
User=root
Group=root
Type=simple
ExecStart=/usr/local/bin/alertmanager --config.file /etc/alertmanager/alertmanager.yml --web.external-url=http://$IP_ADDRESS:9093 --cluster.advertise-address='0.0.0.0:9093'
[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable alertmanager
sudo systemctl start alertmanager
sudo systemctl restart prometheus.service
sudo systemctl restart alertmanager.service

Grafana

Install Required Dependencies

sudo apt-get install -y apt-transport-https software-properties-common wget
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -

Add the Grafana Repository

echo "deb https://packages.grafana.com/enterprise/deb stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list

Create a User for Grafana

sudo useradd -m -s /bin/bash grafana
sudo groupadd --system grafana
sudo usermod -aG grafana grafana

Install Grafana Enterprise

#Install Additional Utilities
sudo apt-get install -y adduser libfontconfig1
#Download and Install Grafana Enterprise
wget https://dl.grafana.com/enterprise/release/grafana-enterprise_9.3.2_amd64.deb
sudo dpkg -i grafana-enterprise_9.3.2_amd64.deb

Start and Enable the Grafana Server

sudo systemctl daemon-reload
sudo systemctl enable grafana-server
sudo systemctl start grafana-server
sudo systemctl status grafana-server

Importing Dashboards into Grafana

To set up dashboards in Grafana, you need the JSON files of the dashboards. These files can either be:

  • Downloaded from a public source like Grafana's Dashboard Library

  • Created manually by you directly in Grafana

If you're using Grafana's library, search for the dashboard by its ID and download the JSON file. Once downloaded, you can import the JSON file into Grafana via:

Grafana UI β†’ Dashboards β†’ Import β†’ Upload JSON file

Final Checks

  • Access Prometheus: http://<your-server-ip>:9090

  • Access Alertmanager: http://<your-server-ip>:9093

  • Access Grafana: http://<your-server-ip>:3000

  • Verify that your Polkadot node metrics (:9615) and alerts are visible

βœ… You can also download our predefined Polkadot node dashboard here: πŸ‘‰

telepush.dev
Polkadot_Dashboard.json