# 📘 Monitoring Polkadot Nodes with Prometheus and Alertmanager

### *Requirements*

* A Linux server (Ubuntu/Debian) with a running **Polkadot node**
* Open ports: `9100` (node\_exporter), `9090` (Prometheus), `9093` (Alertmanager), `9615` (Polkadot metrics endpoint)
* Root or sudo access
* A Telegram bot token from [telepush.dev](https://telepush.dev/)

***

## ***Automatic Installation***

```bash
source <(curl -s https://raw.githubusercontent.com/validexisinfra/polkadot/main/install-alertmanager.sh)
```

## ***Manual Installation***

### *Install Node Exporter*

Node Exporter collects server-level metrics such as CPU, memory, disk, and more.

```bash
cd $HOME
sudo wget $(curl -s https://api.github.com/repos/prometheus/node_exporter/releases/latest | grep "tag_name" | awk '{print "https://github.com/prometheus/node_exporter/releases/download/" substr($2, 2, length($2)-3) "/node_exporter-" substr($2, 3, length($2)-4) ".linux-amd64.tar.gz"}')
sudo tar xvf node_exporter-*.tar.gz
sudo cp ./node_exporter-*.linux-amd64/node_exporter /usr/local/bin/
sudo useradd --no-create-home --shell /usr/sbin/nologin node_exporter
sudo rm -rf ./node_exporter*
```

Create a dedicated system user and service:

```bash
sudo tee /etc/systemd/system/node_exporter.service > /dev/null <<EOF
[Unit]
  Description=Node Exporter
  Wants=network-online.target
  After=network-online.target
[Service] 
  User=node_exporter
  Group=node_exporter
  Type=simple
  ExecStart=/usr/local/bin/node_exporter
[Install]
  WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable node_exporter.service
sudo systemctl start node_exporter.service

sudo systemctl status node_exporter.service
```

### *Install Prometheus*

#### *Download and install*

```bash
curl -s https://api.github.com/repos/prometheus/prometheus/releases/latest \
| grep browser_download_url | grep linux-amd64.tar.gz \
| cut -d '"' -f 4 | wget -qi -

tar xvf prometheus-*.tar.gz
cd prometheus-*.linux-amd64

sudo cp prometheus promtool /usr/local/bin/
sudo mkdir -p /etc/prometheus /var/lib/prometheus

if [ -d "consoles" ]; then
    sudo cp -r consoles /etc/prometheus/
fi

if [ -d "console_libraries" ]; then
    sudo cp -r console_libraries /etc/prometheus/
fi
```

Create a dedicated user and set ownership

```bash
sudo id -u prometheus &>/dev/null || sudo useradd --no-create-home --shell /usr/sbin/nologin prometheus
sudo chown -R prometheus:prometheus /etc/prometheus /var/lib/prometheus
```

#### &#x20;*Prometheus Configuration*

```bash
sudo tee /etc/prometheus/prometheus.yml > /dev/null <<EOF
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - 'rules.yml'

alerting:
  alertmanagers:
    - static_configs:
        - targets: ['localhost:9093']

scrape_configs:
  - job_name: 'node_exporter'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9100']

  - job_name: 'polkadot_node'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9615']
EOF
```

#### *Create Prometheus systemd service*

```bash
sudo tee /etc/systemd/system/prometheus.service > /dev/null <<EOF
[Unit]
  Description=Prometheus Monitoring
  Wants=network-online.target
  After=network-online.target
[Service]
  User=prometheus
  Group=prometheus
  Type=simple
  ExecStart=/usr/local/bin/prometheus \
  --config.file /etc/prometheus/prometheus.yml \
  --storage.tsdb.path /var/lib/prometheus/ \
  --web.console.templates=/etc/prometheus/consoles \
  --web.console.libraries=/etc/prometheus/console_libraries \
  --storage.tsdb.retention.time 30d \
  --web.enable-admin-api
  ExecReload=/bin/kill -HUP $MAINPID
[Install]
  WantedBy=multi-user.target
EOF
```

### *Create Alert Rules*

```bash
cd /etc/prometheus
sudo tee rules.yml > /dev/null <<EOF
groups:
  - name: alert_rules
    rules:
      - alert: PolkadotNodeSyncLag
        expr: (max(substrate_block_height{status="best"}) by (instance) - max(substrate_block_height{status="finalized"}) by (instance)) > 20
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Node polkadot-1 lagging behind"
          description: "Node polkadot-1 is lagging more than 20 blocks behind the network."
      - alert: NodeDown
        expr: up{job="polkadot_node"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Node polkadot-1 down"
          description: "Node polkadot-1 has been down for more than 1 minute."
      - alert: HighDiskUsage
        expr: (node_filesystem_avail_bytes{job="node_exporter", fstype!="tmpfs", fstype!="sysfs", fstype!="proc"} / node_filesystem_size_bytes{job="node_exporter", fstype!="tmpfs", fstype!="sysfs", fstype!="proc"}) * 100 < 2
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High disk usage on polkadot-1"
          description: "Disk usage is above 98% on polkadot-1."
      - alert: PolkadotNodeNotSyncing
        expr: substrate_sub_libp2p_sync_is_major_syncing{job="polkadot_node"} == 1
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Node polkadot-1 not syncing"
          description: "Node polkadot-1 is not syncing blocks for more than 5 minutes."
      - alert: PolkadotNodeHighCPUUsage
        expr: rate(process_cpu_seconds_total{job="polkadot_node"}[5m]) > 0.8
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on polkadot-1"
          description: "CPU usage is above 80% on polkadot-1 for more than 5 minutes."
EOF
```

#### ***Start and Enable the Prometheus Service***

```bash
sudo chown prometheus:prometheus rules.yml

sudo systemctl daemon-reload
sudo systemctl enable prometheus.service
sudo systemctl start prometheus.service

sudo systemctl status prometheus.service
```

### *Install Alertmanager*

#### *Download and install*

```bash
cd ~
sudo wget https://github.com/prometheus/alertmanager/releases/download/v0.24.0/alertmanager-0.24.0.linux-amd64.tar.gz
sudo tar xvf alertmanager-0.24.0.linux-amd64.tar.gz
sudo rm alertmanager-0.24.0.linux-amd64.tar.gz
sudo mkdir /etc/alertmanager /var/lib/prometheus/alertmanager
cd alertmanager-0.24.0.linux-amd64
sudo cp alertmanager amtool /usr/local/bin/
sudo cp alertmanager.yml /etc/alertmanager/alertmanager.yml
```

```bash
sudo useradd --no-create-home --shell /bin/false alertmanager
sudo chown -R alertmanager:alertmanager /etc/alertmanager /var/lib/prometheus/alertmanager
sudo chown alertmanager:alertmanager /usr/local/bin/{alertmanager,amtool}
```

#### *Configuration*

Example configuration (replace `YOUR_TOKEN`):

```yaml
sudo tee /etc/alertmanager/alertmanager.yml > /dev/null <<EOF
route:
  group_by: ['alertname', 'instance', 'severity']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 1h
  receiver: 'telepush'
receivers:
  - name: 'telepush'
    webhook_configs:
      - url: 'https://telepush.dev/api/inlets/alertmanager/<YOUR_TOKEN>'
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']
EOF
```

#### *Create Alertmanager service*

```bash
sudo tee /etc/systemd/system/alertmanager.service > /dev/null <<EOF
[Unit]
Description=AlertManager Server Service
Wants=network-online.target
After=network-online.target
[Service]
User=root
Group=root
Type=simple
ExecStart=/usr/local/bin/alertmanager --config.file /etc/alertmanager/alertmanager.yml --web.external-url=http://$IP_ADDRESS:9093 --cluster.advertise-address='0.0.0.0:9093'
[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable alertmanager
sudo systemctl start alertmanager
```

```bash
sudo systemctl restart prometheus.service
sudo systemctl restart alertmanager.service
```

## *Grafana*

#### ***Install Required Dependencies***

```bash
sudo apt-get install -y apt-transport-https software-properties-common wget
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
```

#### ***Add the Grafana Repository***

```bash
echo "deb https://packages.grafana.com/enterprise/deb stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
```

#### ***Create a User for Grafana***

```bash
sudo useradd -m -s /bin/bash grafana
sudo groupadd --system grafana
sudo usermod -aG grafana grafana
```

#### ***Install Grafana Enterprise***

```bash
#Install Additional Utilities
sudo apt-get install -y adduser libfontconfig1
```

```bash
#Download and Install Grafana Enterprise
wget https://dl.grafana.com/enterprise/release/grafana-enterprise_9.3.2_amd64.deb
sudo dpkg -i grafana-enterprise_9.3.2_amd64.deb
```

#### ***Start and Enable the Grafana Server***

```bash
sudo systemctl daemon-reload
sudo systemctl enable grafana-server
sudo systemctl start grafana-server
sudo systemctl status grafana-server
```

#### *Importing Dashboards into Grafana*

To set up dashboards in Grafana, you need the JSON files of the dashboards. These files can either be:

* Downloaded from a public source like Grafana's Dashboard Library
* Created manually by you directly in Grafana

If you're using Grafana's library, search for the dashboard by its ID and download the JSON file.\
Once downloaded, you can import the JSON file into Grafana via:

> **Grafana UI → Dashboards → Import → Upload JSON file**

✅ You can also download our predefined Polkadot node dashboard here:\
👉 [`Polkadot_Dashboard.json`](https://github.com/validexisinfra/Polkadot/blob/main/Polkadot_Dashboard.json)

### *Final Checks*

* Access Prometheus: `http://<your-server-ip>:9090`
* Access Alertmanager: `http://<your-server-ip>:9093`
* Access Grafana: `http://<your-server-ip>:3000`
* Verify that your Polkadot node metrics (`:9615`) and alerts are visible


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://services.validexis.com/mainnets/polkadot/monitoring-polkadot-nodes-with-prometheus-and-alertmanager.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
