Proxomox - Multiple Cluster Networks

Warning

Ensure that all nodes are in a health state. Weird kak happens when there are poked nodes.

Log into any single node. We only make file changes on one node, the file we will work on is synced accross all nodes.

Back up the config file

Make a backup of the config file

cp /etc/pve/corosync.conf ~/corosync.conf.bkp

Modify config file

Edit the /etc/pve/corosync.conf file

The contents should look something like:

logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: jnb1srvdscocsnecdx2prx04
    nodeid: 6
    quorum_votes: 1
    ring0_addr: 10.10.2.2
  }
  .
  .
  .
  node {
    name: jnb1srvdscocsnecdx2prx05
    nodeid: 7
    quorum_votes: 1
    ring0_addr: 10.10.2.5
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: oxideproxint
  config_version: 8
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

- You will see each existing nodes current management IP address next to ring0_addr.
- ring0 represents a dedicated cluster comms network.
- We will add ring1 and ring2 for internal comms on iSCSI A and iSCSI B networks respectively. - Add the ring1_addr & ring02_addr IP addresses entries, respective to each node's iSCSI A and iSCSI B IP addresses.
- For example ring1_addr on jnb1srvdscocsnecdx2prx04 will get the address that was configured for jnb1srvdscocsnecdx2prx04's iSCSI A network - 10.30.1.2. - We will need to inform the cluster that the new ring interfaces are now available by adding an entry at the end of the file under totem

logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: jnb1srvdscocsnecdx2prx04
    nodeid: 6
    quorum_votes: 1
    ring0_addr: 10.10.2.2
    ring1_addr: 10.30.1.2
    ring2_addr: 10.30.2.2
  }
  .
  .
  .
  node {
    name: jnb1srvdscocsnecdx2prx05
    nodeid: 7
    quorum_votes: 1
    ring0_addr: 10.10.2.5
    ring1_addr: 10.30.1.5
    ring2_addr: 10.30.2.5
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: oxideproxint
  config_version: 9
  interface {
    linknumber: 0
  }
  interface {
    linknumber: 1
  }
  interface {
    linknumber: 2
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

Restart corosync services

You can now restart the corosync services on THIS node, to confirm they restart successfully.

systemctl restart corosync

Roll-back

If the services don't start run cp ~/corosync.conf.bkp /etc/pve/corosync.conf && systemctl restart corosync

Restart corosync on all nodes

Warning

One-at-a-time: The service restarts should be run on each node consecutively, not in parellel.

SSH into each node and restart the corosync services, one after the other.

systemctl restart corosync

Confirm networks

You can view the new networks in the GUI.
Go to DataCentre -> Cluster -> Cluster Nodes
You should now see 3 'links'. These are the three cluster networks.

Show All network interface statuses

Command

 Example Response (vmbr5 on nodeid 1 intentionally downed) LINK            LINK            LINK            

href="#__codelineno-5-1">corosync-cfgtool -s href="#__codelineno-6-1">Local node ID 2, transport knet class="w"> ID 0 udp addr    = 10.10.2.12 status: nodeid:          1:     disconnected nodeid:          2:     localhost nodeid:          6:     connected nodeid:          7:     connected nodeid:          8:     connected nodeid:          9:     connected nodeid:         10:     connected nodeid:         11:     connected nodeid:         12:     connected nodeid:         13:     connected class="w"> ID 1 udp addr    = 10.13.1.12 status: nodeid:          1:     connected nodeid:          2:     localhost nodeid:          6:     connected nodeid:          7:     connected nodeid:          8:     connected nodeid:          9:     connected nodeid:         10:     connected nodeid:         11:     connected nodeid:         12:     connected nodeid:         13:     connected class="w"> ID 2 udp addr    = 10.14.1.12 status: nodeid:          1:     connected nodeid:          2:     localhost nodeid:          6:     connected nodeid:          7:     connected nodeid:          8:     connected nodeid:          9:     connected nodeid:         10:     connected nodeid:         11:     connected nodeid:         12:     connected nodeid:         13:     connected