Ceci est une ancienne révision du document !

Table des matières

Cluster Pacemaker et Corosync

Brouillon, Cluster

Cluster Pacemaker et Corosync

Voir :

Pcs Pacemaker Configuration System :

Conf install

Sur tous les nœuds

/etc/hosts

192.168.56.21  node2
192.168.56.22  node1

yum install -y corosync pcs #pacemaker

PCS Pacemaker Configuration System

yum install corosync pcs #pacemaker
 
 
 
#passwd hacluster
 
echo "passwd" | passwd hacluster --stdin
 
systemctl start pcsd.service
systemctl enable pcsd.service
 
#unset http_proxy
export NO_PROXY=localhost,127.0.0.1,node1,node2
pcs cluster auth node1 node2
 
pcs cluster setup --start --name my_cluster node1 node2
#pcs cluster setup --name my_cluster node1 node2
 
pcs cluster start node1
#pcs cluster auth
 
pcs cluster start --all
pcs cluster enable --all
 
 
pcs resource create myvip IPaddr2 ip=192.168.56.100
pcs status resources
 
pcs property set stonith-enabled=false
# pcs property set no-quorum-policy=ignore
 
# http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/_prevent_resources_from_moving_after_recovery.html
pcs resource defaults resource-stickiness=100
 
# pcs resource
# pcs resource defaults

pcs status
pcs resource --full
pcs resource

/etc/corosync/corosync.conf

totem {
    version: 2
    secauth: off
    cluster_name: my_cluster
    transport: udpu
}
 
nodelist {
    node {
        ring0_addr: node1
        nodeid: 1
    }
 
    node {
        ring0_addr: node2
        nodeid: 2
    }
}
 
quorum {
    provider: corosync_votequorum
    two_node: 1
}
 
logging {
    to_logfile: yes
    logfile: /var/log/cluster/corosync.log
    to_syslog: yes
}

Autre exemple de fichier de conf

/etc/corosync/corosync.conf

totem {
	version: 2
	secauth: off
	cluster_name: mycluster-1
	transport: udpu
	token: 5000
}
 
nodelist {
  node {
        ring0_addr: mycluster-a1
        nodeid: 1
       }
  node {
        ring0_addr: mycluster-b1
        nodeid: 2
       }
}
 
quorum {
	provider: corosync_votequorum
	auto_tie_breaker: 0
	two_node: 1
}
 
logging {
	to_syslog: yes
}

Sécurité

Sur nœud 1

corosync-keygen -l

Copie de la clef sur le nœud 2

scp /etc/corosync/authkey node2:/etc/corosync/authkey

sed -i -e 's/secauth: off/secauth: on/' /etc/corosync/corosync.conf
pcs cluster sync

Bascule

pcs status resources

 myvip  (ocf::heartbeat:IPaddr2):       Started node1

pcs cluster standby node1

pcs status resources

 myvip  (ocf::heartbeat:IPaddr2):       Started node2

Réactivation node1

pcs cluster unstandby node1

Rebascule de la VIP myvip sur node1

pcs resource move myvip node1

Mise à zéro compteur

pcs resource relocate clear

Vérif config

crm_verify -LV
 
crm_verify -LVVV

Autres

Logs

tailf /var/log/cluster/corosync.log

Interface Web

Compte hacluster

Pb

Connection to cluster failed: Transport endpoint is not connected

Pas nécessaire / pas de pb avec pcs

crm_mon -1

Connection to cluster failed: Transport endpoint is not connected

systemctl restart pacemaker

crm_mon -1

Last updated: Fri Oct  7 11:59:55 2016          Last change: Fri Oct  7 11:03:51 2016
Stack: unknown
Current DC: NONE
0 nodes and 0 resources configured

Pour ne plus à devoir démarrer manuelelement pacemaker après corosync

/etc/corosync/corosync.conf

service {
        # Load the Pacemaker Cluster Resource Manager
        name: pacemaker
        ver:  1
}

Effacer les "Failed Actions"

pcs status
Cluster name: my_cluster
Last updated: Fri Oct 14 16:40:14 2016          Last change: Fri Oct 14 16:36:04 2016 by root via cibadmin on node1
Stack: corosync
Current DC: node2 (version 1.1.13-10.el7_2.4-44eb2dd) - partition with quorum
2 nodes and 2 resources configured

Node node1: UNCLEAN (online)
Online: [ node2 ]

Full list of resources:

 myvip  (ocf::heartbeat:IPaddr2):       Started node1
 plopa  (stonith:fence_ssh):    Started node2
 clusterfs      (ocf::heartbeat:Filesystem):     ORPHANED FAILED node1

Failed Actions:
* clusterfs_stop_0 on node1 'unknown error' (1): call=38, status=Timed Out, exitreason='none',
    last-rc-change='Fri Oct 14 14:43:54 2016', queued=0ms, exec=60003ms


PCSD Status:
  node1: Online
  node2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: deactivating/enabled
  pcsd: active/enabled

Source : https://supportex.net/blog/2011/09/cleaning-failed-actions-pacemakercorosync-cluster-setup/

Solution

#crm_resource -P
pcs resource cleanup

Pb node2: Unable to connect to node2 ([Errno 111] Connection refused)

journalctl -u pacemaker.service -f -p warning
-- Logs begin at ven. 2017-06-09 18:23:40 CEST. --
juin 12 11:28:23 pc-2 attrd[170467]:    error: Cluster connection failed
juin 12 11:28:23 pc-2 pacemakerd[170463]:    error: Managed process 170467 (attrd) dumped core
juin 12 11:28:23 pc-2 pacemakerd[170463]:    error: The attrd process (170467) terminated with signal 11 (core=1)
juin 12 11:28:23 pc-2 cib[170464]:    error: Could not connect to the Cluster Process Group API: 11
juin 12 11:28:23 pc-2 cib[170464]:     crit: Cannot sign in to the cluster... terminating
juin 12 11:28:23 pc-2 pacemakerd[170463]:  warning: The cib process (170464) can no longer be respawned, shutting the cluster down.
juin 12 11:28:23 pc-2 crmd[170469]:  warning: Couldn't complete CIB registration 1 times... pause and retry
juin 12 11:28:23 pc-2 crmd[170469]:  warning: FSA: Input I_SHUTDOWN from crm_shutdown() received in state S_STARTING
juin 12 11:28:23 pc-2 pacemakerd[170463]:    error: The attrd process (170470) terminated with signal 15 (core=0)
juin 12 11:28:33 pc-2 stonith-ng[170465]:    error: Could not connect to the CIB service: Transport endpoint is not connected (-107)

Sur les deux noeuds
pcs cluster stop --all
node2: Unable to connect to node2 ([Errno 111] Connection refused)

lsof -Pn -i TCP:2224

pacemakerd[170758]:    error: The attrd process (170765) terminated with signal 15 (core=0)

systemctl status corosync
juin 12 12:39:02 pc-1 corosync[116364]:  [MAIN  ] Denied connection attempt from 189:189
juin 12 12:39:02 pc-1 corosync[116364]:  [QB    ] Invalid IPC credentials (116364-116387-2).
juin 12 12:39:02 pc-1 corosync[116364]:  [MAIN  ] Denied connection attempt from 189:189
juin 12 12:39:02 pc-1 corosync[116364]:  [QB    ] Invalid IPC credentials (116364-116384-2).


ps -C cib,stonithd,pcsd,lrmd,attrd,pengine,crmd
# Only pcsd

ss -lx |grep @cib_
# No unix socket open