{{tag>Brouillon Cluster}}
= Cluster Pacemaker et Corosync
Voir :
* [[cluster_script_de_test_stonith_fencing]]
* http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-intro-architecture.html
* https://wiki.debian.org/Debian-HA/ClustersFromScratch
* http://blogduyax.madyanne.fr/haute-disponibilite-avec-corosync-et-pacemaker.html
* https://www.sebastien-han.fr/blog/2011/07/04/introduction-au-cluster-sous-linux/
* https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Administration/ch-startup-HAAA.html
* Paquet resource-agents
Pcs Pacemaker Configuration System :
* https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Reference/s1-clusternodemanage-HAAR.html
* https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/pdf/High_Availability_Add-On_Administration/Red_Hat_Enterprise_Linux-7-High_Availability_Add-On_Administration-en-US.pdf
* https://www.digitalocean.com/community/tutorials/how-to-create-a-high-availability-setup-with-pacemaker-corosync-and-floating-ips-on-centos-7
* https://skcave.wordpress.com/2014/11/04/creating-high-availability-cluster-with-centos-7/
* https://www.digitalocean.com/community/tutorials/how-to-set-up-an-apache-active-passive-cluster-using-pacemaker-on-centos-7
* http://bigthinkingapplied.com/creating-a-linux-cluster-in-red-hatcentos-7/
== Conf install
=== Sur tous les nœuds
''/etc/hosts''
192.168.56.21 node2
192.168.56.22 node1
yum install -y corosync pcs #pacemaker
== PCS Pacemaker Configuration System
yum install corosync pcs #pacemaker
#passwd hacluster
echo "passwd" | passwd hacluster --stdin
systemctl start pcsd.service
systemctl enable pcsd.service
#unset http_proxy
export NO_PROXY=localhost,127.0.0.1,node1,node2
pcs cluster auth node1 node2
pcs cluster setup --start --name my_cluster node1 node2
#pcs cluster setup --name my_cluster node1 node2
pcs cluster start node1
#pcs cluster auth
pcs cluster start --all
pcs cluster enable --all
pcs resource create myvip IPaddr2 ip=192.168.56.100
pcs status resources
pcs property set stonith-enabled=false
# pcs property set no-quorum-policy=ignore
# http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/_prevent_resources_from_moving_after_recovery.html
pcs resource defaults resource-stickiness=100
# pcs resource
# pcs resource defaults
pcs status
pcs resource --full
pcs resource
''/etc/corosync/corosync.conf''
totem {
version: 2
secauth: off
cluster_name: my_cluster
transport: udpu
}
nodelist {
node {
ring0_addr: node1
nodeid: 1
}
node {
ring0_addr: node2
nodeid: 2
}
}
quorum {
provider: corosync_votequorum
two_node: 1
}
logging {
to_logfile: yes
logfile: /var/log/cluster/corosync.log
to_syslog: yes
}
Autre exemple de fichier de conf
''/etc/corosync/corosync.conf''
totem {
version: 2
secauth: off
cluster_name: mycluster-1
transport: udpu
token: 5000
}
nodelist {
node {
ring0_addr: mycluster-a1
nodeid: 1
}
node {
ring0_addr: mycluster-b1
nodeid: 2
}
}
quorum {
provider: corosync_votequorum
auto_tie_breaker: 0
two_node: 1
}
logging {
to_syslog: yes
}
=== Sécurité
==== Sur nœud 1
corosync-keygen -l
Copie de la clef sur le nœud 2
scp /etc/corosync/authkey node2:/etc/corosync/authkey
sed -i -e 's/secauth: off/secauth: on/' /etc/corosync/corosync.conf
pcs cluster sync
=== Bascule
pcs status resources
myvip (ocf::heartbeat:IPaddr2): Started node1
pcs cluster standby node1
pcs status resources
myvip (ocf::heartbeat:IPaddr2): Started node2
Réactivation node1
pcs cluster unstandby node1
Rebascule de la VIP **myvip** sur node1
pcs resource move myvip node1
Mise à zéro compteur
pcs resource relocate clear
=== Vérif config
crm_verify -LV
crm_verify -LVVV
== Autres
Logs
tailf /var/log/cluster/corosync.log
Interface Web
* https://192.168.56.21:2224
* https://192.168.56.22:2224
Compte **hacluster**
== Pb
=== Connection to cluster failed: Transport endpoint is not connected
** Pas nécessaire / pas de pb avec pcs**
crm_mon -1
Connection to cluster failed: Transport endpoint is not connected
systemctl restart pacemaker
crm_mon -1
Last updated: Fri Oct 7 11:59:55 2016 Last change: Fri Oct 7 11:03:51 2016
Stack: unknown
Current DC: NONE
0 nodes and 0 resources configured
Pour ne plus à devoir démarrer manuelelement **pacemaker** après **corosync**
''/etc/corosync/corosync.conf''
service {
# Load the Pacemaker Cluster Resource Manager
name: pacemaker
ver: 1
}
=== Effacer les "Failed Actions"
pcs status
Cluster name: my_cluster
Last updated: Fri Oct 14 16:40:14 2016 Last change: Fri Oct 14 16:36:04 2016 by root via cibadmin on node1
Stack: corosync
Current DC: node2 (version 1.1.13-10.el7_2.4-44eb2dd) - partition with quorum
2 nodes and 2 resources configured
Node node1: UNCLEAN (online)
Online: [ node2 ]
Full list of resources:
myvip (ocf::heartbeat:IPaddr2): Started node1
plopa (stonith:fence_ssh): Started node2
clusterfs (ocf::heartbeat:Filesystem): ORPHANED FAILED node1
Failed Actions:
* clusterfs_stop_0 on node1 'unknown error' (1): call=38, status=Timed Out, exitreason='none',
last-rc-change='Fri Oct 14 14:43:54 2016', queued=0ms, exec=60003ms
PCSD Status:
node1: Online
node2: Online
Daemon Status:
corosync: active/enabled
pacemaker: deactivating/enabled
pcsd: active/enabled
Source : https://supportex.net/blog/2011/09/cleaning-failed-actions-pacemakercorosync-cluster-setup/
Solution
#crm_resource -P
pcs resource cleanup
=== Pb node2: Unable to connect to node2 ([Errno 111] Connection refused)
journalctl -u pacemaker.service -f -p warning
-- Logs begin at ven. 2017-06-09 18:23:40 CEST. --
juin 12 11:28:23 pc-2 attrd[170467]: error: Cluster connection failed
juin 12 11:28:23 pc-2 pacemakerd[170463]: error: Managed process 170467 (attrd) dumped core
juin 12 11:28:23 pc-2 pacemakerd[170463]: error: The attrd process (170467) terminated with signal 11 (core=1)
juin 12 11:28:23 pc-2 cib[170464]: error: Could not connect to the Cluster Process Group API: 11
juin 12 11:28:23 pc-2 cib[170464]: crit: Cannot sign in to the cluster... terminating
juin 12 11:28:23 pc-2 pacemakerd[170463]: warning: The cib process (170464) can no longer be respawned, shutting the cluster down.
juin 12 11:28:23 pc-2 crmd[170469]: warning: Couldn't complete CIB registration 1 times... pause and retry
juin 12 11:28:23 pc-2 crmd[170469]: warning: FSA: Input I_SHUTDOWN from crm_shutdown() received in state S_STARTING
juin 12 11:28:23 pc-2 pacemakerd[170463]: error: The attrd process (170470) terminated with signal 15 (core=0)
juin 12 11:28:33 pc-2 stonith-ng[170465]: error: Could not connect to the CIB service: Transport endpoint is not connected (-107)
Sur les deux noeuds
pcs cluster stop --all
node2: Unable to connect to node2 ([Errno 111] Connection refused)
lsof -Pn -i TCP:2224
pacemakerd[170758]: error: The attrd process (170765) terminated with signal 15 (core=0)
systemctl status corosync
juin 12 12:39:02 pc-1 corosync[116364]: [MAIN ] Denied connection attempt from 189:189
juin 12 12:39:02 pc-1 corosync[116364]: [QB ] Invalid IPC credentials (116364-116387-2).
juin 12 12:39:02 pc-1 corosync[116364]: [MAIN ] Denied connection attempt from 189:189
juin 12 12:39:02 pc-1 corosync[116364]: [QB ] Invalid IPC credentials (116364-116384-2).
ps -C cib,stonithd,pcsd,lrmd,attrd,pengine,crmd
# Only pcsd
ss -lx |grep @cib_
# No unix socket open