{{tag>Brouillon Cluster}} = Cluster Pacemaker et Corosync Voir : * [[cluster_script_de_test_stonith_fencing]] * http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-intro-architecture.html * https://wiki.debian.org/Debian-HA/ClustersFromScratch * http://blogduyax.madyanne.fr/haute-disponibilite-avec-corosync-et-pacemaker.html * https://www.sebastien-han.fr/blog/2011/07/04/introduction-au-cluster-sous-linux/ * https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Administration/ch-startup-HAAA.html * Paquet resource-agents Pcs Pacemaker Configuration System : * https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Reference/s1-clusternodemanage-HAAR.html * https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/pdf/High_Availability_Add-On_Administration/Red_Hat_Enterprise_Linux-7-High_Availability_Add-On_Administration-en-US.pdf * https://www.digitalocean.com/community/tutorials/how-to-create-a-high-availability-setup-with-pacemaker-corosync-and-floating-ips-on-centos-7 * https://skcave.wordpress.com/2014/11/04/creating-high-availability-cluster-with-centos-7/ * https://www.digitalocean.com/community/tutorials/how-to-set-up-an-apache-active-passive-cluster-using-pacemaker-on-centos-7 * http://bigthinkingapplied.com/creating-a-linux-cluster-in-red-hatcentos-7/ == Conf install === Sur tous les nœuds ''/etc/hosts'' 192.168.56.21 node2 192.168.56.22 node1 yum install -y corosync pcs #pacemaker == PCS Pacemaker Configuration System yum install corosync pcs #pacemaker #passwd hacluster echo "passwd" | passwd hacluster --stdin systemctl start pcsd.service systemctl enable pcsd.service #unset http_proxy export NO_PROXY=localhost,127.0.0.1,node1,node2 pcs cluster auth node1 node2 pcs cluster setup --start --name my_cluster node1 node2 #pcs cluster setup --name my_cluster node1 node2 pcs cluster start node1 #pcs cluster auth pcs cluster start --all pcs cluster enable --all pcs resource create myvip IPaddr2 ip=192.168.56.100 pcs status resources pcs property set stonith-enabled=false # pcs property set no-quorum-policy=ignore # http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/_prevent_resources_from_moving_after_recovery.html pcs resource defaults resource-stickiness=100 # pcs resource # pcs resource defaults pcs status pcs resource --full pcs resource ''/etc/corosync/corosync.conf'' totem { version: 2 secauth: off cluster_name: my_cluster transport: udpu } nodelist { node { ring0_addr: node1 nodeid: 1 } node { ring0_addr: node2 nodeid: 2 } } quorum { provider: corosync_votequorum two_node: 1 } logging { to_logfile: yes logfile: /var/log/cluster/corosync.log to_syslog: yes } Autre exemple de fichier de conf ''/etc/corosync/corosync.conf'' totem { version: 2 secauth: off cluster_name: mycluster-1 transport: udpu token: 5000 } nodelist { node { ring0_addr: mycluster-a1 nodeid: 1 } node { ring0_addr: mycluster-b1 nodeid: 2 } } quorum { provider: corosync_votequorum auto_tie_breaker: 0 two_node: 1 } logging { to_syslog: yes } === Sécurité ==== Sur nœud 1 corosync-keygen -l Copie de la clef sur le nœud 2 scp /etc/corosync/authkey node2:/etc/corosync/authkey sed -i -e 's/secauth: off/secauth: on/' /etc/corosync/corosync.conf pcs cluster sync === Bascule pcs status resources myvip (ocf::heartbeat:IPaddr2): Started node1 pcs cluster standby node1 pcs status resources myvip (ocf::heartbeat:IPaddr2): Started node2 Réactivation node1 pcs cluster unstandby node1 Rebascule de la VIP **myvip** sur node1 pcs resource move myvip node1 Mise à zéro compteur pcs resource relocate clear === Vérif config crm_verify -LV crm_verify -LVVV == Autres Logs tailf /var/log/cluster/corosync.log Interface Web * https://192.168.56.21:2224 * https://192.168.56.22:2224 Compte **hacluster** == Pb === Connection to cluster failed: Transport endpoint is not connected ** Pas nécessaire / pas de pb avec pcs** crm_mon -1 Connection to cluster failed: Transport endpoint is not connected systemctl restart pacemaker crm_mon -1 Last updated: Fri Oct 7 11:59:55 2016 Last change: Fri Oct 7 11:03:51 2016 Stack: unknown Current DC: NONE 0 nodes and 0 resources configured Pour ne plus à devoir démarrer manuelelement **pacemaker** après **corosync** ''/etc/corosync/corosync.conf'' service { # Load the Pacemaker Cluster Resource Manager name: pacemaker ver: 1 } === Effacer les "Failed Actions" pcs status Cluster name: my_cluster Last updated: Fri Oct 14 16:40:14 2016 Last change: Fri Oct 14 16:36:04 2016 by root via cibadmin on node1 Stack: corosync Current DC: node2 (version 1.1.13-10.el7_2.4-44eb2dd) - partition with quorum 2 nodes and 2 resources configured Node node1: UNCLEAN (online) Online: [ node2 ] Full list of resources: myvip (ocf::heartbeat:IPaddr2): Started node1 plopa (stonith:fence_ssh): Started node2 clusterfs (ocf::heartbeat:Filesystem): ORPHANED FAILED node1 Failed Actions: * clusterfs_stop_0 on node1 'unknown error' (1): call=38, status=Timed Out, exitreason='none', last-rc-change='Fri Oct 14 14:43:54 2016', queued=0ms, exec=60003ms PCSD Status: node1: Online node2: Online Daemon Status: corosync: active/enabled pacemaker: deactivating/enabled pcsd: active/enabled Source : https://supportex.net/blog/2011/09/cleaning-failed-actions-pacemakercorosync-cluster-setup/ Solution #crm_resource -P pcs resource cleanup === Pb node2: Unable to connect to node2 ([Errno 111] Connection refused) journalctl -u pacemaker.service -f -p warning -- Logs begin at ven. 2017-06-09 18:23:40 CEST. -- juin 12 11:28:23 pc-2 attrd[170467]: error: Cluster connection failed juin 12 11:28:23 pc-2 pacemakerd[170463]: error: Managed process 170467 (attrd) dumped core juin 12 11:28:23 pc-2 pacemakerd[170463]: error: The attrd process (170467) terminated with signal 11 (core=1) juin 12 11:28:23 pc-2 cib[170464]: error: Could not connect to the Cluster Process Group API: 11 juin 12 11:28:23 pc-2 cib[170464]: crit: Cannot sign in to the cluster... terminating juin 12 11:28:23 pc-2 pacemakerd[170463]: warning: The cib process (170464) can no longer be respawned, shutting the cluster down. juin 12 11:28:23 pc-2 crmd[170469]: warning: Couldn't complete CIB registration 1 times... pause and retry juin 12 11:28:23 pc-2 crmd[170469]: warning: FSA: Input I_SHUTDOWN from crm_shutdown() received in state S_STARTING juin 12 11:28:23 pc-2 pacemakerd[170463]: error: The attrd process (170470) terminated with signal 15 (core=0) juin 12 11:28:33 pc-2 stonith-ng[170465]: error: Could not connect to the CIB service: Transport endpoint is not connected (-107) Sur les deux noeuds pcs cluster stop --all node2: Unable to connect to node2 ([Errno 111] Connection refused) lsof -Pn -i TCP:2224 pacemakerd[170758]: error: The attrd process (170765) terminated with signal 15 (core=0) systemctl status corosync juin 12 12:39:02 pc-1 corosync[116364]: [MAIN ] Denied connection attempt from 189:189 juin 12 12:39:02 pc-1 corosync[116364]: [QB ] Invalid IPC credentials (116364-116387-2). juin 12 12:39:02 pc-1 corosync[116364]: [MAIN ] Denied connection attempt from 189:189 juin 12 12:39:02 pc-1 corosync[116364]: [QB ] Invalid IPC credentials (116364-116384-2). ps -C cib,stonithd,pcsd,lrmd,attrd,pengine,crmd # Only pcsd ss -lx |grep @cib_ # No unix socket open