{{tag>Brouillon Supervision Nagios}} # Notes supervision Nagios Voir **Zabbix** Voir : * [[Exemple simple de conf Nagios]] * [Stop using Nagios - Andy Sykes](https://www.youtube.com/watch?v=Q9BagdHGopg) * https://guihot.fr/assets/doc/documentation_supervision.pdf * https://www.ibisc.univ-evry.fr/~petit/Enseignement/AdminSystem/Administration-reseau-avancee/2010-2011-administration-reseau/nagios.pdf * https://www.doc-developpement-durable.org/file/Projets-informatiques/cours-&-manuels-informatiques/Nagios/La%20supervision%20avec%20Nagios-Centreon.pdf * https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/4/en/monitoring-publicservices.html * https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/3/en/toc.html * https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/3/en/objectdefinitions.html#service Voir aussi : * Icinga * Shinken ## Administration ### Effacer l'historique des données remontées par les sondes Nagios ~~~bash /etc/init.d/nagios stop rm /usr/local/nagios/var/retention.dat rm /usr/local/nagios/var/objects.cache /etc/init.d/nagios start ~~~ A la place de systématiquement effacer ces fichiers avant de démarrer Nagios il est possible de changer : ''nagios.cfg'' ~~~bash #retain_state_information=1 retain_state_information=0 ~~~ ## Configuration Voir aussi : * https://community.icinga.com/t/help-please-adapting-disk-thresholds-per-host/6253 * [Add Host to HostGroup? Or add HostGroup to Host?](https://support.nagios.com/forum/viewtopic.php?t=44093) * https://wiki.monitoring-fr.org/nagios/nagios-debutant/templates-hostgroups-pivots.html * https://wiki.monitoring-fr.org/icinga/start.html ### Exemple de conf Exemple avec [[Supervision - Sonde Nagios - Mémoire Linux|check_snmp_mem_cpu.sh]] ''/usr/local/nagios/etc/objects/servers.cfg'' ~~~bash define service { service_description Memory hostgroup_name WEB_APP1 check_command check_snmp_mem_cpu!mem!80!90 max_check_attempts 1 normal_check_interval 1 retry_check_interval 1 check_period 24x7 notification_interval 2000 notification_period 24x7 notification_options w,c,r contact_groups support #event_handler trigger_memory } ~~~ ''/usr/local/nagios/etc/objects/commands.cfg'' ~~~bash define command { command_name check_snmp_mem_cpu command_line $USER1$/check_snmp_mem.sh -H $HOSTADDRESS$ -t $ARG1$ -w $ARG2$ -c $ARG3$ } ~~~ ### Supervision de services sans hôte réel associé Voir : * https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/3/en/monitoring-publicservices.html Voir aussi : * https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/3/en/monitoring-publicservices.html * https://stackoverflow.com/questions/13434107/how-to-configure-non-host-service-in-nagios * https://support.nagios.com/forum/viewtopic.php?t=49739 Un service doit forcémenet être attaché à un hôte pour pouvoir être utilisé. Dans certains cas il faudrait créer un hôte fantôme pour porter le service Dummy ''commands.cfg'' ~~~bash # 'check_dummy' command definition # NOTE: This command always returns an 'OK' result no matter what. define command { command_name check_dummy command_line $USER1$/check_dummy 0 } ~~~ ''remotes.cfg'' ~~~c define host { host_name generic use generic-host check_command check_dummy!0 # Revoit toujours OK max_check_attempts 1 contact_groups admins } define service { service_description plop use generic-service host_name generic check_command check_plop!80 } ~~~ ### Exemple conf host hostgroupe service ~~~c define host { use physical-host host_name busy-host.example.com alias busy-host.example.com address 10.43.16.1 hostgroups linux,centos,ldap,http,busy } define host { use physical-host host_name normal-host.example.com alias narmal-host.example.com address 10.43.1.1 hostgroups linux,centos,dns,proxy,ldap,hp,http,puppetmaster } define service { use generic-service hostgroup_name linux,!busy service_description Load check_command check_snmp_load } define service { use generic-service hostgroup_name busy service_description Load check_command check_snmp_load_busy } ~~~ ### Conf des hosts ### Conf des services ''etc/objects/servers.cfg'' ~~~c define service { use generic-service hostgroup linux-remotes-servers service_description Total Processes max_check_attempts 3 ; Re-check the service up to 3 times in order to determine its final (hard) state retry_check_interval 1 ; Re-check the service every minute until a hard state can be determined check_command check_snmp_host!procs!400!900 flap_detection_enabled 0 } ~~~ Exclusion ~~~c define service { service_description CPU Stats servicegroups sysres use generic hostgroup_name linux host_name !server1 check_command check_iostat } ~~~