Notifications not coming through

rgouette · May 12, 2017, 3:08pm

Issue: I'm not receiving notifications

I can succesfully receive an e-mail if I do this on the command line:

/usr/bin/mail -s "NAGIOS HOST ALERT on $HOSTNAME$" rgouette@butlerbros.com

but, my command.cfg configuration below, refuses to send an e-mail when I set a service to a critical condition, and perform a force check:

# 'notify-service-by-email' command definition
define command{
        command_name    notify-service-by-email
        command_line    /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
        }

I even tried my explicit e-mail address instead of the MACRO, and still no e-mail sent.

Here is the debug log output for the force check(using my explicit e-mail addy):

[pid=19321] Processing: '/usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" rgouette@butlerbros.com'
[1494614719.490217] [2048.1] [pid=19321]   Done.  Final output: '/usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: CUSTOM\n\nService: C:\ Drive Space\nHost: IT-APP01\nAddress: 10.0.0.53\nState: CRITICAL\n\nDate/Time: Fri May 12 14:45:19 EDT 2017\n\nAdditional Info:\n\nc:\ - total: 59.90 Gb - used: 51.10 Gb (85%) - free 8.80 Gb (15%)\n" | /usr/bin/mail -s "** CUSTOM Service Alert: IT-APP01/C:\ Drive Space is CRITICAL **" rgouette@butlerbros.com'

the $CONTACTEMAIL$ macro does expand into my address when I use that..

Here's the hosts .cfg section for that service:

define service{
	use			generic-service
	host_name		IT-APP01
	service_description	C:\ Drive Space
	check_command		check_nt!USEDDISKSPACE!-l c -w 9 -c 10
	servicegroups		C_DRIVE_SPACE
	}

and here's the generic service, as defined in services.cfg:

# Generic service definition template - This is NOT a real service, just a template!

 define service{
         name                            generic-service 	; The 'name' of this service template
         active_checks_enabled           1       		; Active service checks are enabled
         passive_checks_enabled          1    		   	; Passive service checks are enabled/accepted
         parallelize_check               1       		; Active service checks should be parallelized (disabling this can lead to major performance problems)
         obsess_over_service             1       		; We should obsess over this service (if necessary)
         check_freshness                 0       		; Default is to NOT check service 'freshness'
         notifications_enabled           1       		; Service notifications are enabled
         event_handler_enabled           1       		; Service event handler is enabled
         flap_detection_enabled          1       		; Flap detection is enabled
         process_perf_data               1       		; Process performance data
         retain_status_information       1       		; Retain status information across program restarts
         retain_nonstatus_information    1       		; Retain non-status information across program restarts
         is_volatile                     0       		; The service is not volatile
         check_period                    24x7			; The service can be checked at any time of the day
         max_check_attempts              3			; Re-check the service up to 3 times in order to determine its final (hard) state
         normal_check_interval           10			; Check the service every 10 minutes under normal conditions
         retry_check_interval            2			; Re-check the service every two minutes until a hard state can be determined
         contact_groups                  admins			; Notifications get sent out to everyone in the 'admins' group
 	 notification_options		 c,r		; Send notifications about warning, unknown, critical, and recovery events
         notification_interval           60			; Re-notify about service problems every hour
         notification_period             24x7			; Notifications can be sent out at any time
         register                        0       		; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
         }

nagios.cfg seems to be in order as far as references...

Nagios version: Nagios� Core� 4.2.0
running Postfix

I'm just at a loss for where a misconfiguration might exists..

Thanks,
rich

jim_mcnamara · May 12, 2017, 10:58pm

Can I posit that that you need to thoroughly check the postfix logs? I know postfix logs by default go to

 /var/log/mail*

I think your problem is not in the service, rather it is with postfix seeing something as garbage that is not. You may have to tweak the service, but something is not right with mail. IMO, a priori.

rgouette · May 15, 2017, 11:32am

ok, thanks Jim.
I think I was spacing out in not examining those as well..

Rich

---------- Post updated at 11:32 AM ---------- Previous update was at 11:30 AM ----------

oh, interesting..
I just received a RECOVERY e-mail, from a down Windows server, but only because
I FORCED a check..
Any ideas are welcome..

I'll dig in more..
R

jim_mcnamara · May 15, 2017, 11:52am

Is that windows server the primary mail outbound server? This whole thing really sounds more like email is having issues. Larger sites have a single mail server (or two).

What does the environment variable mailhost on the problem postfix box say? You might have to su to the postfix process owner first. What is in resolv.conf (guessing linux) assuming that exists.... I have seen mailhost defined in there as well.