How to setup monitoring for mailq count?

Hello,

I have postfix running on RedHat 7.6 server. All client servers relay emails to that server and that mail server will forward to Microsoft Exchange and then we will get emails to our Inbox.
The previous week, due to some bad scripts, mailq was over 200k and that chocked the server All emails were getting delivered with long delay. It was cleared and fixed but management wants me to put some kind of monitoring if such an incident happens again.
post service is being monitored by HPOV tool, as standard monitoring. But here ask is, how we will know if mail count is going high to an alarming level. It is difficult to determine if 10 is a good number in mailq or 1000 or 10,000. Depend on the load or day of the month, the mail count can be high or low. For example, some reports will be generated every Friday night so counts will be high. Similarly, there will be month-end reports that will increase this count temporarily.

Is there any different way I should monitor it?

Any suggestions please?

Thanks

500 is a good number to start with. Not taking too much space, and all file systems are still fast with that many directory entries.

Thank you, this is should be a good number to consider

I suggest you monitor the count for 1-2 months with some reasonable first try alarm threshold(500?) but additionally collecting the actual values. After those 1-2 months you have clear data what the normal workload of your server is at which time timeframes and you may readjust your threshold based upon that.

Thats a good idea.
Thanks

It's been awhile since I worked with HPOV and I don't quite remember all the HPOV lingo, but...
On other Network Management tools one could set the so-called "baseline" collection. Once the baselining is set for a particular "metric" and the system is properly "trained", one could set the thresholding against the baseline as a percentage of "deviation from the norm". Say, you set "20% over the baseline" where the SNMP trap will be sent when the baseline is 50 and you current collected value 62.
This makes the thresholding quite a bit more than dynamic and not as static as "trial-and-error" by setting static threshold levels...
There're different types of theresholds as well: burst (once crossed), duration (10mins OR 3 polling cycles over/under) etc....

Once again, I don't know/remember if HPOV has anything similar to the above...

1 Like

Understood. Thanks, good idea