/var/adm/messages not updating

This is Solaris 10.

I have devices sending syslog, but the /var/adm/messages file is not updating anymore. Here's what I did when it stopped. I wanted to change the location of where the messages are being logged to a SAN drive.

1) Made a backup of syslog.conf
2) Edited the file to change the 2 lines sending messages to /var/adm/messages to another path (maintaining the correct tabs, etc)
3) Refreshed the service using the "kill -HUP <pid>" command

If I try to restart the service using the the command "svcadm restart svc:/system/system-log:default", I see this error in the messages file:

Mar 31 14:02:29 tnsp03350 syslogd: going down on signal 15
Mar 31 14:02:29 tnsp03350 syslogd: Unable to bind syslog port for 0.0.0.0.2.2

Does this mean anything?

I can use the logger command to successfully write to the file.

I also rotated the messages file by renaming it, creating a new one, then refreshing the service - that hasn't helped either.

Any help would be appreciated.

Thanks!

Please post the contents of syslog.conf before and after the change.

If (and only if) the new syslog.conf is valid, are you in a position to reboot the server?

Btw. There are established techniques for dealing with the size of /var/adm/messages . Giving it space to grow unbounded is not one of them. Renaming the file is not one either. You are dealing with an open file from an active process.

On most systems the system logging process will fail if the log file tries to exceed 2 Gb or if /var/adm runs out of disc space.

How big is the original /var/adm/messages file which may have prompted you to try to relocate the log?

Make sure that syslogd daemon is having sufficient permission to write on the file which you created instead /var/adm/messages.

Thanks,
Deepak

Any recent activities like OS hardening, etc done prior to this problem? what are the directory and file permissions? default values?

The messages file was over 600mb. Plenty of room on disk.

I submitted a request to have the box rebooted.

The syslog.conf file below is what's in place right now. Not sure if this is original - Splunk was installed on this box so it may have changed it.

After I backed it up, the only changes I made to this file were the 2 paths highlighted in red. Here is the original - my change was simply just a new path - with the tabs retained.

#ident  "@(#)syslog.conf        1.5     98/12/14 SMI"   /* SunOS 5.0 */
#
# Copyright (c) 1991-1998 by Sun Microsystems, Inc.
# All rights reserved.
#
# syslog configuration file.
#
# This file is processed by m4 so be careful to quote (`') names
# that match m4 reserved words.  Also, within ifdef's, arguments
# containing commas must be quoted.
#
*.err;kern.notice;auth.notice                   /dev/sysmsg
*.err;kern.debug;daemon.notice;mail.crit        /var/adm/messages
*.alert;kern.err;daemon.err                     operator
*.alert                                         root
*.emerg                                         *
# if a non-loghost machine chooses to have authentication messages
# sent to the loghost machine, un-comment out the following line:
#auth.notice                    ifdef(`LOGHOST', /var/log/authlog, @loghost)
mail.debug                      ifdef(`LOGHOST', /var/log/syslog, @loghost)
#
# non-loghost machines will use the following lines to cause "user"
# log messages to be logged locally.
#
ifdef(`LOGHOST', ,
user.err                                        /dev/sysmsg
user.err                                        /var/adm/messages
user.alert                                      `root, operator'
user.emerg                                      *
)

From googling, I read that one of the correct ways to do this was to rename or copy the messages file, then restart the service which is what I did. This is one of the links I used as a guide, although there were many:

Solaris System Admin tips: /var/adm/messages

I'm sure there was some hardening done, but that would have been done before I got access to the box. SYSLOG WAS working, right up until the point where I tried to make the change.

Here are the permissions of the files in the folder:

drwxrwxr-x 5 adm adm 5 Feb 10 15:13 acct
-rw------- 1 uucp bin 0 Aug 25 2008 aculog
drwxr-xr-x 2 adm adm 2 Mar 3 2009 exacct
-r--r--r-- 1 root root 14302092 Apr 1 15:40 lastlog
drwxr-xr-x 2 adm adm 2 Mar 3 2009 log
-rw-r--r-- 1 root root 0 Mar 31 15:38 messages
-rw-r--r-- 1 root root 502826 Mar 26 03:04 messages.0
-rw-r--r-- 1 root root 6971261 Mar 19 03:07 messages.1
-rw-r--r-- 1 root root 618895 Mar 11 03:09 messages.2
-rw-r--r-- 1 root root 1330218 Mar 4 03:00 messages.3
drwxr-xr-x 2 root sys 2 Mar 3 2009 pool
drwxrwxr-x 2 adm sys 2 Mar 3 2009 sa
-r-------- 1 root root 110 Mar 18 22:04 setpass.log
drwxr-xr-x 2 root sys 2 Mar 3 2009 sm.bin
-rw-rw-rw- 1 root bin 0 Aug 25 2008 spellhist
drwxr-xr-x 2 root sys 2 Mar 3 2009 streams
-rw------- 1 root nhbw13t 4493 Apr 1 08:55 sudo.log
-rw------- 1 root root 216 Mar 25 15:37 sulog
-rw-r--r-- 1 root root 0 Feb 10 15:45 syslog
-rw-r--r-- 1 root bin 3348 Apr 1 12:40 utmpx
-rw-r--r-- 1 root root 0 Mar 3 2009 vold.log
drwxr-xr-x 2 root sys 6 Mar 8 14:35 vx
-rw-r--r-- 1 adm adm 630540 Apr 1 15:40 wtmpx

Thanks for the input guys!

1 Like

Sorry for the formatting of the syslog.conf in the message - I'm not sure how to get it to look like the original. I attached it instead.

Try to redirect errors on central server

The Blog of Ben Rockwood

please give me output from

du -h /var/log

df -k /var/log

Do you have dual mount point : one for /var and one for /root ?

Is there any syslogd process still running? If so, what is it doing? What's the output from "pfiles [syslogd PID]"? What does the output from "truss -vall -d -o /some/output/file -p [syslogd PID]" show?

If it is still running, and your syslog.conf file is correct, sending "kill -HUP [syslogd PID]" should cause it to reread the syslog.conf file.

How, exactly, did you create the new "/var/adm/messages" file? The normal way to create a new log file when one already exists is to rename the existing file, create a new file using something like "touch", then "kill -HUP" the syslogd daemon.

Hehe, funny that you posted this link as I also used this as a reference, actually saved this link as I thought it was well written.

Here is the output for the 2 commands:

tnsp03350 > du -h /var/log
112K /var/log/VRTSpbx
3K /var/log/swupas
27K /var/log/webconsole/console
28K /var/log/webconsole
1K /var/log/pool
700K /var/log

tnsp03350 > df -k /var/log
Filesystem 1024-blocks Used Available Capacity Mounted on
rpool/ROOT/iZFS/var 12582912 1494089 11088799 12% /var

Here is our filesystem:

tnsp03350 > df -h
Filesystem Size Used Available Capacity Mounted on
rpool/ROOT/iZFS 134G 8.1G 106G 8% /
/devices 0K 0K 0K 0% /devices
ctfs 0K 0K 0K 0% /system/contract
proc 0K 0K 0K 0% /proc
mnttab 0K 0K 0K 0% /etc/mnttab
swap 36G 1.7M 36G 1% /etc/svc/volatile
objfs 0K 0K 0K 0% /system/object
sharefs 0K 0K 0K 0% /etc/dfs/sharetab
fd 0K 0K 0K 0% /dev/fd
rpool/ROOT/iZFS/var 12G 1.4G 11G 12% /var
swap 36G 32K 36G 1% /tmp
swap 36G 72K 36G 1% /var/run
swap 36G 0K 36G 0% /dev/vx/dmp
swap 36G 0K 36G 0% /dev/vx/rdmp
rpool/home 2.0G 141K 2.0G 1% /export/home
rpool/crash 16G 21K 16G 1% /var/crash
rpool/cores 2.0G 52M 1.9G 3% /var/crash/cores
rpool/ROOT/iZFS/var/tmp
2.0G 24K 2.0G 1% /var/tmp
rpool/ROOT/iZFS/zones
24G 18K 24G 1% /zones
/dev/vx/dsk/datadg/idrsplunk
1.1T 4.0G 1.1T 1% /opt/shared/data/idrsplunk

The syslogd daemon is in fact running. I can use "logger" at the command line to get messages in it.

Here is the output from pfiles <pid>:

tnsp03350 > pfiles 29879
29879: /usr/sbin/syslogd
Current rlimit: 65536 file descriptors
0: S_IFDIR mode:0755 dev:256,65538 ino:3 uid:0 gid:0 size:43
O_RDONLY
/
1: S_IFDIR mode:0755 dev:256,65538 ino:3 uid:0 gid:0 size:43
O_RDONLY
/
2: S_IFDIR mode:0755 dev:256,65538 ino:3 uid:0 gid:0 size:43
O_RDONLY
/
3: S_IFDOOR mode:0444 dev:334,0 ino:57 uid:0 gid:0 size:0
O_RDONLY|O_LARGEFILE FD_CLOEXEC door to nscd[415]
/var/run/name_service_door
4: S_IFCHR mode:0600 dev:325,0 ino:50855940 uid:0 gid:3 rdev:97,0
O_WRONLY|O_APPEND|O_NOCTTY|O_LARGEFILE
/devices/pseudo/sysmsg@0:sysmsg
5: S_IFREG mode:0644 dev:256,65539 ino:24998 uid:0 gid:0 size:0
O_WRONLY|O_APPEND|O_NOCTTY|O_LARGEFILE
/var/adm/messages
6: S_IFREG mode:0644 dev:256,65539 ino:25027 uid:0 gid:3 size:2134
O_WRONLY|O_APPEND|O_NOCTTY|O_LARGEFILE
/var/log/syslog
8: S_IFCHR mode:0000 dev:325,0 ino:2980 uid:0 gid:0 rdev:21,6
O_RDONLY
/devices/pseudo/log@0:log
9: S_IFDOOR mode:0777 dev:333,0 ino:0 uid:0 gid:0 size:0
O_RDWR FD_CLOEXEC door to syslogd[29879]

I have used the "kill -HUP <pid>" many times, hasn't seemed to have helped.

I now don't recall whether I copied or renamed the original file and then refreshed/restarted syslog. I found articles that said either method were acceptable.

The box hasn't been rebooted yet so I'll let you know if that helps at all.

---------- Post updated at 09:04 AM ---------- Previous update was at 08:26 AM ----------

Sorry, forgot to post the output of "truss -vall -d -o /some/output/file -p [syslogd PID]". I'm not sure if this stops on its own or not - I stopped it after a couple of minutes:

Base time stamp: 1270211337.1279 [ Fri Apr 2 08:28:57 EDT 2010 ]
/28: lwp_park(0x00000000, 0) (sleeping...)
/12: pollsys(0x0003A5F0, 0, 0x00000000, 0x00000000) (sleeping...)
/9: pollsys(0x00032FC4, 1, 0x00000000, 0x00000000) (sleeping...)
/9: fd=8 ev=POLLIN rev=POLLIN
/10: door_return(0x00000000, 0, 0x00000000, 0) (sleeping...)
/8: lwp_park(0x00000000, 0) (sleeping...)
/1: sigtimedwait(0xFFBFFC68, 0xFFBFFBE8, 0x00000000) (sleeping...)
/1: sigmask = 0x0000F007 0 0 0
/27: lwp_park(0x00000000, 0) (sleeping...)
/31: lwp_park(0x00000000, 0) (sleeping...)
/30: lwp_park(0x00000000, 0) (sleeping...)
/26: lwp_park(0x00000000, 0) (sleeping...)
/11: lwp_park(0x00000000, 0) (sleeping...)
/29: lwp_park(0x00000000, 0) (sleeping...)
/13: door_return(0x00000000, 0, 0x00000000, 0) (sleeping...)
/10: 63.8727 door_return(0x00000000, 0, 0x00000000, 0) = 0
/9: 63.8728 pollsys(0x00032FC4, 1, 0x00000000, 0x00000000) = 1
/9: fd=8 ev=POLLIN rev=POLLIN
/9: 63.8732 getmsg(8, 0xFE8B7B20, 0xFE8B7F30, 0xFE8B7F3C) = 0
/9: ctl: maxlen=24 len=24 buf=0xFE8B7B08: "\0 ,\0\0\0DD\010"..
/9: dat: maxlen=1024 len=113 buf=0xFE8B7B2C: " A p r 2 0"..
/9: flags: 0x0000
/9: 63.8735 lwp_unpark(8) = 0
/8: 63.8735 lwp_park(0x00000000, 0) = 0
/9: 63.8756 pollsys(0x00032FC4, 1, 0x00000000, 0x00000000) = 1
/9: fd=8 ev=POLLIN rev=POLLIN
/10: 63.8756 door_return(0x00000000, 0, 0x00000000, 0) = 0
/9: 63.8757 getmsg(8, 0xFE8B7B20, 0xFE8B7F30, 0xFE8B7F3C) = 0
/9: ctl: maxlen=24 len=24 buf=0xFE8B7B08: "\0 ,\0\0\0DD\010"..
/9: dat: maxlen=1024 len=113 buf=0xFE8B7B2C: " A p r 2 0"..
/9: flags: 0x0000
/9: 63.8760 lwp_unpark(8) = 0
/8: 63.8760 lwp_park(0x00000000, 0) = 0
/9: 63.8822 pollsys(0x00032FC4, 1, 0x00000000, 0x00000000) = 1
/9: fd=8 ev=POLLIN rev=POLLIN
/10: 63.8822 door_return(0x00000000, 0, 0x00000000, 0) = 0
/9: 63.8823 getmsg(8, 0xFE8B7B20, 0xFE8B7F30, 0xFE8B7F3C) = 0
/9: ctl: maxlen=24 len=24 buf=0xFE8B7B08: "\0 ,\0\0\0DD\010"..
/9: dat: maxlen=1024 len=92 buf=0xFE8B7B2C: " A p r 2 0"..
/9: flags: 0x0000
/9: 63.8826 lwp_unpark(8) = 0
/8: 63.8826 lwp_park(0x00000000, 0) = 0
/8: 63.8828 lwp_unpark(31) = 0
/31: 63.8828 lwp_park(0x00000000, 0) = 0
/31: 63.8830 write(6, " A p r 2 0 8 : 3 0".., 101) = 101
/10: door_return(0x00000000, 0, 0x00000000, 0) (sleeping...)
/8: lwp_park(0x00000000, 0) (sleeping...)
/31: lwp_park(0x00000000, 0) (sleeping...)
/9: pollsys(0x00032FC4, 1, 0x00000000, 0x00000000) (sleeping...)
/9: fd=8 ev=POLLIN rev=POLLIN
/9: 65.6017 pollsys(0x00032FC4, 1, 0x00000000, 0x00000000) = 1
/9: fd=8 ev=POLLIN rev=POLLIN
/10: 65.6017 door_return(0x00000000, 0, 0x00000000, 0) = 0
/9: 65.6019 getmsg(8, 0xFE8B7B20, 0xFE8B7F30, 0xFE8B7F3C) = 0
/9: ctl: maxlen=24 len=24 buf=0xFE8B7B08: "\0 ,\0\0\0DD\010"..
/9: dat: maxlen=1024 len=113 buf=0xFE8B7B2C: " A p r 2 0"..
/9: flags: 0x0000
/9: 65.6022 lwp_unpark(8) = 0
/8: 65.6022 lwp_park(0x00000000, 0) = 0
/9: pollsys(0x00032FC4, 1, 0x00000000, 0x00000000) (sleeping...)
/9: fd=8 ev=POLLIN rev=POLLIN
/10: door_return(0x00000000, 0, 0x00000000, 0) (sleeping...)
/8: lwp_park(0x00000000, 0) (sleeping...)
/10: 93.9441 door_return(0x00000000, 0, 0x00000000, 0) = 0
/9: 93.9441 pollsys(0x00032FC4, 1, 0x00000000, 0x00000000) = 1
/9: fd=8 ev=POLLIN rev=POLLIN
/9: 93.9445 getmsg(8, 0xFE8B7B20, 0xFE8B7F30, 0xFE8B7F3C) = 0
/9: ctl: maxlen=24 len=24 buf=0xFE8B7B08: "\0 ,\0\0\0DD\010"..
/9: dat: maxlen=1024 len=111 buf=0xFE8B7B2C: " A p r 2 0"..
/9: flags: 0x0000
/9: 93.9448 lwp_unpark(8) = 0
/8: 93.9448 lwp_park(0x00000000, 0) = 0
/8: lwp_park(0x00000000, 0) (sleeping...)
/9: pollsys(0x00032FC4, 1, 0x00000000, 0x00000000) (sleeping...)
/9: fd=8 ev=POLLIN rev=POLLIN
/10: door_return(0x00000000, 0, 0x00000000, 0) (sleeping...)

And what with read/write permissions ?

I had some problem where applications had not permissions to write at /var/log

**Content deleted by reborg.

Never advise anyone to use chmod in this manner.

Well, /var/adm/messages is open as fd 5:

But the truss output only shows writes being done to fd 6, which is /var/log/syslog.

The gid on /var/adm/messages is 0, but the gid on /var/log/syslog is 3. Since the files are already open, I don't see how that can make any difference, but it's really easy to fix with chown or chgrp and then HUP the syslogd process. It might be revealing to have the truss output of the process when it gets HUP'd, too.

The only other thing I can think of is a typo or other mistake somewhere in syslog.conf. I know from experience that the way syslogd processes that file is very unforgiving.

Compare the working server with /etc/hosts and /etc/nsswitch.conf

Hey guys - thanks for all of your responses with this. The server was rebooted over the weekend and that took care of it. The messages file is now being populated correctly.