Unidentified File on /var Directory

Hi All,

I'm having problem with /var directory which is keep increasing. Here's the output of bdf and du command

# uname -a
HP-UX rppmis1 B.11.11 U 9000/800 1153414645 unlimited-user license
 
# bdf /var
Filesystem          kbytes    used   avail %used Mounted on
/dev/vg00/lvol8    2613248 2139568  473392   82% /var
 
# du -skx /var
1615552 /var

bdf and du command reported different size of used space, 2,139,568 - 1,615,552 = 524,016 KB. Can someone tell me how to find this file ?

Here's detail of /var directory

# du -skx /var/* | sort -k2
16      /var/X11
518344  /var/adm
408     /var/asx
4520    /var/dmi
152     /var/dt
0       /var/home
6160    /var/hp
0       /var/ifor
0       /var/lost+found
18320   /var/mail
0       /var/news
0       /var/obam
86048   /var/opt
0       /var/parmgr
118832  /var/patches
11376   /var/preserve
14336   /var/rbootd
24      /var/run
16160   /var/sam
1032    /var/spool
16      /var/statmon
768752  /var/stm
49872   /var/tmp
712     /var/tombstones
0       /var/uucp
304     /var/vx
96      /var/yp

FYI, this is what I've done to reduce size of /var directory

  • nulling log file such as /var/adm/wtmp
  • trimming /var/mail/oracle using vi editor
  • delete file such as /var/preserve/Ex*
  • delete /var/adm/crash/crash.5 directory and its content
  • delete an application log file on /var/tmp, initially I tried to null it with "cat /dev/null > trasym.ulma" command but it didn't work so I delete it. After I delete it, this file appeared again with same size before the deletion. After several times of deletion, finally this file is disappeared. Unfortunately I forgot to check with "fuser" command when delete this file to ensure that the respective process is already stop

Appreciate any feedback from all. Thanks

I would always prefer to:-

du -k | sort -n

.... to get the biggest at the bottom. You can also:-

ls -l | sort -nk 5

.... to get the biggest files in the current directory.

Does that direct you anywhere?

I see that /var/preserve is quite large. This is normally where editor recovery files are left. Perhaps they could be pruned. There is also mail waiting to be read. Have a look in /var/mail and for each large file, get the user to read their mail. These are normally the output from cron jobs, so perhaps you have something that works, but issues messages that you need to take care of.

I hope that this helps, but feel free to ask more if you still need help.

Robin
Liverpool/Blackburn
UK

There's no file missing. du reports the disk usage by files/directories. But there's more to that: file system infra structure/meta data like super blocks are consuming disk space. Plus, the OS reserves some space for emergency root access (not sure if this holds true for data file systems). This is what df reports. It goes without saying that the two infos differ.

1 Like

How much load do you have on your box, and what consumes midaemon?

Now, the file you deleted will only release its space if it were not used... since it was growing I doubt...

I forgot to say something else, quite important. If you have removed a large file (or large total size of files) and there was no change in the usage, then it could be that the file was in-use. The only thing you will have done is is to remove the entry from the directory it is in. The filesystem as a whole will still have the blocks marked as used until the process that has it open ends / closes it.

Use can use the fuser command to look for deleted files that are still in use on a filesystem. It depends on your OS as to what flags to use.

You are in the HP-UX thread, but I'm only a 11.11, so my fuser does not have this as an option. On other platforms, you could use something like:-

fuser -duV /var

Not sure if this helps, but I thought I should bring it up.

Robin

Hi vbe,
It ranges between 1.00 and 2.00 on daily basis. Here's today load

System: rppmis1                                       Mon Sep 30 08:47:20 2013
Load averages: 0.58, 0.93, 1.16
706 processes: 697 sleeping, 9 running
Cpu states:
CPU   LOAD   USER   NICE    SYS   IDLE  BLOCK  SWAIT   INTR   SSYS
 0    0.61  17.6%   1.4%  40.9%  40.1%   0.0%   0.0%   0.0%   0.0%
 1    0.54   5.5%   1.4%  11.5%  81.6%   0.0%   0.0%   0.0%   0.0%
---   ----  -----  -----  -----  -----  -----  -----  -----  -----
avg   0.58  11.5%   1.4%  26.2%  60.9%   0.0%   0.0%   0.0%   0.0%
Memory: 1282912K (172324K) real, 3158708K (471108K) virtual, 23280K free  Page# 1/65
CPU TTY     PID USERNAME PRI NI   SIZE    RES STATE    TIME %WCPU  %CPU COMMAND
 1   ?    28672 autoprod 154 20   107M  1276K sleep    1:19 26.65 26.61 oracleMES
 1   ?     8633 oracle   231 20   107M  1184K run      0:02 71.36  9.94 oracleMES
 1   ?    23987 oracle   148 22  1764K   160K sleep   48:35  5.77  5.76 tar
 0   ?     7446 autoprod 154 20 24380K  3500K sleep   15:28  3.89  3.89 eff
 0   ?     7529 autoprod 154 20   107M  1128K sleep   11:49  2.84  2.83 oracleMES
 1   ?    13271 autoprod 154 20   108M  1096K sleep   23:54  2.70  2.70 oracleMES
 1   ?     4602 autoprod 154 20   107M  1452K sleep    0:10  1.83  1.82 oracleMES
 1   ?    13190 autoprod 154 20 38172K  4708K sleep   11:19  1.00  1.00 eff
 0 pts/ts 19475 budik    168 20  9816K  1292K sleep    0:23  0.89  0.88 top
 1   ?       37 root     152 20  1888K  1888K run     39:41  0.68  0.68 vxfsd
# 
# ps -ef | grep midaemon
    root  8663  8620  1 08:47:27 pts/tj    0:00 grep midaemon

As you can see, midaemon isn't running.

Hi rbatte1,
Yes, I've removed a single large file with size about 1.5 GB from /var/tmp directory. It was an application log file and I thought it was safe to delete it. It took several times of deletion before this file is totally disappeared. After that I ran bdf command and see no change on usage. I thought this is the root cause but it only last for a while. This log file is generated again by the application but with a smaller size.

Beside this file, I've also deleted some other files. I was panic at that time so I didn't record properly what files on which directory that I've deleted. Maybe one or several of them caused this problem. Current conclusion is I have to find process that is still accessing these files and kill it in order to release those files.

I'm using HP-UX 11.11 too but fuser command only support following options

-c   Display the use of a mount point and any file beneath that
     mount point.  Each file must be a file system mount point.
-f   Display the use of the named file only, not the files
     beneath it if it is a mounted file system.
-u   Display the login user name in parentheses following each
     process ID.
-k   Send the SIGKILL signal to each process using each file.

Correct me if I'm wrong, since English isn't my native language :slight_smile: Based on above options, fuser command can not be used to find process that is using deleted file.

Is there any other command that I can use for this purpose ?

There is an article on the HP ITRC site. You will have to be registered to have a look at it though.

https://h20565.www2.hp.com/portal/site/hpsc/template.PAGE/public/kb/docDisplay/?spf_p.tpst=kbDocDisplay&spf_p.prp_kbDocDisplay=wsrp-navigationalState%3DdocId%253Demr_na-c01055283-2%257CdocLocale%253Den%257CcalledBy%253D&javax.portlet.begCacheTok=com.vignette.cachetoken&javax.portlet.endCacheTok=com.vignette.cachetoken

I hope that it helps. I don't have the tool crashinfo available on my HPUX server. It seem that you have to request it from HP support, according to this article:-

Viewing Open Sockets and Ports with Crashinfo | HP� Support

I hope that this helps.

Robin
Liverpool/Blackburn
UK

/var/patches is certainly not needed.
Check if accessed the last 365 days with

find /var/patches -type f -atime -365 -exec ls -lu {} \;

"rotate" the /var/adm/syslog/syslog.log with

/sbin/init.d/syslogd stop
/sbin/init.d/syslogd start

On October 8th 2013, %used that reported by bdf command already reached 89%. I've escalated this issue to HP Call Center and they suggested to stop diagnostic service, delete or move files on /var/stm/logs/os and start diagnostic service. This help to reduce %used to 61%. They've also asked me to update Event Monitoring System (EMS) and onlinediag to the latest version to solve difference of disk usage that reported by bdf and du command. I decided not to update them since %used dropped to 61%.

On October 10th 2013, %used became 62% and the next day became 63%. This is not good because it keeps on growing again. Fortunately I've found "The /var filesystem is full." thread on HP forum (sorry, I can't post the URL due to my posts is < 5 but you can google it with "hp-ux /var full" keyword). So I downloaded lsof utility, installed it and ran it. Finally I've managed to found processes that were locking deleted files. After I killed them, %used dropped to 38%.

Many thanks for all.