i-nodes - out of disk space on /tmp

Usage:

We run test build jobs that login to our AIX machines and create many small files in /tmp. After these jobs complete, they delete their temporary files that they have created.

Issue:

After approximately a week /tmp appears to become full. Issuing the command �df �g /tmp� shows that there is free disk space (almost the entire disk is free) however the �%Iused� displays 99%, which prevents us from writing any new files to /tmp.

Workaround:
The only solution I�ve been able to find, is to reboot all of our AIX machines, once a week.

Environment:

Operating System Model Series CPU Type
AIX 5300-07-03-0811 JS20 Blade PowerPC_970FX
AIX 5200-08-00 JS21 Blade PowerPC_970MP
AIX 5200-08-00 JS21 Blade PowerPC_970MP
AIX 6100-01-01-0823 JS21 Blade PowerPC_970MP
AIX 5200-08-CSP-0000 JS21 Blade PowerPC_970MP
AIX 6100-01-01-0823 JS22 Blade PowerPC_POWER6
AIX 6100-01-01-0823 JS22 Blade PowerPC_POWER6
AIX 5300-07-00-0000 JS22 Blade PowerPC_POWER6
AIX 6100-01-01-0823 JS22 Blade PowerPC_POWER6
AIX 6100-01-01-0823 JS22 Blade PowerPC_POWER6
AIX 6100-01-01-0823 JS22 Blade PowerPC_POWER6
AIX 6100-01-01-0823 JS22 Blade PowerPC_POWER6
AIX 5200-08-CSP-0000 eServer pSeries PowerPC_POWER4
AIX 6100-01-01-0823 JS22 Blade PowerPC_POWER6
AIX 5200-08-01-0000 eServer pSeries PowerPC_POWER4
AIX 5200-09-04-0000 JS21 Blade PowerPC_970FX
AIX 5300-04-03 JS21 Blade PowerPC_970FX
AIX 5300-06-03-0732 eServer p5 PowerPC_POWER5
AIX 5300-04-03 eServer p5 PowerPC_POWER5
AIX 5300-04-CSP-0000 JS21 Blade PowerPC_970FX

More Information:

When I run the command �ncheck -i /tmp� when there are no files in /tmp:

# pwd
/tmp
# ls -la
total 216
drwxrwsrwt 2 sys sys 102912 Nov 02 08:12 .
drwxr-xr-x 26 root system 4096 Nov 02 07:51 ..
#

ncheck will respond with over 25000 lines of files, which looks like this:

664822 /bld_17450_aix_ppc-64/test/java/jre/bin/shared_test/copyright.o
664832 /bld_17450_aix_ppc-64/test/java/jre/bin/shared_test/jvmti/.
664808 /bld_17450_aix_ppc-64/test/java/jre/bin/shared_test/main.c
664809 /bld_17450_aix_ppc-64/test/java/jre/bin/shared_test/main.h
664842 /bld_17450_aix_ppc-64/test/java/jre/bin/shared_test/main.o
664810 /bld_17450_aix_ppc-64/test/java/jre/bin/shared_test/makefile

These are in fact files/folders which our test jobs created and then deleted.

Is my issue related to some feature of AIX that allows for file recovery?
If so, is there a way to disable this on /tmp?

Or is there a way to release these used i-nodes, without rebooting the system?

the Files are being deleted even when there was a process holding on to it..

If I am right you are using rm -f (force option to delete the files)

Now what u can do to avoid reboots is

fuser -dV /tmp

that will show you the list of delete files holding on by a process..

u can get the corresponding process id and purge them.

Yes, you are correct the jobs do issue "rm -rf " to clean out /tmp when they are done.

Unfortunately I have already rebooted all of these systems today, so �fuser� returned nothing.

I would have to assume that running �fuser� would return many processes still active.

Perhaps I'll have to write a script to terminate these processes, or tell the software developer to �fix� their code :slight_smile:

Thanks for the help!

When a file is created by a process it gets an inode number and as it is filled with content it gets diskspace allocated.

It is possible to delete a file from one process while it is opened (and written to) by another. A "ls" or similar command will not show this file anymore, still the diskspace (as well as the inode) occupied by the file will remain occupied as long as the process is running. When the process is killed the inode as well as the diskspace is relinquished immediately. In your case tell the software developers that their scripting is bad and/or their software is even worse, because they must somewhere open files which they do not close. Not cleaning up - that is: releasing the resources you allocate - is as bad a behavior in software development as it is in housekeeping

Historically speaking this is one of the worser pranks you could play on your "favourite" systems administrator: create a file (prefereably named with a nonprinting character like "0x255" for instance) in /tmp and write to it from an insuspiciously named background process. Then delete the file from the commandline while the process is running. Wait until /tmp is filled and watch your sysadmin going nuts trying to find what it is - because /tmp seems to be empty and even the list of open file handles will (because of the nonprinting character) not reveal at first glance what is the culprit here.

Ahh, if forgot: a reboot spoils the party therefore do this on a production system where a restart is not so easy to manage.

bakunin

I found a workaround.... using /usr/sbin/slibclean
will clean up the system.

I also changed the access rights to this file so the user can issue this command.

-r-Sr-sr-x 1 root system 1860 Jun 21 2004 /usr/sbin/slibclean

"The slibclean command unloads all object files with load and use counts of 0. It can also be used to remove object files that are no longer used from both the shared library region and in the shared library and kernel text regions by removing object files that are no longer required."

Thanks for the help!