File system full, but not really.

Hey all,

What do you think mostly happened in the following situation?

I have a Red Hat 5.5 server. Someone, somehow, managed to get two .nfs000.... type files that totaled over a terabyte in size. I removed them and thought things were back to normal. Then I started getting complains from users via a desktop popup they have that /home was full. I ssh'ed to the server and did a df and sure enough the df reported 100% in use. But a du reported only 109 gigs in use. The filesystem is 1.3 Terabytes. I cleaned up a few things and monitored the usage with df. This extra space very rapidly disappeared and df again reported 100%.

So I rebooted. After the reboot df reported only 20% in use and du backed it up. I did notice during the reboot that the hard drive lights were flashing widely which was an indication to me that the RAID 5 was rebuilding itself or doing some sort of QC. It actually took a while for the system to come up and I was getting pdflush timeouts before nash actually started.

Those .nfs files obviously maxed out the filesystem, but why would the system not be able to access it's disk space properly after their removal?

Thanks.

Removing the inode pointing to a file doesn't mean the file is really removed. As long as a process still has an open handle to that file it can continue to read and write to that file. Only when that handle is closed too, the space is reclaimed.

The difference between df and du is also explained by this. df checks for used/unused inodes and blocks (asking the FS module for that data), while du just checks the directory indexes for files. If a file is removed but still in use, that space is marked used in the file system (seen by df), while du can't see it anymore.

Next time, check with man lsof (Linux) or man fuser (Linux) which process and user is accessing that file.

Thanks Pludi, that does sound very reasonable. If it happens again I'll let you know so we can prove your thoughts on this. The user that owned the file did not have any processes running and I was not able to track the PID and PPID associations back to anything that caught my eye. Thanks for reminding me of lsof and fuser. I have heard of those commands, but, unfortunately, have not gotten into the habit of using them. My bad. I'll start that today.