ZFS does not release space even after deleting application log files in a non-global zone

Hi Guys,

I have a non-global zone in which has apache application on it. There is a ZFS filesystem where the app saves the log. Even after deleting the logfiles I dont see the space being freed up. There are no snapshots or anything at all

Zpool info
NAME SIZE ALLOC FREE CAP HEALTH ALTROOT
adpl203 9.94G 9.75G 190M 98% ONLINE -

ZFS info
NAME USED AVAIL REFER MOUNTPOINT
adpl203 9.75G 31.4M 31K /adpl203
adpl203/data 9.75G 31.4M 31K /adpl203/data
adpl203/data/ajb_home 9.75G 31.4M 9.75G /export/home/ajb

zfs list -o space
NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD
adpl203 29.7M 9.75G 0 31K 0 9.75G
adpl203/data 29.7M 9.75G 0 31K 0 9.75G
adpl203/data/ajb_home 29.7M 9.75G 0 9.75G 0 0

df -h 

adpl203/data 9.8G 31K 31M 1% /adpl203/data
adpl203/data/ajb_home 9.8G 9.8G 31M 100% /export/home/ajb

Can anyone please help me find a solution!

The log files are likely still open by apache, thus even unlinked, their data is still stored on the file system. This would happen with other file systems than ZFS. You should have blanked their content before removing them, as with:

> apache.log
rm apache.log

If that's the case, bouncing Apache should free the space.

Thanks for responding @jlliagre and @radoulov (bouncing apache or unmounting and mounting zfs fs are last options for me)...

I usually truncate the file after making a copy, but someone before me deleted the file directly. Now, when I do du -skh /export/home/ajb it shows around 70M but df and zfs list shows about 9.7g which is 100%. I have seen this discrepancy between du and df in UFS for which I use the following solution
Sorry it wont let me post a link to the source, for the source please google for
Discrepancies Between df And du Outputs

1. Find the file which has been unlinked through the procfs interface

# find /proc/*/fd \( -type f -a ! -size 0 -a -links 0 \) -print | xargs \ls -li
 415975 --w-------   0 user  group  2125803025 Oct 15 23:59 /proc/1252/fd/3


2. Eventually, get more detail about it:

  1252

3.Check to see if you can understand what is the content of the unlinked file:
# tail /proc/1252/fd/3
-------------------------------------------------------------------------------
2008-10-15 23:59:32.002116 - [MSG] BBG_Transmitter_class.cc, line 792 (thread 25087:4)
[4060] Sent a heartbeat
-------------------------------------------------------------------------------
BBG_Transmitter_class.cc: [4111] No activity detected. Send a Heartbeat message
-------------------------------------------------------------------------------
2008-10-15 23:59:32.134829 - [MSG] BBG_Transmitter_class.cc, line 1138 (thread 25087:4)
[4065] Heartbeat acknowledged by Bloomberg

4.You can correlate the size of the removed, but always referenced, file to the space accounted from the du and df tools:
# df -k /path/to
Filesystem            kbytes    used   avail capacity  Mounted on
/dev/md/dsk/d5       6017990 5874592   83219    99%    /path/to
# du -sk /path/to
3791632 /data
# echo "(5874616-3791632)*1024" | bc
2132975616

5. So, we now found the ~2GB log file which was always opened (used) by a process. Now, there are two solutions to be able to get back the freed space:
Truncate the unlinked file (quick workaround).
Simply restart properly the corresponding program (better option).

But I don't know if that'll work for a zfs filesystem.... Do you know any procedure like the one above where I can truncate the links which are holding up the space?

Well, as always - it depends, but a graceful restart of Apache usually has a limited impact.

1 Like

Truncating the unlinked file will certainly work with ZFS too.

2 Likes

Thanks for the input guys, we went the safest route by bouncing Apache and it cleared the issue. Regarding truncating linked file, I will keep that in mind and try it next time.

Truncating an open file is not guaranteed to work. The mode the file descriptor is in matters, as does the ability of the underlying file system to support sparse files. If the mode is "wrong", and I think "wrong" can depend on a lot of things, but in this case I think if the file wasn't opened with O_APPEND flag set is what's most important, the next time the process that has the file open writes to the file, the data will go to whatever the current file offset is for that open file descriptor. If the underlying file system doesn't support sparse files, and the current offset is 6 GB into the file, now that file you truncated is suddenly once again 6 GB in size.

In short, if the file is created like this:

daemon.process >> log.file

You should be OK if you truncate that log file.

If it's

daemon.process > log.file

You can't be sure that truncating log file will stick.

This isn't really a simple issue - the specific OS matters, the specific file system matters, the specific flags set on the open file descriptor matter, and the actions the process performs on the open file descriptor matter.

Try it:

cat /dev/null >> /some/file &

Let the file grow to a good size, then

cp /dev/null /some/file

Now try

cat /dev/null > /some/file
1 Like

Truncating the file is guaranteed to work, i.e. to release the disk space, in the OP context. The question clearly mention the underlying file system is ZFS. In any case, all file systems supported on Solaris support sparse files outside FAT32.