Force unmount of a SWAP filesystem left over from bad Live Upgrade

Hello,

Long story short, we built an alternate boot env. back in July and applied the latest CPU to about 15 servers. Of the 15 servers, 7 servers failed to start the zones after the luactivate / reboot. The zones failed to rename from <zone>-<boot_env> back to <zone>. This is fixed in patch 121430-95.

Anyway, we got that mess fixed up (or so we thought) but now I'm trying to apply the Oct CPU and running into an issue. I can build a new boot env. I can even apply-prereq but I cannot patch the BE.

It appears there is zone/lu mount that is causing me this problem.

# ./installpatchset -B Oct2017CPU --s10patchset
ERROR: Failed to determine zone configuration for target boot environment.

       Please ensure all LU boot environments are unmounted, and altroots other
       than the target altroot are unmounted prior to executing this script.

# df -h | grep -i swap | grep lu
swap                   309G     0K   309G     0%    /zoneroot/spahwfm1/lu

# umount /zoneroot/spahwfm1/lu
umount: /zoneroot/spahwfm1/lu busy

# fuser -cu /zoneroot/spahwfm1/lu
/zoneroot/spahwfm1/lu:

# umount -f /zoneroot/spahwfm1/lu
umount: Operation not supported
umount: cannot unmount /zoneroot/spahwfm1/lu

We rebooted one of the servers and the lu mount did not come back and than we can complete the installcluster / installpatchset.

Is there any way for me to force an unmount of this zone lu SWAP?

I don't think I'll be able to request that we reboot 6 production servers just to patch them and reboot them again.

HELP!

EDIT: Two things I want to mention, no, the /etc/zones/index does not list the global zone as configured instead of install (at global or in BE) and no I am not missing zone packages. I know this is related to the zone lu SWAP.

On Solaris 10 local zone swap space is inherited from the global zone. So the swap you are trying to manipulate should show up there.

This swap space is obviously still in use so you probably need to tell the system to stop using it (provided the system has enough swap space elsewhere to keep running).

Tell the OS to stop using this swap:

# /usr/sbin/swap -d <path to swap>

Reference:
Removing a Swap File From Use - System Administration Guide: Devices and File Systems

What O/S version is it exactly?
What filesystem type, UFS or ZFS?

1 Like

Thanks for the reply!

This is a Solaris 10 server with a ZFS root file system.

The swap is at the global zone. That is where I am seeing it. I checked the (single) zone running on the box and it doesn't appear to be using that swap volume.

This swap is not listed using the swap command

From the global zone:

# df -h | grep -i swap
swap                   308G   1.8M   308G     1%    /etc/svc/volatile
swap                   308G   1.1M   308G     1%    /tmp
swap                   308G   112K   308G     1%    /var/run
swap                   308G     0K   308G     0%    /zoneroot/spahwfm1/lu

# swap -l
swapfile             dev  swaplo blocks   free
/dev/zvol/dsk/rpool/swap 256,1      16 268435440 268435440

# swap -d /zoneroot/spahwfm1/lu
/zoneroot/spahwfm1/lu: Is a directory

From the zone:

# swap -l
swapfile             dev  swaplo blocks   free
/dev/swap           4294967295,4294967295     16 33554432 31264976

# df -h | grep -i swap
swap                    16G   1.1G    15G     7%    /etc/svc/volatile
swap                    16G   1.1G    15G     7%    /tmp
swap                    16G   1.1G    15G     7%    /var/run

Hmmmmm...............

You posted,

Is it really? Can you check that out. If it is a directory, is there anything in it?

Yes, it is an empty directory

[root@spahwfm1gz:/zoneroot/spahwfm1] # ls -lh
total 46
drwxr-xr-x  14 root     sys           55 Oct 25 18:30 dev
drwxr-xr-x   2 root     root         117 Aug 24 14:12 lu
drwxr-xr-x  50 root     root          59 Oct 26 09:18 root
[root@spahwfm1gz:/zoneroot/spahwfm1] # cd lu
[root@spahwfm1gz:/zoneroot/spahwfm1/lu] # ls -la
total 19
drwxr-xr-x   2 root     root         117 Aug 24 14:12 .
drwx------   5 root     root           5 May  5 20:30 ..
[root@spahwfm1gz:/zoneroot/spahwfm1/lu] # du -hs
   8K   .
[root@spahwfm1gz:/zoneroot/spahwfm1/lu] #

I also cannot remove it, was hoping to than be able to brute force an umount

[root@spahwfm1gz:/zoneroot/spahwfm1/lu] # cd ..
[root@spahwfm1gz:/zoneroot/spahwfm1] # rmdir lu
rmdir: directory "lu": Directory is a mount point or in use
[root@spahwfm1gz:/zoneroot/spahwfm1] # rm -Rf lu/
rm: Unable to remove directory lu/: Device busy

From global zone can you try:

 
 # /usr/sbin/swap -d /zoneroot/spahwfm1/lu
 

using full path of swap command in case you've got another swap command in your path.

Just covering all options. Still thinking about this one.

Thanks! No luck unfortunately

[root@spahwfm1gz:/] # /usr/sbin/swap -d /zoneroot/spahwfm1/lu
/zoneroot/spahwfm1/lu: Is a directory

Can you go into that directory /zoneroot/spahwfm1/lu again and use some other 'ls' switches like '-b' (unprintable character display) and '-i' (print inode numbers) to check again whether there's a hidden file in there.

Global says it's swapping to somewhere in there.

Also, is this swap area listed in /etc/vfstab to mount at boot time?

It's not in vfstab and I see nothing in the folder. What is frustrating is I know this 'lu' swap directory will be gone on reboot and than the installpatchset will work properly. This is a left over remnant of a bad Live Upgrade.

[root@spahwfm1gz:/] # cd /zoneroot/spahwfm1/lu
[root@spahwfm1gz:/zoneroot/spahwfm1/lu] # ls -b
[root@spahwfm1gz:/zoneroot/spahwfm1/lu] # ls -i
[root@spahwfm1gz:/zoneroot/spahwfm1/lu] # cat /etc/vfstab | grep -i swap
/dev/zvol/dsk/rpool/swap        -       -       swap    -       no      -
swap    -       /tmp    tmpfs   -       yes     -
[root@spahwfm1gz:/zoneroot/spahwfm1/lu] #

I've been messing about with this here on my own system and trying to work this out as we go along!

It appears that swapfiles created (eg, using 'mkfile') are not supported on ZFS filesystem. That I guess means that this swap area is a disk pool. Can you list what pools you have and see if you can find it. Just for good measure, try listing the pools when within the local zone too (although that shouldn't work).

OK here is the "zfs list" before and after an ludelete of the new BE (not that it should make a difference except simplify the zfs list output)

[root@spahwfm1gz:/] # zfs list
NAME                                     USED  AVAIL  REFER  MOUNTPOINT
rpool                                    173G   375G   106K  /rpool
rpool/ROOT                              25.4G   375G    31K  legacy
rpool/ROOT/April2017CPU                 25.4G   375G  22.7G  /
rpool/ROOT/April2017CPU@Oct2017CPU      2.32M      -  22.7G  -
rpool/ROOT/April2017CPU/var             2.62G   375G  2.59G  /var
rpool/ROOT/April2017CPU/var@Oct2017CPU  30.5M      -  2.59G  -
rpool/ROOT/Oct2017CPU                   6.10M   375G  22.7G  /
rpool/ROOT/Oct2017CPU/var               5.84M   375G  2.59G  /var
rpool/dump                              4.51G   375G  4.50G  -
rpool/export                              68K   375G    32K  /export
rpool/export/home                         36K   375G    36K  /export/home
rpool/swap                               132G   379G   128G  -
rpool/zoneroot                          10.9G   375G    32K  /zoneroot
rpool/zoneroot/spahwfm1                 10.9G   375G  10.8G  /zoneroot/spahwfm1
rpool/zoneroot/spahwfm1@Oct2017CPU      24.3M      -  10.8G  -
rpool/zoneroot/spahwfm1-Oct2017CPU       213K   375G  10.8G  /zoneroot/spahwfm1-Oct2017CPU

[root@spahwfm1gz:/] # ludelete Oct2017CPU
WARNING: Deleting ZFS dataset <rpool/ROOT/Oct2017CPU>.
WARNING: Deleting ZFS dataset <rpool/ROOT/Oct2017CPU/var>.
WARNING: Deleting ZFS dataset <rpool/zoneroot/spahwfm1-Oct2017CPU>.
Updating boot environment configuration database.
Updating boot environment description database on all BEs.
Updating all boot environment configuration databases.

[root@spahwfm1gz:/] # zfs list
NAME                          USED  AVAIL  REFER  MOUNTPOINT
rpool                         173G   375G   106K  /rpool
rpool/ROOT                   25.3G   375G    31K  legacy
rpool/ROOT/April2017CPU      25.3G   375G  22.7G  /
rpool/ROOT/April2017CPU/var  2.59G   375G  2.59G  /var
rpool/dump                   4.51G   375G  4.50G  -
rpool/export                   68K   375G    32K  /export
rpool/export/home              36K   375G    36K  /export/home
rpool/swap                    132G   379G   128G  -
rpool/zoneroot               10.8G   375G    32K  /zoneroot
rpool/zoneroot/spahwfm1      10.8G   375G  10.8G  /zoneroot/spahwfm1

And in the zone

[root@spahwfm1gz:/] # zlogin spahwfm1
[Connected to zone 'spahwfm1' pts/1]
spahwfm1 # zfs list
no datasets available

Here is mount -v for swap from global

[root@spahwfm1gz:/] # mount -v | grep -i swap
swap on /etc/svc/volatile type tmpfs read/write/setuid/devices/rstchown/xattr/dev=5cc0001 on Tue Jul 25 05:40:41 2017
swap on /tmp type tmpfs read/write/setuid/devices/rstchown/xattr/dev=5cc0002 on Tue Jul 25 05:41:11 2017
swap on /var/run type tmpfs read/write/setuid/devices/rstchown/xattr/dev=5cc0003 on Tue Jul 25 05:41:11 2017
swap on /zoneroot/spahwfm1/lu type tmpfs read/write/setuid/devices/rstchown/xattr/dev=5cc000a on Tue Jul 25 05:42:07 2017
swap on /zoneroot/spahwfm1/root/etc/svc/volatile type tmpfs read/write/setuid/nodevices/rstchown/xattr/zone=spahwfm1/dev=5cc000b on Tue Jul 25 06:46:53 2017
swap on /zoneroot/spahwfm1/root/tmp type tmpfs read/write/setuid/nodevices/rstchown/xattr/zone=spahwfm1/dev=5cc000c on Tue Jul 25 06:46:56 2017
swap on /zoneroot/spahwfm1/root/var/run type tmpfs read/write/setuid/nodevices/rstchown/xattr/zone=spahwfm1/dev=5cc000d on Tue Jul 25 06:46:56 2017

I'm puzzled by this one.

Yes, I'm puzzled too.

Does anything happen if you:

# swap -d <each of the above>

from Global zone. Clutching at straws now!

Nope

[root@spahwfm1gz:/] # swap -d /zoneroot/spahwfm1/root/var/run
/zoneroot/spahwfm1/root/var/run: Is a directory
[root@spahwfm1gz:/] # swap -d /zoneroot/spahwfm1/root/tmp
/zoneroot/spahwfm1/root/tmp: Is a directory
[root@spahwfm1gz:/] # swap -d /zoneroot/spahwfm1/root/etc/svc/volatile
/zoneroot/spahwfm1/root/etc/svc/volatile: Is a directory
[root@spahwfm1gz:/] # swap -d /zoneroot/spahwfm1/lu
/zoneroot/spahwfm1/lu: Is a directory

It seems like each tmpfs is identified by a unique dev=

For that lu mount it is dev=5cc000a. I'm wondering if there is a way to use that to umount the device?

I doubt you can use that to umount the device.

We know that during Live Update the BE and the new BE share the same swap so, in a crash situation, it could be either BE holding the swap open and preventing deletion. It seems that you have to 'back out' the crashed procedure but that's not simple.

I'm trying to get a handle on that.

This is where I'm at with my reading right now:
Getting Rid of Pesky Live Upgrade Boot Environments | Oracle Solaris Tips and Tricks Blog

Some of the directories, eg, /var/run and /tmp, look familiar.

I don't think there is any going back unfortunately. When the problem occurred we had to rename ZFS filesystems along with change their mountpoint and canmount options. Finally we had to promote the zoneroot file system. So, we ended up removing /etc/lutab, /etc/lu/ICF., /etc/lu/INODE., and /etc/lu/vtoc.*

Basically the only BE (until I recently created one for the Oct patches) on the system is the currently running one.

I've inventoried the servers today, out of 7 that had the Live Upgrade "problem", only 4 have this problem. I've asked the customer to reboot the servers which will remove that mount point and allow me to proceed.

I just feel like there must be a way without rebooting to accomplish this...

I just wanted to post a follow up. The customer rebooted the 4 servers today and the zone 'lu' swap is now gone. I can now apply the patch set. I still feel like there must have been a way to accomplish this without a reboot though. Thanks!