The immortal aioserver

When shutdown an oracle server i see this error message
at exit

umount /oracle/
umount: error unmounting /dev/oracle: Device busy

lsof and fuser report nothing
but

ps aux|grep oracle

report this

oracle    5964026  0,0  0,0  448  448      - A      apr 21  0:00 aioserver
oracle   10289224  0,0  0,0  448  448      - A    19:09:27  0:00 aioserver
oracle   11075692  0,0  0,0  448  448      - A    19:09:27  0:00 aioserver
oracle   11468902  0,0  0,0  448  448      - A    19:09:27  0:00 aioserver
oracle   13631648  0,0  0,0  448  448      - A    19:09:27  0:00 aioserver
oracle    3604680  0,0  0,0  448  448      - A    19:09:27  0:00 aioserver

I try to kill with kill -15 and also kill -9 but they still alive

Using pstree i see this

 |--= 1966178 root aioPpool
 |--= 2228302 root aioLpool
 |--= 3604680 root aioserver
 |--= 5964026 root aioserver
 |--= 10289224 root aioserver
 |--= 11075692 root aioserver
 |--= 11468902 root aioserver
 \--= 13631648 root aioserver

Question is,how to kill those process for umounting oracle?
Thanks

why do you think, that this processes use /oracle?

Is ASM in play here, and although you have shutdown your user database instances, is this still running perhaps? Whatever it is, there should be an equivalent stop for the start that was issued.

What do Oracle say? You are surely paying them for the software, so make them earn the money :wink:

Robin

If fuser does not show what is accessing the FS, maybe get lsof to see what keeps it's hands on it.

I suppose I should check that you are using the correct flags for fuser as without any, it will test the file or directory you name, but processes in subdirectories will not be seen.

Try fuser -c /oracle to see if that gives more information.

What OS and version are you using? The flags can vary between them.

Robin

The -c option might find what fuser alone does not find.

fuser /oracle
fuser -c /oracle

Of course lsof examines+finds the most.
--
The aioserver might be kernel threads, unkillable.

All processes belong to user oracle,homedir /oracle

---------- Post updated at 10:43 AM ---------- Previous update was at 10:42 AM ----------

Lsof report nothing

it is a wrong assumption. these are kernel processes - they are created and killed by the kernel.

can you show output of df and mount commands?

df output

/dev/hd4        1,3G  218M  1,1G  18% /
/dev/hd2        9,3G  6,5G  2,8G  70% /usr
/dev/hd9var     3,4G  417M  3,0G  13% /var
/dev/hd3        6,7G   23M  6,7G   1% /tmp
/dev/fwdump     128M  348K  128M   1% /var/adm/ras/platform
/dev/hd1        8,2G   27M  8,1G   1% /home
/dev/hd11admin  128M  380K  128M   1% /admin
/proc              -     -     0    - /proc
/dev/hd10opt    4,3G  2,0G  2,3G  46% /opt
/dev/livedump   256M  368K  256M   1% /var/adm/ras/livedump
/dev/hd12data    54G  8,3G   46G  16% /var/data
/dev/oracle      50G   14G   37G  27% /oracle
/dev/rpmrepo     50G  1,6G   49G   4% /var/rpmrepo

---------- Post updated at 08:41 AM ---------- Previous update was at 08:40 AM ----------

mount

 node       mounted        mounted over    vfs       date        options      
-------- ---------------  ---------------  ------ ------------ --------------- 
         /dev/hd4         /                jfs2   ott 21 15:30 rw,log=/dev/hd8 
         /dev/hd2         /usr             jfs2   ott 21 15:30 rw,log=/dev/hd8 
         /dev/hd9var      /var             jfs2   ott 21 15:30 rw,log=/dev/hd8 
         /dev/hd3         /tmp             jfs2   ott 21 15:30 rw,log=/dev/hd8 
         /dev/fwdump      /var/adm/ras/platform jfs2   ott 21 15:30 rw,log=/dev/hd8 
         /dev/hd1         /home            jfs2   ott 21 15:30 rw,log=/dev/hd8 
         /dev/hd11admin   /admin           jfs2   ott 21 15:30 rw,log=/dev/hd8 
         /proc            /proc            procfs ott 21 15:30 rw              
         /dev/hd10opt     /opt             jfs2   ott 21 15:30 rw,log=/dev/hd8 
         /dev/livedump    /var/adm/ras/livedump jfs2   ott 21 15:30 rw,log=/dev/hd8 
         /dev/hd12data    /var/data        jfs2   ott 21 15:30 rw,log=/dev/hd8 
         /dev/oracle      /oracle          jfs2   ott 21 15:30 rw,log=/dev/loglv00
         /dev/rpmrepo     /var/rpmrepo     jfs2   ott 21 15:30 rw,log=/dev/loglv00
         /etc/auto_nfs    /media/nfs       autofs ott 21 15:31 ignore 

as for me it seems ok. if you're sure, you can try umount -f /oracle .

I have seen some strange thing
One minute ago

oracle    8454150  0,0  0,0  448  448      - A    15:32:25  0:00 aioserver
oracle    9306142  0,0  0,0  448  448      - A    15:32:24  0:00 aioserver
oracle    8323106  0,0  0,0  448  448      - A    15:32:34  0:00 aioserver
oracle    7667764  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle    8126490  0,0  0,0  448  448      - A    15:32:24  0:00 aioserver
oracle    9764906  0,0  0,0  448  448      - A    15:32:34  0:00 aioserver
oracle   10223680  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   10158154  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   10944596  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   10879078  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   10092596  0,0  0,0  448  448      - A    15:32:34  0:00 aioserver
oracle    9895982  0,0  0,0  448  448      - A    15:32:34  0:00 aioserver
oracle    9830444  0,0  0,0  448  448      - A    15:32:34  0:00 aioserver
oracle   10027058  0,0  0,0  448  448      - A    15:32:34  0:00 aioserver
oracle    9961520  0,0  0,0  448  448      - A    15:32:34  0:00 aioserver
oracle    7602428  0,0  0,0  448  448      - A    15:32:25  0:00 aioserver
oracle   13041806  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   12976268  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   13107344  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   13238420  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   13172882  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   12714116  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   12648578  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   12779654  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   12910730  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   12845192  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   13697186  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   13631648  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   13762724  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   13893800  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   13828262  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   13369496  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   13303958  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   13435034  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   13566110  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   13500572  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   12583040  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   11337826  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   11468900  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   11599976  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   11534438  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   11075670  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   11010136  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   11141208  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   11272284  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   11206746  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   12517502  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   12452020  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   11731056  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   11665514  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   11796610  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
root      2359382  0,0  0,0  448  448      - A    15:30:44  0:00 aioLpool
root      2424936  0,0  0,0  448  448      - A    15:30:45  0:00 aioPpool
oracle    4784350  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle    5439488  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver

Now

oracle    4784350  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   11206746  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle    9895982  0,0  0,0  448  448      - A    15:32:34  0:00 aioserver
oracle   12517502  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   12910730  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
oracle   12976268  0,0  0,0  448  448      - A    15:37:29  0:00 aioserver
root      2424936  0,0  0,0  448  448      - A    15:30:45  0:00 aioPpool
root      2359382  0,0  0,0  448  448      - A    15:30:44  0:00 aioLpool

The immortal process are 8,they never die

Please issue lsof +D /oracle and show the output in code tags, thanks.

And do not try to kill those aioservers. Once started they will stay active, even if not needed. They have a starting amount and a max amount. If the system needs more of them than the starting amount, it spawns new ones until max is reached but, as said, they stay in the process list and do not terminate. If not needed, they just idle and do not really eat performance.

if you want to see them dead, try to change some parameters in the system :wink:

$ ioo -L | grep posix_aio
posix_aio_active          0             0                    boolean           S
posix_aio_maxreqs         128K   128K   128K   4K     1M     numeric           D
posix_aio_maxservers      30     30     30     1      20000  numeric           D
posix_aio_minservers      3      3      3      0      20000  numeric           D
posix_aio_server_inactivity

Already play with ioo,i set for test 1 to posix_aio_server_inactivity
and
aio_server_inactivity
But nothing change

---------- Post updated at 09:24 AM ---------- Previous update was at 09:23 AM ----------

lsof +D report nothing

While this is very true - in fact, the "aio" stands for "Asynchronous I/O" and the processes are controlled by tuning parameters - it is most probably a bad idea to do so on a database system. If memory serves correctly Oracle always requested to have asynchronous I/o switched on during the installation and the performance of the db-writer process greatly suffered when it was switched off.

Anyways, the "aioserver" processes are definitely not reponsible for preventing the unmount of the filesystems, so it won't have any positive effect even if it succeeds (although this, given that they are kernel processes is highly unlikely).

The number, btw., of the main processes is dependent on the number of LCPUs the system has. I suppose your system has 8 CPUs configured and this is why you always see a minimum of 8 processes running.

I hope this helps understanding these processes.

bakunin

if you want to switch aio off, you have to remove some files such as /usr/lib/drivers/aio.ext, set some restricted tunables such as aio_fastpath, and then reboot the server. But I think, your DBA will be very unhappy about it, and you have to obtain IBM permission to change a restricted tunable.

There are things, other than processes (I am thinking kernel extension, e.g.) that can keep something open.

a) assuming you have an Oracle support license - ask them what their experience is for something like this. Perhaps there is an Oracle kernel extension that neither fuser nor lsof can see and report back to you.

b) same for AIX - open a PMR. Support will probably want a snap , and maybe even a perfpmr to get an impression of details that would be hard to show/discuss in a forum

c) the command ps is probably a poor way to diagnosis this. I suspect all you are shown is the process list in user space - and not the status of the threads that 'work' in kernel space . Note the ps man page lists different possible values for a process and kernel thread

          S
            (-l and l flags) The state of the process or kernel thread :

         For processes:
              O
                   Nonexistent
              A
                   Active
              W
                   Swapped
              I
                   Idle (waiting for startup)
              Z
                   Canceled
              T
                   Stopped
            For kernel threads:
              O
                   Nonexistent
              R
                   Running
              S
                   Sleeping
              W
                   Swapped
              Z
                   Canceled
              T
                   Stopped

As you have shown us the 'A' value, this implies it is a process - that could do something - but you have not shown us the (kernel) threads.
Note also, that when this column is blank (empty) the process/thread is actually 'running'. In other words, 'Active' != 'Running'

re: aio servers - the processes you see are there - just as minServers controls (i.e., starts) several httpd processes - so that they are already 'active' aka loaded, and ready to be used. Just because an aioserver process exists does not mean it is doing anything - most of the time they are "waiting for work".

The Oracle logs - I would hope - could report whether any i/o (scheduled/requested via aio_read() or aio_write()) is pending. And so, in closing - why AIX is not permitting you to unmount a filesystem previous used (exclusively) by an application probably lies in the application domain.
AIX, as any solid OS would, has built-in safety (one or more) to maintain system integrity. In short, don't blame the messenger.

Note: IF AIX is at fault - forbidding an unmount when everything has been closed cleanly - please please tell us - and make us wiser, - i.e., less gullible/naive, people!

  • Ideally, you can replicate this 'locking' situation on a test server. By using /usr/bin/truss with your unmount command (on either test or production) - you may learn at least which system call is reporting an unsafe (not OK) status, and from that you may also be able to establish what error status (see /usr/include/error.h for possible values and meanings).