Deadlocked App

Hello All -

We have a legacy C program running non stop on one of our servers, with several instances often running at once. Fairly regularly, one of the instances while stop outputting to the log file and will just deadlock/hang. They must be then 'kill'ed by myself.

When I gdb into one of the hung running processes, and enter the 'where' command I invariably get something like the following:

(gdb) where
#0 0x009f0402 in ?? ()
#1 0x00bdf1ce in __lll_mutex_lock_wait () from /lib/libc.so.6
#2 0x00b86abf in _L_mutex_lock_1965 () from /lib/libc.so.6
#3 0x00000000 in ?? ()

Does anyone recognise this? I'm sure it's indicative of a bug in the app but amn't sure how to track it down. Any suggestions would be very welcome.

Mark.

A mutex (mutual exclusion semaphore) is a gatekeeper for interprocess cooperation.
It is used to allow one and only one process at a time to have access to a resource or
memory or whatever.

When one process sets (owns) a mutex the other processes that want the protected resource cooperate by calling mutex wait until the mutex becomes free. Then they can get it. If the process that owns the mutex dies or does not play fair by not releasing the mutex, the other process stays in a wait state forever.

As well, It is possible for two processes to set mutexes that another other process needs, then wait to get the other process' held mutex without releasing it's own mutex, so neither process can go anywhere.

This is what you are seeing - forever waiting. Since more than one freezes
the bit of locking each other out is prolly what you are seeing.

It's a programming error.

Thanks for the quick response Jim.

I'm sure it is a programming error, and probably mine. But could you give me any indication how I might track it down?

The application certainly wasn't programmed to support interprocess communication, so could the mutex problem as you explained it be down to an external library (such as MySQL) or the code used to reference them (ie. the MySQL API)?

Thanks,

Mark.

I'm shaky on MySQL, but other db's provide table and sometimes record level locking.
Assuming it does, check if you are doing 'SELECT stuf from mytable for update;' which exclusively locks the records selected, for example. (It does in Oracle, which I do understand).

Databases also provide for exclusive access to a resource. An Oracle example:
'LOCK mytable in EXCLUSIVE MODE;' locks the entire table against any access by any other Oracle session.

If you can translate this concept to MySQL terms, that's very likely the place to start looking.