How to see actual filename that File Descriptor is pointing to for a given processID?

kchinnam · December 7, 2010, 9:54am

Using lsof command I was able to query all Regular Files opened for write.
A specific processID related output shows some FileDescriptors(FD 45 in second output 45w) pointing to real file. But most of the FDs(1, 2, 5 in first output and 31,29,10 in second output) are pointing to root of the disk itself.

I want to know what it means if FDs are not pointing to real files.?
We are using Solaris8, is there a way that I can see the real files that all these FDs(1,2,5,31,29,10) are pointing to?
We don't have 'DTrace' or 'procfiles' commands available on Solaris8.

 
$> lsof -T | awk '$4 ~ /[1-9]*w$/ && $5 ~ /REG/ && /2492/' | more
COMMAND     PID     USER   FD   TYPE        DEVICE   SIZE/OFF      NODE NAME
java       2492  usr1    1w  VREG          85,3    7948151   6190579 /export (/dev/md/dsk/d3)
java       2492  usr1    2w  VREG          85,3    7948151   6190579 /export (/dev/md/dsk/d3)
java       2492  usr1    5w  VREG          85,3    2962640   3441379 /export (/dev/md/dsk/d3)

 
$> lsof -T -u usr1 | awk '$4 ~ /.*[wu]/ && $5 ~ /REG/'
COMMAND     PID     USER   FD   TYPE        DEVICE   SIZE/OFF      NODE NAME
stcms.exe 26244 usr1   31w  VREG         314,6    33555456   9127033 /export/apps/PATH1 (g1:/vol/vol4/share1)
stcms.exe 26244 usr1   29w  VREG         314,6    33555456   9127028 /export/apps/PATH1 (g1:/vol/vol4/share1)
stcms.exe 26244 usr1   10w  VREG         314,6    33555456   9127021 /export/apps/PATH1 (g1:/vol/vol4/share1)
stcms.exe 26204 usr1   45w  VREG         314,6    33555456   4646191 /export/apps/PATH1/PATH2/file1.dbs

DGPickett · December 7, 2010, 11:06am

A dir is an inode a process can open to examine using dirent, just like a flat file except for the inode status, inability to shrink and internal structure of int inode# and null term char name[] pairs. I get a ton of hits from fuser $HOME. I does seem a bit sloppy to leave them open, but there it is.

jlliagre · December 7, 2010, 11:58am

The most common cause of lsof or similar being unable to resolve the real pathname a file descriptor is pointing to is that the file has been removed after being opened, but isn't closed yet.

kchinnam · December 7, 2010, 1:15pm

jlliagre I think you are right. Where can I find reference information that could explain what you said.
If its just waiting to be closed, even after few minutes same File Descriptors that are pointing to base disk location are still there. Its as if they are there for some purpose..

DGPickett · December 7, 2010, 1:22pm

Oh, you think it is the old */lost+found/* or .nsf* file for the one deleted but still open (an open file only needs an inode not a name)! I'd think lsof would find the real residual file, not pretend it was some directory. The man says sometimes you get the mount point not the path:

Man Page for lsof (Linux Section 8) - The UNIX and Linux Forums

       NODE      is the node number of a local file;
 
          or the inode number of an NFS file in the server host;
 
          or the Internet protocol type - e. g, ``TCP'';
 
          or ``STR'' for a stream;
 
          or ``CCITT'' for an HP-UX x.25 socket;
 
          or the IRQ or inode number of a Linux AX.25 socket device.
 
       NAME      is  the name of the mount point and file system on which the
          file resides;
 
          or the name of a file specified in the names    option    (after
          any symbolic links have been resolved);
 
          or the name of a character special or block special device;
 
          or  the  local  and  remote  Internet addresses of a network
          file; the local host name or IP  number  is  followed  by  a
          colon  (':'),  the  port,  ``->'',  and  the two-part remote
          address; IP addresses may be reported as numbers  or    names,
          depending  on  the +|-M, -n, and -P options; colon-separated
          IPv6    numbers  are  enclosed    in   square   brackets;   IPv4
          INADDR_ANY  and  IPv6 IN6_IS_ADDR_UNSPECIFIED addresses, and
          zero port numbers are represented by an  asterisk  ('*');  a
          UDP  destination  address  may  be followed by the amount of
          time elapsed since the last packet was sent to the  destina-
          tion;  TCP, UDP and UDPLITE remote addresses may be followed
          by  TCP/TPI  information  in    parentheses  -    state    (e.g.,
          ``(ESTABLISHED)'',  ``(Unbound)''),  queue sizes, and window
          sizes (not all dialects) - in a fashion similar to what net-
          stat(1)  reports;  see  the  -T  option  description    or the
          description of the TCP/TPI field in OUTPUT  FOR  OTHER  PRO-
          GRAMS  for more information on state, queue size, and window
          size;
 
          or the address or name of a  UNIX  domain  socket,  possibly
          including a stream clone device name, a file system object's
          path name, local and foreign kernel addresses,  socket  pair
          information, and a bound vnode address;
 
          or the local and remote mount point names of an NFS file;
 
          or ``STR'', followed by the stream name;
 
          or  a  stream  character device name, followed by ``->'' and
          the stream name or a list of stream module names,  separated
          by ``->'';
 
          or ``STR:'' followed by the SCO OpenServer stream device and
          module names, separated by ``->'';
 
          or system directory name, `` -- '', and as  many  components
          of the path name as lsof can find in the kernel's name cache
          for selected dialects (See the KERNEL NAME CACHE section for
          more information.);
 
          or ``PIPE->'', followed by a Solaris kernel pipe destination
          address;
 
          or ``COMMON:'', followed by  the  vnode  device  information
          structure's device name, for a Solaris common vnode;
 
          or  the  address family, followed by a slash (`/'), followed
          by fourteen comma-separated  bytes  of  a  non-Internet  raw
          socket address;
 
          or  the  HP-UX  x.25    local address, followed by the virtual
          connection number (if any), followed by the  remote  address
          (if any);
 
          or ``(dead)'' for disassociated Tru64 UNIX files - typically
          terminal files that have been  flagged  with    the  TIOCNOTTY
          ioctl and closed by daemons;
 
          or ``rd=<offset>'' and ``wr=<offset>'' for the values of the
          read and write offsets of a FIFO;
 
          or ``clone n:/dev/event'' for SCO OpenServer file clones  of
          the /dev/event device, where n is the minor device number of
          the file;
 
          or ``(socketpair: n)'' for a Solaris 2.6, 8, 9  or  10  UNIX
          domain  socket,  created by the socketpair(3N) network func-
          tion;
 
          or ``no PCB'' for socket files that do not have  a  protocol
          block  associated  with  them,  optionally  followed    by ``,
          CANTSENDMORE'' if sending on the socket has  been  disabled,
          or  ``,  CANTRCVMORE''  if  receiving on the socket has been
          disabled (e.g., by the shutdown(2) function);
 
          or the local and remote addresses of a Linux IPX socket file
          in  the  form <net>:[<node>:]<port>, followed in parentheses
          by the transmit and receive queue sizes, and the  connection
          state;
 
          or  ``dgram''  or ``stream'' for the type UnixWare 7.1.1 and
          above in-kernel UNIX domain sockets,    followed  by  a  colon
          (':')  and  the  local path name when available, followed by
          ``->'' and the remote path name or kernel socket address  in
          hexadecimal when available

kchinnam · December 7, 2010, 1:37pm

DGPickett,
I am aware of .nsf 'lost+found' situation. But what I am asking is not specifically about that. If you have Unix/Linx try the following command

 
$>lsof -u <app_user> -T 2>/dev/null | awk '$4 ~ /[1-9]*[wu]/ && $5 ~ /REG/'

If you see lot of rows in your output look something like below. Then my question is what are all these FileDescriptors [ ex: 11 in 11w] doing? Why are they not pointing to real files? Why are they hanging around forver, as if they are needed by something!

 
COMMAND     PID     USER   FD   TYPE        DEVICE   SIZE/OFF      NODE NAME
<commad>  4017 <app_user>   11u  VREG          85,3      64521  2590790 /export (/dev/md/dsk/d3)

jlliagre · December 7, 2010, 1:55pm

You misunderstand what I am referring to. A process can remove (unlink) a file while it is still open for reading and/or writing. The file contents stays on disk as long as the process use it. Only the process(es) having the file open before it was unlinked can still access its data. This can be for hours/days or indefinitely.

Yes, files are almost always used for some purpose
I don't recall if Solaris 8 already had that feature yet but you can try:

file /proc/pid#/fd/fd#

with pid# being the process id and fd# the file descriptor number. (eg
26204 and 45 for your sample output)

kchinnam · December 7, 2010, 2:36pm

I just tried 'file' command. It simply says if its 'data' or 'ascii text'.

I don't have access to 'procfiles' command, if someone has access to that comand I would like to see what they see for FDs that point to disk but not the actual file.

jlliagre · December 7, 2010, 2:50pm

So you have the process, the file descriptor and the file content. What are you missing ?

And what is this "procfiles" command you are referring to ? If it is the AIX one, /usr/proc/bin/pfiles is the equivalent on Solaris.

DGPickett · December 7, 2010, 3:41pm

Solaris also has "find /export -mount -ls" displaying inode on the front, so you can identify the file name from the inode #, if it is still linked to any dir. (slow, but . . . .).

And if you think it is being written, there is "truss -wall -p" to show you what it is writing.

kchinnam · December 7, 2010, 10:34pm

I made some progress. Answer for my question is this command.

 
$> find / -inum <NODE> -print 2>/dev/null 
/export/<full_path>/<filename>
/proc/<PID#>/fd/<FD#>
/proc/<PID#>/object/ufs.85.3.<INODE#>

Following command simply proves relation between FD and PID.

Now my problem is it took close to 3 minutes to finish that one find command.
I am planning to do this for all FD#s for all PID#s on the system.

Is there way to get that quicker!?
How can I only get actual filename and not other two entries from /proc/<PID#>/..?

jlliagre · December 8, 2010, 3:23am

So you mean lsof doesn't show the filename for a given inode number pair, but find found a real pathname for the same inode ?

DGPickett · December 8, 2010, 8:22am

I do not have lsof to play with this month, so I am working from memory, but I thought it used to show me paths. I can see the path info can be difficult/expensive to acquire in some cases, especially the deleted dir entry case where there may be none, so I can roll with a inode-device display for speed.

If this is your own process, you can truss it from the start and know what each fd is, and what it is up to.

jim_mcnamara · December 8, 2010, 8:51am

It is a security feature, for one thing. It prevents literally any other process (not part of the process tree if fd's are shared parent->child ) from being able to open the file, get the filename or to perform read or write or open on file contents.

kchinnam · December 8, 2010, 11:12am

I did some more testing. "find / -inum <INODE> -print" command is finding actual file names that are not showing up in 'lsof' output sometimes, still for many indoes its returning empty output [ 100% of time still shows a entry under /proc/<PID>/fd/<FD> ].

Whatever the intent and concept may be for this hiding,, I am getting headache.

jlliagre · December 8, 2010, 11:52am

There is no intent or concept. "lsof", at least on Solaris, is an unsupported third party hack. As far as I know, it directly accesses undocumented kernel structures from /dev/kmem.

What is the problem you are trying to solve by investigating your processes open files ?

kchinnam · December 8, 2010, 11:38pm

For a manual disaster recovery project[to a different data center], we are trying to understand the scope. \\
What if we take a full backup of disk used by running apps, while they are still running. \\
What are all the runtime files that might get affected. Just to have an idea, in case if stopping all apps \\
is not feasible due to downtime and other ill effects..

jlliagre · December 9, 2010, 3:26am

That seems quite risky depending on the application, especially if the backup of evolving files isn't strictly synchronous.

achenle · December 9, 2010, 6:35am

Quite the understatement.

If the plan is to take some sort of snapshot of the raw disk underlying an active and mounted file system, there's no guarantee at all the the snapshot would be consistent or even mountable.

If the plan is to use something like ZFS snapshots, that would at least produce a consistent file system on the remote system(s). Then there would "only" be problems at the application level.

DGPickett · December 10, 2010, 3:26pm

The files under /proc are imaginary, but the flat files have names that disclose their inode and device. If you cannot "find" a path name (there may be several links), then there are no links, which is to say it has been removed but is still open on one or more processes. For instance, tmpfile() does this so if your proc crashes, the file space is released. Then, your only clue to the original path is the inode contents or the fd # implying an order of opening.