GPFS file system corrupt issue

Hi Administrators,

I have one issue related to GPFS filesystem. We have bad entries in this file system, which reflects error like

ls: 0653-341 The file <filename> does not exist.

when we give "ls -ltr" on this directory.

So we taken the FS filesystem offline and followed the below steps.

root@profrd06:/root]# umount /opt/minotaur/Data/Shared_CFDX
root@profrd06:/root]# mmfsck /dev/gpfslv
Checking "gpfslv"
Checking inodes
Checking inode map file
Checking directories and files

Error in directory inode 461241:  DirEntryBad
Directory entry "GSM_20080821022010_MSC140547280551273.EDR" is not an allocated inode.
Patching will delete the directory entry.
Remove directory entry? y
Directory entry "GSM_20080821022510_MSC140547280641273.EDR" is not an allocated inode.
Patching will delete the directory entry.
Remove directory entry? y
 
Checking log files
Checking extended attributes file
Checking allocation summary file
Checking policy file
Checking filesets metadata
Checking file reference counts

File inode 1191665 is not referenced by any directory.
Reattach inode to lost+found? y

File inode 1196881 is not referenced by any directory.
Reattach inode to lost+found? y

And it gave the result similar as above. However it has removed all the files who have bad entries.

But the corruption issues was because of the old version of IBM servers so their typical workaround is to do patching , and its 2 years old, so lot of patching is require and will be blind in production servers. So anyway cant go for it (not recommended).

Kindly let me know if any is familiar about any alternate on this, the details are as below :

<username@hostname>  $ uname -a
AIX <hostname> 3 5 00C0B4204C00

Kindly let me know if something else is required.

Hi,
Try:

mmstartup

now: ls and other commands.

Hi, mmstartup is just to reboot the file system , my issue is little different , please go through the previous post.

if fsck output contains only two types of errors as listed below, then probably all data blocks are intact and there is no data loss. but we'll need to have a look at the entire fsck output to be sure.

error type 1 : Directory entry ... is not an allocated inode
error type 2 : File inode 1191665 is not referenced by any directory

we need to check if these errors are related to each other. if they are, then all the inodes moved to lost+found are those whose directory entries were removed by fsck. in this case, you could check the contents of lost+found and if you could identify the entries in it, they could be restored.

in the meanwhile, i'm checking if the fsck expert person (about gpfs) could see this and tell us more.

edit : i checked with fsck expert who is working with gpfs and he confirmed that there should be no data loss (as stated above), and the detached entries should be available in lost+found which could be used for corrections.

You very lucky man, that have two weeks to repair filesystem.
Retrieve directory from archive.

I'd even suggest recreating the filesystem completely anew before. Whatever has gone wrong in this FS should not be allowed to go on any further.

Create the filesystem(s) anew on new LUNs, restore its contents from backup and remount. This should account for only a very short change, if you plan your activities carefully.

I hope this helps.

bakunin

mount the filesytem
test, have you any snapshots:

mmlssnapshots /dev/gpfslv -d

example:

Snapshots in file system fs1: [data and metadata in KB]
Directory  SnapId    Status     Created               Data  Metadata
snap1       1        Valid   Fri Oct 17 10:56:22 2003   0       512

create directory .problem and restore filesystem

mmsnapdir /dev/gpfslv -s .problem
mmrestorefs /dev/gpfslv snap1

you look for your missed directory and other in directory .problem
after them you can do:

rm -rf .problem

I hope this helps.