File system is bad

Petrucci · October 30, 2007, 1:33pm

Hi all, we have a 280R with Solaris 10 11/06 and Oracle 10 installed.
For unknown reason the system become instable and after a reboot (init 6)
we got the message to run fsck manually. We did the check many time and now we have this result:

# fsck
/dev/md/dsk/d0 IS CURRENTLY MOUNTED READ/WRITE.
CONTINUE? y

** /dev/md/dsk/d0
** Currently Mounted on /
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3a - Check Connectivity
** Phase 3b - Verify Shadows/ACLs
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cylinder Groups
FILESYSTEM MAY STILL BE INCONSISTENT.
187189 files, 5282453 used, 7310544 free (272312 frags, 879779 blocks, 2.2% fragmentation)
***** FILE SYSTEM IS BAD *****

***** PLEASE RERUN FSCK *****
/dev/md/dsk/d5 IS CURRENTLY MOUNTED READ/WRITE.
CONTINUE? y

** /dev/md/dsk/d5
** Currently Mounted on /oracle
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3a - Check Connectivity
** Phase 3b - Verify Shadows/ACLs
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cylinder Groups
FILESYSTEM MAY STILL BE INCONSISTENT.
17664 files, 1923832 used, 8150704 free (47200 frags, 1012938 blocks, 0.5% fragmentation)
***** FILE SYSTEM IS BAD *****

***** PLEASE RERUN FSCK *****
/dev/dsk/c1t1d0s6 IS CURRENTLY MOUNTED READ/WRITE.
CONTINUE? y

** /dev/dsk/c1t1d0s6
** Currently Mounted on /mint
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3a - Check Connectivity
** Phase 3b - Verify Shadows/ACLs
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cylinder Groups
FILESYSTEM MAY STILL BE INCONSISTENT.
2 files, 9 used, 10074527 free (15 frags, 1259314 blocks, 0.0% fragmentation)
***** FILE SYSTEM IS BAD *****

The system apparently looks good, but obviously something is still wrong.
Any suggestion?
Thanks in advance
Petrucci

joerg · October 31, 2007, 9:27am

Hi,
are there any messages inside /var/adm/messages ?

And if you test the / file system please boot from CD or Network.
Then use the
fsck -Y -o f
This checks the fs regardless the super block clean flag.
Best regards
jeorg

Petrucci · October 31, 2007, 12:42pm

Hi
thanks for the reply.
Nothing special on messages, but after a reboot (no activity on this machine until the reboot) and fsck we got this message:

UNREF FILE I=17916 OWNER=oracle MODE=100600
SIZE=2 MTIME=Oct 29 10:48 2007
RECONNECT? yes

UNREF FILE I=35452 OWNER=oracle MODE=100640
SIZE=112 MTIME=Oct 29 11:17 2007
RECONNECT? yes

** Phase 5 - Check Cylinder Groups

CORRECT BAD CG SUMMARIES? yes

CORRECTED SUMMARY FOR CG 3
FILE BITMAP WRONG
FIX? yes

FRAG BITMAP WRONG (CORRECTED)
CORRECTED SUMMARY FOR CG 6
FILE BITMAP WRONG (CORRECTED)
FRAG BITMAP WRONG (CORRECTED)
FILESYSTEM MAY STILL BE INCONSISTENT.
17664 files, 1923832 used, 8150704 free (47200 frags, 1012938 blocks, 0.5% fragmentation)

***** FILE SYSTEM WAS MODIFIED *****
***** FILE SYSTEM IS BAD *****

***** PLEASE RERUN FSCK *****

I think the problem could be one hard disk.

joerg · November 2, 2007, 4:56pm

Hi,
You know that you have to run fsck so long till there is no error message.

Sorry but there are several peope they do this mistake.

And you are right, it can be a physical error!

Otherwise you do the fsck on a meta device so it can be a problem of disksuite.
In this situation i normally break the mirror (If it is a mirror) and do the fsck on every effected disk of d0. After the file system check I suggest to repair the mirror and boot again.

Best regrads
joerg

txmsegj · March 27, 2008, 11:42am

Hi,

We have installed 6 servers with Solaris 10 release 11/06 and after metattach completed mirror disks we got the exact same error descried in this topic on the 6 servers, we ran FSCK as suggested several times on the servers and always reports FILE SYSTEM IS BAD.

Do you know any news or solutions for this problem? I thinks is not a problem on the hardware, but perhaps Solaris patches missing or additional procedure steps needed for this version.

Regards
/Juan

sparcguy · March 27, 2008, 10:15pm

how you do and the way you do your metattach command can affect your data.

c0t1d0s3 d13 ---> data disk
c0t2d0s3 d23 ---> empty disk

metainit d30 -m d23
metattach d30 d13

If you try to sync an empty disk to a one that contains data you will end up with a empty mirrored filesystem, not sure whether it will cause your problem but you say all 6 servers having the same problem? maybe you like to recap and review what you did.