hi team,
I'm a new with Solaris system and i'm a french, so my english will not be very good but I'll try to explain my problem.
I have a Sun server SunFire X4170 with Solaris 10 as OS.
since last week I am not able to access on /vol1 anymore. And bellow are the warning messages which are displaying during the starting of the server:
WARNING: /pci@0,0/pci8086,340a@3/pci108e,286@0/disk@1,0 (sd2):
Error for Command: read Error Level: Fatal
Requested Block: 167762 Error Block: 167762
Vendor: Sun Serial Number:
Sense Key: Hardware Error
ASC: 0x0 (no additional sense info), ASCQ: 0x0, FRU: 0x0
/dev/rdsk/c0t1d0s0: CANNOT READ: DISK BLOCK 135632: I/O error
/dev/rdsk/c0t1d0s0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
THE FOLLOWING FILE SYSTEM(S) HAD AN UNEXPECTED INCONSISTENCY: /dev/rdsk/c0t1d0s0 (/vol1)
fsckall failed with exit code 1.
WARNING - Unable to repair one or more filesystems.
Run fsck manually (fsck filesystem...).
mount: Please run fsck and try again
svc:/system/filesystem/local:default: WARNING: /sbin/mountall -l failed: exit status 1
Reading ZFS config: done.
Dec 19 00:11:05 svc.startd[7]: svc:/system/filesystem/local:default: Method "/lib/svc/method/fs-local" failed with exit status 95.
Dec 19 00:11:05 svc.startd[7]: system/filesystem/local:default failed fatally: transitioned to maintenance (see 'svcs -xv' for details)
MYSERVER console login:
I'm blocked and don't know what can i do to fix this problem.
please can someone help me to resolve it ?
hi Bartus,
thank you for your reply. please find bellow all asked:
#echo | format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c0t0d0 <Sun -STK RAID INT -V1.0 cyl 36348 alt 2 hd 255 sec 63>
/pci@0,0/pci8086,340a@3/pci108e,286@0/disk@0,0
1. c0t1d0 <DEFAULT cyl 54627 alt 2 hd 255 sec 126>
/pci@0,0/pci8086,340a@3/pci108e,286@0/disk@1,0
Specify disk (enter its number): Specify disk (enter its number):
#
# cat /etc/vfstab
#device device mount FS fsck mount mount
#to mount to fsck point type pass at boot options
#
fd - /dev/fd fd - no -
/proc - /proc proc - no -
/dev/dsk/c0t0d0s0 - - swap - no -
/dev/dsk/c0t0d0s1 /dev/rdsk/c0t0d0s1 / ufs 1 no -
/dev/dsk/c0t0d0s3 /dev/rdsk/c0t0d0s3 /usr ufs 1 no -
/dev/dsk/c0t0d0s4 /dev/rdsk/c0t0d0s4 /var ufs 1 no -
/dev/dsk/c0t0d0s5 /dev/rdsk/c0t0d0s5 /opt ufs 2 yes -
/dev/dsk/c0t1d0s0 /dev/rdsk/c0t1d0s0 /vol1 ufs 2 yes -
/dev/dsk/c0t1d0s1 /dev/rdsk/c0t1d0s1 /vol2 ufs 2 yes -
/devices - /devices devfs - no -
sharefs - /etc/dfs/sharetab sharefs - no -
ctfs - /system/contract ctfs - no -
objfs - /system/object objfs - no -
swap - /tmp tmpfs - yes -
#
# mount
/ on /dev/dsk/c0t0d0s1 read/write/setuid/devices/intr/largefiles/logging/xattr/onerror=panic/dev=800041 on Thu Dec 19 00:01:27 2013
/devices on /devices read/write/setuid/devices/dev=47c0000 on Thu Dec 19 00:01:11 2013
/system/contract on ctfs read/write/setuid/devices/dev=4800001 on Thu Dec 19 00:01:11 2013
/proc on proc read/write/setuid/devices/dev=4840000 on Thu Dec 19 00:01:11 2013
/etc/mnttab on mnttab read/write/setuid/devices/dev=4880001 on Thu Dec 19 00:01:11 2013
/etc/svc/volatile on swap read/write/setuid/devices/xattr/dev=48c0001 on Thu Dec 19 00:01:11 2013
/system/object on objfs read/write/setuid/devices/dev=4900001 on Thu Dec 19 00:01:11 2013
/etc/dfs/sharetab on sharefs read/write/setuid/devices/dev=4940001 on Thu Dec 19 00:01:11 2013
/usr on /dev/dsk/c0t0d0s3 read/write/setuid/devices/intr/largefiles/logging/xattr/onerror=panic/dev=800043 on Thu Dec 19 00:01:27 2013
/lib/libc.so.1 on /usr/lib/libc/libc_hwcap1.so.1 read/write/setuid/devices/dev=800043 on Thu Dec 19 00:01:27 2013
/dev/fd on fd read/write/setuid/devices/dev=4ac0001 on Thu Dec 19 00:01:27 2013
/var on /dev/dsk/c0t0d0s4 read/write/setuid/devices/intr/largefiles/logging/xattr/onerror=panic/dev=800044 on Thu Dec 19 00:01:29 2013
/tmp on swap read/write/setuid/devices/xattr/dev=48c0002 on Thu Dec 19 00:01:29 2013
/var/run on swap read/write/setuid/devices/xattr/dev=48c0003 on Thu Dec 19 00:01:29 2013
/opt on /dev/dsk/c0t0d0s5 read/write/setuid/devices/intr/largefiles/logging/xattr/onerror=panic/dev=800045 on Thu Dec 19 00:11:05 2013
/vol2 on /dev/dsk/c0t1d0s1 read/write/setuid/devices/intr/largefiles/logging/xattr/onerror=panic/dev=800081 on Thu Dec 19 00:11:05 2013
#
# metastat -a
metastat: MYSERVER: there are no existing databases
#
The disk c0t1d0 (kernel driver name sd2) is broken:
it has got an unreadable sector 167762.
Replace the disk!
The new disk must get identical (or similar) partitions, new filesystems, and data for /vol1 and /vol2 restored from last data backup.
hi,
please is there not another possibility to fix that or to repair this sector ?
because I'm sorry to tell you that, but we have not done any backup for this volume.
and I don't know if the disks are mounted in Raid5, so that I can just replace another disk.
You have a simple disk, c0t1d0, but I don't think that this necessarily means you have a failed device. The filesystem /vol2 is on the same disk c0t1d0s1, and that has mounted okay.
Can you run fsck on the command line? Something like this, I think:-
In format, select the c0t1d0, and inquiry, to ensure it's a simple disk.
Then analyze it - non-destructive read test.
It will 'repair' bad sectors i.e. tell the controller to replace by spare sectors. The contents of the 'repaired' sectors is unknown; run an fsck (like Robin suggested) to ensure file system integrity at least.
My assumption is that this is a simple device with the following output that was provided.
AVAILABLE DISK SELECTIONS:
0. c0t0d0 <Sun -STK RAID INT -V1.0 cyl 36348 alt 2 hd 255 sec 63>
/pci@0,0/pci8086,340a@3/pci108e,286@0/disk@0,0
1. c0t1d0 <DEFAULT cyl 54627 alt 2 hd 255 sec 126>
/pci@0,0/pci8086,340a@3/pci108e,286@0/disk@1,0
The STK RAID device is c0t0d0 with the problem on c0t1d0.
This appears to either be a disk with some bad blocks or a corrupt filesystem. The whole disk is not broken (yet)
To cerco,
You say you have 8 physical disks. That's good to know, but how are they arranged? Are perhaps 7 in a RAID and one is not? From the cylinder numbers, it would almost suggest that you have 5 in a RAID at target zero and 3 as a simple LUN (no protection) at target one, but I can't be sure on the numbers.
I'm guessing that there must be a management tool for the array somewhere, hopefully not part of the server OS, else how would one boot first time to allocate the array? What does that tell you about the arrangement of the disks/LUNs?
What output do you get from the suggestion to analyse the LUN from MadeInGermany? It will take a while to run. It might just be that we have to use fsck and read an alternate superblock, but let's not go that way just yet. It's probably best to find out what we can first before taking action.
hi MadeInGermany,
sorry, but I don't understand very well what are you asking me to do. could you please just tell me what commands must I type to obtain what you need ?
Solaris is not my strong point
I think you need to start the format utility, without the echo | on the front. Just run format on the command line.
It will take you into an interactive disk management session. Select option 1 which should be for the disk in question, c0t1d0 and it should present you a menu of actions you can take. It's been too many years (when Solaris 2.6 was current) to recall what to pick next (it might even be analyze on that menu) but if it's not obvious, paste the menu into the thread to jog my memory. Make sure you pick the read-only / non-destructive test.
@ratte1 My mistake, you are correct, that's a single disk.
@cerco, you can run "iostat -En" to get some information about your disk. You can also try running "fsck -y /dev/rdsk/c0t1d0s0" see if it manages to fix your file system.
# format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c0t0d0 <Sun -STK RAID INT -V1.0 cyl 36348 alt 2 hd 255 sec 63>
/pci@0,0/pci8086,340a@3/pci108e,286@0/disk@0,0
1. c0t1d0 <DEFAULT cyl 54627 alt 2 hd 255 sec 126>
/pci@0,0/pci8086,340a@3/pci108e,286@0/disk@1,0
Specify disk (enter its number): 1
selecting c0t1d0
[disk formatted]
Warning: Current Disk has mounted partitions.
/dev/dsk/c0t1d0s0 is normally mounted on /vol1 according to /etc/vfstab. Please remove this entry to use this device.
/dev/dsk/c0t1d0s1 is currently mounted on /vol2. Please see umount(1M).
FORMAT MENU:
disk - select a disk
type - select (define) a disk type
partition - select (define) a partition table
current - describe the current disk
format - format and analyze the disk
fdisk - run the fdisk program
repair - repair a defective sector
label - write label to the disk
analyze - surface analysis
defect - defect list management
backup - search for backup labels
verify - read and display labels
save - save new disk/partition definitions
inquiry - show vendor, product and revision
volname - set 8-character volume name
!<cmd> - execute <cmd>, then return
quit
format>
Hi rbatte1,
that has been done, but the result is the same. after have rerun fsck more than 4 times, I'm getting the same message.
must I continue to rerun fsck again and again ???