Need help in scripting to do repair in format tool of Solaris

Hi,

I posted my problem in Solaris forum, but I think, I am expecting a script, which should solve my purpose. I am not a scripting person, so need help here.

I am on Solaris 10, in failsafe mode. Manually I run format command and it takes me to options, I pick "repair" and enter, it waits for my input at "Enter absolute block number of defect:", here I need to give block number, which is 39594646. It asks for confirmation, I say Y and enter. It will take some time to repair and came back on "repair" prompt. I need to repeat same for block 39594646 to 585912500 so manual repairing is not possible. I need to say n also, if block is not defective.

Can somebody help me with this kind of script ?

# format c0t0d0
selecting c0t0d0
[disk formatted]


FORMAT MENU:
        disk       - select a disk
        type       - select (define) a disk type
        partition  - select (define) a partition table
        current    - describe the current disk
        format     - format and analyze the disk
        repair     - repair a defective sector
        label      - write label to the disk
        analyze    - surface analysis
        defect     - defect list management
        backup     - search for backup labels
        verify     - read and display labels
        save       - save new disk/partition definitions
        inquiry    - show vendor, product and revision
        volname    - set 8-character volume name
        !<cmd>     - execute <cmd>, then return
        quit
format>
format> repair
Enter absolute block number of defect: 39594665
Ready to repair defect, continue? y
Repairing hard error on block 39594665 (3167/11/290)...ok.

format> repair
Enter absolute block number of defect: 39594666
This block doesn't appear to be bad.  Repair it anyway? n
format>

If I understand you correctly, you want to automate the input to the format tool.

You could try to use expect, which is made for this purpose. There should be a man-page for this, and you also find many examples for expect on the Net.

Note that expect is basically Tcl, which was extended with a special library, so you are then effectively doing Tcl-programming, not shell programming. Keep this in mind, when you get in trouble with the syntax of the commands: In this case, search the Net not only for "expect", but also for "Tcl".

I also recommend taking 15 minutes or so to get familiar with the Tcl language. It is simple, but not always obvious if you come from other languages.

From experiemce: if more than 10 sectors are bad then you better go for a new disk.

I am not knowledgeable on scripting, but will try to check, if I can figure out TCL.

I understand that disk is bad, though there are no errors in iostat. But we don't have backup for this server, so want to retain as much as data before replacing disk. I already got new disk from Oracle. Here is actual issue, which I mentioned in this post - http://www.unix.com/solaris/271114-can-i-run-repair-lot-blocks-single-command.html\#post302992204

Probably do two things initially:-

  • Get a backup of the data and the structure.
  • Run the analyze option and let it run to at least list out the errors. Make sure you choose the right sub-option so it is not destructive.

Make sure you replace the disk before you have a failure.

Robin

On your other (now closed) thread you say:

This would make me see red. This shouldn't happen. I'd be checking whether the disks are over-heating, are the fans running, etc.

AFAIK a disk read test will not correct anything, only verify readability. That's what read means.

The system will boot into safe mode but not into multi-user.
I'm assuming the filesystem is ufs (??) and there is an (often little documented) option on fsck which will check literally everything on a filesystem:

# fsck -n -o full <filesystem device node>

Note the -n will ensure no modifications are attempted but the -o full will check everything and list files with errors. Make sure you capture the output as it may be long and take many hours to run. You should then know which files are affected and which to restore.

I appreciate that my comments above do not give you a coherent strategy for a fix but you might find them helpful. Hopefully you will get other input from other members.

Run a new backup as soon as you can.

1 Like

You can feed format from stdin.
For example

echo "
0
partition
print
" | format

Now, if your disk is the first one (0), make a script "repair_sector"

#!/bin/sh
for arg
do
  echo "
0
repair
$arg
y
" | format
done

And you can run this with your absolute sectors as arguments.
Untested! of course.

1 Like

hicksd8 : fsck is not helping, I tried it with different options. It comes up clean.

rbatte1 : I can run analyze and read and I can see, what blocks are corrupted. But There are large number of blocks, which need to be fixed with repair option in format. So looking for that script.

MadeInGermany : These are two scripts or I should both one by one as it is ?

The first one is just a demonstration that it works.
It does nothing than print the partition table.
You can type it at (or copy it to) the command line.

Yes, it might do because it's not being told to check everything. A -o full should check every last byte is readable and report errors.

hicksd8 : Tried fsck, no luck

# fsck -o f /dev/rdsk/c0t0d0s0
** /dev/rdsk/c0t0d0s0
** Last Mounted on /a
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3a - Check Connectivity
** Phase 3b - Verify Shadows/ACLs
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cylinder Groups
91388 files, 4328341 used, 5989667 free (3747 frags, 748240 blocks, 0.0% fragmentation)
#

MadeInGermany : Am I mising something here ? It just came back on prompt

# cat /tmp/repair_sector
#!/bin/sh
for arg
do
  echo "
0
repair
$arg
y
" | format
done
# /tmp/repair_sector
#
# fsck -o full /dev/rdsk/c0t0d0s0

It is not accepting "full" syntax

# fsck -o full /dev/rdsk/c0t0d0s0
ufs usage: fsck [-F ufs] [-m] [-n] [-V] [-v] [-y] [-o p,b=#,w,f] [special ....]
# 

You must give the sectors as arguments.
Say you want to repair absolute sectors 39594665 and 39594666

/tmp/repair_sector 39594665 39594666

fsck is useless here. It does not repair defective sectors, it repairs logical errors in the filesystem.

Got it.
Can I give a range here ? Like repair from 39594646 to 585912500
In above example, it will repair only two blocks. I tried soemthing, didn't worked

# for i in {39594646..585912500}
> do
> echo $i
> /tmp/repair_sector $i
> done
{39594646..585912500}

format> Enter absolute block number of defect: `{39594646..585912500}' is not an integer.
Enter absolute block number of defect: `y' is not an integer.
Enter absolute block number of defect: Enter absolute block number of defect:
#

In a bash shell you can test the argument expansion with

echo {39594646..585912500}

Likewise you can do

/tmp/repair_sector {39594646..585912500}

I think, we missed something

# echo {39594646..585912500}
{39594646..585912500}
# /tmp/repair_sector {39594646..585912500}
Searching for disks...done


AVAILABLE DISK SELECTIONS:
       0. c0t0d0 <SUN300G cyl 46873 alt 2 hd 20 sec 625>
          /pci@0/pci@0/pci@2/scsi@0/sd@0,0
       1. c0t1d0 <SUN300G cyl 46873 alt 2 hd 20 sec 625>
          /pci@0/pci@0/pci@2/scsi@0/sd@1,0
Specify disk (enter its number): Specify disk (enter its number): selecting c0t0d0
[disk formatted]


FORMAT MENU:
        disk       - select a disk
        type       - select (define) a disk type
        partition  - select (define) a partition table
        current    - describe the current disk
        format     - format and analyze the disk
        repair     - repair a defective sector
        label      - write label to the disk
        analyze    - surface analysis
        defect     - defect list management
        backup     - search for backup labels
        verify     - read and display labels
        save       - save new disk/partition definitions
        inquiry    - show vendor, product and revision
        volname    - set 8-character volume name
        !<cmd>     - execute <cmd>, then return
        quit
format> Enter absolute block number of defect: `{39594646..585912500}' is not an integer.
Enter absolute block number of defect: `y' is not an integer.
Enter absolute block number of defect: Enter absolute block number of defect:
#
# echo $SHELL
/sbin/sh
# 
# echo {seq 39594646......585912500}
{seq 39594646......585912500}
# 

Not sure, but I am in failsafe mode, so may have limited features

What's your shell? As MadeInGermany said, the {...} construct is available in (recent) bash .
And, to use seq as you want it to, deploy "command substitution" $(...) .

EDIT: I see your shell is sh , so {...} won't work.

There is normally no seq on Solaris.
Try to enter a bash shell with

/bin/bash

or

/a/bin/bash

I mounted c0t1d0s0 on /a and included bash path in /tmp/repair_sector as #!/a/bin/bash
but no luck

Is there any other, I should do it in failsafe mode ?