Missing ASM Disks in Solaris 11.3 LDOM

Hi Guys,

Just a quick question hopefully someone will have seen this before and will be able to enlighten me.

I have been doing some Infrastructure Verification Testing and one of the tests was booting the primary domain from alternate disks, this all went well - however on restarting one of the LDOM's which has 10 Oracle ASM disks assigned to it I get the following;

{0} ok boot -s
Boot device: /virtual-devices@100/channel-devices@200/disk@0  File and args: -s
SunOS Release 5.11 Version 11.3 64-bit
Copyright (c) 1983, 2018, Oracle and/or its affiliates. All rights reserved.
WARNING: forceload of drv/oracleafd failed
Booting to milestone "milestone/single-user:default".
NOTICE: AFDK-00001: Module load succeeded. Build information:  (LOW DEBUG) - USM_12.2.0.1.0ACFSRU_SOLARIS.SPARC64_171115 built on 2017/12/15 13:50:23.
NOTICE:

NOTICE: vdisk@2 disk access failed
NOTICE: vdisk@3 disk access failed
NOTICE: vdisk@4 disk access failed
NOTICE: vdisk@5 disk access failed
NOTICE: vdisk@6 disk access failed
NOTICE: vdisk@7 disk access failed
NOTICE: vdisk@8 disk access failed
NOTICE: vdisk@9 disk access failed
NOTICE: vdisk@10 disk access failed
NOTICE: vdisk@11 disk access failed
Hostname: fdbkirnhht01
Requesting System Maintenance Mode
SINGLE USER MODE

Enter user name for system maintenance (control-d to bypass): root
Enter root password (control-d to bypass):
single-user privilege assigned to root on /dev/console.
Entering System Maintenance Mode

Oct 22 13:35:54 su: 'su root' succeeded for root on /dev/console

Attempting to run echo | format or just plain old format just comes back with searching for disks message and hangs.

I've asked the DBA's to start the Oracle Stack after bringing the system to run level 3 and they are unable see the disks either. This has got to be something fundamental, but at the moment it is beyond me and although I've had hits on the error - Goggle hasn't been much help.

This is a Test system so I can afford some time to look for issues with the system, I do have a full support package with Oracle so I could raise a support call - but I'd rather ask here in case any one is familiar with this issue.

Regards

Gull04

Check the output of following commands :

ldm list -l <ldom in question>
ldm list-services # omit output for working ldoms...
ldm list-spconfig 

This will tell us if those devices are present in VDS and in LDOM and what is the current configuration saved to SP.
I suspect ldom in question has been modified (added disks) without a save on SP.

If you have former saved configuration in xml somewhere (you really should save it to files at time intervals), run

ldm list-contraints -x <ldom in question> > somefile.xml

And diff vs past working one.

Regards
Peasant.

Hi Peasant,

I did think that this would be a configuration issue, however I did a kind of belt and braces thing with the LDOM's - however here is the output.

root@fvskirsun01:~# ldm ls -l fdbkirnhht01
NAME             STATE      FLAGS   CONS    VCPU  MEMORY   UTIL  NORM  UPTIME
fdbkirnhht01     active     -t----  5000    48    96G      2.1%  2.1%  6m

SOFTSTATE
OpenBoot Running

UUID
    30de72c2-fc4c-42bb-9c50-907e07a78647

MAC
    00:14:4f:fa:80:23

HOSTID
    0x84fa9c6d

CONTROL
    failure-policy=ignore
    extended-mapin-space=on
    cpu-arch=native
    rc-add-policy=
    shutdown-group=15
    perf-counters=htstrand
    boot-policy=warning
    effective-max-pagesize=2GB
    hardware-max-pagesize=2GB

DEPENDENCY
    master=

CORE
    CID    CPUSET
    8      (64, 65, 66, 67, 68, 69, 70, 71)
    9      (72, 73, 74, 75, 76, 77, 78, 79)
    10     (80, 81, 82, 83, 84, 85, 86, 87)
    11     (88, 89, 90, 91, 92, 93, 94, 95)
    12     (96, 97, 98, 99, 100, 101, 102, 103)
    13     (104, 105, 106, 107, 108, 109, 110, 111)

VCPU
    VID    PID    CID    UTIL NORM STRAND
    0      64     8      100% 100%   100%
    1      65     8      0.0% 0.0%   100%
    2      66     8      0.0% 0.0%   100%
    3      67     8      0.0% 0.0%   100%
    4      68     8      0.0% 0.0%   100%
    5      69     8      0.0% 0.0%   100%
    6      70     8      0.0% 0.0%   100%
    7      71     8      0.0% 0.0%   100%
    8      72     9      0.0% 0.0%   100%
    9      73     9      0.0% 0.0%   100%
    10     74     9      0.0% 0.0%   100%
    11     75     9      0.0% 0.0%   100%
    12     76     9      0.0% 0.0%   100%
    13     77     9      0.0% 0.0%   100%
    14     78     9      0.0% 0.0%   100%
    15     79     9      0.0% 0.0%   100%
    16     80     10     0.0% 0.0%   100%
    17     81     10     0.0% 0.0%   100%
    18     82     10     0.0% 0.0%   100%
    19     83     10     0.0% 0.0%   100%
    20     84     10     0.0% 0.0%   100%
    21     85     10     0.0% 0.0%   100%
    22     86     10     0.0% 0.0%   100%
    23     87     10     0.0% 0.0%   100%
    24     88     11     0.0% 0.0%   100%
    25     89     11     0.0% 0.0%   100%
    26     90     11     0.0% 0.0%   100%
    27     91     11     0.0% 0.0%   100%
    28     92     11     0.0% 0.0%   100%
    29     93     11     0.0% 0.0%   100%
    30     94     11     0.0% 0.0%   100%
    31     95     11     0.0% 0.0%   100%
    32     96     12     0.0% 0.0%   100%
    33     97     12     0.0% 0.0%   100%
    34     98     12     0.0% 0.0%   100%
    35     99     12     0.0% 0.0%   100%
    36     100    12     0.0% 0.0%   100%
    37     101    12     0.0% 0.0%   100%
    38     102    12     0.0% 0.0%   100%
    39     103    12     0.0% 0.0%   100%
    40     104    13     0.0% 0.0%   100%
    41     105    13     0.0% 0.0%   100%
    42     106    13     0.0% 0.0%   100%
    43     107    13     0.0% 0.0%   100%
    44     108    13     0.0% 0.0%   100%
    45     109    13     0.0% 0.0%   100%
    46     110    13     0.0% 0.0%   100%
    47     111    13     0.0% 0.0%   100%

MEMORY
    RA               PA               SIZE
    0x10000000       0x210000000      7936M
    0x400000000      0x600000000      8G
    0x800000000      0xa00000000      8G
    0xc00000000      0xe00000000      8G
    0x1000000000     0x1200000000     8G
    0x1400000000     0x1600000000     8G
    0x1800000000     0x1a00000000     8G
    0x1c00000000     0x1e00000000     8G
    0x2000000000     0x2200000000     8G
    0x2400000000     0x2600000000     8G
    0x2800000000     0x2a00000000     8G
    0x2c00000000     0x2e00000000     8G
    0x3000000000     0x3200000000     256M

CONSTRAINT

VARIABLES
    auto-boot?=false

NETWORK
    NAME         SERVICE                MACADDRESS          PVID|PVLAN|VIDs
    ----         -------                ----------          ---------------
    vnet1        primary-vsw0@primary   00:14:4f:f9:ec:3c   1|--|--

    NAME         SERVICE                MACADDRESS          PVID|PVLAN|VIDs
    ----         -------                ----------          ---------------
    vnet0        primary-vsw0@primary   00:14:4f:f8:29:a2   1|--|--

DISK
    NAME         VOLUME                 TOUT ID   DEVICE  SERVER         MPGROUP
    volume1      volume1@fdbkirnhht01-vds0      0    disk@0  primary
    db_vol2      nhht_db_vol2@fdbkirnhht01-vds0      1    disk@1  primary
    db_vol3      nhht_db_vol3@fdbkirnhht01-vds0      12   disk@12 primary
    ba_vol2      nhht_ba_vol2@fdbkirnhht01-vds0      13   disk@13 primary
    asm_01       nhht_asm_01@fdbkirnhht01-vds0      2    disk@2  primary
    asm_02       nhht_asm_02@fdbkirnhht01-vds0      3    disk@3  primary
    asm_03       nhht_asm_03@fdbkirnhht01-vds0      4    disk@4  primary
    asm_04       nhht_asm_04@fdbkirnhht01-vds0      5    disk@5  primary
    asm_05       nhht_asm_05@fdbkirnhht01-vds0      6    disk@6  primary
    asm_06       nhht_asm_06@fdbkirnhht01-vds0      7    disk@7  primary
    asm_07       nhht_asm_07@fdbkirnhht01-vds0      8    disk@8  primary
    asm_08       nhht_asm_08@fdbkirnhht01-vds0      9    disk@9  primary
    asm_09       nhht_asm_09@fdbkirnhht01-vds0      10   disk@10 primary
    asm_10       nhht_asm_10@fdbkirnhht01-vds0      11   disk@11 primary

VCONS
    NAME         SERVICE                PORT   LOGGING
    fdbkirnhht01 primary-vcc0@primary   5000   on

root@fvskirsun01:~# ldm ls-services
VCC
    NAME         LDOM         PORT-RANGE
    primary-vcc0 primary      5000-5100

VSW
    NAME         LDOM         MACADDRESS          NET-DEV   DVID|PVID|VIDs
    ----         ----         ----------          -------   --------------
    primary-vsw0 primary      00:14:4f:fa:51:60   aggr0     1|1|--

    primary-vsw1 primary      00:14:4f:fa:0f:6e   aggr1     1|1|--

VDS
    NAME         LDOM         VOLUME         OPTIONS          MPGROUP        DEVICE
    fdbkirnhht01-vds0 primary      volume1                                        /dev/dsk/c0t60050768018086B6E800000000000457d0s0
                              nhht_db_vol2                                   /dev/rdsk/c0t60050768018086B6E800000000000459d0s0
                              nhht_db_vol3                                   /dev/rdsk/c0t60050768018086B6E800000000000464d0s0
                              nhht_ba_vol2                                   /dev/rdsk/c0t60050768018086B6E800000000000465d0s0
                              nhht_asm_01                                    /dev/rdsk/c0t60050768018086B6E80000000000045Ad0s0
                              nhht_asm_02                                    /dev/rdsk/c0t60050768018086B6E80000000000045Bd0s0
                              nhht_asm_03                                    /dev/rdsk/c0t60050768018086B6E80000000000045Cd0s0
                              nhht_asm_04                                    /dev/rdsk/c0t60050768018086B6E80000000000045Dd0s0
                              nhht_asm_05                                    /dev/rdsk/c0t60050768018086B6E80000000000045Ed0s0
                              nhht_asm_06                                    /dev/rdsk/c0t60050768018086B6E80000000000045Fd0s0
                              nhht_asm_07                                    /dev/rdsk/c0t60050768018086B6E800000000000460d0s0
                              nhht_asm_08                                    /dev/rdsk/c0t60050768018086B6E800000000000462d0s0
                              nhht_asm_09                                    /dev/rdsk/c0t60050768018086B6E800000000000461d0s0
                              nhht_asm_10                                    /dev/rdsk/c0t60050768018086B6E800000000000463d0s0
    primary-vds0 primary      nhh_db_vol1                                    /dev/rdsk/c0t60050768018086B6E8000000000004A0d0s0
                              nhh_db_vol2                                    /dev/rdsk/c0t60050768018086B6E8000000000004A1d0s0
                              nhh_asm_01                                     /dev/rdsk/c0t60050768018086B6E8000000000004A2d0s0
                              nhh_asm_02                                     /dev/rdsk/c0t60050768018086B6E8000000000004A3d0s0
                              nhh_asm_03                                     /dev/rdsk/c0t60050768018086B6E8000000000004A4d0s0
                              nhh_asm_04                                     /dev/rdsk/c0t60050768018086B6E8000000000004A5d0s0
                              nhh_asm_05                                     /dev/rdsk/c0t60050768018086B6E8000000000004A6d0s0
                              nhh_asm_06                                     /dev/rdsk/c0t60050768018086B6E8000000000004A7d0s0
                              nhh_asm_07                                     /dev/rdsk/c0t60050768018086B6E8000000000004A8d0s0
                              nhh_asm_08                                     /dev/rdsk/c0t60050768018086B6E8000000000004A9d0s0
                              nhh_asm_09                                     /dev/rdsk/c0t60050768018086B6E8000000000004AAd0s0
                              nhh_asm_10                                     /dev/rdsk/c0t60050768018086B6E8000000000004ABd0s0
                              nhh_db_vol3                                    /dev/rdsk/c0t60050768018086B6E8000000000004ADd0s0
                              nhh_ba_vol2                                    /dev/rdsk/c0t60050768018086B6E8000000000004ACd0s0
    recovery     primary      boot                                           /export/home/e415243/sol_11.iso
                              disk0                                          /dev/rdsk/c0t5000CCA022484178d0s1
                              disk1                                          /dev/rdsk/c0t5000CCA022484178d0s0
    faskirkweb03-vds0 primary      volume1                                        /dev/rdsk/c0t5000CCA022482DF4d0s0
                              boot                                           /usr/spbin/isos/sol-11_3.iso

root@fvskirsun01:~# ldm ls-spconfig
factory-default
initial
27072018
10102018
19102018 [next poweron]
root@fvskirsun01:~#

There are a couple of other things that don't quite seem to add up here, for example - from the {}ok prompt - I can see this;

{0} ok show-disks
a) /reboot-memory@0
b) /virtual-devices@100/channel-devices@200/disk@b
c) /virtual-devices@100/channel-devices@200/disk@a
d) /virtual-devices@100/channel-devices@200/disk@9
e) /virtual-devices@100/channel-devices@200/disk@8
f) /virtual-devices@100/channel-devices@200/disk@7
g) /virtual-devices@100/channel-devices@200/disk@6
h) /virtual-devices@100/channel-devices@200/disk@5
i) /virtual-devices@100/channel-devices@200/disk@4
j) /virtual-devices@100/channel-devices@200/disk@3
m) MORE SELECTIONS
q) NO SELECTION
Enter Selection, q to quit: m


a) /virtual-devices@100/channel-devices@200/disk@2
b) /virtual-devices@100/channel-devices@200/disk@d
c) /virtual-devices@100/channel-devices@200/disk@c
d) /virtual-devices@100/channel-devices@200/disk@1
e) /virtual-devices@100/channel-devices@200/disk@0
f) /iscsi-hba/disk
q) NO SELECTION
Enter Selection, q to quit: q
{0} ok
{0} ok
{0} ok show-devs
/cpu@2f
/cpu@2e
/cpu@2d
/cpu@2c
/cpu@2b
/cpu@2a
/cpu@29
/cpu@28
/cpu@27
/cpu@26
/cpu@25
/cpu@24
/cpu@23
/cpu@22
/cpu@21
/cpu@20
/cpu@1f
/cpu@1e
/cpu@1d
/cpu@1c
/cpu@1b
/cpu@1a
/cpu@19
/cpu@18
/cpu@17
/cpu@16
/cpu@15
/cpu@14
/cpu@13
/cpu@12
/cpu@11
/cpu@10
/cpu@f
/cpu@e
/cpu@d
/cpu@c
/cpu@b
/cpu@a
/cpu@9
/cpu@8
/cpu@7
/cpu@6
/cpu@5
/cpu@4
/cpu@3
/cpu@2
/cpu@1
/cpu@0
/virtual-devices@100
/reboot-memory@0
/iscsi-hba
/virtual-memory
/memory@m0,10000000
/aliases
/options
/openprom
/chosen
/packages
/virtual-devices@100/channel-devices@200
/virtual-devices@100/console@1
/virtual-devices@100/random-number-generator@e
/virtual-devices@100/flashprom@0
/virtual-devices@100/channel-devices@200/virtual-domain-service@0
/virtual-devices@100/channel-devices@200/pciv-communication@0
/virtual-devices@100/channel-devices@200/disk@b
/virtual-devices@100/channel-devices@200/disk@a
/virtual-devices@100/channel-devices@200/disk@9
/virtual-devices@100/channel-devices@200/disk@8
/virtual-devices@100/channel-devices@200/disk@7
/virtual-devices@100/channel-devices@200/disk@6
/virtual-devices@100/channel-devices@200/disk@5
/virtual-devices@100/channel-devices@200/disk@4
/virtual-devices@100/channel-devices@200/disk@3
/virtual-devices@100/channel-devices@200/disk@2
/virtual-devices@100/channel-devices@200/disk@d
/virtual-devices@100/channel-devices@200/disk@c
/virtual-devices@100/channel-devices@200/disk@1
/virtual-devices@100/channel-devices@200/disk@0
/virtual-devices@100/channel-devices@200/network@1
/virtual-devices@100/channel-devices@200/network@0
/iscsi-hba/disk
/openprom/client-services
/packages/vnet-helper-pkg
/packages/vdisk-helper-pkg
/packages/obp-tftp
/packages/kbd-translator
/packages/SUNW,asr
/packages/dropins
/packages/terminal-emulator
/packages/disk-label
/packages/deblocker
/packages/SUNW,builtin-drivers
{0} ok

But there're a number of anomalies from within the OS - there are probably more but the obvious ones are.

The format command just hangs - even at single user, as does devfsadm and if you go any where in the /dev tree - even a simple ls doesn't want to play.

Regards

Gull04

Hmmmm.......this sounds weird. I haven't had much experience with ASM so I don't have a stock answer but perhaps just some pointers might help especially with someone as experienced as you.

Some of this you'll already know so apologies for that.

In my experience, maintenance mode is caused by unable to mount /usr. Of course, if the O/S couldn't access root (/) then it wouldn't boot at all but it also needs access to /usr to avoid calling maintenance mode. So where is your /usr? On what disk(s)?

Inability to access disk(s) can be caused by the device node (/dev/dsk, /dev/rdsk, /dev/cfg) files being missing so I would check them. Yes, to do that I would be using 'format' to tell me the correct device node(s) by discovering them.

ASM uses device maps in /devices I believe. Do they look right?

Does running 'format' in expert mode (-e) show you anything different???

Are the missing disks on a SAN? If so, have the storage boys made them visible to this box?

As you suggest, I would use 'devfsadm' to recreate the missing device nodes but, of course, for that to work, the disks must already be physically visible.

These are just my thoughts. I would guess that this issue is at the O/S level and not directly related to ASM.

Just my thoughts.

On the primary domain, see those disks via format.
Print the partitions and stuff, are those luns labeled ?

Did you partition your s0 disk starting from 0 cylinders perhaps ?

I see you did not use options=slice when issuing ldm add-vdsdev and you are using slices!
This is wrong and should be corrected

Regards
Peasant

2 Likes

Hi Folks,

I have an update to this problem, it's not fixed yet - it's going to take a while to do that as the following explanation will show. So for any one who is interested I have identified the problem as incorrectly partitioned disks on the test server, the production and dr servers were built correctly with the disks sliced from cylinder 1 and Peasant hit the nail on the head - I hadn't checked that because the build document that I handed over did actually specify that the slice must start from Cylinder 1.

So on Production/DR we have;

Specify disk (enter its number)[11]: 12
selecting c0t600507680C8082780000000000000494d0
[disk formatted]
format> pa


PARTITION MENU:
        0      - change `0' partition
        1      - change `1' partition
        2      - change `2' partition
        3      - change `3' partition
        4      - change `4' partition
        5      - change `5' partition
        6      - change `6' partition
        7      - change `7' partition
        select - select a predefined table
        modify - modify a predefined partition table
        name   - name the current table
        print  - display the current table
        label  - write partition map and label to the disk
        !<cmd> - execute <cmd>, then return
        quit
partition> pr
Current partition table (original):
Total disk cylinders available: 38398 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders         Size            Blocks
  0 unassigned    wm       1 - 38397      299.98GB    (38397/0/0) 629096448
  1       swap    wu       0                0         (0/0/0)             0
  2     backup    wu       0 - 38397      299.98GB    (38398/0/0) 629112832
  3 unassigned    wm       0                0         (0/0/0)             0
  4 unassigned    wm       0                0         (0/0/0)             0
  5 unassigned    wm       0                0         (0/0/0)             0
  6        usr    wm       0                0         (0/0/0)             0
  7 unassigned    wm       0                0         (0/0/0)             0

partition> quit


FORMAT MENU:
        disk       - select a disk
        type       - select (define) a disk type
        partition  - select (define) a partition table
        current    - describe the current disk
        format     - format and analyze the disk
        repair     - repair a defective sector
        label      - write label to the disk
        analyze    - surface analysis
        defect     - defect list management
        backup     - search for backup labels
        verify     - read and display labels
        save       - save new disk/partition definitions
        inquiry    - show disk ID
        scsi       - independent SCSI mode selects
        cache      - enable, disable or query SCSI disk cache
        volname    - set 8-character volume name
        !<cmd>     - execute <cmd>, then return
        quit
format> quit
root@fvssphsun01:~#

And on the Test Server we have;

Specify disk (enter its number)[18]: 20
selecting c0t60050768018086B6E80000000000045Bd0
[disk formatted]
Disk not labeled.  Label it now? no
format> pa


PARTITION MENU:
        0      - change `0' partition
        1      - change `1' partition
        2      - change `2' partition
        3      - change `3' partition
        4      - change `4' partition
        5      - change `5' partition
        6      - change `6' partition
        7      - change `7' partition
        select - select a predefined table
        modify - modify a predefined partition table
        name   - name the current table
        print  - display the current table
        label  - write partition map and label to the disk
        !<cmd> - execute <cmd>, then return
        quit
partition> pr
Current partition table (default):
Total disk cylinders available: 38398 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders         Size            Blocks
  0       root    wm       0 -    15      128.00MB    (16/0/0)       262144
  1       swap    wu      16 -    31      128.00MB    (16/0/0)       262144
  2     backup    wu       0 - 38397      299.98GB    (38398/0/0) 629112832
  3 unassigned    wm       0                0         (0/0/0)             0
  4 unassigned    wm       0                0         (0/0/0)             0
  5 unassigned    wm       0                0         (0/0/0)             0
  6        usr    wm      32 - 38397      299.73GB    (38366/0/0) 628588544
  7 unassigned    wm       0                0         (0/0/0)             0

partition> quit


FORMAT MENU:
        disk       - select a disk
        type       - select (define) a disk type
        partition  - select (define) a partition table
        current    - describe the current disk
        format     - format and analyze the disk
        repair     - repair a defective sector
        label      - write label to the disk
        analyze    - surface analysis
        defect     - defect list management
        backup     - search for backup labels
        verify     - read and display labels
        save       - save new disk/partition definitions
        inquiry    - show disk ID
        scsi       - independent SCSI mode selects
        cache      - enable, disable or query SCSI disk cache
        volname    - set 8-character volume name
        !<cmd>     - execute <cmd>, then return
        quit
format> current
Current Disk = c0t60050768018086B6E80000000000045Bd0
<IBM-2145-0000 cyl 38398 alt 2 hd 64 sec 256>
/scsi_vhci/ssd@g60050768018086b6e80000000000045b

format> verify
Warning: Could not read primary label.

Warning: Check the current partitioning and 'label' the disk or use the
         'backup' command.

Backup label contents:

Volume name = <        >
ascii name  = <IBM-2145-0000 cyl 38398 alt 2 hd 64 sec 256>
pcyl        = 38400
ncyl        = 38398
acyl        =    2
nhead       =   64
nsect       =  256
Part      Tag    Flag     Cylinders         Size            Blocks
  0 unassigned    wm       0 - 38397      299.98GB    (38398/0/0) 629112832
  1 unassigned    wu       0                0         (0/0/0)             0
  2     backup    wu       0 - 38397      299.98GB    (38398/0/0) 629112832
  3 unassigned    wm       0                0         (0/0/0)             0
  4 unassigned    wm       0                0         (0/0/0)             0
  5 unassigned    wm       0                0         (0/0/0)             0
  6 unassigned    wm       0                0         (0/0/0)             0
  7 unassigned    wm       0                0         (0/0/0)             0

format>

As you may be able to gather from the above, the disk on the test server has lost it's primary label - which has been over written by ASM. The original backup label is still there as displayed by the verify command output.

I attempted to duplicate these disks to a set of files in a zpool, where I was going to work on them - but attempting to dd the disks was a problem even using the raw devices.

root@fvskirsun01:~# zpool create asm_recovery c0t60050768018086B6E800000000000560d0s0
root@fvskirsun01:~# zpool list
NAME          SIZE  ALLOC  FREE  CAP  DEDUP  HEALTH  ALTROOT
asm_recovery  796G   153K  796G   0%  1.00x  ONLINE  -
rpool         556G   177G  379G  31%  1.00x  ONLINE  -
root@fvskirsun01:~# zfs list | grep asm
asm_recovery                     86.5K   784G    31K  /asm_recovery
root@fvskirsun01:~# dd if=/dev/rdsk/c0t60050768018086B6E80000000000045Ad0s0 of=/asm_recovery/asm_01 bs=1024k
dd: /dev/rdsk/c0t60050768018086B6E80000000000045Ad0s0: open: I/O error
root@fvskirsun01:~# dd if=/dev/rdsk/c0t60050768018086B6E80000000000045Ad0s2 of=/asm_recovery/asm_01 bs=1024k
dd: /dev/rdsk/c0t60050768018086B6E80000000000045Ad0s2: open: I/O error

I have spoken with a couple of people and the general consensus is that this situation is non recoverable, but I have a couple of other things to try - I'll be familiarising myself with some low level stuff that I haven't worked with for years. So if I do manage to resolve the problem I'll let you all know.

Many thanks Dennis and Peasant

Gull04

Use options slice if you are using slices, as mentioned !
Or use entire device (s2) without options=slice, then partition s0 inside LDOM (or outside) and give it to ASM.

Never give s2 to anything except zpools on VTOC labels.
ZFS when s2 (full scsi lun) or without slice, will repartition disk(s) properly and create zpool (EFI)
When using s2 or no slice at all, ZFS will be endianess aware, you will be able to import the zpool on both solaris architectures (SPARC or x86).

Also, clusters that do SCSI reservation and fencing (like Solaris cluster) will require s2 inside LDOM, or else it will not work.

As for recovering, well, i have no good news about that.
Perhaps some hackery could be done if you had original intact disks without creating zpools out of ASM disk(s)
Why did you do that ... it's like having bad UFS and creating ZFS out of it to recover, it just doesn't make sense :slight_smile:

Hackery in a way that you could (from primary domain) :
Present a bit bigger lun, label them, create slice 0 matching the size of entire old lun exactly
Using dd copy entire s2 disk (broken) to s0 slice (new presented and labeled above)
At the end, you would have a labeled disk with s0 slice containing ASM label.
See if ASM now recognizes those luns as proper ones present to LDOM with options=slice
I haven't tried this, get back with results :slight_smile:

In my opinion, it is just easier and less prone to error to create from scratch and restore backup and continue with standby DB.
I presume you are replicating with database tools, not storage backend, since if you used storage, you would get properly labeled disks on DR site.

Regards
Peasant.

1 Like

Hi Peasant,

The reason for the failure as you'll have deduced is that the disks assigned to ASM were incorrectly configured, the original build document instructed that they skipped Cylinder 0 for the ASM slice. However this wasn't done and as the server had to be rebooted Solaris threw a flaky when it couldn't read the label, no surprise there.

The disk assigned to ASM are the raw devices and the ASM header covers the first 8 sectors of the disk, which is the label and some of the bitmap.

The zpool that I created was to try a quick and dirty fix as follows.

Use dd to directly copy the raw device to a file inside the asm_recovery zpool, my intention was to then label the original luns and restart the LDOM without ASM and AFD. At that point I would have also presented the dd files to the LDOM and would have used kfed to sort the disks out.

I wanted to run a comparison with the raw dd files and the ASM disks before dumping the database and blowing the disks away, however things have moved on here and I probably won't have time to do this. I was looking at using the mdb to get round the label problem (This exists at a LUN level - so is problematic in the primary LDOM as well.) so as I could dd the disks to the target files.

In all fairness using s2 would not have yielded a different result, as the earlier post shows - slice 2 also started at Cylinder 0 - so ASM would have just killed the disk - I have spoken with two Sun (now Oracle people - both with two digit employee numbers) and the chances of sorting this out are slim.

The long and the short of this is that the person who built the system, missed the requirement to have the ASM disks start at Cylinder 1. His experience of ASM before this was limited to using soft partitions in SVM and adding the meta device to a zone, although he did use ASM on top of AIX and RHEL - the configuration is significantly different.

The disks used for ASM and AFD have only been labelled, they are raw devices - these normally show up in format as;

root@fvskirsun01:~# echo | format
Searching for disks...done

c0t60050768018086B6E80000000000045Ad0: configured with capacity of 1022.00MB
c0t60050768018086B6E8000000000004B8d0: configured with capacity of 49.98GB
c0t60050768018086B6E80000000000045Bd0: configured with capacity of 299.98GB
c0t60050768018086B6E80000000000045Cd0: configured with capacity of 299.98GB
c0t60050768018086B6E80000000000045Dd0: configured with capacity of 5.00GB
c0t60050768018086B6E80000000000045Ed0: configured with capacity of 5.00GB
c0t60050768018086B6E80000000000045Fd0: configured with capacity of 5.00GB
c0t60050768018086B6E800000000000462d0: configured with capacity of 49.98GB
c0t60050768018086B6E800000000000463d0: configured with capacity of 10.00GB
c0t60050768018086B6E800000000000461d0: configured with capacity of 49.98GB
c0t60050768018086B6E800000000000460d0: configured with capacity of 20.00GB


AVAILABLE DISK SELECTIONS:

These are labelled and pushed through to the LDOM in question as;

VDS
    NAME         VOLUME         OPTIONS          MPGROUP        DEVICE
    fdbkirnhht01-vds0 volume1                                        /dev/dsk/c0t60050768018086B6E800000000000457d0s0
                 nhht_db_vol2                                   /dev/rdsk/c0t60050768018086B6E800000000000459d0s0
                 nhht_db_vol3                                   /dev/rdsk/c0t60050768018086B6E800000000000464d0s0
                 nhht_ba_vol2                                   /dev/rdsk/c0t60050768018086B6E800000000000465d0s0
                 nhht_asm_01                                    /dev/rdsk/c0t60050768018086B6E80000000000045Ad0s0
                 nhht_asm_02                                    /dev/rdsk/c0t60050768018086B6E80000000000045Bd0s0
                 nhht_asm_03                                    /dev/rdsk/c0t60050768018086B6E80000000000045Cd0s0
                 nhht_asm_04                                    /dev/rdsk/c0t60050768018086B6E80000000000045Dd0s0
                 nhht_asm_05                                    /dev/rdsk/c0t60050768018086B6E80000000000045Ed0s0
                 nhht_asm_06                                    /dev/rdsk/c0t60050768018086B6E80000000000045Fd0s0
                 nhht_asm_07                                    /dev/rdsk/c0t60050768018086B6E800000000000460d0s0
                 nhht_asm_08                                    /dev/rdsk/c0t60050768018086B6E800000000000462d0s0
                 nhht_asm_09                                    /dev/rdsk/c0t60050768018086B6E800000000000461d0s0
                 nhht_asm_10                                    /dev/rdsk/c0t60050768018086B6E800000000000463d0s0

I will update you all if there are any changes, if I can get the disks copied I'll let you know. Failing that then I'll clone the production system and they will have an up to date test system.

Thanks again for the pointers and information.

Regards

Gull04