Solaris LDOM IP conflict

I have a Sun T4-1 running Solaris 11.4 with a static IP 192.168.0.183. On this machine is a Solaris 10 LDOM with a static IP of 192.168.0.78. The other day I had to stop the LDOM to do a memory reconfigure. When I rebooted it I got an error that the IP 192.168.0.78 was already in use and so networking was disabled.

I discovered that if I ssh into 192.168.0.78, it comes up with the T4's primary domain. So I thought I'd jsut changed the Solaris 10 IP to 192.168.0.79. But when I rebooted the LDOM, it complained that that IP was in conflict too! So somehow the primary domain has hijacked the Solaris 10's IP. So somehow my networking has gotten all bollixed up. The question is how to get the primary domain to give up 192.168.0.78?

Here's some info from the primary domain but darned if I can figure out what needs to be changed:

root@hemlock:~# ldm list
NAME             STATE      FLAGS   CONS    VCPU  MEMORY   UTIL  NORM  UPTIME
primary          active     -n-cv-  UART    32    16G      0.3%  0.3%  4d 22h 43m
sol10            bound      ------  5000    32    32G
root@hemlock:~# ldm list-bindings sol10
...
NETWORK
    NAME         SERVICE                MACADDRESS          PVID|PVLAN|VIDs
    ----         -------                ----------          ---------------
    net1         primary-vsw0@primary   00:14:4f:f8:07:f9   1|--|--

        PEER                   MACADDRESS          PVID|PVLAN|VIDs
        ----                   ----------          ---------------
        primary-vsw0@primary   00:14:4f:f8:22:af   1|--|--

root@hemlock:~# ipadm
NAME              CLASS/TYPE STATE        UNDER      ADDR
lo0               loopback   ok           --         --
   lo0/v4         static     ok           --         127.0.0.1/8
   lo0/v6         static     ok           --         ::1/128
net0              ip         ok           --         --
   net0/v4        static     ok           --         192.168.0.183/24
   net0/v6        addrconf   ok           --         fe80::210:e0ff:fe8a:1138/10
net1              ip         ok           --         --
   net1/v4        static     ok           --         192.168.0.79/24
sp-phys0          ip         ok           --         --
   sp-phys0/v4    static     ok           --         169.254.182.77/24
vnic1             ip         ok           --         --
   vnic1/v4address static    ok           --         192.168.0.78/24
root@hemlock:~# dladm show-phys
LINK            MEDIA         STATE      SPEED  DUPLEX    DEVICE
net0            Ethernet      up         100    full      igb0
net1            Ethernet      up         100    full      igb1
net2            Ethernet      unknown    0      unknown   igb2
net3            Ethernet      unknown    0      unknown   igb3
net4            Ethernet      up         100    full      vsw0
sp-phys0        Ethernet      up         10     full      usbecm2
root@hemlock:~# dladm show-link
LINK                CLASS     MTU    STATE    OVER
ldoms-vsw0.vport0   vnic      1500   up       net1
net0                phys      1500   up       --
net1                phys      1500   up       --
net2                phys      1500   unknown  --
net3                phys      1500   unknown  --
net4                phys      1500   up       --
sp-phys0            phys      1500   up       --
vnic1               vnic      1500   up       net1
root@hemlock:~# ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
vnic1: flags=100001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,PHYSRUNNING> mtu 1500 index 2
        inet 192.168.0.78 netmask ffffff00 broadcast 192.168.0.255
        ether 2:8:20:a4:2:9b
net0: flags=100001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,PHYSRUNNING> mtu 1500 index 4
        inet 192.168.0.183 netmask ffffff00 broadcast 192.168.0.255
        ether 0:10:e0:8a:11:38
net1: flags=100001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,PHYSRUNNING> mtu 1500 index 6
        inet 192.168.0.79 netmask ffffff00 broadcast 192.168.0.255
        ether 0:10:e0:8a:11:39
sp-phys0: flags=100001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,PHYSRUNNING> mtu 1500 index 5
        inet 169.254.182.77 netmask ffffff00 broadcast 169.254.182.255
        ether 2:21:28:57:47:17
lo0: flags=2002000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL> mtu 8252 index 1
        inet6 ::1/128
vnic1: flags=120002000840<RUNNING,MULTICAST,IPv6,PHYSRUNNING> mtu 1500 index 2
        inet6 ::/0
        ether 2:8:20:a4:2:9b
net0: flags=120002004841<UP,RUNNING,MULTICAST,DHCP,IPv6,PHYSRUNNING> mtu 1500 index 4
        inet6 fe80::210:e0ff:fe8a:1138/10
        ether 0:10:e0:8a:11:38
sp-phys0: flags=120002000840<RUNNING,MULTICAST,IPv6,PHYSRUNNING> mtu 1500 index 5
        inet6 ::/0
        ether 2:21:28:57:47:17
root@hemlock:~#

You might considering flushing your ARP cache(s), or simply waiting until the ARP entries to timeout.

1 Like

Current situation is :

Your T4 box has 3 IP addresses on three interface, from one which is VNIC net1 net0 vnic0
In one point in time, these addresses have been configured...

If you wish to remove the mentioned address found in this output :

.....
vnic1             ip         ok           --         --
   vnic1/v4address static    ok           --         192.168.0.78/24
.....

Execute the following on the hypervisor via SP or connected to any other address but 78 via ssh :

ipadm delete-addr vnic1/v4address

The system is complaining for duplicate address since you have one defined on VNIC in hypervisor and same inside LDOM.

Hope that helps
Regards
Peasant.

2 Likes

Thanks! The ipadm command worked! I now have networking again at 192.168.0.78 on the LDOM. But now I have a new problem. When I start the LDOM console, it won't let me in. (But I can successfully ssh to the LDOM and also open a vncviewer to it).

root@hemlock:~# ldm start sol10
LDom sol10 started
root@hemlock:~# ldm list
NAME             STATE      FLAGS   CONS    VCPU  MEMORY   UTIL  NORM  UPTIME
primary          active     -n-cv-  UART    32    16G      0.8%  0.8%  5d 23h 56m
sol10            active     -t----  5000    32    32G      0.6%  0.6%  2s
root@hemlock:~# telnet localhost 5000
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.

Connecting to console "sol10" in group "sol10" ....
Press ~? for control options ..
Hostname: avon
Feb  3 12:39:14 svc.startd[11]: svc:/application/management/ocm:default: Method "/lib/svc/method/svc-ocm start" failed with exit status 95.
Feb  3 12:39:14 svc.startd[11]: application/management/ocm:default failed fatally: transitioned to maintenance (see 'svcs -xv' for details)

avon console login: You do not have write access
You do not have write access

(message repeats every time I press a key)

And in the LDOM (captured from an ssh window):

/export/home/michele$ svcs -xv
svc:/application/print/server:default (LP print server)
 State: disabled since Mon Feb 03 12:38:57 2020
Reason: Disabled by an administrator.
   See: http://sun.com/msg/SMF-8000-05
   See: man -M /usr/share/man -s 1M lpsched
Impact: 2 dependent services are not running:
        svc:/application/print/rfc1179:default
        svc:/application/print/ipp-listener:default

svc:/application/management/ocm:default (Oracle Configuration Manager)
 State: maintenance since Mon Feb 03 12:39:14 2020
Reason: Start method exited with $SMF_EXIT_ERR_FATAL.
   See: http://sun.com/msg/SMF-8000-KS
   See: /var/svc/log/application-management-ocm:default.log
Impact: This service is not running.
/export/home/michele $ more /var/svc/log/application-management-ocm:default.log
...
[ Feb  3 12:38:57 Enabled. ]
[ Feb  3 12:39:13 Executing start method ("/lib/svc/method/svc-ocm start") ]
OCM is marked as disconnected.
Use emocmrsp -output /opt/ocm/ocm.rsp to create a response file and reconfigure
with configCCR -R /opt/ocm/ocm.rsp.
[ Feb  3 12:39:14 Method "start" exited with status 95 ]

Is this non-running service anything important?

--- Post updated at 02:56 PM ---

Thanks. I didn't even know about that. That will be good to know for next time. In the end I went with the ipadm method.

UPDATE
I fixed the "no write access problem" by grepping for proceses named "telnet" and then killing them. Then I tried the "telnet localhost 5000" again and it worked fine. I next ran "svcs -xv" again and this time the "OCM process in maintenane mode" wasn't listed there anymore. So that seems to have fixed itself. Go figure.

1 Like