Help with Solaris 8

Good Morning All,

We are having a Sparc server v880 running Solaris 8 and recently we ran into a strange issue as below

Jan 20 15:24:16 xxxx scsi: WARNING: /pci@9,700000/scsi@3 (glm1): 
Jan 20 15:24:16 xxxx        Connected command timeout for Target 0.0 
Jan 20 15:24:16 xxxx scsi: WARNING: /pci@9,700000/scsi@3 (glm1): 
Jan 20 15:24:16 xxxx        got SCSI bus reset 
Jan 20 15:24:17 xxxx scsi: WARNING: /pci@9,700000/scsi@3,1 (glm2): 
Jan 20 15:24:17 xxxx        Connected command timeout for Target 0.0 
Jan 20 15:24:17 xxxx scsi: WARNING: /pci@9,700000/scsi@3,1 (glm2): 
Jan 20 15:24:17 xxxx        got SCSI bus reset

Because of above error when the commands like format, devfsadm, cfgadm, iostat are issue nothing comes up and doesn't return the prompt back.

Not sure what is this glm1 and glm2 are here. This is a stand alone server no tape or juke box attached to them.

Any ideas will be great for fixing this and please let me know if any more information needed?

Thanks,
P

I would say the disk /pci@9,700000/scsi@3,1 is about to die...

None of the disk are on it. At the end it says glm1 and glm2

All my disks are on /pci@8,600000/

Is GLM something to do with a fibre card? Gigabit Link Module

Do you have any open-ended fibre links or decommissioned fibre links?

Robin

Should be a card in PCI slot 5. Would also say a fibre card with connection to external storage (or not anymore in this case).

Seems like it Duke. After bang my head and running probe-scsi-all from OK prompt somehow that error isnt showing up now but ran into different issue on same server

Look at the below output of cfgadm , I understand pcisch0:hpc1_slot0 stuff if related to tape drives.

bash-2.03# cfgadm -al
Ap_Id                          Type         Receptacle   Occupant     Condition
SBa                            cpu/mem      connected    configured   ok
SBa::cpu0                      cpu          connected    configured   ok
SBa::cpu1                      cpu          connected    configured   ok
SBa::memory                    memory       connected    configured   ok
SBb                            cpu/mem      connected    unconfigured ok
SBc                            cpu/mem      connected    unconfigured ok
SBd                            cpu/mem      connected    unconfigured ok
c0                             scsi-bus     connected    configured   unknown
c0::dsk/c0t6d0                 CD-ROM       connected    configured   unknown
c2                             scsi-bus     connected    unconfigured unknown
c3                             scsi-bus     connected    unconfigured unknown
pcisch0:hpc1_slot0             unknown      empty        unconfigured unknown
pcisch0:hpc1_slot1             unknown      empty        unconfigured unknown
pcisch0:hpc1_slot2             unknown      empty        unconfigured unknown
pcisch0:hpc1_slot3             unknown      empty        unconfigured unknown
pcisch2:hpc2_slot4             unknown      empty        unconfigured unknown
pcisch2:hpc2_slot5             mult/hp      connected    configured   ok
pcisch2:hpc2_slot6             unknown      empty        unconfigured unknown
pcisch3:hpc0_slot7             unknown      empty        unconfigured unknown
pcisch3:hpc0_slot8             unknown      empty        unconfigured unknown

I cant see c1 controller in above stuff, where as all the disk are using c1 controller.

bash-2.03# cd /dev/dsk
bash-2.03# ls
c0t6d0s0   c1t0d0s2   c1t10d0s4  c1t11d0s6  c1t13d0s0  c1t1d0s2   c1t2d0s4   c1t3d0s6   c1t5d0s0   c1t8d0s2   c1t9d0s4
c0t6d0s1   c1t0d0s3   c1t10d0s5  c1t11d0s7  c1t13d0s1  c1t1d0s3   c1t2d0s5   c1t3d0s7   c1t5d0s1   c1t8d0s3   c1t9d0s5
c0t6d0s2   c1t0d0s4   c1t10d0s6  c1t12d0s0  c1t13d0s2  c1t1d0s4   c1t2d0s6   c1t4d0s0   c1t5d0s2   c1t8d0s4   c1t9d0s6
c0t6d0s3   c1t0d0s5   c1t10d0s7  c1t12d0s1  c1t13d0s3  c1t1d0s5   c1t2d0s7   c1t4d0s1   c1t5d0s3   c1t8d0s5   c1t9d0s7
c0t6d0s4   c1t0d0s6   c1t11d0s0  c1t12d0s2  c1t13d0s4  c1t1d0s6   c1t3d0s0   c1t4d0s2   c1t5d0s4   c1t8d0s6
c0t6d0s5   c1t0d0s7   c1t11d0s1  c1t12d0s3  c1t13d0s5  c1t1d0s7   c1t3d0s1   c1t4d0s3   c1t5d0s5   c1t8d0s7
c0t6d0s6   c1t10d0s0  c1t11d0s2  c1t12d0s4  c1t13d0s6  c1t2d0s0   c1t3d0s2   c1t4d0s4   c1t5d0s6   c1t9d0s0
c0t6d0s7   c1t10d0s1  c1t11d0s3  c1t12d0s5  c1t13d0s7  c1t2d0s1   c1t3d0s3   c1t4d0s5   c1t5d0s7   c1t9d0s1
c1t0d0s0   c1t10d0s2  c1t11d0s4  c1t12d0s6  c1t1d0s0   c1t2d0s2   c1t3d0s4   c1t4d0s6   c1t8d0s0   c1t9d0s2
c1t0d0s1   c1t10d0s3  c1t11d0s5  c1t12d0s7  c1t1d0s1   c1t2d0s3   c1t3d0s5   c1t4d0s7   c1t8d0s1   c1t9d0s3

How can this be ? Am I missing anything

Thanks,
P

Try to run devfsadm -Cv to get rid of device entries that are not used anymore.

It could be a poor contact on an add-in card. I would power down, pull out each add-in card, and plug it back in again just to make sure the edge contacts are making. Perhaps this machine has been around a while and card(s) need reseating.

I tried doing it but see see the same output of cfgadm -al as sent before and still no C1 controller in the list

Whenever i reboot the server glm issue pops up then none of the commands work but after couple of hours glm issue disappears and commands starts tot work

Thanks,
P

Under normal circumstances no system will change its behaviour i.e., commands which don't work just start working after a couple of hours of running. There are only so many things that can cause it.

  1. The system is running something at boot time which is time-based and toggling off something for a couple of hours.
  2. A peripheral system is toggling something for a couple of hours e.g. a SAN is making volumes unavailable at certain times.
  3. Or most likely it is a hardware problem e.g. as the system heats up things start working, see my post#8.