V240 no OBP or console available

Hey guys,
I recently replaced the heatsinks on two different V240's. Repasted and replaced them myself. After attempting to bring them back up, there was no display coming from the KVM. I figured it was a matter of environment in OBP and attempted to get to the OK prompt from the ALOM. I let the login in expire, and the OK prompt never came. Went in through the ALOM and sent a break command and tried console -f. Still nothing. I have scoured the internet and exhausted all resources at my disposal to no avail. It seems to me that something I did is causing the system to hang up, or perhaps enter a boot cycle that I can't break. I've been working this over a week and am drained. Any advice or suggestions are welcomed. ANY. Thank you in advanced.

P.S. It may be simpler than I am thinking, because I am experiencing the same exact issues on two separate servers... just a thought.

First, you need CPU's to get an OK prompt. I think you screwed up your CPU's somehow. I have no 240's left so this is from imperfect memory. I used to be able to defeat the case switch so I could run the 240 with the cover off. Those heat sinks have a couple of fans on each one. Are the blades spinning? (BTW, over the last few years that I had 240's I never saw a replacement heat sink that used paste. I assume that yours required that paste you used.) Try reseating the CPU's. This is probably the key. My guess is that you reseated all 4 cpu's wrong. The next most likely idea is that you got 4 bum replacement heat sinks. A final thought is maybe you forgot to plug the heat sinks in. Check for spinning heat sink fan blades will quickly disprove the last two ideas.

In the worst case buy some replacement 240's on ebay. And when the heat sink fans break, hire a tech to replace the heat sinks rather than doing it yourself.

we replace the fans on v125 without removing the heatsink (I think its the same as in v240), its not very hard.
perhaps you should check if RAM modules are seated properly, too.

What do you mean by this??? You let what login expire?

Are you saying that from a cold power up, you get no output whatsoever?

You mention a KVM. This is connected to which port on the V240?

Is everything definitely connected the same way it was before you worked on the box?

If you plug into the serial port on a V240 and power it on you get a chance to login to ALOM. If you login to ALOM, you can use the "console" command to turn the serial port into a system console. This also happens if you just let ALOM time out without logging in to it. If you don't know the ALOM admin password this is what you have to do.

@Perderabo.........

Yes, agreed, but I'm asking the OP whether there is any output at all. I'm wondering whether he does have control of the system controller (SC>) and the main processor is not configured to auto-boot. If so, perhaps he needs to tell the SC to 'poweron' the main box before giving the 'console' command, otherwise he's connecting a console to a dead box.

That's where I'm coming from. Do you think that's a possible scenario?

@hicksd8

:eek:
What you suggest is possible. ALOM is alive while the system is powered off. You can signon to ALOM and then use the "poweron" command to apply power to box. But there is also a power button. If this guy has been working on two dead V240's for a full week, I would assume that he is applying power. If not, then I would certainly recommend powering the system on as the very next thing to try.

@Perderabo........Thanks. Anyway, you know my suspicions, hence my questions to the OP. The OP is 8 hrs behind me and you are 5 hrs behind me. I'm in the UK so will be signing out soon so you are better placed to help. Unless the hardware architecture is understood (ie, the manual has been read), then pressing the power button after seeing a SC> prompt on the terminal might not be the obvious thing to do. Perhaps you'll have this sorted before I wake up tomorrow.

I will try to hit everything, sorry for my delay.

  1. I believe it is something with the CPU's and unfortunately I am the hardware tech rep for this gear. Usually I'm just software, so I'm learning on the fly.

  2. I did work around the case switch and all four replacements were running properly (and plugged in obviously)

  3. Re-seating the RAM was the first thing I did when no OBP showed.

  4. I let the SC login expire hoping to see OBP, I also removed all forms of connections to the KVM to try and force output to redirect to TTYA

  5. I do have access to the ALOM and I've tried rebooting with bootscripts to change output/input and set autoboot to false. When I console -f I get the same blank screen.

  6. With fans running properly, I think it would be safe to assume I applied power correctly. Interesting to note however, when I run showplatform through the ALOM, it does not correctly reflect the current state of the server. Example: When system has been powered on, it will still show system stopped.

I think you may be on to something with the CPU's being the issue, that is really the only thing that makes any sense. With both behaving the same exact way and no OBP I would have to say it is CPU's. I have pulled them out and checked to see how the pins are, and they seem fine to me... I did clear out some fuzz on one of them. Would the fuzz short the motherboard?

P.S. From what I understand, the difference in paste vs the strip that comes with it is the type of processors. (read it on my search for answers before posting)

Right, let me explain some of the basic hardware architecture here. Sorry, if you already know this but you really must understand this correctly.

First there is a Linux appliance called the System Controller (SC>) integrated underneath the full Sun hardware. The SC's job is to assist you, the Sysadmin, to manage the system. The SC is a separate machine to the main system (although integrated in the box).

When you plug in the mains power the SC will boot and you should see that output.

IF THE SC IS CONFIGURED TO, it will power up the main processor(s) but otherwise it won't. So power on the SC doesn't mean that the whole machine is powered up.

Can you login to the SC? Do you know the password?

If you can login to the SC then you can ask the SC to power up the main box by giving the command:

SC> poweron

After that is you switch to the console output:

SC> console

you should be able to see the main machine boot.

Try the first 'poweron' command and watch the front of the box for lights and the noise level (if it's not in a noisey machine room). Or get someone else stand in front of the box when you hit the return key. There should be an immediate response.

Post back the result.

I can log on to the SC, and after console this is what shows:

Enter #. to return to Alom.

and OBP never comes up. Been waiting for the OK prompt and nothing. I know for a fact the server is on, after poweron command is issued, the fans come up. This is where I've been stuck for a week. Trying to figure out why the OBP isn't there.

Yes, BUT BEFORE you issue the 'console' command tell the SC to power up the main box:

SC> poweron

Try that please.

Yes, I have done that. It shows the response SC requesting poweron and the fans kick on.

---------- Post updated at 02:32 PM ---------- Previous update was at 02:26 PM ----------

So no POST if that's what you are asking.

Right, I understand.

It's quite possible that the OBP is set to send console output elsewhere which would be why you're not seeing it. I've been involved in many a discussion on this forum regarding this; I'm off to look for these and I'll edit this post with the link(s).

Alternatively, if you have a Sun keyboard on this system then you could reset all SC parameters to default by hitting STOP-N as the SC is booting (but that might be a bit too drastic).

Watch this space.....

As I may or may not have stated in the initial post, I did suspect that at first. I ran the command bootmode bootscript="setenv device-output ttya" to try and at the very least reroute output. But even with the output being the monitor, I should be able to see the POST while consoled in and I can't.

This may help....

and this one:

BE CAREFUL reading this. There's a number of us discussing what's right and what's wrong.

I guess the console baud rate could be set way different to the SC baud rate thereby you're terminal not reading console output properly or even see garbage output.

Being 2300 hrs in the UK right now I'm going to have to leave you discussing with the many experts on this forum. Be assured that you're in the right place to get help.

1 Like

hcksd8

 I read through those forums, and found a few things I haven't tried yet. \(correcting time in ALOM, and booting without harddrives\) will post an update when I've tried them.

I recommend that you take a look at the settings (eg, console redirection, serial comms, etc) but don't be tempted to start changing things en masse.

As you said, both machines were working, you worked on the hardware, and now both have the same problem. Logic says that it's something you've done. This hardware has interlocks to protect the hardware (eg, it knows if a fan isn't running) so leaving one tiny cable unplugged could do something like this.

Are the processors properly seated and locked down? Stuff like that.

All too often an IT fault is responded to by reinstalling the O/S, reinstalling the app that errored, completely restoring a filesystem, and other too drastic responses like that turning the one original problem into a multitude of problems that then take ages to fix. My advice is don't be tempted but retrace your steps.

on v125, if you replace CPU fan(s) with 2wire (instead of 3wire), you get a bunch of warnings in Solaris log file, but it works.

Console settings are set to 9600 8-N-1

I figure if I can see the SC console settings are correct, right?

Also interesting to note, I received a new chassis, and am able to get the OBP but still not getting any visuals on the monitor. I was receiving similar issues to the old chassis until I changed the keyswitch to diagnostics and it came up perfectly. I may try on the old chassis to see if it works out.

---------- Post updated at 12:49 PM ---------- Previous update was at 09:28 AM ----------

Okay.... I have resolved the issue. I feel really dumb about it, the reason I'm posting is in case someone else down the line has a similar issue....

Apparently V240 CPU slots have a little silver lever next to them. You have to pull the lever up, place the CPU in, and lower the lever flush with the motherboard.

I feel very dumb, but this has been an excellent learning experience, and I thank all who participated.

1 Like