Prize of being an Admin

Was wondering if anyone has come across any situation where you do your best to help users and in return you get a nice escalation from top level management!

Here's my story:

One fine morning, I was sitting idle, doing next to nothing, I got an alert from helpdesk people about a problem with an application running on a high-priority AIX LPAR (sorry for not being specific as I am not allowed to disclose the details). This application is actually used heavily by the bio-tech research fellows. The application was crashing frequently. Upon checking in-depth, I found there's nothing wrong from the OS stand point. I also found the problem was caused by one rouge wrapper script. But as a server admin, I had nothing to do with that officially. In the mean time, I had to speak to the manager of the research team and had to try my best to make her understand where the actual problem was. I assured her that I would contact the responsible application admin team.

Followed by that, I tried contacting that application admin team (which is a different IT services vendor than my company). I did not get any solid reply from them nor any definite timeline when they would be able to look into the problem.

I ended my shift. Still no reply from the application team. I came home. Gave it a lot of thought whether I should modify the script by myself. I accessed the server from home, spent a sleepless night understanding how everything of that application was wired.

Next morning, I went to office, saw my E-mail inbox hoping that the app team might have responded. But no! So I went ahead, made the changes that I thought should have been done. restarted the application service. Monitored the application for a couple of hours and called up the manager of the research team. She was furious. I politely told what I had done so far and asked her to check if things were fine. And yes, things were fine. But that lady was quite upset about our service.

After 6 hours, I saw an E-mail from our top level Service Delivery Manager demanding an answer as to why it took so long for me to fix the issue. Upon checking I found the research team manager had sent a "beautiful" E-mail to the top level manager saying how worthless people in IT were.

After two days, my supervisor showed up after his vacation. After knowing my deed, all he had to ask was "Why the hell did you even think of taking such bold step? What if something wrong had happened?". Being one of the junior members in the team, I kept quiet. All I tried to do was to help the research folks out and let them continue their work as soon as possible. But I got the prize of doing that. :rolleyes:

so you learned your lesson: if you fix things that are broken but not owned by you - don't tell anyone :slight_smile:

2 Likes

I agree with zxmaus and the old saying:

"No good deed goes unpunished."

When in a political (territorial) situation as described; just fix the problem and don't tell anyone you fixed it, since you are the main system admin with root privs.

"No good deed goes unpunished."

.. as they say....

1 Like

Well said zxmaus and Neo! Lesson learned!!

I used to think why all of the senior admins were so lifeless. And here I am, always ready to dive into any problem with extra zeal!! But now I know, I need to be controlled and there's a reason why seniors are indifferent on things. :slight_smile:

I think what you should have done is "shift (or share) responsibility" ie - ask research team manager if you can try fixing the issue by yourself if the application responsible is not responding for few hours. Then you would avoid getting trashed by her, as she couldn't say that you wasn't working on the issue. It would also solve the problem of responsibility (at least to some extent) in case things went ugly, that your manager told you about.

Excellent point, bartus11.
Being a naive and over-enthusiastic one, I was only thinking about the problem on hand, not about the consequences. Looks like I have to work on so many things!! :smiley:

Difficult situation admin_xor . You dealt with the situation well on the back of an emergency ... but did you have the authority to make the change?
In a formal environment you needed authority from the duty Change Control Manager to go ahead with an Emergency Change.
This could have backfired big time.
Take care.

(I've had a Director out of bed to authorise a minor change when the duty Change Control Manager did not respond).

1 Like

No, methyl. There are different teams from different IT service vendors who work for our client. My company is one of them. My team is responsible for administering the server from OS stand point. If there's any issue with storage, we need to consult with SAN people. If problem is with an application, we need to work with application people. If there's a network issue, we get in touch with network team. Should it be a problem with hardware, we need to contact datacenter admins who have physical access to the server.

For this situation, I was not the authorized person to make any changes to the application.

I agree with all said....

If the application was crashing (and not effecting the host or any other apps running); and you did not have explicit authority to change the app; then you were "crossing the line" to fix the app without authority.

On the other hand, the situation gets more complex if you have responsibility for the host and the host or other apps are being effected by the crashing app.

What was the case here?

Was this a single application server? Or are there multiple users with apps under their authority?

This was an app server. The running service was serving around 83 clients (the research group). I am primarily responsible for the server's performance and health. Also, as per our process, the server admin (my team) is the on-duty person for any issue with the server. We will RCA the issue and if there's nothing to be done from our side, we have to pass it to the correct team.

I admit I took a bold step here. All my intentions were to resume the normal service.

Yes, but how many applications are running under different administrative authorities?

Sounds like the server only support one group (the research group) and there are vary applications running on it (all under the administrative authority of the research group).

Is that right?

Problem is when you fix other folks mistakes code-wise, you become the programer / maintainer of that overnight.

So being pro-active and showing initiative will actually backstab you later on :wall:
This is especially true for large-scale deployments and implementation.

My story is that my company implemented (bought) a completely new core solution.
There were, of course, alot of issues and bugs software wise.

But the unix part was working fine most of the time, so system folks like myself were helping business IT (app folks) to realize what's wrong by doing tcpdumps, AWR's.

At the end unix folks were explaning core bussiness to folks who should know it.

And nowadays (we are in production), i still get phone calls from folks who need their part explained.

That why in the past year or so i avoid being everything else except what i'm paid for.
It should be an advantage to know what's actually going on business-wise, but practice has showed otherwise.

@Neo

I'm not really sure about exact number of application running in the environment as the client is a giant company spreading in US, UK and NL. There are 143 UNIX/Linux (AIX, Solaris and RHEL) servers dedicated for different things. This particular server hosts only one application. The research group is the end users of the application. They access it through Citrix. The app admin team is a different group. Problem is that the end users do not differentiate among the teams. To them, we are all IT people with weird techie-terms. That's why I got blamed for taking 24 hours to fix the issue. Whereas the actual people who should be blamed is the app admin team for not responding to my calls/E-mails.

@Peasant

You are right! Will just do what I'm paid for. :rolleyes:

This is giant ? :slight_smile: ... actually that explains why you have time to fix other area's problems ... :smiley:

Nice post. Thanks for posting interesting events.

In my company, the culture has been just do it without having to wrap header, footer, body all sorts of work delay catalysts. Having said that the code changes have to be reviewed by an engineer from the team who owns the package.

Sarcastically, even if you add a ';' to a package you are the owner of the package from then on :wink:

I've been in similar situations, and it is often very difficult to push back to the customers (end-users) that while you can RCA the problem, you are not authorized to fix it.

A lot of my time now is spent proving negatives where an issue is raised to us and we need to check if it is our problem. If it is not, we often have to provide documentary evidence that it is not us and point the customer to who they need to contact to escalate the issue.

I think the single most important help for me is the fact that my line manager will back us engineers up completely if we have taken an issue as far as we can, and he fields the heat from the customer in that case.

It is difficult not doing something you know will work to fix an issue, but I've learned to be careful to not do anything without cast-iron authorization from the correct person(s), simply as a self-preservation measure.

@matrixmadhan

Thank you for taking the time to read my post! Just thought that many of us have experienced similar situation at the dawn of their careers. :slight_smile:

@spynappels

Yes, to end users we are all same. When we try to tell that it's not our issue exactly and some other team has to work on that, they think we are just giving "excuse" to get away. That hurts!

My intention in this case was not show off how capable I was. I just wanted to make up for the other teams ignorance as I know how it goes to the end users when something important does not work. I am an end user too for my companies IT team (yes they have separate IT team which takes care of internal infrastructure) and it would be really difficult for me to work if a proxy server or a terminal server, which gives us access to the client's infrastructure, goes down!

However, I have learned it hard way and have sworn that I will never repeat such crazy deed. :cool:

Wise words.
Just make that call to get authority and you are covered. If you cannot get authority, make a diary entry and move on to the next problem.

Yup, papertrail is essential in situations like these.

my ex-colleague at HP was angry at my manager and he screwed up the crontab not once but twice, it affected the Australian customers so bad that people couldn't get their pension payout..hehe he didn't tell anyone he did it, neither did he admit it but one look at his face told me everything I had to know. I had a good laugh after he left