Oracle return codes?

pludi · November 30, 2009, 5:02am

Having searched high and low through Oracles documentation, I came to think that they're very scripting-averse, as there's (apparently) no list of possible return/exit codes for their various command line utilities.

Is anyone here in possession of such a list, or knows where to find one? It would help me a great deal, as it's simpler to use that, than parsing through the load of output that lsnrctl and the like produce.

radoulov · November 30, 2009, 6:24am

This one?

$ORACLE_HOME/rdbms/mesg/oraus.msg

It's used by the oerr utility.

EDIT: Actually, you're asking for the return codes, not for the error messages so ignore my post.

Now that I think ... could you explain what exactly you're trying to achieve? Status != 0 is not sufficient?

pludi · November 30, 2009, 6:57am

Generally: yes, status != 0 is sufficient, if I can be sure that status 0 isn't the only code returned, and without any documentation I assume the worst (I've seen too many tools always returning 0, no matter what).

But, knowing the operators that will have to use that script, sooner or later I'll have to give more information than just Success/Failure, and I'd rather get that information via a return code, that by parsing the command output, which could change without warning.

Besides, I'm sure I'm sooner or later someone will have a similar task, where the scripts execution path will depend on the failure reason.

radoulov · November 30, 2009, 7:00am

I understand and I believe that the only reliable way of doing this is to parse the utilities output.

pludi · December 1, 2009, 8:01am

Well, it's pretty much confirmed that parts of the output and return codes are not useful when used in scripting. Following is the relevant parts of the script when it failed (additional comments in color):

lsnrctl stop LISTENER_xxx_V10
LSNRCTL for HPUX: Version 10.2.0.4.0 - Production on 27-NOV-2009 05:25:11

Copyright (c) 1991, 2007, Oracle.  All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=TCP)(HOST=xxx)(PORT=1545))
The command completed successfully

lsnrctl start LISTENER_xxx_V10
LSNRCTL for HPUX: Version 10.2.0.4.0 - Production on 27-NOV-2009 05:25:58

Copyright (c) 1991, 2007, Oracle.  All rights reserved.

Starting /orabase/product/10.2.0/bin/tnslsnr: please wait...

TNSLSNR for HPUX: Version 10.2.0.4.0 - Production
System parameter file is /etc/listener.ora
Log messages written to /tmp/listener_xxx_v10.log
Error listening on: (ADDRESS=(PROTOCOL=TCP)(HOST=xxx)(PORT=1545))
TNS-12542: TNS:address already in use
 TNS-12560: TNS:protocol adapter error
  TNS-00512: Address already in use
   HPUX Error: 226: Address already in use

Listener failed to start. See the error message(s) above...

Return code of both calls to lsnrctl: zero. No information that it didn't completely close the listener. No return code indicating that starting the listener didn't run normally.

radoulov · December 1, 2009, 8:21am

As already stated, while I agree that you cannot rely on the meaning of the return codes, I believe that the output gives you a good idea of what happened,
as you surely already know, in the case of Oracle NET utilities you can always use the string TNS- (often in addition to the famous ORA-).

pludi · December 1, 2009, 8:54am

Yes, the output gives a pretty good idea of what's going on. But I was hoping that I could avoid parsing the output, and any possible problems that can stem from it (wrong match because of a typo, incompatibilities across platforms, ...) by applying a simple 'case $? in...'.

radoulov · December 1, 2009, 8:58am

I know,
perhaps the statement below should be reformulated

pludi · December 1, 2009, 9:05am

Agreed and done, although parts of the output are still not useful (if lsnrctl didn't manage to stop the listener, I'd like to know that without running ps myself).

radoulov · December 1, 2009, 9:17am

Well, I've never seen this exact situation:

lsnrctl stop <listenername> with success
lsnrctl start <thesamelistenername> with TNS-12542

Unless in the meantime the configuration has been altered, it's not obvious to me what could have happened: I suppose that the first command was really successful and it's only the second one that's failing for whatever reason.

pludi · December 1, 2009, 9:36am

No configuration change whatsoever. At that point the script was a pretty simple "Stop listener -> Stop DB -> Start DB -> Start listener" thing. After I received the error, I checked the running processes, and sure there was an old listener process left. When I manually issued a restart, everything stopped/started just fine.

radoulov · December 1, 2009, 9:49am

Did you save and consult the output/log of the last script execution?
I doubt that lsnrctl stop was successful.

Is the output you posted above the real one?

pludi · December 1, 2009, 10:07am

What I've posted is what was the output sent to me via email, minus the output of our internal developed programs (I've checked their logs, exited successfully, no connections left open), and the restart of the instance itself (which is checked via sqlplus output).

radoulov · December 1, 2009, 10:22am

As far as your original question is concerned, I suppose the reported behavior should be considered an exception (might be a bug).
Assuming, of course, they sent you the complete log, all the relevant information and didn't miss/skip something between the stop and the start command

jim_mcnamara · December 1, 2009, 10:43am

FWIW - lsnctrl starts a daemon.

There are errors that can occur after the parent process is detached/defunct and can no longer report an error. This is a common issue with a daemon starter script/program.

I would not expect a return code of failure except in the instance where you request, for example, a duplicate daemon instance. Most daemon drivers are coded to barf at the the very beginning in that case.

The daemon can fail independently and not report anything to the calling process when there are system errors, or there is a subsequent network error. That's why you have to scan logs rather than check return codes. IMO.

pludi · December 1, 2009, 12:18pm

@radoulov: I pretty sure that I received the complete log, as it was sent automatically, and I wrote the script in question

@jim_mcnamara: IMHO any program/script that is a controlling interface between user and daemon (eg. init scripts, or lsnrctl in this case) shouldn't just limit themselves to the message "I've successfully sent the command to shut down completely", like it's happening here, but also tell me if the daemon in question can't act on that command. A simple "Sorry, I couldn't shut down, try again later" would be enough, and better than a default "Everything's shiny, move along".

jim_mcnamara · December 2, 2009, 5:33am

That is a valid design complaint. Get on the phone with support. They used to listen.
Does metalink offer any help? I have not played DBA for many years, so I am not current.

When I was at Los Alamos, we worked with Oracle on an 'alpha' version of Oracle/sqlnet for the very first nationwide distributed dbms system. We had similar complaints back then - what we bitched about was exactly what you describe now - an 'all is well' from one object only to find an error later on. Which the first object should have done something about - at least complain.

Our problem was the then new sqlnet vs. VMS / MS-DOS