troubleshooting log detailing symptoms/error msgs/fix actions for NIS+ client authent

summary found at bottom. to skip straight to action summary, ctrl+f for <summary>

this initially started with trouble changing passwords due to client being unable to authenticate, this was further caused by missing client files.

This was transparent to me, so this details the road I took, the road signs I saw to get where I needed to go, and how I got the right directions. METAPHOR TERMINATED//...

This was NIS+ in a solaris 9 environment, not sure if anybody still uses NIS+ but I hope that somebody can benefit from this in the future. Took me about 40 hours of research/troubleshooting to ultimately fix. I know--LOUSY!! I'm new to unix though so many of the tools available for diagnostics are unknown to me.

here's the log:
This is a record log of events detailing troubleshooting the change of password for the root account on DB1 in response to a compromise of the password.

Changing root password on all other servers/workstations went according to procedure until DB1 was reached. It would allow you to change password for root but not allow you to perform the necessary keylogin, which is part of updating root's secret key (part of the private-public key authentication process employed by NIS+ via DES). This prohibition prompted the following troubleshooting measures, produced the bold/red error messages, and I recorded sun.com's solaris documentation for probable causes for the specified error messages.

# keylogin
Could not generate netname (before domainname updated)
Could not generate netname
The Secure RPC software could not generate the Secure RPC netname for your UID when performing a keylogin. This could be due to the following causes:
� You do not have LOCAL credentials in the NIS+ cred table of the machine's home domain.
� You have a local entry in /etc/passwd with a UID that is different from the UID you have in the NIS+ passwd table. *note* we cannot compare NIS+ passwd table because NIS+ is not installed

This led to the investigation of the NIS+ passwd.org_dir table. See below for resulting error.

When attempting niscat -o passwd.org_dir to view the passwd table, error message results as follows:
Error in accessing NIS+ cold start file is NIS+ installed?
This message is returned if NIS+ is not installed on a machine or if for some reason the file /var/nis/NIS_COLD_START could not be found or accessed. Check to see if there is a /var/nis/NIS_COLD_START file. If the file exists, make sure your path is set correctly and that NIS_COLD_START has the proper permissions. Then rename or remove the old cold-start file and rerun the nisclient script to install NIS+ on the machine.
This message is generated by the cache manager that sends the NIS+ error code constant: NIS_COLDSTART_ERR. See the write and open man pages for additional information on why a file might not be accessible.

So nisclient (nisclient -i -h fs1 -d fci2.sn.gt.gov) was attempted in order to reinstall NIS+ and reestablish DB1 as a client. Domain was required but was missing from configuration.

This was resolved by issuing domainname command. Once complete, considering the missing domain could have been the initial root of the problem with keylogin. We proceeded to see if this would resolve the issues with keylogin for successful password change procedures. See below for results.

When attempting keylogin (after domainname updated)
Could not find string 's secret key
Make sure the secret key is stored in domain ...
Possible causes:
� You might have incorrectly typed the password.
� There might not be an entry for name in the cred table.
� NIS+ could not decrypt the key (possibly because the entry might be corrupt)
� The nsswitch.conf file might have the wrong publickey policy. It might be directing the query to a local public key in an /etc/publickey file that is different from the NIS+ password recorded in the cred table.

The following excerpt was included, since the procedure was recommended before performing certain prescribed fix actions with troubleshooting NIS+ errors.

The nis_cachemgr Daemon
The nis_cachemgr should run on all NIS+ clients. The cache manager maintains a
cache of location information about the NIS+ servers that support the most frequently
used directories in the namespace, including transport addresses, authentication
information, and a time-to-live value.
At start-up, the cache manager obtains its initial information from the client's
cold-start file, and downloads it into the /var/nis/NIS_SHARED_DIRCACHE file.
The cache manager makes requests as a client machine. Make sure the client machine
has the proper credentials, or instead of improving performance, the cache manager
will degrade it.

Starting and Stopping the Cache Manager
When using the Service Management Facility (SMF), the cache manager has a
dependency on the NIS+ service, so cache manager starts and stops along with the
NIS+ service. Use the svcadm command to start, stop, or restart the NIS+ service.
client% svcadm enable /network/rpc/nisplus:default
client% svcadm disable /network/rpc/nisplus:default
client% svcadm restart /network/rpc/nisplus:default
When you stop and start the NIS+ service, the cache manager is restarted but it retains
the information in the /var/nis/NIS_SHARED_DIRCACHE file. The information in
the cold-start file is simply appended to the existing information in the cache file. Use
the -i option to clear the cache file and re-initialize it from the contents of the client's
cold-start file.

When searching for cold_start file, it was noted as missing. The machine dumped several critical NIS+ files, for reasons unknown and relative time when files went missing is also unknown. Faulty power is suspect, as increased heat temperatures may have adverse effects on equipment operation/data integrity/hardware...all which could be possible culprits for loss of said data.
This resulted in the following:

Please enter the Secure-RPC password for root:
Please enter the Secure-RPC password for root:
Chkey: key-pair unchanged for root.
**ERROR: chkey failed.

The network password that you have entered is invalid.
If this machine was initialized before as a NIS+ client,
Please enter the root login password as the network
password.

*the above message repeats itself before continuing
This message indicates that you typed the wrong network password.
� If this is the first time you are initializing this machine, contact your network administrator to verify the network password.
� If this machine has been initialized before as an NIS+ client of the same domain, try typing the root login password at the Secure RPC password prompt.
� If this machine is currently an NIS+ client and you are trying to change it to a client of a different domain, remove the /etc/.rootkey file, and rerun the nisclient script, using the network password given to you by your network administrator (or the network password generated by the nispopulate script).
After much research, I consulted the /etc/nsswitch.conf file to see what source local machine referenced for configured services to ensure that the proper files were referenced for particular services. It was determined that the primary cause of all troubles was lack of functioning NIS+ services.

fs1:/etc/nsswitch.conf was copied to db1. This did not resolve troubles. After careful research, nisinit was concluded to be solution and successfully resolved all issues, thus allowing SA to finish prescribed RW root password-change procedures.

db1# nsinit -c -H fs1

NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!
After reinstalling nisclient files, default secure-RPC password becomes "nisplus"

<summary>
all troubleshooting measures, i.e. updating missing domain via domainname fci2.sn.gt.gov command, attempting to install nisclient via nisclient -i -h fs1 -d fci2.sn.gt.gov command, copying master nsswitch.conf file from fs1 in /etc directory via ftp to db1, and lastly nisinit -c -H fs1 can be described as the whole solution, it is unknown whether nisinit command statement would have resolved all issues alone or if the preceding command statements could have been omitted to produce the same effect, so all commands together are considered the solution.