How to clear network socket port 17005?

gjackson123 · December 2, 2012, 7:08pm

:)Hi Solaris Experts,

I am wondering whether it is possible to clear a network socket port 17005 left by Apache Tomcat/6.0.20 after having to terminate it forcefully, since it has run away due to remote JDBC resource contention on another server. A normal Tomcat stop / shutdown proves ineffective from then on. This listening port below would hang around for quite some time before disappearing:

localhost.17005            *.*                0      0 49152      0 LISTEN

This port would prevent Tomcat from starting up successfully again until it is clear, either by waiting for a minimum of 1/2 - 1hr or restarting the server altogether which affect other applications / users on a production server.

I am running jdk1.6.0_11, on SunOS braveheart 5.10 Generic_139556-08 i86pc i386 i86pc.
Thanks a lot,
George

jim_mcnamara · December 2, 2012, 8:25pm

The suggestions I have may make it better, however you cannot get rid of some wait states and still have tcp configured correctly. I am assuming it goes to TIME_WAIT.

This is a programming error - which you cannot fix. Most times programs call setsockopt() with SO_REUSEADDR so that if the program is forced to exit then the socket can be reused right away.

What you can do is to check a TCP parameter, especially if you really are waiting one half hour. You should only have to wait 4 minutes or so.
As root:

ndd -get  /dev/tcp time_wait_interval

You should get a value like 60000 (60 seconds = 60 x 1000 for this parameter).
If it is 60000 the problem lies elsewhere. Or I misunderstood what you asked.
Do NOT go below 60000.

If it is a larger number set it to 60 seconds:

ndd -set /dev/tcp tcp_time_wait_interval 60000

I am assuming this socket goes into TIME_WAIT, which lasts 2 * MSL (Max segment lifetime), we tune that with the tcp_time_wait_interval. So, I am assuming the TIME_WAIT interval TCP setting is bonked.

If the socket persists in LISTEN (not TIME_WAIT), a thread somewhere still has the socket open. You can find the pid with script below. Issue a kill command (NOT kill -9)

kill <pid>

, the socket should show TIME_WAIT and be gone in 2 minutes or so.

Run this as root:

#!/bin/ksh
pids=$(/usr/bin/ps -ef | sed 1d | awk '{print $2}')

if [ $# -eq 0 ]; then
   read ans?"Enter port you would like to know pid for: "
else
   ans=$1
fi

for f in $pids
do
   /usr/proc/bin/pfiles $f 2>/dev/null | /usr/xpg4/bin/grep -q "port: $ans"
   if [ $? -eq 0 ]; then
      echo "Port: $ans is being used by PID:\c $f"      
   fi
done

If this does not work you can consider restarting network services, but this has the almost same effect on users and programs as rebooting.

Get back to us if this doesn't help.

gjackson123 · December 3, 2012, 6:22pm

Hi Jim mcnamara,

Thank you very much for your detail response and suggestion.

I tried the following command but had no luck with it:

root@braveheart # /usr/sbin/ndd -get /dev/tcp time_wait_interval
name is non-existent for this module
for a list of valid names, use name '?'

Is it supported on Solaris 10 (5.10 Generic_139556-08 i86pc i386 i86pc)?

Thanks so much again,

George