How to repair a TCP/IP socket in state: CLOSE_WAIT?

Hi

The clients connect to my server -using port 9130. But no client could connect to my server at this time. I've checked already and this is the result

netstat -Aan|grep -v 127.0.0.1|grep 9130|pg
f10006000abcb398 tcp4   10313      0  10.0.89.81.9130       10.158.70.24.1705     CLOSE_WAIT
f100060016a4eb98 tcp4    4968      0  10.0.89.81.9130       10.199.1.77.2786      CLOSE_WAIT
f100060012152398 tcp4    8147      0  10.0.89.81.9130       10.158.70.92.1724     CLOSE_WAIT
f100060008f3b398 tcp4    6198      0  10.0.89.81.9130       10.158.70.86.1890     CLOSE_WAIT
f100060024e55398 tcp4   16097      0  10.0.89.81.9130       10.11.0.67.1145       CLOSE_WAIT
f1000600253e8b98 tcp4   12180      0  10.0.89.81.9130       10.150.12.113.2155    CLOSE_WAIT
f10006000d141398 tcp4   14256      0  10.0.89.81.9130       10.11.0.89.1157       CLOSE_WAIT
f10006002bf12b98 tcp4   20688      0  10.0.89.81.9130       10.150.12.109.2245    CLOSE_WAIT
f1000600250c3398 tcp4    1653      0  10.0.89.81.9130       10.150.15.115.1546    CLOSE_WAIT
f1000600335f9398 tcp4    4538      0  10.0.89.81.9130       10.5.6.13.1139        CLOSE_WAIT
f100060018cc9b98 tcp4     838      0  10.0.89.81.9130       10.204.70.43.1080     CLOSE_WAIT
f1000600066c1b98 tcp4    3291      0  10.0.89.81.9130       10.11.0.219.1325      CLOSE_WAIT
f10006001d084b98 tcp4    2004      0  10.0.89.81.9130       10.5.7.12.1065        ESTABLISHED
f10006000e9f8b98 tcp4   24454      0  10.0.89.81.9130       10.165.5.26.1436      CLOSE_WAIT
f10006000def1b98 tcp4    8116      0  10.0.89.81.9130       10.54.0.144.1140      CLOSE_WAIT
f10006002486f398 tcp4    2489      0  10.0.89.81.9130       10.47.70.4.1142       CLOSE_WAIT
f1000600091a2398 tcp4   24633      0  10.0.89.81.9130       10.11.6.120.49305     CLOSE_WAIT
f10006001e8f9b98 tcp4   10038      0  10.0.89.81.9130       10.174.0.43.1169      CLOSE_WAIT
f1000600169c7b98 tcp4    1663      0  10.0.89.81.9130       10.47.70.77.1132      CLOSE_WAIT
f100060025fdb398 tcp4    6064      0  10.0.89.81.9130       10.11.6.66.49433      ESTABLISHED
f1000600253f7398 tcp4    7884      0  10.0.89.81.9130       10.11.6.125.49445     ESTABLISHED
f10006000f8b1b98 tcp4    8177      0  10.0.89.81.9130       10.29.71.155.4635     CLOSE_WAIT
f10006001b06e398 tcp4    4951      0  10.0.89.81.9130       10.14.0.125.1464      CLOSE_WAIT
f10006002bdaf398 tcp4   16149      0  10.0.89.81.9130       10.254.0.91.1305      CLOSE_WAIT

I'd like to turn all the sockets from CLOSE_WAIT --> ESTABLISHED. I think there's something wrong in /etc/security but I don't know to fix it.

Somebody help please :(:(:frowning:

Hi,

as far as I know this is not possible.

tcpipguide

I guess there is a problem in your application.

Regards

1 Like

you can't change from CLOSE_WAIT to ESTABLISHED. usually it means, that close() call on socket is forgotten. the only way to clean up is to shut down the application, fix it and start it again.

Can you provide output of the following commands:

# netstat -an | grep 10.0.89.81.910 | grep -c CLOSE_WAIT
# lsuser  <APP_USER>
# no -L somaxconn
1 Like

Both, XrAy and agent.kgb are correct: a TCP connection works like a telephone call. First, a so-called "virtual channel" (the call) ist established by both sides (one calls, one picks up the handset). Then, the connection remains in use (the connected people talk to each other) until, finally, one or both sides drop it (they hang up).

The CLOSE_WAIT means, that one has already hung up and this side now is also in the process of dropping the connection. In TCP this is just a bit more complicated with acknowledgements being sent back and forth, but in principle the difference is minimal.

So, what you want amounts to "i want to still talk to someone who just hung up". With a phone you would know what to do: redial and establish a new connection. Here, you do the same: you(r application) needs to reestablish another TCP connection.

Maybe your application was a bit too eager to drop the connection. In this case you must change the application somehow. But this will not change the fact that dropped connections remain dropped, no matter what you want. *)

I hope this helps.

bakunin

_______________
*) Corollary: For better or worse, unlike in Zombie movies dead connections remain dead and won't come back to haunt you.

2 Likes

I used rmsock command for some sockets, then all the sockets remanning changed the state to ESTABLISH. Don't know why, but it seems OK now.

Thanks for your help

in some universe is possible that 2+2=5, but in no universe is possible, that a socket changes its state from CLOSE_WAIT to ESTABLISHED.

Sorry, my mistake. Maybe something happened, then everything is OK. The sockets state don't change from CLOSE_WAIT to ESTABLISHED

What application is that you are running? I think what you thought were recovered socket are more new sockets after a cleanup... now if you think or see an issue here it can be a question of time out ( from the other side...) where your remote hosts believed the connection lost and closed it... The issue in this case is it opens a new one again, and after some time you find yourself short of sockets as you are wating for the cleanup process to complete
So you should look at the application side if there isnt something that needs tuning...

Most likely what happened was that the connections which have been half-removed before were being completely dissolved and then new connections (which you saw as "ESTABLISHED") were made.

AIX has some timeout value between a connection going into state "CLOSE_WAIT" and it being dissolved completely (i don't know it off the top of my head, but it is some no -tunable). IIRC (sorry, its been some years i needed it last) the unit it uses are half-seconds. You can set that to some shorter value to dissolve connections not longer in use quicker if the need arises (usually it doesn't but there is always the exception to the rule).

It might also be that the application is sloppily programmed and the used sockets are not freed properly, stalling the cleanup-process. This might be tricky to track down, but if this is your problem you will have no other option to do just that (and then beat the programmer with a copy of "UNIX Network Programming").

I hope this helps.

@agent.kgb: actually in no universe 2+2 can be equal to 5, at least not when the symbols "2", "5", "+" and "=" are used in way equal to or at least similar to how we use them. For details about why this is so refer to Bertrand Russells "Introduction to Mathematical Philosophy, which is an outstanding classic in this field.

In short: a "(natural) number" is defined as a successor of its preceding number (yes, its a recursive definition) the recursion stopping at zero. So "1" is the successor of zero, "2" is the successor of 1 (or the successor of the successor of zero), and so on. In short we write "S0" for "1" and "SS0" for 2, etc.. Because the addition being defined as it is we can concatenate these lists of successors and arrive at "SS0" + "SS0" being the "successor of the successor of SS0" or, in other words, "SSSS0", which is 4. If we would define "+" as an operation where the operands "SS0" and "SS0" result in "SSSSS0" this the set of natural numbers would be no longer an abelian group regarding to addition which would have a lot more grave consequences than you can possibly envision.

bakunin

1 Like