Hi,
This weekend there was a sudden application crash in the server.
I did not know where to start to investigate the problem, so I first looked into the /var/adm/syslog/syslog.log, and this was what I found :
Dec 17 00:38:02 L28bi01 sshd[126]: error: accept: No buffer space available
Dec 17 00:38:02 L28bi01 sshd[24333]: error: setsockopt SO_KEEPALIVE: Invalid argument
Dec 17 00:38:07 L28bi01 sshd[24379]: error: setsockopt SO_KEEPALIVE: Invalid argument
Dec 17 00:38:21 L28bi01 sshd[24445]: error: PAM: No account present for user for illegal user UlGLXBTX from 10.61.1.55
Dec 17 00:38:21 L28bi01 sshd[24447]: error: PAM: No account present for user for illegal user anonymous from 10.61.1.55
Dec 17 00:38:26 L28bi01 sshd[24511]: error: PAM: No account present for user for illegal user guest from 10.61.1.55
Dec 17 00:38:27 L28bi01 sshd[24515]: error: PAM: No account present for user for illegal user IyoYLEnT from 10.61.1.55
Dec 17 00:38:28 L28bi01 sshd[24517]: error: PAM: No account present for user for illegal user shelladmin from 10.61.1.55
Dec 17 00:38:31 L28bi01 sshd[24524]: error: PAM: Authentication failed for root from 10.61.1.55
Dec 17 00:38:31 L28bi01 sshd[24525]: error: PAM: No account present for user for illegal user netscreen from 10.61.1.55
Dec 17 00:38:33 L28bi01 sshd[24528]: error: PAM: No account present for user for illegal user admin from 10.61.1.55
Dec 17 00:38:38 L28bi01 sshd[24534]: error: PAM: Authentication failed for root from 10.61.1.55
Dec 17 00:38:58 L28bi01 sshd[24542]: error: PAM: No account present for user for illegal user admin1 from 10.61.1.55
Dec 17 00:39:06 L28bi01 sshd[24552]: error: PAM: No account present for user for illegal user admin from 10.61.1.55
Dec 17 00:39:18 L28bi01 sshd[24561]: error: PAM: No account present for user for illegal user emailswitch from 10.61.1.55
Dec 17 00:39:22 L28bi01 sshd[24584]: error: PAM: No account present for user for illegal user product from 10.61.1.55
Dec 17 00:39:23 L28bi01 sshd[24599]: error: PAM: No account present for user for illegal user admin from 10.61.1.55
Dec 17 00:39:27 L28bi01 sshd[24621]: error: PAM: Authentication failed for root from 10.61.1.55
Dec 17 00:39:29 L28bi01 sshd[24626]: error: PAM: No account present for user for illegal user n3ssus from 10.61.1.55
Dec 17 00:39:31 L28bi01 sshd[24632]: error: PAM: Authentication failed for root from 10.61.1.55
Dec 17 00:41:01 L28bi01 sshd[126]: error: accept: No buffer space available
Dec 17 00:41:01 L28bi01 sshd[25366]: error: setsockopt SO_KEEPALIVE: Invalid argument
Dec 17 00:41:55 L28bi01 sshd[26128]: error: PAM: No account present for user for illegal user cisco from 10.61.1.55
Dec 17 00:42:00 L28bi01 sshd[26134]: error: PAM: No account present for user for illegal user Cisco from 10.61.1.55
Dec 17 00:42:02 L28bi01 sshd[26142]: error: PAM: No account present for user for illegal user admin from 10.61.1.55
Dec 17 00:42:04 L28bi01 sshd[26175]: error: PAM: No account present for user for illegal user from 10.61.1.55
Dec 17 00:42:10 L28bi01 sshd[26254]: error: PAM: No account present for user for illegal user manage from 10.61.1.55
Dec 17 00:42:15 L28bi01 sshd[26273]: error: PAM: No account present for user for illegal user monitor from 10.61.1.55
Dec 17 00:42:19 L28bi01 sshd[26280]: error: PAM: No account present for user for illegal user ftp from 10.61.1.55
Dec 17 00:42:54 L28bi01 sshd[26792]: error: PAM: No account present for user for illegal user Fortimanager_Access from 10.61.1.55
Dec 17 00:42:54 L28bi01 sshd[26791]: error: PAM: No account present for user for illegal user nessus_oJgOWh46 from 10.61.1.55
Dec 17 00:42:56 L28bi01 sshd[26791]: error: PAM: No account present for user for illegal user nessus_oJgOWh46 from 10.61.1.55
Dec 17 00:43:27 L28bi01 sshd[26926]: error: setsockopt SO_KEEPALIVE: Invalid argument
The error that is most related to this problem is "No buffer space available".
When I googled this error, there was no solid solution, some say memory pressure, and some say check the kernel value "tcp_conn_request_max" but I do not see this value present at all in the server.
However, the application logs present this error :
File: data.c, Line: 2963, Time: 2017.12.17 00:36:56, RC: -23
Text: CL_receive_message failed
Error during 'read'
System error: Connection timed out
File: data.c, Line: 2963, Time: 2017.12.17 00:37:46, RC: -23
Text: CL_receive_message failed
Error during 'read'
System error: Connection timed out
File: data.c, Line: 2963, Time: 2017.12.17 00:37:46, RC: -23
Text: CL_receive_message failed
Error during 'read'
System error: Connection timed out
File: data.c, Line: 825, Time: 2017.12.17 00:38:52, RC: -28
Text:
Connection between client and server was terminated
File: data.c, Line: 918, Time: 2017.12.17 00:38:52, RC: -28
Text:
Connection between client and server was terminated
File: data.c, Line: 3564, Time: 2017.12.17 00:43:27, RC: -20
Text:
Socket option error
System error: Invalid argument
File: dta_ids.c, Line: 4027, Time: 2017.12.17 00:43:27, RC: 0
Text: DaTA shutting down: ids clients finished
File: dta_ids.c, Line: 4052, Time: 2017.12.17 00:43:28, RC: 0
Text: DaTA shutting down: std clients finished
File: dta_ids.c, Line: 4078, Time: 2017.12.17 00:43:31, RC: 0
Text: DaTA shutting down: file queues synchronized
Could this be a network issue?
How do I investigate this problem, I need to know the RCA of it. Please help.