AIX 6.1 reach the threshold of stream(no -a|grep strthresh)

machinen · February 16, 2013, 10:08pm

last night i want to do oracle full backup with expdp when i switch to oracle it hangs,it looks like:
su - oracle
there is nothing feedback and hang ,but su - root work fine.
then i use truss su - oracle found it stuck at "ENOSR" ,then i changed the kernel parameter of strthresh from 85 to 90 and the su - oracle command works fine.

the no -a command output:

  no -a

                 arpqsize = 1024
               arpt_killc = 20
              arptab_bsiz = 7
                arptab_nb = 149
                bcastping = 0
      clean_partial_conns = 0
                 delayack = 0
            delayackports = {}
   dgd_flush_cached_route = 0
         dgd_packets_lost = 3
            dgd_ping_time = 5
           dgd_retry_time = 5
       directed_broadcast = 0
                 fasttimo = 200
                    hstcp = 0
        icmp6_errmsg_rate = 10
          icmpaddressmask = 0
ie5_old_multicast_mapping = 0
                   ifsize = 256
           igmpv2_deliver = 0
               ip6_defttl = 64
                ip6_prune = 1
            ip6forwarding = 0
       ip6srcrouteforward = 1
       ip_ifdelete_notify = 0
                 ip_nfrag = 200
             ipforwarding = 0
                ipfragttl = 2
        ipignoreredirects = 0
                ipqmaxlen = 512
          ipsendredirects = 1
        ipsrcrouteforward = 1
           ipsrcrouterecv = 0
           ipsrcroutesend = 1
               limited_ss = 0
          llsleep_timeout = 3
                  lo_perf = 1
                lowthresh = 90
                 main_if6 = 0
               main_site6 = 0
                 maxnip6q = 20
                   maxttl = 255
                medthresh = 95
               mpr_policy = 1
              multi_homed = 1
                nbc_limit = 11829248
            nbc_max_cache = 131072
            nbc_min_cache = 1
         nbc_ofile_hashsz = 12841
                 nbc_pseg = 0
           nbc_pseg_limit = 23658496
           ndd_event_name = {all}
        ndd_event_tracing = 0
              ndogthreads = 0
            ndp_mmaxtries = 3
            ndp_umaxtries = 3
                 ndpqsize = 50
                ndpt_down = 3
                ndpt_keep = 120
               ndpt_probe = 5
           ndpt_reachable = 30
             ndpt_retrans = 1
             net_buf_size = {all}
             net_buf_type = {all}
     net_malloc_frag_mask = {0}
        netm_page_promote = 1
           nonlocsrcroute = 0
                 nstrpush = 8
              passive_dgd = 0
         pmtu_default_age = 10
              pmtu_expire = 10
 pmtu_rediscover_interval = 30
              poolbuckets = 4
              psebufcalls = 20
                 psecache = 1
                psetimers = 20
           rfc1122addrchk = 0
                  rfc1323 = 1
                  rfc2414 = 1
             route_expire = 1
          routerevalidate = 0
     rtentry_lock_complex = 0
                 rto_high = 64
               rto_length = 13
                rto_limit = 7
                  rto_low = 1
                     sack = 0
                   sb_max = 4194304
       send_file_duration = 300
              site6_index = 0
               sockthresh = 85
                  sodebug = 0
              sodebug_env = 0
                somaxconn = 1024
                 strctlsz = 1024
                 strmsgsz = 0
                strthresh = 85
               strturncnt = 15
          subnetsarelocal = 1
       tcp_bad_port_limit = 0
        tcp_cwnd_modified = 0
                  tcp_ecn = 0
       tcp_ephemeral_high = 65500
        tcp_ephemeral_low = 9000
               tcp_fastlo = 0
     tcp_fastlo_crosswpar = 0
             tcp_finwait2 = 1200
           tcp_icmpsecure = 0
          tcp_init_window = 0
    tcp_inpcb_hashtab_siz = 24499
              tcp_keepcnt = 8
             tcp_keepidle = 14400
             tcp_keepinit = 150
            tcp_keepintvl = 150
     tcp_limited_transmit = 1
              tcp_low_rto = 0
             tcp_maxburst = 0
              tcp_mssdflt = 1460
          tcp_nagle_limit = 65535
        tcp_nagleoverride = 0
               tcp_ndebug = 100
              tcp_newreno = 1
           tcp_nodelayack = 0
        tcp_pmtu_discover = 1
            tcp_rand_port = 0
       tcp_rand_timestamp = 0
            tcp_recvspace = 65536
            tcp_sendspace = 65536
            tcp_tcpsecure = 0
             tcp_timewait = 1
                  tcp_ttl = 60
           tcprexmtthresh = 3
             tcptr_enable = 0
                  thewall = 47448064
         timer_wheel_tick = 0
                tn_filter = 1
       udp_bad_port_limit = 0
       udp_ephemeral_high = 65500
        udp_ephemeral_low = 9000
    udp_inpcb_hashtab_siz = 24499
        udp_pmtu_discover = 1
            udp_recvspace = 655360
            udp_sendspace = 65536
                  udp_ttl = 30
                 udpcksum = 1
           use_sndbufpool = 1

and i found the "delayed" column in the result of netstat -m output
is not 0,it likes:

******* CPU 24 *******
By size           inuse     calls failed   delayed    free   hiwat   freed
64                  119    239402      0        12     713   14824       0
128                  89    238427      0         0     679    7412       0
256                   5      1279      0         0      11   14824       0
512                  40     20422      0         1     104   18530       0
1024                107    238984      0       190     665    7412       0
2048                 67    116067      0       209     357   11118       0
4096                  1         5      0         1       4    3706       0
8192                  1         6      0         1       0     926       0
16384                 1         5      0         1       0     463       0
32768                 0         2      0         2       1     231       0
65536                 0         5      0         2       2     231       0
131072                0         0      0         0      16      32       0

Streams mblk statistic failures:
0 high priority mblk failures
0 medium priority mblk failures
0 low priority mblk failures

my questions is :
1how can i know the stream usage on aix 6.1?

and now i suspect the problem is network issue but i don't know how to affirm that.

thanks!

tony
2013/2/17

---------- Post updated at 10:08 PM ---------- Previous update was at 10:03 PM ----------

the strthresh means:
AIX has another no option called "strthresh" which is defined as "Specifies the maximum number of bytes Streams are normally allowed to allocate. When the threshold is passed, does not allow users without the appropriate privilege to open Streams, push modules, or write to Streams devices, and returns ENOSR. The threshold applies only to output side and does not affect data coming into the system` (e.g. console continues to work properly). A value of zero means that there is no threshold. The strthresh attribute represents a percentage of the thewall attribute and you can set its value from 0 to 100. The thewall attribute indicates the maximum number of bytes that can be allocated by Streams and Sockets using the net_malloc() call. When you change thewall attribute, the threshold gets updated accordingly." Thank you for using AIX Support Family Services.

MichaelFelt · February 17, 2013, 5:50pm

sb_max, at 4Mbyte looks large enough, but i would increase the tcp_sendspace and tcp_recvspace. 256 or 512k, rather than 64k. Note, an application can overide the defaults, so maybe your real sizes are larger already.

how much real memory?

machinen · February 17, 2013, 8:38pm

thanks for your reply.the physical memory size is 96gb.
the application run on this machine is oracle 11gR2 RACi set the tcp_sendspace from the oracle manual and do it on other machine many times and never face this problemhow can ionitor the stream usage in aix

thanks.

MichaelFelt · February 18, 2013, 8:24am

A rather simple way to monitor socket activity (aka streams), especially for blockage is to look at netstat -tn output.

michael@x054:[/home/michael]netstat -tn | head -2; netstat -tn | grep ESTABLISHED | head
Active Internet connections
Proto Recv-Q Send-Q  Local Address          Foreign Address        (state)
tcp4       0     48  192.168.129.54.22      192.168.129.20.1348    ESTABLISHED

What you are looking for is numbers in the Send-Q and/or Recv-Q. If they are consistently at the sendspace/recvspace size then you may be suffering from network congestion outside the box - as TCP is doing what it can, then stopping and waiting for acknowledgements (Send-Q at max) and the "outside" is waiting for the server to wake up and respond when the Revc-Q is "stuck" at max.

I have looked at netstat -nm again. It is normal that there are some "delayed" numbers. Not sure why - probably has something to do with setting up the stack. What you want to watch for is "failed" - as that indicates, mainly, not enough memory for communications.

Question: as this sometimes occurs: are you using large sends (e.g., MTU of 9000) while the network and/or endpoints cannot support that?

What does netstat -p tcp report?

machinen · February 18, 2013, 8:29pm

1netstat -m output of failed is consistently zero.
2mtu

for i in 1 2 
do
lsattr -El en$i|grep mtu
done

both of en1 and en2's mtu are 1500
the application run this machine is oracle 11gR2 rac,and the client is middleware tuxedo
3first output

# netstat -tn | head -2;netstat -tn | grep ESTABLISHED | head
Active Internet connections
Proto Recv-Q Send-Q  Local Address          Foreign Address        (state)
tcp4       0      0  172.18.100.9.1521      172.18.100.5.52799     ESTABLISHED
tcp4       0      0  172.18.100.9.1521      172.18.100.5.52905     ESTABLISHED
tcp4       0      0  172.18.100.9.1521      172.18.100.5.53378     ESTABLISHED
tcp4       0      0  172.18.100.9.1521      172.18.100.5.55738     ESTABLISHED
tcp4       0      0  127.0.0.1.6100         127.0.0.1.65429        ESTABLISHED
tcp4       0      0  127.0.0.1.65429        127.0.0.1.6100         ESTABLISHED
tcp4       0      0  172.18.100.9.1521      172.18.100.5.56564     ESTABLISHED
tcp4       0      0  172.18.100.9.1521      172.18.100.5.32805     ESTABLISHED
tcp4       0      0  172.18.100.9.1521      172.18.100.5.32806     ESTABLISHED
tcp4       0      0  172.18.100.9.1521      172.18.100.5.32807     ESTABLISHED

4second output

 # netstat -tn | head -2; netstat -tn | grep ESTABLISHED | head
Active Internet connections
Proto Recv-Q Send-Q  Local Address          Foreign Address        (state)
tcp4       0      0  172.18.100.9.1521      172.18.100.5.52799     ESTABLISHED
tcp4       0      0  172.18.100.9.1521      172.18.100.5.52905     ESTABLISHED
tcp4       0      0  172.18.100.9.1521      172.18.100.5.53378     ESTABLISHED
tcp4       0      0  172.18.100.9.1521      172.18.100.5.55738     ESTABLISHED
tcp4       0      0  127.0.0.1.6100         127.0.0.1.65429        ESTABLISHED
tcp4       0      0  127.0.0.1.65429        127.0.0.1.6100         ESTABLISHED
tcp4       0      0  172.18.100.9.1521      172.18.100.5.56564     ESTABLISHED
tcp4       0      0  172.18.100.9.1521      172.18.100.5.32805     ESTABLISHED
tcp4       0      0  172.18.100.9.1521      172.18.100.5.32806     ESTABLISHED
tcp4       0      0  172.18.100.9.1521      172.18.100.5.32807     ESTABLISHED

5netstat -p tcp output

tcp:
       447631444 packets sent
               425423593 data packets (2452701253 bytes)
               2796 data packets (2109430 bytes) retransmitted
               2332528 ack-only packets (1814129 delayed)
               0 URG only packets
               3 window probe packets
               19339756 window update packets
               1065558 control packets
               20593693 large sends
               2004898651 bytes sent using largesend
               4170240 bytes is the biggest largesend
       422219647 packets received
               341923413 acks (for 2452662545 bytes)
               216669 duplicate acks
               0 acks for unsent data
               315335086 packets (1007197678 bytes) received in-sequence
               7624 completely duplicate packets (60958 bytes)
               488 old duplicate packets
               0 packets with some dup. data (0 bytes duped)
               138640 out-of-order packets (42518 bytes)
               0 packets (0 bytes) of data after window
               0 window probes
              524552 window update packets
               2407 packets received after close
               0 packets with bad hardware assisted checksum
               0 discarded for bad checksums
               0 discarded for bad header offset fields
               0 discarded because packet too short
               1489 discarded by listeners
               0 discarded due to listener's queue full
               90949323 ack packet headers correctly predicted
               79016426 data packet headers correctly predicted
       267916 connection requests
       110757 connection accepts
       258066 connections established (including accepts)
       391662 connections closed (including 613 drops)
       0 connections with ECN capability
       0 times responded to ECN
       118143 embryonic connections dropped
       320167932 segments updated rtt (of 265052429 attempts)
       0 segments with congestion window reduced bit set
       0 segments with congestion experienced bit set
       0 resends due to path MTU discovery
       2563 path MTU discovery terminations due to retransmits
       9875 retransmit timeouts
               0 connections dropped by rexmit timeout
       55 fast retransmits
               4 when congestion window less than 4 segments
       48 newreno retransmits
       0 times avoided false fast retransmits
       2 persist timeouts
               0 connections dropped due to persist timeout
       7550 keepalive timeouts
               0 keepalive probes sent
               1 connection dropped by keepalive
       0 times SACK blocks array is extended
       0 times SACK holes array is extended
       0 packets dropped due to memory allocation failure
       0 connections in timewait reused
       0 delayed ACKs for SYN
       0 delayed ACKs for FIN
       0 send_and_disconnects
       0 spliced connections
       0 spliced connections closed
       0 spliced connections reset
       0 spliced connections timeout
       0 spliced connections persist timeout
       0 spliced connections keepalive timeout
       7 TCP checksum offload disabled during retransmit
       27 Connections dropped due to bad ACKs
       0 Connections dropped due to duplicate SYN packets
       0 fastpath loopback connections
       0 fastpath loopback sent packets (0 bytes)
       0 fastpath loopback received packets (0 bytes)

6this qustion has appeared 2 timesfirst time i changed the strthresh from 85 to 92 ,the second from 90 to 92,a few days ago the ibm engineer told me can modify the strthresh to 0 and the stream has no limit,i dont't modify that because i am worry about if i change to 0 when the stream usage reach 100% and whole system is hang until i reboot the system from hmc or something.
7the system has reboot a week ago and the switch hang has found before that,so the netstat's statistics was lost

thanks a lot for your helping.

MichaelFelt · February 19, 2013, 2:31am

I have been approaching this as an AIX configuration issue because changing a setting has helped it "go away". Needed: better definition of what you mean by "network issue".

Some data during/after the problem (during - repeating commands to look for deltas helps pin-point what the system is trying to say).

Sort of: no Pain, no Gain.

In any case - real data values - during a backup are needed to know if we are looking at this properly.

FYI: not sure what the limits are these days. Back when CHRP (Common Hardware Reference Platform) first came out IP buffers were limited to 4x 256MB memory, or 1G - up to 50% of memory (so, when more than 2G of memory, maximum was 1G)
no -o thewall tells us the HW limit (1k value) - so roughly, drop 6 digits, and you get the GByte value - on my system with 9G - that makes it near 50% still.

Your number: thewall = 47448064 goes down to 47 .
I do not see this as being your limiting factor - unless it is conflicting with something else. However, there is a second variable to set a limit under thewall .

[

](http://pic.dhe.ibm.com/infocenter/aix/v6r1/topic/com.ibm.aix.prftungd/doc/prftungd/tuning\_mbuf\_pool_perf.htm)

Hope this helps moving forward!

machinen · February 19, 2013, 3:18am

sorry about my english,the "network issue" i mean that is network problem ,maybe the networker parameter of kernel parameter set a wrong value or network card and cable has something wrong .

lsattr -El sys0|grep maxmbuf

is default 0.

the physical memory is 96g,thewall set 47G ,is nearly 50% of ram.

thanks!

MichaelFelt · February 20, 2013, 2:25am

Note: you can also use the follow to get a single paramater:
# lsattr -El sys0 -a <attribute> [-a <attribute>]

At/After painful moment need stats from each layer (adapter, interface, protocol)

I am thinking of:

entstat -d <device>
netstat -nm # (of all CPU, not just CPU24)
netstat -t # (especially when errors are occurring)
svmon -G # for general statistics
svmon -P -t 8 # more general statistics
netstat -w -I -p ALL 10 120 # (to see if paging has any influence)

Before you start your backup
# netpmon -o File -d -P -T TraceBufferSize -O so
After it starts
# trcon
This collects a lot of data! so perhaps let it run for 5 minutes only and see what it produces. Get used to using it - practice - ahead of time!
Example:
# sleep 300; trcoff
It is the command trcoff that actually stops the trace and starts the creation of the output. Again - get familiar with how to use the command.
Note: sleep is not necessary - just a handy way to stop after a fixed amount of time - trcoff is needed in any case to stop the trace and generate the report. If you do not specify -d argument - the trace starts automatically (i.e., no trcon needed).

From here on it is going to be too complex to handle this forum style - at least considering the amount of time I have. Continue asking for advice from IBM, but also from the VAR.
What I am concerned about, with Oracle - that usually uses lots of (pinned) memory (�75% at minimum, 80%+ is quite common), is that the network stack and Oracle are fighting for the memory and that "some" of the problem may be coming from memory overcommitment.
In short, get the application provider (might not be Oracle themselves, but an additional party) to help you on the client side with correcting/understanding the issues.