ndd -set /dev/tcp tcp_host_param

RTM · March 1, 2010, 5:03pm

Following command was set up in startup script on Solaris 8 servers - improved network transfers of files from one server to the another [server B has duplicate entry in it's startup script except the IP is changed to server A] (doubled transfer speed).

ndd -set /dev/tcp tcp_host_param '10.140.20.10 sendspace 279600 recvspace 279600 timestamp 1'

Now they are getting a new server with Solaris 10 on it - setting the same command in a startup script does NOTHING for the speed of the transfer.

both networks are 100 MB - new server on same subnet as old one. New server will get old server's IP (so nothing on the server B side has to change)

Anyone know what might be the 'new' way of doing this for Solaris 10?

incredible · March 2, 2010, 4:45am

What are the model of the Solaris 8 and 10 machines?
Are they sparc or x86 ?

jlliagre · March 2, 2010, 5:40am

What performance are you measuring with the Solaris 8 and 10 servers ?

This ndd setting is doomed anyway:
Bug ID: 6737341 tcp_host_param should be removed

RTM · March 2, 2010, 8:48am

Both are sparc machines and the application is using scp (F-secure I think) to measure 'speed'. The server A has a qfe card - the tcp_host_param is set on only one of the interfaces and it has 1.3MBs versus 512MBs to the interface with the default tcp_host_param setting.

Again, the question is what parameter (if any) would need to be set to get the same type of boost (obviously the tcp_host_param doesn't do it anymore - actually, the transfer does use the tcp_host_param setting when the scp happens [can be seen with netstat -an send-q/recv-q] but it doesn't cause it to go any faster, as it did between 2 Solaris 8 servers).

jlliagre · March 2, 2010, 9:01am

I'm no asking what performance you measure with Solaris 8 with and without the tuning but the performance measured with Solaris 10 vs Solaris 8.

RTM · March 2, 2010, 11:25am

Application is using the output from the scp -

examples are all from old Solaris 8 server:

to 1st interface on old solaris 8 server with tcp_host_param settings set higher

snoop5.out                            |  866MB | 1.1MB/s |TOC: 00:13:14 | 100%

to 2nd interface on old solaris 8 server with no special tcp_host_param set up

snoop5.out                            |  866MB | 756kB/s |TOC: 00:19:32 | 100%

to new server (solaris 10) with tcp_host_param settings set higher

snoop5.out                            |  866MB | 413kB/s |TOC: 00:35:48 | 100%

So, they are looking to get the solaris 10 to have the same "speed" as they see on the old solaris 8 server.

honglus · March 5, 2010, 12:10am

maybe because of the global cap on window wize, try this:

ndd -set /dev/tcp tcp_xmit_hiwat 279600
ndd -set /dev/tcp tcp_recv_hiwat 279600

TCP Tunable Parameters (Solaris Tunable Parameters Reference Manual) - Sun Microsystems

RTM · March 5, 2010, 9:54am

Thanks but no, the tcp_xmit_hiwat and tcp_recv_hiwat were both set higher than the default (and higher then the limit we were setting on tcp_host_param, with no change in the time it took to transfer the file.

For whatever reason this worked in Solaris 8, it seems to fail to change the transfer rate in Solaris 10. I was hoping to hear that these settings now had to be done with a different parameter. The application owner believes it's SUN's way of making them upgrade all their servers to Solaris 10.

honglus · March 6, 2010, 12:32am

I suggest to use iperf to test bandwidth, iperf will show the theoretical bandwith, The scp result is affected by disk I/O, CPU (encryption).
Please run vmstat to check CPU while copying files, maybe the bottleneck is cpu not network.
As a bonus, Iperf also show the window size detected, so It can verify if the change is effective.

you can get iperf from www.sunfreeware.com

an example.

# iperf   -n 900M -mc 172.16.1.12
------------------------------------------------------------
Client connecting to 172.16.1.12, TCP port 5001
TCP window size:   192 KByte (default)
------------------------------------------------------------
[  3] local 172.16.1.11 port 36932 connected with 172.16.1.12 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 7.9 sec    900 MBytes    955 Mbits/sec
[  3] MSS size 8948 bytes (MTU 8988 bytes, unknown interface)

RTM · March 6, 2010, 1:56pm

Unfortunately, I can't put freeware on the server.

Output from iostat, vmstat, top during a scp transfer (all of these from the new server). You have to remember, if I try a scp transfer from another server in the same location (but not on the same subnet), the transfer rate shown in the scp is 4 to 5 times faster, so the server isn't having an issue when accepting a transfer. Also, showing netstat at the time of transfer shows the increased values are getting used.

netstat -an (cut down to just the scp process and my ssh to the server)
TCP: IPv4
Local Address Remote Address Swind Send-Q Rwind Recv-Q State
-------------------- -------------------- ----- ------ ----- ------ -----------
10.140.16.20.22 172.16.10.10.44764 280912 135 280912 0 ESTABLISHED
10.140.16.20.49395 172.16.10.10.22 280912 0 525624 0 ESTABLISHED

As you can see, the scp (first process shown) is at the rate set up with tcp_host_params. My ssh to the old server has the higher rate on Rwind of 525624.

iostat 5 200
tty ramdisk1 sd0 sd1 sd2 cpu
tin tout kps tps serv kps tps serv kps tps serv kps tps serv us sy wt id
0 2 0 0 0 1 0 4 0 0 0 262 48 47 0 0 0 100
0 143 0 0 0 0 0 0 0 0 0 7 1 5 0 0 0 100
0 111 0 0 0 0 0 0 0 0 0 8 2 6 0 0 0 100
0 111 0 0 0 0 0 0 0 0 0 2 0 8 0 0 0 100
0 143 0 0 0 0 0 0 0 0 0 20 11 36 0 0 0 100
0 111 0 0 0 0 0 0 0 0 0 19 4 5 0 0 0 99
0 111 0 0 0 0 0 0 0 0 0 24 6 6 0 0 0 100
0 111 0 0 0 0 0 0 0 0 0 42 10 4 0 0 0 100
0 111 0 0 0 0 0 0 0 0 0 3 1 6 0 0 0 100
3 111 0 0 0 0 0 0 0 0 0 263 57 5 0 0 0 100

vmstat 5 200
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr rm s0 s1 s2 in sy cs us sy id
0 0 0 39429784 31169736 19 94 0 33 33 0 0 0 0 0 52 1665 3742 1331 0 0 100
0 0 0 39425768 31165392 30 128 0 13 13 0 0 0 0 0 8 1926 10266 1915 0 0 100
0 0 0 39425712 31165424 146 226 0 142 142 0 0 0 0 0 17 1633 5231 1350 0 0 100
0 0 0 39423680 31164952 40 262 0 241 241 0 0 0 0 0 37 1838 5427 1584 0 0 100
0 0 0 39421672 31163688 10 59 0 2 2 0 0 0 0 0 0 1541 2021 1164 0 0 100
0 0 0 39419568 31161488 59 289 0 17 17 0 0 0 0 0 12 1971 9501 2057 0 0 100
0 0 0 39417416 31159296 32 143 0 11 11 0 0 0 0 0 2 1498 3184 1147 0 0 100
0 0 0 39415360 31157216 9 59 0 0 0 0 0 0 0 0 0 1514 2051 1126 0 0 100
0 0 0 39413336 31155128 31 120 0 61 61 0 0 0 0 0 8 1658 10844 1217 0 0 99
snoop5.out | 110MB | 413kB/s | ETA: 00:31:16 | 12%

last pid: 6764; load avg: 0.26, 0.22, 0.19; up 0+23:33:41 18:28:48
80 processes: 79 sleeping, 1 on cpu
CPU states: 99.8% idle, 0.1% user, 0.1% kernel, 0.0% iowait, 0.0% swap
Kernel: 1583 ctxsw, 189 trap, 1727 intr, 4811 syscall, 59 flt, 8 pgout
Memory: 32G phys mem, 30G free mem, 12G total swap, 12G free swap

PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND
5009 AutoMon 1 29 10 32M 30M sleep 104:53 0.05% PatrolAgent
6541 v821383 1 54 0 10M 7976K sleep 0:03 0.04% sshd
2583 root 1 158 -20 63M 61M sleep 29:29 0.03% seosd
6365 root 1 54 0 5128K 4136K sleep 0:02 0.01% vmstat
6550 v821383 1 59 0 3752K 2144K sleep 0:00 0.01% sftp-server
406 root 9 59 0 10M 9504K sleep 0:58 0.00% picld
6188 root 1 59 0 4296K 3480K cpu/104 0:01 0.00% top

RTM · March 12, 2010, 4:23pm

Opened a case with SUN - this is their final answer...

Unfortunately, with the T5240 system that you are using, scp will see slowness.
This write up applies to T5240 also.

Below is write up from our sustaining alias :
The T2000/T5240 requires multithreaded security apps to get super speeds. If
the application is single threaded, it will be up to 15 times slower over the
same application on a non-SUN4V. The ssh program is single threaded, so slowness
is normal.

The cause for the decreased performance on the T2000 is a combination of
factors. The scp and sftp routines are single threaded. When a single
threaded application is run in the multi-core cpu that is in the T1000
and T2000 it can perform less optimally than it would on the traditional
single threaded cpu in systems such as the V210, v240, v480. When a
single threaded routine such and the one for scp is run in the Niagara
cpu in the T2000 platform, it will not uses the 4 threads in the core and
as a result, perform less optimally. It does not utilize the full
resources of the core/cpu. The optimal performance on a T1000 or T2000
is when a multithreaded process is executed.

The transfer rate seen with scp (and sftp) in this situation is what
would and can be expected per my discussions with the ssh software
engineering group. To obtain a faster transfer rate when transferring
files to the T2000 the customer would want to use a method such as ftp
that transferred at a faster rate:

There would be no changes available to scp at this time.

jlliagre · March 12, 2010, 4:42pm

Too bad you focused so much on the Solaris release and didn't told in the first place you were using such different server models.

RTM · March 12, 2010, 6:07pm

Yea - of course, I also just found out that SUN's version of ssh sucks.

I threw F-secure on the server, replacing SUN ssh, and it flies! Giving the kind of throughput the application folks expected. So, is F-secure so much better (using multi-thread )? I find that hard to believe.

I think SUN just doesn't bother to check their own software/hardware or bother to see what they are giving the customer is worth it...and then just stall, give lame answers and don't bother truly helping the customer. "Yea, it's sun spots that causing those issues...wait until it's dark out and try it again!"

So - final results - setting the tcp_host_param will up the send/receive packets and F-secure will work with that to give better throughput versus using SUN's ssh.

Thanks all who took notice and helped.

jlliagre · March 14, 2010, 1:05am

Sun version is the same openssh everyone bundles.

Instead of ranting and name calling without a clue, you'd better investigate why it goes faster with F-secure.
Possible reasons would be:

parallelized cryptography, why do you think it to be hard to believe ? There is actually a well known patch to ssh that implement multi-threaded cyphering: look for MT-AES-CTR in High Performance Enabled SSH/SCP [PSC]
faster encryption algorithm selection (Solaris use triple-des by default which is the most secure but slower).
Using UltraSPARC T2 integrated hardware crypto support.

RTM · March 15, 2010, 9:02am

Jlliagre - it seems to me that SUN should have been able to come back and tell me the reasons the ssh that comes with the OS wasn't working as fast. That's what they are getting paid to do (which is why it is usually easier and faster to get an answer to these types of issue here versus opening a case with SUN).

Instead of SUN coming back with something, they took the easy route of stating that there was nothing that could be done.

And I apologize if after 2 months of different people (company network, application folks, SUN, and the folks of these forums) working on this issue has caused me to get frustrated and sound like I'm ranting. But it does make one perturbed when SUN gives answers like they did instead of giving me something about "well known patch to ssh that implement multi-threaded cyphering".

Thanks again.

jlliagre · March 15, 2010, 6:00pm

This patch is experimental and not included with the official ssh source code:
As before an unresolved problem with the multi-threaded AES-CTR routines forces us to release this as an experimental patch.
Not being as fast as it is with a different implementation isn't a bug so is unlikely covered by your support contract. It might be the case for a RFE though.