Benchmarking a new Solaris, with four different clients

PatrickBaer · December 2, 2009, 5:58am

Good morning,

for the impatient: I have a new backup-server and need to monitor, what the machine can do, what's the best way of finding that out?

I will tell the story right from the beginning, so you have a clue about what's going on:

I have a setup of three machines:

A new backup-server with Solaris 10/Intel, two bundled 1Gbit connections to a Netgear switch and fourteen 1,5T hard drives building a zraid with two groups a 7 drives.

Then I started rsync/scp processes from three different machines each after the other, now running simultaneously:

First one is a freshly installed FreeBSD 7.2 in a probuilt NAS case, two bundled 1Gbit connections to the very same switch and an internal 8-port raid. The controller splits the 3,5T in two junks, which I connected via ccd.

I started (on the backupserver) an rsync -varu in a screen session to it and it has by now transferred, according to du -sh, 330G of data in 17 hours.

Second is our primary fileserver. Debian Linux, 3ware Raid-Controller with 16 disks a 500G, Raid 5, six 1Gbit connections to the same switch as the backup server. It is running idle during the night and has transferred approx. in 17 hours.

Third one is a really old fileserver with Debian Linux, 4T Raid 5 and 1Gbit connection. It has transferred 40G in 100 minutes!

So, how do I monitor those machines? Most important would be to monitor the Solaris server, how fast it is able to write and read data. I think the filesystem should outrun the network connection by lightyears, true? But how can I monitor the network interfaces and how much spare bandwidth they could handle?

The third one is, well, a lemon and it is running along "for fun". But the primary fileserver should be replaced with a new Solaris machine. Yet, 330G in 17 hours is crap in an idle network on two idle machines.

I have to add of course, the files transmitted range from rather big chunks of 4G to tiny 50k files. Nevertheless, shouldn't the machines handle much more in such a long time. I need to find the bottleneck, is there something else but trying to flood the machine with twenty others?

PS: Is it normal for ZFS to cache data before writing to the disks (compression is on)? I noticed when I started the second scp, that the fileservers disks LEDs are flashing like crazy, but the backupserver's are dark for about 20 seconds, then some three second fireworks with disk activity, 20 seconds dark etc...

jim_mcnamara · December 2, 2009, 6:40am

M8000 4-2.52 GHz cpu? We ordered one.

This kind of a cursory effort - but something seems really wrong to me.

We have raidz, 7 disk devices - different switch. My low-end Dell 280 writing to our marginal raid setup: moves 2100 files, ~1TB in about 5 hours. That is via SMB.

This about an order of magnitude faster than your Solaris box stats. That just should not be. No way.

Is the raid is mounted locally on the Sun box. From your description I assume it is the ZFS filesystem you are talking about. In reading about ZFS I got one major point: DO NOT PIDDLE WITH ZFS FILESYSTEM PARAMETERS. Out of the box is very likely to be optimal.

So: Is the ZFS setup vanilla?

PatrickBaer · December 2, 2009, 7:16am

It's an Intel, no Sparcs!

And yes, it's a plain out-of-the-box ZFS.

---------- Post updated at 07:16 AM ---------- Previous update was at 06:49 AM ----------

obelix:/storage# ./Bonnie -d /storage/ -s 2000
File '/storage//Bonnie.3218', size: 2097152000
Writing with putc()...done
Rewriting...done
Writing intelligently...done
Reading with getc()...done
Reading intelligently...done
Seeker 1...Seeker 2...Seeker 3...start 'em...done...done...done...
-------Sequential Output-------- ---Sequential Input-- --Random--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
2000 197573 92.5 616858 79.5 463151 71.1 118146 100.0 1757611 100.0 124906.3 296.5

That considered fast?

jim_mcnamara · December 2, 2009, 10:27am

Then the problem is rsync or scp.

When you need to sync up data that is already pretty close, either of those is fine.
To start when you have terabytes out of sync, one ssh connection running rsync/ssh is going to take loooong time to sync up. It's like putting a straw between two lakes and then trying to drain one lake into the other.

PatrickBaer · December 2, 2009, 10:30am

Hm, what do you have in mind?

jim_mcnamara · December 2, 2009, 10:37am

We had Veritas for volume management - timefinder could move ~2.0TB/hr anywhere
between either high speed tape devices<->filesystem or filesystem->filesystem. We now have CommGuard, which is slower but still okay.

What volume management software do you have? You can mount a filesystem temporarily and then move gobs of data around quickly - hopefully with the tools you already have. Please note - if it is proprietary Solaris I'm not going to be much help at all.

PatrickBaer · December 2, 2009, 10:44am

Its a Solaris 10 (did you mean that by proprietary? and the other one is a Linux machine, so Veritas won't work.

Wenn the filesystem seems ok to you? netio showed a decent 1Gbit connection.

Well it's the protocol then