Performance (iops) becomes bad, what is the reason?

I have written a virtual HBA driver named "xmp_vhba". A scsi disk is attached on it. as shown below:

xmp_vhba, instance #0
        disk, instance #11

But the performance became very bad when we read/write the scsi disk using the vdbench(a read/write io tool).
What is the reason? Thanks!!

the performance is shown as below:

-bash-3.00# ./vdbench -vt -f xxx.txt
Vdbench distribution: vdbench502
For documentation, see 'vdbench.pdf'.
15:23:40.130 input argument scanned: '-vt'
15:23:40.163 input argument scanned: '-fxxx.txt'
15:23:41.635 Starting slave: /export/home/vdbench502/vdbench SlaveJvm -m localhost -n localhost-10-110616-15.23.39.879 -l localhost-0 -p 5570   
15:23:43.481 All slaves are now connected
15:23:48.002 Starting RD=run1; I/O rate: Uncontrolled MAX; elapsed=900000; For loops: threads=30.0
Jun 16, 2011  interval        i/o   MB/sec   bytes   read     resp     resp     resp    cpu%  cpu%
                             rate  1024**2     i/o    pct     time      max   stddev sys+usr   sys
15:23:49.281         1   14827.00     7.24     512 100.00    1.700   22.333    0.607    20.3  15.4
15:23:50.094         2   15736.00     7.68     512 100.00    1.771   18.857    0.308    22.0  17.0
15:23:51.085         3   16376.00     8.00     512 100.00    1.790   17.089    0.238    18.9  15.5
15:23:52.083         4   16797.00     8.20     512 100.00    1.744   17.868    0.198    19.1  15.7
15:23:53.076         5   16635.00     8.12     512 100.00    1.764   18.878    0.260    19.2  15.6
15:23:54.076         6   16769.00     8.19     512 100.00    1.748   17.625    0.224    19.1  15.7
15:23:55.076         7   16752.00     8.18     512 100.00    1.750   18.424    0.266    19.0  15.6

---------- Post updated at 06:38 AM ---------- Previous update was at 03:05 AM ----------

Maybe the properties of the VHBA and the scsi disk can affect the performance. Or the vdbench cause this problem.
I don't know what are the possible reasons. Who can give me some suggestions to resolve this problem. Thanks!!

---------- Post updated at 06:40 AM ---------- Previous update was at 06:38 AM ----------

Maybe the properties of the VHBA and the scsi disk can affect the performance. Or the vdbench cause this problem.
I don't know what are the possible reasons. Who can give me some suggestions to resolve this problem. Thanks!!

What's the output from

iostat -sndzx 1

while vdbench is running?

FWIW, it looks like it's doing a bunch of very small IO operations (512 bytes)

the output from "iostat -sndzx 1" is shown as below:

 
-bash-3.00# iostat -sndzx 1
extended device statistics 
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.2 0.8 3.6 3.6 0.0 0.0 0.0 6.3 0 0 c1t0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0 0 c1t2d0
11300.2 0.1 5650.1 0.0 0.2 19.1 0.0 1.7 16 69 c0t661B205100A12200000FB02D0000000Dd0
extended device statistics 
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 167.3 0.0 735.4 0.0 1.0 0.0 6.0 0 98 c1t0d0
16684.4 0.0 8342.2 0.0 0.2 27.8 0.0 1.7 22 100 c0t661B205100A12200000FB02D0000000Dd0
extended device statistics 
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 166.1 0.0 740.8 0.0 1.0 0.0 6.0 0 98 c1t0d0
16858.8 0.0 8429.4 0.0 0.2 27.8 0.0 1.6 22 100 c0t661B205100A12200000FB02D0000000Dd0
extended device statistics 
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 192.0 0.0 847.6 0.0 1.0 0.0 5.1 0 97 c1t0d0
16234.7 0.0 8117.4 0.0 0.2 27.8 0.0 1.7 22 100 c0t661B205100A12200000FB02D0000000Dd0

And you are right, it is doing a bunch of small IO with 512 bytes. The parameter setting of vdbench is shown as following:

 
sd=sd0,lun=/dev/rdsk/c0t661B205100A12200000FB02D0000000Dd0s2
wd=wd1,sd=(sd*),xfersize=512,rdpct=100,seekpct=0
rd=run1,wd=(wd1),iorate=max,elapsed=900000,interval=1,forthreads=30

Can you run vdbench with larger IO sizes? Like 1 MB or larger (but keep in in powers of 2). What happens then?

Thank you for your replies!

 
the result with running vdbench with 1MB IO :

Jun 22, 2011 interval i/o MB/sec bytes read resp resp resp cpu% cpu%
rate 1024**2 i/o pct time max stddev sys+usr sys
16:06:21.052 31 782.00 782.00 1048576 100.00 149.715 161.195 0.493 4.5 4.3
16:06:22.051 32 781.00 781.00 1048576 100.00 149.697 161.233 0.475 4.5 4.2
16:06:23.051 33 781.00 781.00 1048576 100.00 148.282 154.836 2.734 4.7 4.3

The io rate is always very bad ,especial with 512 IO size.
Now I doubt the DMA property and buf struct in scsi_init_pkt function. But I didn't understand these fully.

If I'm reading that right, you're getting 781 MB/sec with 1 MB IO operations.

How fast do you think it should be going?

You are right. The performance is normal with 1M IO operations.

While, the problem is that the performance is bad with 512 bytes compared with the one without our virtual HBA dirver.

The performace of same disk without VHBA is shown as below:

 
Jun 23, 2011 interval i/o MB/sec bytes read resp resp resp cpu% cpu%
rate 1024**2 i/o pct time max stddev sys+usr sys
11:12:20.729 1 73623.00 35.95 512 100.00 0.391 30.889 0.353 31.1 21.0
11:12:21.102 2 34416.00 16.80 512 100.00 0.775 14.747 0.140 39.5 25.3
11:12:22.093 3 66576.00 32.51 512 100.00 1.099 15.570 0.222 36.3 24.9
11:12:23.089 4 66305.00 32.38 512 100.00 1.687 16.420 0.283 36.5 25.1
11:12:24.084 5 66237.00 32.34 512 100.00 1.899 16.832 0.208 35.9 24.5
11:12:25.082 6 66379.00 32.41 512 100.00 1.895 17.151 0.274 36.2 25.0

The max io rate can reach 7k while the io rate with our VHBA is only 1.7k when running vdbench with 512 bytes IO.
That is the problem. But I don't know why.

How are your doing your IO? Via a file system or direct to the device? If direct, are you using the raw device? (rdsk vs. dsk)

Also, if you use just one thread, how does the VHBA vs non-VHBA performance look? Maybe you're having threads contending for resources in your VHBA code?