Performance problem with bidirectional nc

Working on a simple, half duplex network diagnostic that will run anywhere using nc and dd. Performance is symmetrical with sink and source nc processes open as a server:

nc -vkl 5000 > /dev/null &
cat /dev/zero | nc -vkl 5001 &

With this on the client:

nc host0 5001 | dd of=/dev/null count=10000
5119384 bytes (5.1 MB) copied, 0.037134 seconds, 138 MB/s

dd if=/dev/zero count=10000| nc host0 5000
5120000 bytes (5.1 MB) copied, 0.036661 seconds, 140 MB/s

When I try to make the connection bi-directional on the same port, the write performance degrades:

cat /dev/zero | nc -vkl 5000 >/dev/null &

nc host0 5000 | dd of=/dev/null count=10000
5098656 bytes (5.1 MB) copied, 0.036657 seconds, 139 MB/s

dd if=/dev/zero count=10000| nc host0 5000
5120000 bytes (5.1 MB) copied, 11.6311 seconds, 440 kB/s

I'd like the diagnostic to use a single port, if possible. Any insights or suggestions would be greatly appreciated.

Thanks,

Tom