Sign in with
Sign up | Sign in
Your question

Anyone used Iperf or Netperf w/GigE?

Last response: in Networking
Share
Anonymous
October 13, 2004 9:51:27 PM

Archived from groups: comp.dcom.lans.ethernet (More info?)

Hi,

I was wondering if anyone has used Iperf or Netperf for testing network
performance over GigE? The reason for my question is that I've been
doing some testing, initially with Iperf, and recently with Netperf, of
GigE LAN links, and I've been finding results in the 300Mbit/sec range.
The server vendor is implying that these results are not valid, and is
suggesting that I do a file copy of a 36GB file instead and time it,
subtracting the time for a local file copy. I don't mind doing the test
they're suggesting, but I'm just wondering if there's a possibility that
the numbers that I'm getting from both Iperf and Netperf are really
'off'?

Thanks,
Jim

More about : iperf netperf gige

Anonymous
October 13, 2004 10:51:14 PM

Archived from groups: comp.dcom.lans.ethernet (More info?)

Ya. Two Dell Poweredge 2650 servers connected to a Nortel Baystack 5510
will run 996Mb/s all day long, jumbo frames enabled. Servers were running
RedHat Enterprise. Needless to say, we use Iperf for performance tuning and
testing all the time. The multicast and udp support is great for QoS
testing.

-mike


"ohaya" <ohaya@cox.net> wrote in message news:416DA35F.A0F52E20@cox.net...
> Hi,
>
> I was wondering if anyone has used Iperf or Netperf for testing network
> performance over GigE? The reason for my question is that I've been
> doing some testing, initially with Iperf, and recently with Netperf, of
> GigE LAN links, and I've been finding results in the 300Mbit/sec range.
> The server vendor is implying that these results are not valid, and is
> suggesting that I do a file copy of a 36GB file instead and time it,
> subtracting the time for a local file copy. I don't mind doing the test
> they're suggesting, but I'm just wondering if there's a possibility that
> the numbers that I'm getting from both Iperf and Netperf are really
> 'off'?
>
> Thanks,
> Jim
Anonymous
October 14, 2004 1:56:15 AM

Archived from groups: comp.dcom.lans.ethernet (More info?)

ohaya <ohaya@cox.net> wrote:
> I was wondering if anyone has used Iperf or Netperf for testing network
> performance over GigE?

Yes :) 

> The reason for my question is that I've been doing some testing,
> initially with Iperf, and recently with Netperf, of GigE LAN links,
> and I've been finding results in the 300Mbit/sec range. The server
> vendor is implying that these results are not valid, and is
> suggesting that I do a file copy of a 36GB file instead and time it,
> subtracting the time for a local file copy. I don't mind doing the
> test they're suggesting, but I'm just wondering if there's a
> possibility that the numbers that I'm getting from both Iperf and
> Netperf are really 'off'?

I suspect they are not off, but they may be using TCP settings that
are not optimal for GigE. For example, what are you using for -s and
-S as test-specific parameters in the netperf TCP_STREAM test?

Also, what sort of system are you using, the GigE card, the bus speeds
and feeds all that stuff.

rick jones
--
firebug n, the idiot who tosses a lit cigarette out his car window
these opinions are mine, all mine; HP might not want them anyway... :) 
feel free to post, OR email to raj in cup.hp.com but NOT BOTH...
Related resources
Anonymous
October 14, 2004 10:46:40 AM

Archived from groups: comp.dcom.lans.ethernet (More info?)

ohaya <ohaya@cox.net> wrote:
> Hi,

> I was wondering if anyone has used Iperf or Netperf for testing network
> performance over GigE? The reason for my question is that I've been
> doing some testing, initially with Iperf, and recently with Netperf, of
> GigE LAN links, and I've been finding results in the 300Mbit/sec range.
> The server vendor is implying that these results are not valid, and is
> suggesting that I do a file copy of a 36GB file instead and time it,
> subtracting the time for a local file copy. I don't mind doing the test
> they're suggesting, but I'm just wondering if there's a possibility that
> the numbers that I'm getting from both Iperf and Netperf are really
> 'off'?

As always, benchmarks measure the speed of the particular benchmark.

As regards for network performance, netperf _is_ a very good tool, allowing
to compare different systems. If your system only comes to 300Mbit/sec then
your system is limited to that speed. A vendor that is unable to tune up
might react with bullshit and try to shift your focus.

If you need more speed, you might need radical changes, it could be
anouther OS, or other networking gear. But remember, changing measurment
tools won't give you more performance from your application!



> Thanks,
> Jim

--
Peter HÃ¥kanson
IPSec Sverige ( At Gothenburg Riverside )
Sorry about my e-mail address, but i'm trying to keep spam out,
remove "icke-reklam" if you feel for mailing me. Thanx.
Anonymous
October 14, 2004 12:59:13 PM

Archived from groups: comp.dcom.lans.ethernet (More info?)

Hi Rick,

Comments below...

> > The reason for my question is that I've been doing some testing,
> > initially with Iperf, and recently with Netperf, of GigE LAN links,
> > and I've been finding results in the 300Mbit/sec range. The server
> > vendor is implying that these results are not valid, and is
> > suggesting that I do a file copy of a 36GB file instead and time it,
> > subtracting the time for a local file copy. I don't mind doing the
> > test they're suggesting, but I'm just wondering if there's a
> > possibility that the numbers that I'm getting from both Iperf and
> > Netperf are really 'off'?
>
> I suspect they are not off, but they may be using TCP settings that
> are not optimal for GigE. For example, what are you using for -s and
> -S as test-specific parameters in the netperf TCP_STREAM test?
>
> Also, what sort of system are you using, the GigE card, the bus speeds
> and feeds all that stuff.


With the netperf testing so far, I just used the default settings. I
was assuming that this should give us at least "an idea" of what the
actual throughput was?

I've been using iperf more extensively, because I couldn't find netperf
until a couple of days ago.

Needless to say, I was surprised with the results I got from iperf, and
then when I finally got a working netperf, those numbers came in about
the same.

System under test consisted of two IBM blade servers (HS40) with 4 x
Xeon 3.0 GHz CPUs, 16GB of memory, and 4 x Intel/1000 NICs onboard.
Connection between the blades (for these tests with netperf) was simply
a fiber cross-over cable.

Jim
Anonymous
October 14, 2004 1:51:12 PM

Archived from groups: comp.dcom.lans.ethernet (More info?)

ohaya wrote:
>
> Hi Rick,
>
> Comments below...
>
> > > The reason for my question is that I've been doing some testing,
> > > initially with Iperf, and recently with Netperf, of GigE LAN links,
> > > and I've been finding results in the 300Mbit/sec range. The server
> > > vendor is implying that these results are not valid, and is
> > > suggesting that I do a file copy of a 36GB file instead and time it,
> > > subtracting the time for a local file copy. I don't mind doing the
> > > test they're suggesting, but I'm just wondering if there's a
> > > possibility that the numbers that I'm getting from both Iperf and
> > > Netperf are really 'off'?
> >
> > I suspect they are not off, but they may be using TCP settings that
> > are not optimal for GigE. For example, what are you using for -s and
> > -S as test-specific parameters in the netperf TCP_STREAM test?
> >
> > Also, what sort of system are you using, the GigE card, the bus speeds
> > and feeds all that stuff.
>
> With the netperf testing so far, I just used the default settings. I
> was assuming that this should give us at least "an idea" of what the
> actual throughput was?
>
> I've been using iperf more extensively, because I couldn't find netperf
> until a couple of days ago.
>
> Needless to say, I was surprised with the results I got from iperf, and
> then when I finally got a working netperf, those numbers came in about
> the same.
>
> System under test consisted of two IBM blade servers (HS40) with 4 x
> Xeon 3.0 GHz CPUs, 16GB of memory, and 4 x Intel/1000 NICs onboard.
> Connection between the blades (for these tests with netperf) was simply
> a fiber cross-over cable.


Hi,

Sorry, forgot to mention that both systems are running Windows 2003
Server.

Jim
Anonymous
October 14, 2004 1:53:08 PM

Archived from groups: comp.dcom.lans.ethernet (More info?)

Michael Roberts wrote:
>
> Ya. Two Dell Poweredge 2650 servers connected to a Nortel Baystack 5510
> will run 996Mb/s all day long, jumbo frames enabled. Servers were running
> RedHat Enterprise. Needless to say, we use Iperf for performance tuning and
> testing all the time. The multicast and udp support is great for QoS
> testing.
>


Mike,

Thanks for the info. Actually, that gives me an idea. We have some
Dell PowerEdges with GigE NICs sitting around somewhere. I'll see if I
can try out Iperf and/or Netperf on them and see what I get.

Jim
Anonymous
October 14, 2004 2:01:49 PM

Archived from groups: comp.dcom.lans.ethernet (More info?)

> As always, benchmarks measure the speed of the particular benchmark.
>
> As regards for network performance, netperf _is_ a very good tool, allowing
> to compare different systems. If your system only comes to 300Mbit/sec then
> your system is limited to that speed. A vendor that is unable to tune up
> might react with bullshit and try to shift your focus.
>
> If you need more speed, you might need radical changes, it could be
> anouther OS, or other networking gear. But remember, changing measurment
> tools won't give you more performance from your application!


Peter,

Thanks for the advice.

I think/hope that you're aware of what I've been attempting to do, based
on my earlier thread, and personally, I agree that at this point, the
vendor is reacting with "b....".

Nevertheless, it looks like I'm going to have to do their "manual copy"
test to satisfy them that there's a problem in the first place, even
though I think that tools like Iperf and Netperf do a better job because
they're specifically designed for what they do. Otherwise, so far, it
doesn't look like they're going to even look into this problem.

I guess that we've all "been there, and done that" with our vendors
:( ...

Jim
Anonymous
October 14, 2004 2:17:49 PM

Archived from groups: comp.dcom.lans.ethernet (More info?)

> I suspect they are not off, but they may be using TCP settings that
> are not optimal for GigE. For example, what are you using for -s and
> -S as test-specific parameters in the netperf TCP_STREAM test?


Hi Rick,

I can't see any -s or -S parameters? What I'm going by is a man page
at:

http://carol.science.uva.nl/~jblom/datatag/wp3_1/tools/...

Also tried a "-h" and didn't see any -s or -S there?

FYI, the binaries that I have are 2.1pl1. The www.netperf.org site
doesn't seem to be working anymore, so these were the only binaries I
could find for Win32, on a site in Japan, I think.

Jim

P.S. Are you "the" Rick Jones, the originator of Netperf?
Anonymous
October 14, 2004 2:36:15 PM

Archived from groups: comp.dcom.lans.ethernet (More info?)

ohaya wrote:

> Hi Rick,
>
> Comments below...
>
>> > The reason for my question is that I've been doing some testing,
>> > initially with Iperf, and recently with Netperf, of GigE LAN links,
>> > and I've been finding results in the 300Mbit/sec range. The server
>> > vendor is implying that these results are not valid, and is
>> > suggesting that I do a file copy of a 36GB file instead and time it,
>> > subtracting the time for a local file copy. I don't mind doing the
>> > test they're suggesting, but I'm just wondering if there's a
>> > possibility that the numbers that I'm getting from both Iperf and
>> > Netperf are really 'off'?
>>
>> I suspect they are not off, but they may be using TCP settings that
>> are not optimal for GigE. For example, what are you using for -s and
>> -S as test-specific parameters in the netperf TCP_STREAM test?
>>
>> Also, what sort of system are you using, the GigE card, the bus speeds
>> and feeds all that stuff.
>
>
> With the netperf testing so far, I just used the default settings. I
> was assuming that this should give us at least "an idea" of what the
> actual throughput was?
>
> I've been using iperf more extensively, because I couldn't find netperf
> until a couple of days ago.
>
> Needless to say, I was surprised with the results I got from iperf, and
> then when I finally got a working netperf, those numbers came in about
> the same.
>
> System under test consisted of two IBM blade servers (HS40) with 4 x
> Xeon 3.0 GHz CPUs, 16GB of memory, and 4 x Intel/1000 NICs onboard.
> Connection between the blades (for these tests with netperf) was simply
> a fiber cross-over cable.

I can't find a match on the Serverworks site for the chipset that is
supposed to be on that board, but one possible would be the "HE" chipset,
which has onboard 32/33 PCI and an IMB link that allows connection of a
64/66 or PCI-X southbridge. If the Ethernet is on the 32/33 PCI that would
explain the poor performance you're seeing. Just for hohos, try each
Ethernet port in turn, using the same port on both blades--it may be that
one or two are on 32/33 and the others are on the fast bus. I realize it's
a long shot, but it's simple and obvious.

Also, are you _sure_ you've got a good cable.

And is there any possibility that there's a duplex mismatch? Did you
connect the cable with both blades powered down? If not, it may be that
the NICs did not handshake properly--they're _supposed_ to I know but
what's supposed to happen and what does happen aren't always the same.

> Jim

--
--John
Reply to jclarke at ae tee tee global dot net
(was jclarke at eye bee em dot net)
Anonymous
October 14, 2004 3:59:45 PM

Archived from groups: comp.dcom.lans.ethernet (More info?)

John,

Comments below...

Jim


> I can't find a match on the Serverworks site for the chipset that is
> supposed to be on that board, but one possible would be the "HE" chipset,
> which has onboard 32/33 PCI and an IMB link that allows connection of a
> 64/66 or PCI-X southbridge. If the Ethernet is on the 32/33 PCI that would
> explain the poor performance you're seeing. Just for hohos, try each
> Ethernet port in turn, using the same port on both blades--it may be that
> one or two are on 32/33 and the others are on the fast bus. I realize it's
> a long shot, but it's simple and obvious.

I asked IBM specifically about the interface, and they said they were
PCI-X. Of course, they could be wrong. Also, I've tried between combos
among 4 servers already.

Re. cables, I've tried several fiber cables.

Here's the page listing the driver:

http://www-307.ibm.com/pc/support/site.wss/document.do?...

I used the one:

"Intel-based Gigabit and 10/100 Ethernet adapter drivers for Microsoft
Windows 2000 and Microsoft Windows Server 2003"


> Also, are you _sure_ you've got a good cable.

See above.


>
> And is there any possibility that there's a duplex mismatch? Did you
> connect the cable with both blades powered down? If not, it may be that
> the NICs did not handshake properly--they're _supposed_ to I know but
> what's supposed to happen and what does happen aren't always the same.


That's a good hint. For the tests via a GigE switch, the servers were
connected to the switch prior to power-on (no choice). For the tests
via fiber cross-over cable, I plugged the fiber together after power on.

I'll try some tests powering the servers off, connecting the cables, the
powering the servers on, if I can.
Anonymous
October 14, 2004 9:40:46 PM

Archived from groups: comp.dcom.lans.ethernet (More info?)

ohaya <ohaya@cox.net> wrote:
> I asked IBM specifically about the interface, and they
> said they were PCI-X. Of course, they could be wrong.
> Also, I've tried between combos among 4 servers already.

OK, I'm a little late to this thread, but have you
tried UDP? MS-Windows used to have horrible problems
setting adequate TCP-Rcv-Windows sizes.

Personally, I use `ttcp` for bandwidth testing.

-- Robert
Anonymous
October 14, 2004 10:40:08 PM

Archived from groups: comp.dcom.lans.ethernet (More info?)

Robert Redelmeier wrote:
>
> ohaya <ohaya@cox.net> wrote:
> > I asked IBM specifically about the interface, and they
> > said they were PCI-X. Of course, they could be wrong.
> > Also, I've tried between combos among 4 servers already.
>
> OK, I'm a little late to this thread, but have you
> tried UDP? MS-Windows used to have horrible problems
> setting adequate TCP-Rcv-Windows sizes.
>
> Personally, I use `ttcp` for bandwidth testing.


Robert,

No, I haven't tried UDP yet. Will do that when I have time.

I started this testing with TTCP (actually a version called PCATTCP),
but I was getting very inconsistent test-to-test results, so I looked
for another tool. Couldn't find netperf (for Win32), so I found Iperf,
and did most of the testing with that.

Then, I found an older binary for netperf, and tried that to validate
the results I got from Iperf.

Jim
Anonymous
October 14, 2004 11:15:51 PM

Archived from groups: comp.dcom.lans.ethernet (More info?)

ohaya <ohaya@cox.net> wrote:
> With the netperf testing so far, I just used the default settings. I
> was assuming that this should give us at least "an idea" of what the
> actual throughput was?

"Typically" (as if there really is such a thing) one wants 64KB or
larger TCP windows for local gigabit. Default netperf settings simply
take the system's defaults which may not be large enough for
maximizing GbE throughput.

> System under test consisted of two IBM blade servers (HS40) with 4 x
> Xeon 3.0 GHz CPUs, 16GB of memory, and 4 x Intel/1000 NICs onboard.
> Connection between the blades (for these tests with netperf) was simply
> a fiber cross-over cable.

Is that 4X Intel/1000 on each blade, or are they on the chassis?
Windows or Linux? I'd check CPU util if possible - although don't put
_tooo_ much faith in top.

rick jones
--
a wide gulf separates "what if" from "if only"
these opinions are mine, all mine; HP might not want them anyway... :) 
feel free to post, OR email to raj in cup.hp.com but NOT BOTH...
Anonymous
October 14, 2004 11:15:52 PM

Archived from groups: comp.dcom.lans.ethernet (More info?)

>
> Is that 4X Intel/1000 on each blade, or are they on the chassis?
> Windows or Linux? I'd check CPU util if possible - although don't put
> _tooo_ much faith in top.


Hi,

There are 4 x Intel/1000 NICs on each blade, not on the chassis. OS in
Windows 2003.

Jim
Anonymous
October 14, 2004 11:20:16 PM

Archived from groups: comp.dcom.lans.ethernet (More info?)

ohaya <ohaya@cox.net> wrote:
> I can't see any -s or -S parameters? What I'm going by is a man page
> at:

> http://carol.science.uva.nl/~jblom/datatag/wp3_1/tools/...

> Also tried a "-h" and didn't see any -s or -S there?

-s and -S are "test specific" options. Help for test-specific options is diplayed when you specify a test type and the -- -h:

$ ./netperf -t TCP_STREAM -- -h

Usage: netperf [global options] -- [test options]

TCP/UDP BSD Sockets Test Options:
-C Set TCP_CORK when available
-D [L][,R] Set TCP_NODELAY locally and/or remotely (TCP_*)
-h Display this text
-I local[,remote] Set the local/remote IP addresses for the data socket
-m bytes Set the send size (TCP_STREAM, UDP_STREAM)
-M bytes Set the recv size (TCP_STREAM, UDP_STREAM)
-p min[,max] Set the min/max port numbers for TCP_CRR, TCP_TRR
-P local[,remote] Set the local/remote port for the data socket
-r req,[rsp] Set request/response sizes (TCP_RR, UDP_RR)
-s send[,recv] Set local socket send/recv buffer sizes
-S send[,recv] Set remote socket send/recv buffer sizes

For those options taking two parms, at least one must be specified;
specifying one value without a comma will set both parms to that
value, specifying a value with a leading comma will set just the second
parm, a value with a trailing comma will set just the first. To set
each parm to unique values, specify both and separate them with a
comma.

> FYI, the binaries that I have are 2.1pl1. The www.netperf.org site
> doesn't seem to be working anymore, so these were the only binaries I
> could find for Win32, on a site in Japan, I think.

www.netperf.org should be up - i'll double check it. While netperf
sources are up to 2.3pl1 now, which includes some non-trivial Windows
re-integration, there aren't binaries for it from netperf.org/ftp.cup.

> P.S. Are you "the" Rick Jones, the originator of Netperf?

Yes. These days I call myself the "Contributing Editor" :) 

rick jones
--
Wisdom Teeth are impacted, people are affected by the effects of events.
these opinions are mine, all mine; HP might not want them anyway... :) 
feel free to post, OR email to raj in cup.hp.com but NOT BOTH...
Anonymous
October 14, 2004 11:24:49 PM

Archived from groups: comp.dcom.lans.ethernet (More info?)

ohaya <ohaya@cox.net> wrote:
> Nevertheless, it looks like I'm going to have to do their "manual copy"
> test to satisfy them that there's a problem in the first place, even
> though I think that tools like Iperf and Netperf do a better job because
> they're specifically designed for what they do. Otherwise, so far, it
> doesn't look like they're going to even look into this problem.

Being affiliated with a vendor :)  at least for the moment. I will say
that the path through the stack may indeed be different for FTP than
for netperf TCP_STREAM. For example, many FTP's can make use of the
platform's "sendfile" command which will send data directly from the
buffer cache down the stack without copies. There will be a data copy
in a netperf TCP_STREAM test. If the system is easily CPU/memory bus
limited that could make a significant difference. Of course, that is
why there is a TCP_SENDFILE test in contemporary versions of netperf
:)  (I cannot remember if it is coded to use transmitfile on Windows or
not - I think that change may not be there yet)

Of course, it still could just be smoke or simply someone going step
by step through a checklist.

rick jones
--
Process shall set you free from the need for rational thought.
these opinions are mine, all mine; HP might not want them anyway... :) 
feel free to post, OR email to raj in cup.hp.com but NOT BOTH...
Anonymous
October 14, 2004 11:24:50 PM

Archived from groups: comp.dcom.lans.ethernet (More info?)

Rick Jones wrote:
>
> ohaya <ohaya@cox.net> wrote:
> > Nevertheless, it looks like I'm going to have to do their "manual copy"
> > test to satisfy them that there's a problem in the first place, even
> > though I think that tools like Iperf and Netperf do a better job because
> > they're specifically designed for what they do. Otherwise, so far, it
> > doesn't look like they're going to even look into this problem.
>
> Being affiliated with a vendor :)  at least for the moment. I will say
> that the path through the stack may indeed be different for FTP than
> for netperf TCP_STREAM. For example, many FTP's can make use of the
> platform's "sendfile" command which will send data directly from the
> buffer cache down the stack without copies. There will be a data copy
> in a netperf TCP_STREAM test. If the system is easily CPU/memory bus
> limited that could make a significant difference. Of course, that is
> why there is a TCP_SENDFILE test in contemporary versions of netperf
> :)  (I cannot remember if it is coded to use transmitfile on Windows or
> not - I think that change may not be there yet)


Rick,

I may have been unclear by what I meant by a "manual copy" test. What
they are suggesting that I do is create a 36GB file on one server, then:

- manually time a file copy from that server to the other server, and
- manually time a file copy from that server to itself, and
- subtract the times and divide the result by 36GB.

Jim
Anonymous
October 14, 2004 11:38:02 PM

Archived from groups: comp.dcom.lans.ethernet (More info?)

ohaya wrote:

> John,
>
> Comments below...
>
> Jim
>
>
>> I can't find a match on the Serverworks site for the chipset that is
>> supposed to be on that board, but one possible would be the "HE" chipset,
>> which has onboard 32/33 PCI and an IMB link that allows connection of a
>> 64/66 or PCI-X southbridge. If the Ethernet is on the 32/33 PCI that
>> would
>> explain the poor performance you're seeing. Just for hohos, try each
>> Ethernet port in turn, using the same port on both blades--it may be that
>> one or two are on 32/33 and the others are on the fast bus. I realize
>> it's a long shot, but it's simple and obvious.
>
> I asked IBM specifically about the interface, and they said they were
> PCI-X. Of course, they could be wrong. Also, I've tried between combos
> among 4 servers already.
>
> Re. cables, I've tried several fiber cables.
>
> Here's the page listing the driver:
>
> http://www-307.ibm.com/pc/support/site.wss/document.do?...
>
> I used the one:
>
> "Intel-based Gigabit and 10/100 Ethernet adapter drivers for Microsoft
> Windows 2000 and Microsoft Windows Server 2003"
>
>
>> Also, are you _sure_ you've got a good cable.
>
> See above.
>
>
>>
>> And is there any possibility that there's a duplex mismatch? Did you
>> connect the cable with both blades powered down? If not, it may be that
>> the NICs did not handshake properly--they're _supposed_ to I know but
>> what's supposed to happen and what does happen aren't always the same.
>
>
> That's a good hint. For the tests via a GigE switch, the servers were
> connected to the switch prior to power-on (no choice). For the tests
> via fiber cross-over cable, I plugged the fiber together after power on.
>
> I'll try some tests powering the servers off, connecting the cables, the
> powering the servers on, if I can.

I didn't realize they were running fiber. There have been cases with short
cables where the receiver was being overdriven--don't know if that would
produce the symptoms you're seeing though.

--
--John
Reply to jclarke at ae tee tee global dot net
(was jclarke at eye bee em dot net)
Anonymous
October 15, 2004 4:14:21 AM

Archived from groups: comp.dcom.lans.ethernet (More info?)

ohaya <ohaya@cox.net> wrote:
> I started this testing with TTCP (actually a version called PCATTCP),

I think there's a version called `ttcpw`

> but I was getting very inconsistent test-to-test results, so I looked

Of course! The standard number of packets goes too quickly on Gig.

> Then, I found an older binary for netperf, and tried that
> to validate the results I got from Iperf.

You should validate a tool against the localhost loopback interface.
On my slow 500 MHz Linux box:

$ ttcp -sr & ttcp -stu -n99999 localhost
[2] 5030
ttcp-r: buflen=8192, nbuf=2048, align=16384/0, port=5001 tcp
ttcp-r: socket
ttcp-t: buflen=8192, nbuf=99999, align=16384/0, port=5001 udp -> localhost
ttcp-t: socket
ttcp-t: 819191808 bytes in 4.05 real seconds = 197396.61 KB/sec +++
ttcp-t: 100005 I/O calls, msec/call = 0.04, calls/sec = 24676.06
ttcp-t: 0.1user 3.9sys 0:04real 100% 0i+0d 0maxrss 0+2pf 0+0csw

This is barely faster than Gig. My Athlon XP 2000+ will
report 1+ GByte/sec

-- Robert
Anonymous
October 15, 2004 6:10:56 AM

Archived from groups: comp.dcom.lans.ethernet (More info?)

In article <416EFF42.101E4E28@cox.net>, ohaya <ohaya@cox.net> wrote:
:I may have been unclear by what I meant by a "manual copy" test. What
:they are suggesting that I do is create a 36GB file on one server, then:

:- manually time a file copy from that server to the other server, and
:- manually time a file copy from that server to itself, and
:- subtract the times and divide the result by 36GB.

That test is dubious.

- The time to copy a file is dependant on the OS and drive maximum
write rate, and the write rates are not necessarily going to be the
same between the two servers [unless they are the same hardware through
and through.]

- A copy of a file from a server to itself can potentially be
substantially decreased by DMA. Depends how smart the copy program is.
There is the advantage of knowing that one is going to be starting the
read and write on a nice boundarys, so one could potentially have the
copy program keep the data in system space or maybe even in hardware
space.

- When the file is being copied locally, if it is being copied to the
same drive, then the reads and writes are going to be in contention
whereas when copying a file to a remote server, the reads and writes
happen in parallel. The larger the memory buffer that the system can
[with hardware cooperation] allocate to a single disk I/O, the fewer
the times the drive has to move its head... if, that is, the file is
allocated into contiguous blocks and is being written into contiguous
blocks, though this need would be mitigated if the drive controller
supports scatter-gather or CTQ.

- When the file is being copied locally, if it is being copied to the
same controller, then there can be bus contention that would prevent
the reads from operating in parallel with the writes. But again system
buffering and drive controller cache and CTQ can mitigate this: some
SCSI drives do permit incoming writes to be buffered while they are
seeking and reading for a previous read request.

- The first copy is going to require that the OS find the directory
entry and locate the file on disk and start reading. But at the time of
the second copy, the directory and block information might be cached by
the OS, reducing the copy time. Also, if the file fits entirely within
available memory, then the OS may still have the file in it's I/O
buffers and might skip the read. (Okay, that last is unlikely to happen
with a 30 Gb file on the average system, but it is not out of the
question for High Performance Computing systems.)

- In either copy scenario, one has to know what it means for the last
write() to have returned: does it mean that the data is flushed to
disk, or does it mean that the last buffer of data has been sent to the
filesystem cache for later dispatch when convenient? Especially when
you are doing the copy to the remote system, are you measuring the time
until the last TCP packet hits the remote NIC and the ACK for it gets
back, or are you measuring the time until the OS gets around to
scheduling a flush? The difference could be substantial if you have
large I/O buffers on the receiving side! Is the copy daemon using
synchronous I/O or asynch I/O ?

- A test that that would more closely simulate the source server's copy
out to network, would be to time a copy to the null device instead of
to a file on the server. But to measure the network timing you still
need to know how the destination server handles flushing the last
buffer when a close() is issued. Ah, but you also have to know how the
TCP stack and copy daemons work together.

When the copy-out daemon detects the end of the source file, it will
close the connection and the I/O library will translate that into
needing to send a FIN packet. But will that FIN packet get sent in the
header of the last buffer, or will it be a separate packet? And when
the remote system receives the FIN, does the TCP layer FIN ACK
immediately, or does it wait until the copy-in daemon closes the input
connection? If it waits, then does the copy-in daemon close the input
connection as soon as it detects EOF, or does it wait until the write()
on the final buffer returns? When the copy-out daemon close()'s the
connection, does the OS note that and return immediately, possibly
dealing with the TCP details on a different CPU or in hardware, or does
the OS wait for the TCP ACK gets received before it returns to the
program? Are POSIX.1 calls being used by the copy daemons, and if so
what does POSIX.1 say is the proper behaviour considering that until
the ACK of the last output packet arives, the write associated with the
implicit flush() might fail: if the last packet gets dropped [and all
TCP retries are exhausted] then the return from close() is perhaps
different than if the last packet makes it. Of maybe not and one has to
explicitly flush() if one wants to distinguish the cases. Unfortunately
I don't have my copy of POSIX.1 with me to check.


I bet the company didn't think of these problems when they asked you to
do the test. Or if they did, then they are probably assuming that
the boundary conditions will not make a significant contribution
to the final bandwidth calculation when the boundary conditions
are amoratized over 30 Gb. But there are just too many possibilities
that could throw the calculation off significantly, especially
the drive head contention and the accounting of the time to flush the
final write buffer when one has large I/O buffers.
--
Everyone has a "Good Cause" for which they are prepared to spam.
-- Roberson's Law of the Internet
Anonymous
October 15, 2004 6:20:42 AM

Archived from groups: comp.dcom.lans.ethernet (More info?)

J. Clarke <jclarke@nospam.invalid> wrote:
> I didn't realize they were running fiber. There have
> been cases with short cables where the receiver was being
> overdriven--don't know if that would produce the symptoms
> you're seeing though.

A good point. Worth trying a set of attenuators
made just for this purpose.

-- Robert
Anonymous
October 17, 2004 11:55:22 PM

Archived from groups: comp.dcom.lans.ethernet (More info?)

Rick Jones wrote:
>
> ohaya <ohaya@cox.net> wrote:
> > With the netperf testing so far, I just used the default settings. I
> > was assuming that this should give us at least "an idea" of what the
> > actual throughput was?
>
> "Typically" (as if there really is such a thing) one wants 64KB or
> larger TCP windows for local gigabit. Default netperf settings simply
> take the system's defaults which may not be large enough for
> maximizing GbE throughput.


Rick,

I spent a few more hours testing this weekend, including various
different sizes for the "RWIN". Increasing it up to 64KB or so made no
noticeable difference.

I also tried enabling the "TCP Scaling" (Tcp1323Opts) which should allow
the RWIN to be set to greater than 64KB, and then tried various sizes
for RWIN. Again, no difference.

I began running Windows Performance Monitor, monitoring "Total
Bytes/sec" on the sending machine, and what I was seeing was that:

- There was very low CPU utilization throughout the test (<10%).
- When the test started, I could see the Total Bytes/sec spike, but then
it'd level out to about 30+ Mbytes/sec for the rest of the test. The
height of the spike varied, I'd guess from 50+ Mbytes/sec. I think that
I saw it spike once to about ~80 Mbytes/sec.


I tried these tests again, both through NICs connected through the GigE
switch and NICs connected by a simple fiber cross-over cable. Not much
difference.


I'm kind of running out of ideas here. There almost seems like there is
something that is preventing the sending end from sending more than
about 30+ Mbytes/sec :( ...

Jim
Anonymous
October 18, 2004 10:28:41 PM

Archived from groups: comp.dcom.lans.ethernet (More info?)

Hi,

After much testing, I was finally able to get some reasonable results
from all 3 network test tools that I had been working with, Iperf,
Netperf, and PCATTCP.

What I had to do was to include command line parameters for the
following:

MSS: 100000
TcpWindowSize: 64K
Buffer Size: 24K

For example, for Iperf sending end, I used:

iperf -c 10.1.1.24 -M 100000 -w 64K -l 24K -t 30

and for Netperf, I used:

netperf-2.1pl1 -H 10.1.1.24 -l 30 -- -s 24000,24000 -m 100000 -M 100000


With these command line parameters, I am now getting results in the 900+
Mbits/sec range, both via the GigE switch and via a cross-over cable.


I'm posting this in case anyone needs this info, and to close off this
thread. I'll be posting another msg to start a thread re. "What now?",
i.e., what are the implications of these test results.

Thanks for all those who replied to this thread!!

Yours,
Jim Lum
Anonymous
October 20, 2004 10:27:58 PM

Archived from groups: comp.dcom.lans.ethernet (More info?)

ohaya <ohaya@cox.net> wrote:
> After much testing, I was finally able to get some reasonable
> results from all 3 network test tools that I had been working with,
> Iperf, Netperf, and PCATTCP.

> What I had to do was to include command line parameters for the
> following:

> MSS: 100000
> TcpWindowSize: 64K
> Buffer Size: 24K

> For example, for Iperf sending end, I used:

> iperf -c 10.1.1.24 -M 100000 -w 64K -l 24K -t 30

> and for Netperf, I used:

> netperf-2.1pl1 -H 10.1.1.24 -l 30 -- -s 24000,24000 -m 100000 -M 100000

Those are truely odd. FWIW, and I suspect the same is true for iperf,
what you are calling the MSS is really the size of the buffer being
presented to the transport at one time. TCP then breaks that up into
MSS-sized segments.

Windows TCP has a bit of a disconnect between SO_SNDBUF/SO_RCVBUF
(what netperf sets with -s and -S on either side) and the TCP window
doesn't it.

rick jones
--
denial, anger, bargaining, depression, acceptance, rebirth...
where do you want to be today?
these opinions are mine, all mine; HP might not want them anyway... :) 
feel free to post, OR email to raj in cup.hp.com but NOT BOTH...
Anonymous
October 21, 2004 2:03:59 AM

Archived from groups: comp.dcom.lans.ethernet (More info?)

Rick Jones wrote:
>
> ohaya <ohaya@cox.net> wrote:
> > After much testing, I was finally able to get some reasonable
> > results from all 3 network test tools that I had been working with,
> > Iperf, Netperf, and PCATTCP.
>
> > What I had to do was to include command line parameters for the
> > following:
>
> > MSS: 100000
> > TcpWindowSize: 64K
> > Buffer Size: 24K
>
> > For example, for Iperf sending end, I used:
>
> > iperf -c 10.1.1.24 -M 100000 -w 64K -l 24K -t 30
>
> > and for Netperf, I used:
>
> > netperf-2.1pl1 -H 10.1.1.24 -l 30 -- -s 24000,24000 -m 100000 -M 100000
>
> Those are truely odd. FWIW, and I suspect the same is true for iperf,
> what you are calling the MSS is really the size of the buffer being
> presented to the transport at one time. TCP then breaks that up into
> MSS-sized segments.
>
> Windows TCP has a bit of a disconnect between SO_SNDBUF/SO_RCVBUF
> (what netperf sets with -s and -S on either side) and the TCP window
> doesn't it.


Rick,

What did you mean in your last sentence when you said "and the TCP
window doesn't it"?


Re. you comments, I'm a bit confused (not by your comments, but just in
general).

I was able to get the higher speed results with Iperf first. I found
these parameters at:

http://www.digit-life.com/articles2/gigeth32bit/gig-eth...


Then, I proceeded to try to duplicate these results with netperf and
PCATTCP, i.e., I did the best that I could to try to use the equivalent
parameters to the ones that I used with Iperf. Granted, now that I go
back and review the parameters, some of the "translations" were somewhat
unclear.

According to the Iperf docs, the "-M" parameter is:

"Attempt to set the TCP maximum segment size (MSS) via the
TCP_MAXSEG option. The MSS is usually the MTU - 40 bytes for the
TCP/IP header. For ethernet, the MSS is 1460 bytes (1500 byte
MTU).
This option is not implemented on many OSes."

The "-l" parameter is:

"The length of buffers to read or write. Iperf works by writing an
array of len bytes a number of times. Default is 8 KB for TCP, 1470
bytes for UDP. Note for UDP, this is the datagram size and needs to
be lowered when using IPv6 addressing to 1450 or less to avoid
fragmentation. See also the -n and -t options."

The "-w" parameter is:
"Sets the socket buffer sizes to the specified value. For TCP, this
sets the TCP window size. For UDP it is just the buffer which
datagrams are received in, and so limits the largest receivable
datagram size."


It sounds like what you describe in your post as "really the size of the
buffer presented to the transport at one time" corresponds to the Iperf
"-l" parameter, rather than the "-M", which the Iperf docs say are for
"attempting" to set the MSS using TCP_MAXSEG?

Jim
Anonymous
October 21, 2004 6:12:25 AM

Archived from groups: comp.dcom.lans.ethernet (More info?)

In article <4177190F.51E501F5@cox.net>, ohaya <ohaya@cox.net> wrote:
:Rick Jones wrote:

:> Windows TCP has a bit of a disconnect between SO_SNDBUF/SO_RCVBUF
:> (what netperf sets with -s and -S on either side) and the TCP window
:> doesn't it.

:What did you mean in your last sentence when you said "and the TCP
:window doesn't it"?

That confused me a moment too, but I then re-parsed it and understood.

"A has a bit of a disconnect between B and C, does it not?"

In other words,

"I think you will agree that in system A, element B and element C are
not related as strongly as you would normally think they would be."
--
Those were borogoves and the momerathsoutgrabe completely mimsy.
Anonymous
October 21, 2004 1:28:33 PM

Archived from groups: comp.dcom.lans.ethernet (More info?)

>
> Those are truely odd. FWIW, and I suspect the same is true for iperf,
> what you are calling the MSS is really the size of the buffer being
> presented to the transport at one time. TCP then breaks that up into
> MSS-sized segments.
>
> Windows TCP has a bit of a disconnect between SO_SNDBUF/SO_RCVBUF
> (what netperf sets with -s and -S on either side) and the TCP window
> doesn't it.


Rick,

I haven't had a chance to try adjusting these, but at:

http://www.microsoft.com/technet/prodtechnol/windowsser...

if you look at Appendix C, under "DefaultReceiveWindow", it says:

"Description: The number of receive bytes that AFD buffers on a
connection before imposing flow control. For some applications,
a larger value here gives slightly better performance at the
expense of increased resource utilization. Applications can
modify this value on a per-socket basis with the SO_RCVBUF socket
option."

There's also a "DefaultSendWindow" just below that.

It looks like, from this description, that the SO_RCVBUF is equivalent
to the DefaultReceiveWindow and the SO_SNDBUF is equivalent to the
DefaultSendWindow?

Jim
Anonymous
October 27, 2004 12:21:15 AM

Archived from groups: comp.dcom.lans.ethernet (More info?)

Walter Roberson <roberson@ibd.nrc-cnrc.gc.ca> wrote:
> In article <4177190F.51E501F5@cox.net>, ohaya <ohaya@cox.net> wrote:
> :Rick Jones wrote:

> :> Windows TCP has a bit of a disconnect between SO_SNDBUF/SO_RCVBUF
> :> (what netperf sets with -s and -S on either side) and the TCP window
> :> doesn't it.

> :What did you mean in your last sentence when you said "and the TCP
> :window doesn't it"?

> That confused me a moment too, but I then re-parsed it and understood.

> "A has a bit of a disconnect between B and C, does it not?"

> In other words,

> "I think you will agree that in system A, element B and element C are
> not related as strongly as you would normally think they would be."

Yeah, what he said :)  Basically, I am accustomed to having
setsockopt() calls for SO_RCVBUF, when made before the call to
connect() or listen(), controlling the size of the offered TCP window.
My understanding is that is not _particularly_ the case under Windows.

rick jones
--
oxymoron n, commuter in a gas-guzzling luxury SUV with an American flag
these opinions are mine, all mine; HP might not want them anyway... :) 
feel free to post, OR email to raj in cup.hp.com but NOT BOTH...
Anonymous
October 27, 2004 12:21:16 AM

Archived from groups: comp.dcom.lans.ethernet (More info?)

On a different note, but related to Iperf....

I found a Linux distribution called Knoppix-STD (security tool
distribution). The CD is bootable and it contains the Linux version of
Iperf. The host does not even need a hard drive. The CD can breathe
life into old workstations and servers by transforming them into traffic
generators.

http://www.knoppix-std.org/

-mike


Rick Jones wrote:
> Walter Roberson <roberson@ibd.nrc-cnrc.gc.ca> wrote:
>
>>In article <4177190F.51E501F5@cox.net>, ohaya <ohaya@cox.net> wrote:
>>:Rick Jones wrote:
>
>
>>:> Windows TCP has a bit of a disconnect between SO_SNDBUF/SO_RCVBUF
>>:> (what netperf sets with -s and -S on either side) and the TCP window
>>:> doesn't it.
>
>
>>:What did you mean in your last sentence when you said "and the TCP
>>:window doesn't it"?
>
>
>>That confused me a moment too, but I then re-parsed it and understood.
>
>
>>"A has a bit of a disconnect between B and C, does it not?"
>
>
>>In other words,
>
>
>>"I think you will agree that in system A, element B and element C are
>>not related as strongly as you would normally think they would be."
>
>
> Yeah, what he said :)  Basically, I am accustomed to having
> setsockopt() calls for SO_RCVBUF, when made before the call to
> connect() or listen(), controlling the size of the offered TCP window.
> My understanding is that is not _particularly_ the case under Windows.
>
> rick jones
Anonymous
October 27, 2004 12:23:04 AM

Archived from groups: comp.dcom.lans.ethernet (More info?)

ohaya <ohaya@cox.net> wrote:

> I haven't had a chance to try adjusting these, but at:

> http://www.microsoft.com/technet/prodtechnol/windowsser...

> if you look at Appendix C, under "DefaultReceiveWindow", it says:

> "Description: The number of receive bytes that AFD buffers on a
> connection before imposing flow control. For some applications,
> a larger value here gives slightly better performance at the
> expense of increased resource utilization. Applications can
> modify this value on a per-socket basis with the SO_RCVBUF socket
> option."

> There's also a "DefaultSendWindow" just below that.

> It looks like, from this description, that the SO_RCVBUF is equivalent
> to the DefaultReceiveWindow and the SO_SNDBUF is equivalent to the
> DefaultSendWindow?

It might - _if_ the "flow control" being mentioned is between TCP
endpoints (at least in the receive case) however, if it is an
intra-stack flow control it would be different. One possible
experiement is to take the code, hack it a bit to open a socket,
connected it to chargen on some other system and see in tcpdump just
how many bytes flow before the zero window advertisements start to
flow.

rick jones
--
The glass is neither half-empty nor half-full. The glass has a leak.
The real question is "Can it be patched?"
these opinions are mine, all mine; HP might not want them anyway... :) 
feel free to post, OR email to raj in cup.hp.com but NOT BOTH...
!