Archived from groups: comp.dcom.lans.ethernet (
More info?)
In article <416EFF42.101E4E28@cox.net>, ohaya <ohaya@cox.net> wrote:
:I may have been unclear by what I meant by a "manual copy" test. What
:they are suggesting that I do is create a 36GB file on one server, then:
:- manually time a file copy from that server to the other server, and
:- manually time a file copy from that server to itself, and
:- subtract the times and divide the result by 36GB.
That test is dubious.
- The time to copy a file is dependant on the OS and drive maximum
write rate, and the write rates are not necessarily going to be the
same between the two servers [unless they are the same hardware through
and through.]
- A copy of a file from a server to itself can potentially be
substantially decreased by DMA. Depends how smart the copy program is.
There is the advantage of knowing that one is going to be starting the
read and write on a nice boundarys, so one could potentially have the
copy program keep the data in system space or maybe even in hardware
space.
- When the file is being copied locally, if it is being copied to the
same drive, then the reads and writes are going to be in contention
whereas when copying a file to a remote server, the reads and writes
happen in parallel. The larger the memory buffer that the system can
[with hardware cooperation] allocate to a single disk I/O, the fewer
the times the drive has to move its head... if, that is, the file is
allocated into contiguous blocks and is being written into contiguous
blocks, though this need would be mitigated if the drive controller
supports scatter-gather or CTQ.
- When the file is being copied locally, if it is being copied to the
same controller, then there can be bus contention that would prevent
the reads from operating in parallel with the writes. But again system
buffering and drive controller cache and CTQ can mitigate this: some
SCSI drives do permit incoming writes to be buffered while they are
seeking and reading for a previous read request.
- The first copy is going to require that the OS find the directory
entry and locate the file on disk and start reading. But at the time of
the second copy, the directory and block information might be cached by
the OS, reducing the copy time. Also, if the file fits entirely within
available memory, then the OS may still have the file in it's I/O
buffers and might skip the read. (Okay, that last is unlikely to happen
with a 30 Gb file on the average system, but it is not out of the
question for High Performance Computing systems.)
- In either copy scenario, one has to know what it means for the last
write() to have returned: does it mean that the data is flushed to
disk, or does it mean that the last buffer of data has been sent to the
filesystem cache for later dispatch when convenient? Especially when
you are doing the copy to the remote system, are you measuring the time
until the last TCP packet hits the remote NIC and the ACK for it gets
back, or are you measuring the time until the OS gets around to
scheduling a flush? The difference could be substantial if you have
large I/O buffers on the receiving side! Is the copy daemon using
synchronous I/O or asynch I/O ?
- A test that that would more closely simulate the source server's copy
out to network, would be to time a copy to the null device instead of
to a file on the server. But to measure the network timing you still
need to know how the destination server handles flushing the last
buffer when a close() is issued. Ah, but you also have to know how the
TCP stack and copy daemons work together.
When the copy-out daemon detects the end of the source file, it will
close the connection and the I/O library will translate that into
needing to send a FIN packet. But will that FIN packet get sent in the
header of the last buffer, or will it be a separate packet? And when
the remote system receives the FIN, does the TCP layer FIN ACK
immediately, or does it wait until the copy-in daemon closes the input
connection? If it waits, then does the copy-in daemon close the input
connection as soon as it detects EOF, or does it wait until the write()
on the final buffer returns? When the copy-out daemon close()'s the
connection, does the OS note that and return immediately, possibly
dealing with the TCP details on a different CPU or in hardware, or does
the OS wait for the TCP ACK gets received before it returns to the
program? Are POSIX.1 calls being used by the copy daemons, and if so
what does POSIX.1 say is the proper behaviour considering that until
the ACK of the last output packet arives, the write associated with the
implicit flush() might fail: if the last packet gets dropped [and all
TCP retries are exhausted] then the return from close() is perhaps
different than if the last packet makes it. Of maybe not and one has to
explicitly flush() if one wants to distinguish the cases. Unfortunately
I don't have my copy of POSIX.1 with me to check.
I bet the company didn't think of these problems when they asked you to
do the test. Or if they did, then they are probably assuming that
the boundary conditions will not make a significant contribution
to the final bandwidth calculation when the boundary conditions
are amoratized over 30 Gb. But there are just too many possibilities
that could throw the calculation off significantly, especially
the drive head contention and the accounting of the time to flush the
final write buffer when one has large I/O buffers.
--
Everyone has a "Good Cause" for which they are prepared to spam.
-- Roberson's Law of the Internet