To be brutally frank, I normally program direct to the kernel (i.e. the Berkeley sockets API) and have no problems with superfluous copying.
However, when one looks at it from the POV of both efficiency and robustness one has to conclude that there are issues which go all the way down to the CPU architecture.
MarkMLl