so just loop that in assembler and change the byte
How large is the stream you are talking about?
there is a property "Memory" which returns the pointer of the memory block So if you know assembler you can do this..
If I see a code which read data from the stream/buffer to a register one by one and byte-steps, and write this to the other stream/buffer, I think it's a good start for the beginng.
{$ASMMODE INTEL}...
That's additional danger with assembler - easy to make a mistake. I forgot to dereference Memory and that the property should be accessed directly through the field, otherwise the offset 0 is used.My operation is more complicated than inc(), but unfortunatelly I got sigsev (StreamOut.Position := 0;) with this untouched simple test code:
...
...
pascal implementation: 35,32 MB/sCan we see your pascal implementation?
assembler implementation: 333,88 MB/s
It's about 10 times faster...
Can we see your pascal implementation?
Did it also work with tstream.memory as array or did you use tstream.read and write?
Did you try the repeat/until ASerge showed as commented code?
... but something about this:There is the problem.
(And even this probably can be more optimized)
(I take it you do your testing outside the ide without debugger)
Size := MemStreamIn.Size; MemStreamOut.Size := Size + HeaderSize; pIn := 0; pOut := HeaderSize; pTo := MemStreamOut.Size -1; // orig: pTo := MemStreamOut.Size; repeat B := PByte(MemStreamIn.Memory)[pIn]; asm // put here your assembler code of just the algoritme. end; PByte(MemStreamOut.Memory)[pOut] := B; Inc(pIn); Inc(pOut); until pOut > pTo;
Small bug corrected ;) Se: //Not a 'bug'... That's why I added the extra note :P
(just typed out of my head so the > and begin and end values might be slightly off)
and begin and end values might be slightly off)
If you are about speed, try using SIMD instructions. Or unroll the loop.
The sources are files, and total file size about 20MB (at the moment).
... { simple bitwise operations with an const array... array element choice: see: Counter }
if you show me workable sample (like as ASerge asm code) I can to try it.
Here is an example (https://forum.lazarus.freepascal.org/index.php/topic,33761.msg219682.html#msg219682).
Here is an example (https://forum.lazarus.freepascal.org/index.php/topic,33761.msg219682.html#msg219682).
project1.lpr(21,1) Warning: Object file "unit1.o" contains 32-bit absolute relocation to symbol ".data.n_tc_$unit1$_$tform1_$_convert$tmemorystream$tmemorystream_$$_onemask".
How about this (https://forum.lazarus.freepascal.org/index.php/topic,39098.msg267861.html#msg267861).
Thanks for this sample, but I got warnings at the begining (compiled to 64 bit).
Thanks for this sample, but I got warnings at the begining (compiled to 64 bit).
In 64-bit, you need to work via RIP
MOVDQA xmm0, [rip+ONEMASK]
movdqu is better though. Older processors might not like it if onemask is unaligned.
Note that unaligned access in a loop doubles the needed bandwidth. (unaligned typically reads two aligned 16-byte areas, and gives you the needed part from it)