Recent

Author Topic: Stream read/write via inline assembler [SOLVED by ASerge]  (Read 2165 times)

totya

  • Hero Member
  • *****
  • Posts: 577
Stream read/write via inline assembler [SOLVED by ASerge]
« on: June 09, 2019, 06:53:37 pm »
Hi!

I know, asm isn't very popular, but I'd like to access to stream data (TMemoryStream) via (inline) assembler (x86). Because I want execute a simple (byte!) operation on the whole stream (stream.size), and I think this is much faster with asm.

So I'd like a similar asm example for that (only the for cycle):

Code: Pascal  [Select]
  1. procedure TForm1.Operation(const StreamIn, StreamOut: TMemoryStream);
  2. var
  3.   i: integer;
  4.   B: byte;
  5. begin
  6.   StreamIn.Position:=0;
  7.   StreamOut.Clear;
  8.  
  9.   for i := 0 to StreamIn.Size - 1 do
  10.   begin
  11.     B := StreamIn.ReadByte;
  12.     B := B + 1; // operation example...
  13.     StreamOut.WriteByte(B);
  14.   end;
  15.  
  16.   StreamOut.Position := 0;
  17. end;      

I guess the StreamIn.Memory pointed to the stream memory...

Thanks...
« Last Edit: June 10, 2019, 10:53:16 pm by totya »

LemonParty

  • New Member
  • *
  • Posts: 28
Re: Stream read/write via inline assembler
« Reply #1 on: June 09, 2019, 07:18:49 pm »
Use method Read.
Then write a separate function that proceed the array of readed bytes (may be written in pure assembler).
Then put back proceeded bytes using TStream.Write.

This algorithm have to really increase the performance.

(inlining not work with functions that contains loops)

rvk

  • Hero Member
  • *****
  • Posts: 3842
Re: Stream read/write via inline assembler
« Reply #2 on: June 09, 2019, 07:26:10 pm »
As you mentioned StreamIn.Memory contains the array of bytes so just loop that in assembler and change the byte. After that the InStream will have your changed stream. No need for read and write.

But I wonder how much performance gain you will get.

How large is the stream you are talking about?

totya

  • Hero Member
  • *****
  • Posts: 577
Re: Stream read/write via inline assembler
« Reply #3 on: June 09, 2019, 07:30:40 pm »
Hi rvk master! :)

Well, the StreamIn and StreamOut size is different, so I need read from StreamIn and write to StreamOut (Streamout has header).

Quote
so just loop that in assembler and change the byte

I understand, but please show me an example :) I used assembler long time ago... and the google not my friend in this case...

Edit for new question.:
How large is the stream you are talking about?

The sources are files, and total file size about 20MB (at the moment). The speed is not very bad with pascal, but I want to see it with asm... :)
« Last Edit: June 09, 2019, 08:01:41 pm by totya »

jamie

  • Hero Member
  • *****
  • Posts: 2162
Re: Stream read/write via inline assembler
« Reply #4 on: June 09, 2019, 08:01:47 pm »
there is a property "Memory" which returns the pointer of the memory block

So if you know assembler you can do this..

I suppose I can code up an example but why? Unless you are doing a lot of short setups
I can't see a reason for it but who am I. Maybe I'll feel generous and code up an example.


Number 1 at blue screen app creations!

rvk

  • Hero Member
  • *****
  • Posts: 3842
Re: Stream read/write via inline assembler
« Reply #5 on: June 09, 2019, 08:03:10 pm »
I don't think it will be much faster in assembler but you can try.
For me it has been even longer since I worked with assembler (around 1988).

But for move() in fpc, it is already in assembler. So you can create a stream (set size) and just do a move. Or don't work with streams at all and just work with arrays.

Code: Pascal  [Select]
  1. procedure Move(const source;var dest;count:SizeInt);[public, alias: 'FPC_MOVE'];assembler;nostackframe;
  2. asm
  3.   cmp     ecx,SMALLMOVESIZE
  4.   ja      @Large
  5.   cmp     eax,edx
  6.   lea     eax,[eax+ecx]
  7.   jle     @SmallCheck
  8. @SmallForward:
  9.   add     edx,ecx
  10.   jmp     SmallForwardMove_3
  11. @SmallCheck:
  12.   je      @Done {For Compatibility with Delphi's move for Source = Dest}
  13.   sub     eax,ecx
  14.   jmp     SmallBackwardMove_3
  15. @Large:
  16.   jng     @Done {For Compatibility with Delphi's move for Count < 0}
  17.   cmp     eax,edx
  18.   jg      @moveforward
  19.   je      @Done {For Compatibility with Delphi's move for Source = Dest}
  20.   push    eax
  21.   add     eax,ecx
  22.   cmp     eax,edx
  23.   pop     eax
  24.   jg      @movebackward
  25. @moveforward:
  26.   jmp     dword ptr fastmoveproc_forward
  27. @movebackward:
  28.   jmp     dword ptr fastmoveproc_backward {Source/Dest Overlap}
  29. @Done:
  30. end;
  31.  
  32. {Move ECX Bytes from EAX to EDX, where EAX > EDX and ECX > 36 (SMALLMOVESIZE)}
  33. procedure Forwards_SSE_3;assembler;nostackframe;
  34. const
  35.   LARGESIZE = 2048;
  36. asm
  37.   cmp     ecx,LARGESIZE
  38.   jge     @FwdLargeMove
  39.   cmp     ecx,SMALLMOVESIZE+32
  40.   movups  xmm0,[eax]
  41.   jg      @FwdMoveSSE
  42.   movups  xmm1,[eax+16]
  43.   movups  [edx],xmm0
  44.   movups  [edx+16],xmm1
  45.   add     eax,ecx
  46.   add     edx,ecx
  47.   sub     ecx,32
  48.   jmp     SmallForwardMove_3
  49. @FwdMoveSSE:
  50.   push    ebx
  51.   mov     ebx,edx
  52.   {Align Writes}
  53.   add     eax,ecx
  54.   add     ecx,edx
  55.   add     edx,15
  56.   and     edx,-16
  57.   sub     ecx,edx
  58.   add     edx,ecx
  59.   {Now Aligned}
  60.   sub     ecx,32
  61.   neg     ecx
  62. @FwdLoopSSE:
  63.   movups  xmm1,[eax+ecx-32]
  64.   movups  xmm2,[eax+ecx-16]
  65.   movaps  [edx+ecx-32],xmm1
  66.   movaps  [edx+ecx-16],xmm2
  67.   add     ecx,32
  68.   jle     @FwdLoopSSE
  69.   movups  [ebx],xmm0 {First 16 Bytes}
  70.   neg     ecx
  71.   add     ecx,32
  72.   pop     ebx
  73.   jmp     SmallForwardMove_3
  74. @FwdLargeMove:
  75.   push    ebx
  76.   mov     ebx,ecx
  77.   test    edx,15
  78.   jz      @FwdLargeAligned
  79.   {16 byte Align Destination}
  80.   mov     ecx,edx
  81.   add     ecx,15
  82.   and     ecx,-16
  83.   sub     ecx,edx
  84.   add     eax,ecx
  85.   add     edx,ecx
  86.   sub     ebx,ecx
  87.   {Destination now 16 Byte Aligned}
  88.   call    SmallForwardMove_3
  89.   mov     ecx,ebx
  90. @FwdLargeAligned:
  91.   and     ecx,-16
  92.   sub     ebx,ecx {EBX = Remainder}
  93.   push    edx
  94.   push    eax
  95.   push    ecx
  96.   call    AlignedFwdMoveSSE_3
  97.   pop     ecx
  98.   pop     eax
  99.   pop     edx
  100.   add     ecx,ebx
  101.   add     eax,ecx
  102.   add     edx,ecx
  103.   mov     ecx,ebx
  104.   pop     ebx
  105.   jmp     SmallForwardMove_3
  106. end; {Forwards_SSE}
This can be shortened but you will only gain a few cycles because this procedure first determines the best way to do the move and then jumps to the appropriate function.

But I wouldn't focus on the move procedure because it's already in assembler. So create an array, first add the header, then do the move from instream.memory to your array and loop through it to perform your action.

If you look at the assembler "debug view" when you run the following snippet:
Code: Pascal  [Select]
  1. for i := 0 to 20 * 1024 * 1024 do
  2. begin
  3.   a[i] := a[i] + 1;
  4. end;

You see something like this:
Code: Pascal  [Select]
  1. asm
  2.   @back:
  3.   movl   $0xffffffff,-0xdc(%ebp)
  4.   mov    %esi,%esi
  5.   mov    -0xdc(%ebp),%eax
  6.   add    $0x1,%eax
  7.   mov    %eax,-0xdc(%ebp)
  8.   movzbl -0x70(%ebp,%eax,1),%eax
  9.   add    $0x1,%eax
  10.   mov    -0xdc(%ebp),%edx
  11.   mov    %al,-0x70(%ebp,%edx,1)
  12.   cmpl   $0x1400000,-0xdc(%ebp)
  13.   jl     back
  14. end;
What would you like to change for performance wise?

It would all depend on your "B := B + 1;" calculation because I don't think you can make the loop any more efficient. (But don't use the stream.read and stream.write because they do add some overhead)

Note: Back in the time I did assembler we only had ax, ah and al and such (8086 processor  8))

totya

  • Hero Member
  • *****
  • Posts: 577
Re: Stream read/write via inline assembler
« Reply #6 on: June 09, 2019, 08:11:23 pm »
there is a property "Memory" which returns the pointer of the memory block So if you know assembler you can do this..

If I see a code which read data from the stream/buffer to a register one by one and byte-steps, and write this to the other stream/buffer, I think it's a good start for the beginng.

rvk

  • Hero Member
  • *****
  • Posts: 3842
Re: Stream read/write via inline assembler
« Reply #7 on: June 09, 2019, 08:17:07 pm »
The last snippet in my previous post shows the for loop to manipulate an array.

So first create the header in an array, then move the instream.memory after it and do the for loop in assembler.

But even if you don't do the for loop in assembler... You can just only do your b := b +1 in assembler (assuming.it does something different than just adding 1).

ASerge

  • Hero Member
  • *****
  • Posts: 1422
Re: Stream read/write via inline assembler
« Reply #8 on: June 09, 2019, 08:22:29 pm »
If I see a code which read data from the stream/buffer to a register one by one and byte-steps, and write this to the other stream/buffer, I think it's a good start for the beginng.
Code: Pascal  [Select]
  1. {$ASMMODE INTEL}
  2. procedure Operation(const StreamIn, StreamOut: TMemoryStream);
  3. var
  4.   LSize: SizeInt;
  5. begin
  6.   LSize := StreamIn.Size;
  7.   StreamOut.Size := LSize;
  8.   //repeat
  9.   //  Dec(LSize);
  10.   //  if LSize < 0 then
  11.   //    Break;
  12.   //  PByte(StreamOut.Memory)[LSize] := PByte(StreamIn.Memory)[LSize] + 1;
  13.   //until False;
  14.   asm
  15.     mov  rsi, StreamIn
  16.     mov  rdi, StreamOut
  17.     mov  rcx, LSize
  18.     @@StartLoop:
  19.     dec  rcx
  20.     jl   @@EndLoop
  21.     mov  al, [rsi+rcx].TMemoryStream.Memory
  22.     inc  al
  23.     mov  [rdi+rcx].TMemoryStream.Memory, al
  24.     jmp  @@StartLoop
  25.     @@EndLoop:
  26.   end ['rsi', 'rdi', 'rcx', 'rax'];
  27.   StreamOut.Position := 0;
  28. end;
When using "asm" inserts, FPC stops optimizing the surrounding code, so without asm it will be faster.
« Last Edit: June 09, 2019, 08:24:41 pm by ASerge »

LemonParty

  • New Member
  • *
  • Posts: 28
Re: Stream read/write via inline assembler
« Reply #9 on: June 09, 2019, 08:56:29 pm »
Steroids gotta make the cycle running with the speed of light (at least 4 times faster than classic instructions).
But do you need the speed of light?

totya

  • Hero Member
  • *****
  • Posts: 577
Re: Stream read/write via inline assembler
« Reply #10 on: June 09, 2019, 09:10:53 pm »
Code: Pascal  [Select]
  1. {$ASMMODE INTEL}...

Thank you for this very readable code! It's enough for me the start... Seems to me these x64 registers, but seems to me its not a big problem (rsi->esi).

Thank you too: LemonParty, rvk master, jamie for answers, and informations.

totya

  • Hero Member
  • *****
  • Posts: 577
Re: Stream read/write via inline assembler
« Reply #11 on: June 09, 2019, 10:10:26 pm »

Hi!

My operation is more complicated than inc(), but unfortunatelly I got sigsev (StreamOut.Position := 0;) with this untouched simple test code:

Code: Pascal  [Select]
  1. unit Unit1;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. interface
  6.  
  7. uses
  8.   Classes, SysUtils, Forms, Controls, Graphics, Dialogs, StdCtrls;
  9.  
  10. type
  11.  
  12.   { TForm1 }
  13.  
  14.   TForm1 = class(TForm)
  15.     Button1: TButton;
  16.     Memo1: TMemo;
  17.     procedure Button1Click(Sender: TObject);
  18.   private
  19.     procedure Operation(const StreamIn, StreamOut: TMemoryStream);
  20.  
  21.   end;
  22.  
  23. var
  24.   Form1: TForm1;
  25.  
  26. implementation
  27.  
  28. {$R *.lfm}
  29.  
  30. { TForm1 }
  31.  
  32. {$ASMMODE INTEL}
  33. procedure TForm1.Operation(const StreamIn, StreamOut: TMemoryStream);
  34. var
  35.   LSize: SizeInt;
  36. begin
  37.   LSize := StreamIn.Size;
  38.   StreamOut.Size := LSize;
  39.  
  40.   //repeat
  41.   //  Dec(LSize);
  42.   //  if LSize < 0 then
  43.   //    Break;
  44.   //  PByte(StreamOut.Memory)[LSize] := PByte(StreamIn.Memory)[LSize] + 1;
  45.   //until False;
  46.   asm
  47.     mov  rsi, StreamIn
  48.     mov  rdi, StreamOut
  49.     mov  rcx, LSize
  50.     @@StartLoop:
  51.     dec  rcx
  52.     jl   @@EndLoop
  53.     mov  al, [rsi+rcx].TMemoryStream.Memory
  54.     inc  al
  55.     mov  [rdi+rcx].TMemoryStream.Memory, al
  56.     jmp  @@StartLoop
  57.     @@EndLoop:
  58.   end ['rsi', 'rdi', 'rcx', 'rax'];
  59.  
  60.   StreamOut.Position := 0;
  61. end;
  62.  
  63. procedure TForm1.Button1Click(Sender: TObject);
  64. var
  65.   StreamIn, StreamOut: TMemoryStream;
  66. begin
  67.   StreamIn := TMemoryStream.Create;
  68.   StreamOut := TMemoryStream.Create;
  69.   try
  70.     StreamIn.WriteByte(100);
  71.     StreamIn.WriteByte(100);
  72.  
  73.     Operation(StreamIn, StreamOut);
  74.  
  75.     Memo1.Lines.Add(IntToStr(StreamOut.ReadByte));
  76.     Memo1.Lines.Add(IntToStr(StreamOut.ReadByte));
  77.   finally
  78.     StreamIn.Free;
  79.     StreamOut.Free;
  80.   end;
  81. end;
  82.  
  83. end.

If I comment
//inc  al
then this code run without error, but the result is garbage... (120, 204).

jamie

  • Hero Member
  • *****
  • Posts: 2162
Re: Stream read/write via inline assembler
« Reply #12 on: June 09, 2019, 10:36:43 pm »
Or, you can use the MOVE

Move(SourceStream.Memory^, DestinationStream.Pointer^,Memory);

Reset your Seek back to zero or what ever.

The MOVE is system level and should be closer optimized over using the Methods of the
streams.
Number 1 at blue screen app creations!

ASerge

  • Hero Member
  • *****
  • Posts: 1422
Re: Stream read/write via inline assembler
« Reply #13 on: June 09, 2019, 11:57:03 pm »
My operation is more complicated than inc(), but unfortunatelly I got sigsev (StreamOut.Position := 0;) with this untouched simple test code:
That's additional danger with assembler - easy to make a mistake. I forgot to dereference Memory and that the property should be accessed directly through the field, otherwise the offset 0 is used.
Code: Pascal  [Select]
  1. procedure Operation(const StreamIn, StreamOut: TMemoryStream);
  2. var
  3.   LSize: SizeInt;
  4. begin
  5.   LSize := StreamIn.Size;
  6.   StreamOut.Size := LSize;
  7.   //repeat
  8.   //  Dec(LSize);
  9.   //  if LSize < 0 then
  10.   //    Break;
  11.   //  PByte(StreamOut.Memory)[LSize] := PByte(StreamIn.Memory)[LSize] + 1;
  12.   //until False;
  13.   asm
  14.     mov  rsi, StreamIn
  15.     mov  rdi, StreamOut
  16.     mov  rcx, LSize
  17.     @@StartLoop:
  18.     dec  rcx
  19.     jl   @@EndLoop
  20.     mov  rdx, [rsi].TMemoryStream.FMemory
  21.     mov  al, [rdx+rcx]
  22.     inc  al
  23.     mov  rdx, [rdi].TMemoryStream.FMemory
  24.     mov  BYTE PTR [rdx+rcx], al
  25.     jmp  @@StartLoop
  26.     @@EndLoop:
  27.   end ['rsi', 'rdi', 'rcx', 'rax', 'rdx'];
  28.   StreamOut.Position := 0;
  29. end;

totya

  • Hero Member
  • *****
  • Posts: 577
Re: Stream read/write via inline assembler
« Reply #14 on: June 10, 2019, 09:50:41 am »
...

Big thanks to you, this sample code result is okay now... But now I will have one less register what can I use for operations ;)