Recent

Author Topic: Stream read/write via inline assembler [SOLVED by ASerge]  (Read 1553 times)

totya

  • Hero Member
  • *****
  • Posts: 555
Re: Stream read/write via inline assembler [SOLVED by ASerge]
« Reply #30 on: June 15, 2019, 04:40:05 pm »
Here is an example.

Hi!

Thanks for this sample, but I got warnings at the begining (compiled to 64 bit).

Quote
project1.lpr(21,1) Warning: Object file "unit1.o" contains 32-bit absolute relocation to symbol ".data.n_tc_$unit1$_$tform1_$_convert$tmemorystream$tmemorystream_$$_onemask".

Code: Pascal  [Select]
  1. {$asmmode intel}
  2. procedure TForm1.Convert(const StreamIn, StreamOut: TMemoryStream);
  3. const
  4.  ONEMASK: array[0..15] of byte=($01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01);
  5. begin
  6.   asm
  7.     MOVDQA xmm0, ONEMASK
  8.   end;
  9. end;

engkin

  • Hero Member
  • *****
  • Posts: 2513
Re: Stream read/write via inline assembler [SOLVED by ASerge]
« Reply #31 on: June 15, 2019, 05:24:14 pm »
How about this.

totya

  • Hero Member
  • *****
  • Posts: 555
Re: Stream read/write via inline assembler [SOLVED by ASerge]
« Reply #32 on: June 15, 2019, 11:18:23 pm »
How about this.

About this don't help for me. But not a big problem, because with ASerge very usable code, I can use the TMemoryStream as const array. And it's better, because Stream is more flexible than const array... (etc: variable parameter).

The asm code under development (partially working yet), because if the "key array" is shorter than 16, and not divisible, then as I see, I need create an array table... but it will only tomorrow...

for example if 16 mod KeyArray.Size  > 0, it's a problem... :)

sample keyarray:
$EA  $FA $AA $22 $11

then I need a hash table for 16 byte operation, similar of this:
$EA  $FA $AA $22 $11 $EA  $FA $AA $22 $11 $EA  $FA $AA $22 $11 $EA
$FA $AA $22 $11 $EA  $FA $AA $22 $11 $EA  $FA $AA $22 $11 $EA  $FA
$AA $22 $11 $EA  $FA $AA $22 $11 $EA  $FA $AA $22 $11 $EA  $FA $AA
$22 .. and so on. :)

engkin

  • Hero Member
  • *****
  • Posts: 2513
Re: Stream read/write via inline assembler [SOLVED by ASerge]
« Reply #33 on: June 15, 2019, 11:24:21 pm »
didn't you mention 20MB?  But yes, it is asm after all.

totya

  • Hero Member
  • *****
  • Posts: 555
Re: Stream read/write via inline assembler [SOLVED by ASerge]
« Reply #34 on: June 15, 2019, 11:50:10 pm »
With rvk "pascal" code the speed is okay, similar to asm, few houndred MB/s. It's more than enough, but I'd like to see the speed with SIMD (MMX) registers :)

Otherwise I don't see the puhs/pop (I know its need alternative way) for the mmx registers in your code... I can do it this way:

Code: Pascal  [Select]
  1.            @@EndLoop:
  2.            MOV     RDI, RemainStartIndex
  3.            MOV     [RDI], RAX
  4.  
  5.   end ['rsi', 'rdi', 'rax', 'rbx', 'r10', 'r11', 'xmm1'];

Unfortunatelly this last line kill the Jedi Code Format function (but finally I know why don't work JEDI if asm code available), I need a bug report...

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 7253
Re: Stream read/write via inline assembler [SOLVED by ASerge]
« Reply #35 on: June 16, 2019, 03:25:21 am »
Thanks for this sample, but I got warnings at the begining (compiled to 64 bit).

In 64-bit, you need to work via RIP

    MOVDQA xmm0, [rip+ONEMASK]

movdqu is better though. Older processors might not like it if onemask is unaligned.

totya

  • Hero Member
  • *****
  • Posts: 555
Re: Stream read/write via inline assembler [SOLVED by ASerge]
« Reply #36 on: June 16, 2019, 08:45:12 am »
Thanks for this sample, but I got warnings at the begining (compiled to 64 bit).

In 64-bit, you need to work via RIP

    MOVDQA xmm0, [rip+ONEMASK]

movdqu is better though. Older processors might not like it if onemask is unaligned.

OMG, I rewrote my code on yesterday, because as I saw this operation can't handle the offset parameter (if not divisible by 16)... and as you wrote, unaligned version available... thanks for this valuable information!  :)

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 7253
Re: Stream read/write via inline assembler [SOLVED by ASerge]
« Reply #37 on: June 16, 2019, 09:37:20 am »
Note that unaligned access in a loop doubles the needed bandwidth. (unaligned typically reads two aligned 16-byte areas, and gives you the needed part from it)

So it is better to

1. check if your count is large enough (don't bother <256 or so)
2. process  a few bytes on normal cPU to align
3. process with SSE till the remainer <16
4. process  rest bytes on normal cPU again

totya

  • Hero Member
  • *****
  • Posts: 555
Re: Stream read/write via inline assembler [SOLVED by ASerge]
« Reply #38 on: June 16, 2019, 05:19:45 pm »
Note that unaligned access in a loop doubles the needed bandwidth. (unaligned typically reads two aligned 16-byte areas, and gives you the needed part from it)

Your original idea is fantastic, I use that. The asm/pas code is more simple that way, thanks again! Typical filesize is about few MB, so double memory req doesn't matter.  The asm code about finished and result is okay with the own test app, as soon I try it with the reall app. 

totya

  • Hero Member
  • *****
  • Posts: 555
Re: Stream read/write via inline assembler [SOLVED by ASerge]
« Reply #39 on: June 16, 2019, 09:49:26 pm »
After the storm is about go away, I got my computer, and I can try the new code...

With inital "unoptimized" pascal code, in the past

I got 35 MB/s

So, with rvk master code

I got 180 MB/s (compiled to x86) (300MB test size)
I got 370 MB/s (compiled to x64) (300MB test size)

Nice speed increase from x64...

So, now these are very nice speeds...

Now I got idea from engkin

Shortly I can use 128 bit register. After many sucking, I created a workable asm code. With this

I got 860 MB/s (compiled to x64) (300MB test size) (but compile destination doesn't matter really)

I excepted higher speed than this, but the truth is about as rvk master said, the compiler works very well... (from ugly, but fast code).

Thanks to everyone!