Recent

Author Topic: real case test MM parallel 4x scalable (i7 6700)  (Read 11329 times)

guest59697

  • Guest
real case test MM parallel 4x scalable (i7 6700)
« on: July 26, 2017, 07:38:54 am »
I did a small test with real code scenario,
look at parallel zlib with my patch, zcompress loop 1000 of a 1100KB text file:

uses System.Zlib;

threadvar
  INS: TMemoryStream;
  OUTS: pointer;
  SizeIn: integer;
  SizeOUT: integer;

procedure TForm.CompressClick(Sender: TObject);
var
  Count: integer;
begin
    Count := GetTickCount;
    TParallel.For(1,1000,procedure(I:integer)
    begin
    INS := TMemoryStream.Create;
    INS.LoadFromFile('c:\teststream.txt');
    SizeIn := INS.Size;
    GetMem(OUTS, SizeIn);
    SizeOUT := SizeIn;
    ZCompress(INS.Memory, SizeIn, OUTS, SizeOUT, zcFastest);
    INS.Free;
    FreeMem(OUTS);
    end);
        ShowMessage(IntToStr(GetTickCount - Count));
end;

- fastmm4 900-1000msec
- brainMM 563msec
- msheap 532msec
- my patch Intel IPP + TTB 281 msec

www.dellapasqua.com
www.dellapasqua.com/intelTBB.rar (put a teststream.txt file on c:\ and run files)

Blestan

  • Sr. Member
  • ****
  • Posts: 461
Re: real case test MM parallel 4x scalable (i7 6700)
« Reply #1 on: July 26, 2017, 08:40:11 am »
how do you use fastmm brainmm and others under fpc?
Speak postscript or die!
Translate to pdf and live!

guest59697

  • Guest
Re: real case test MM parallel 4x scalable (i7 6700)
« Reply #2 on: July 26, 2017, 08:45:16 am »
I wait Thaddy for the adaption of the code for FPC

Blestan

  • Sr. Member
  • ****
  • Posts: 461
Re: real case test MM parallel 4x scalable (i7 6700)
« Reply #3 on: July 26, 2017, 08:51:10 am »
aaa ok ... im also interested in mm under fpc but keep in mind that mm cannot be asm dependent as it must be crossplaform. i hope thaddy will make good point
Speak postscript or die!
Translate to pdf and live!

sam707

  • Guest
Re: real case test MM parallel 4x scalable (i7 6700)
« Reply #4 on: July 26, 2017, 12:13:36 pm »
easy remark :
full featured garbage collector, in multitreaded native environment i've ever seen AT VERY FIRST (before c#, java and others, so wikipedia is even wrong about it) was in Oberon 2, A2, son of modula language which itself is the principal descendant of pascal. pascal, modula, oberon 2 are from research of same author : Professor Niklaus Wirth

https://www.youtube.com/watch?v=BJIqHIYSDrk

Quote
Oberon-2 provides built-in run-time support for garbage collection

https://en.wikipedia.org/wiki/Oberon-2
reinventing the wheels make them seem square maybe

Long Life to "M"ister Wirth! Father of Pascal

sam707

  • Guest
Re: real case test MM parallel 4x scalable (i7 6700)
« Reply #5 on: July 26, 2017, 12:52:28 pm »
modula 2 = a kind of pascal compiler which separates units into 2 files 1) interfaces 2) implementations = pioneer of modular preprocessing, enabling to compile interfaces "headers like .h in c" separately from implementations so the compiler is truely able to do parrallel compilation of multiple "implementation files", knowing preprocessed "interfaces files"

Oberon = another upragde of pascal/modula 2, that makes oberon an OS and a compiler, each program written in Oberon 2 has as many entry points as needed, because it is fully multithreaded from A to Z

Next stage ? as told in the above video by Pr Wirth = a chip (kinda eprom) able to downoad any available processor architecture, and mimic the processor you need/know.
this guy, Father of pascal, is Fabulous!!!

there is NO equivalent guy in C world, i can tell !!!
im working with Qt for 9 years LOL shhhhht (it's far behind)
« Last Edit: July 26, 2017, 01:09:00 pm by sam707 »

sam707

  • Guest
Re: real case test MM parallel 4x scalable (i7 6700)
« Reply #6 on: July 26, 2017, 01:20:06 pm »
Pr Wirth makes me dream ... "the end of cross compilers" with a "mutant" processor able to mimic the others LOL

asm
"flavour m68k"
some apple stuffs
"flavour z80"
some 8 bits stuffs
"flavour icore2"
some bithreaded stuffs
end asm

let the mutant select its form and registers! ROCK N ROLL Pr Wirth  :D

hmmm lemme remember, an idiot told me that when you pass 40 years you are too old in computers world HAAHHAHHAHAHAHH
may i introduce you Niklaus Wirth ???
« Last Edit: July 26, 2017, 01:29:20 pm by sam707 »

Thaddy

  • Hero Member
  • *****
  • Posts: 14373
  • Sensorship about opinions does not belong here.
Re: real case test MM parallel 4x scalable (i7 6700)
« Reply #7 on: July 26, 2017, 01:34:14 pm »
aaa ok ... im also interested in mm under fpc but keep in mind that mm cannot be asm dependent as it must be crossplaform. i hope thaddy will make good point
Roberto's code is strictly Windows and uses *a lot* of intel assembler. Re-doing that cross-platform is not an option. (But it is an option for all intel platforms)
What *is* an option is to use some of the algorithmic improvements, which I am focussing  on.
Note the mm's from Delphi and FPC differ in architecture. Not only the interface, but also what is expected to be implemented: like storing size, which delphi does not do: Delphi simply relies on slot/bucket size only.
The fpc compiler and rtl internals rely on that part of implementation, so it can not be changed very easily. I will have to implement storage of size.
OTOH, the core algorithm of FPC's MM is similar but *a lot* better designed in other parts, so speed gains will be not as big as with Delphi.

Don't expect too much from it. The biggest gains will be on intel only. It will also never be a default, but a plugin MM.

Also: once you use the available optimizations in FPC, the pure pascal implementation of the default MM can be much improved using just the settings that are already there.
FPC is much, much better than Delphi for using vectors, mmx, sse and the likes on intel, but also e.g. VFPX on arm, provided you specify the correct options.
And that is really a matter of documenting how to speed up the MM. (This only goes for memory block operations, but that is essentially a MM)

To put it another way: if you would use the FPC mm code in Delphi 7, (stripping the storage of size part) Delphi 7 would be faster than with its standard MM. You can try that yourself. I did that during the MM competition years ago. (Where I became unofficially slowest with my nifty commm.pas  8-) )
« Last Edit: July 26, 2017, 01:49:40 pm by Thaddy »
Object Pascal programmers should get rid of their "component fetish" especially with the non-visuals.

Blestan

  • Sr. Member
  • ****
  • Posts: 461
Re: real case test MM parallel 4x scalable (i7 6700)
« Reply #8 on: July 26, 2017, 03:20:43 pm »
offtopic: @thaddy :thanks alot for the internals of fpc mm... this is something that iwant to readforyears. and whar about mem fragmentation over long run... 6m to 1y ?  is it possible to dedicate a memory block to some aspectof theprogram and alloc and dealloc in it and finally to free the block at some point....for example to reserve  a blockfora server object and do allocs in it for"child" objects and atthe end to release the block onserver termination  like the old pascal heap mark/release function?
Speak postscript or die!
Translate to pdf and live!

sam707

  • Guest
Re: real case test MM parallel 4x scalable (i7 6700)
« Reply #9 on: July 26, 2017, 04:26:47 pm »
I had a look around THeapStatus, TFPCHeapStatus, and the Delphi XE2 THeapStatus with a simple heap stress program, last year

@Thaddy is 110% right on what he just said

FPC defeated delphi on  mem alloc/realloc by more than 1 minute and a half on 30 minutes intensive stress test (I5 sandy bridge cpu, fpk 2.4.2)
« Last Edit: July 26, 2017, 04:29:22 pm by sam707 »

sam707

  • Guest
Re: real case test MM parallel 4x scalable (i7 6700)
« Reply #10 on: July 26, 2017, 04:50:46 pm »
Chinese biscuit philosophy sentence =>

Laz + Fpk is to delphi what "the Gimp" is to Photoshop  :D

meaning, when you can't know what you do, you pay AT MAX for a very good product, ...... and when you know what you do, you install a better one.... FOR FREE

 :D :D :D

Blestan

  • Sr. Member
  • ****
  • Posts: 461
Re: real case test MM parallel 4x scalable (i7 6700)
« Reply #11 on: July 26, 2017, 05:43:25 pm »
@sam707:  your example of gimp vs photosphop is not correct its 110% wrong ... gimp is a good producg but is far behind the best product - photoshop ... im giving you just 4 examples : 1. CMYK support 2. features ( non destructive editing , content aware fill etc) 3 format support (raw, psd ... etc) 4. speed - gpu usage for image calculation and opengl for display...
please be more accurate when making general assumptions/ conclusions
Speak postscript or die!
Translate to pdf and live!

sam707

  • Guest
Re: real case test MM parallel 4x scalable (i7 6700)
« Reply #12 on: July 26, 2017, 06:33:28 pm »
in your opinion and flavor Sir @Blestan  :P
because from where i am and where i did work, you can't imagine the number of pro (likewise in the shrek movies production) who did amazing thingeys on Gimp that they were absolutely unable to reproduce on photoglups

I took the example from my mad mind because when I buy a car my wifey always looks for the paints with monkeys on fire while i am looking for the power of the engine, deep under

So I guess, int the State of the Art, my comparison gimp > photoSHOP - laz > delpiSHOP is far away to be bad LMAO!!!

okay, i'll say hello for you to my wifey while she cry on colors of bankets  :P

guest59697

  • Guest
Re: real case test MM parallel 4x scalable (i7 6700)
« Reply #13 on: July 26, 2017, 09:48:14 pm »
Thaddy, maybe you can consider to do a TParallel class for FPC, loops, tasks, waitable timers? Then the code can take full advantage of threads and SIMD instructions, to have horizontal scalability upon number of cpu cores

guest59697

  • Guest
Re: real case test MM parallel 4x scalable (i7 6700)
« Reply #14 on: July 26, 2017, 09:50:26 pm »
and very hard to do, make the generic collections parallelized, as dictionary, linked lists, tlist and descendants.
This can be very useful. Consider to make TLS thread local slot variables to fully parallelize records and classes.
Then FPC will be very powerful.

 

TinyPortal © 2005-2018