Recent

Author Topic: My Parallel Sort Library and benchmarks ...  (Read 37039 times)

aminer

  • Hero Member
  • *****
  • Posts: 956
My Parallel Sort Library and benchmarks ...
« on: May 07, 2010, 07:35:52 am »

Hello all,



I have used my Parallel Sort Library and benchmarked an Object Pascal
program that sorts a dynamic array of 10 millions of strings.

Please look at the benchmarks in following page:

http://pages.videotron.com/aminer/parallelsort/parallelsort.htm


On four cores four threads, here is the results:
 
Parallel Quicksort gives 5.31x speed and 2.877 seconds
Parallel Heapsort gave me 4.72x  and 7.452 seconds

Note: Parallel Quicksort is much faster in pratice than parallel heapsort
         or parallel mergesort.





Sincerely
Amine Moulay Ramdane.


skalogryz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2532
    • havefunsoft.com
Re: My Parallel Sort Library and benchmarks ...
« Reply #1 on: May 07, 2010, 07:53:31 am »
really nice job!
Patron Cocoa Widgetset development https://www.patreon.com/skalogryz

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3887
  • I like bugs.
Re: My Parallel Sort Library and benchmarks ...
« Reply #2 on: May 07, 2010, 09:16:36 am »
Interesting.
How can quicksort became 3.35 times faster with 2 processor cores?
I would guess the theoretical maximum speed gain is *2.
The same thing with 4 cores. How can it become more than 4 times faster?

I looked at the home page: http://pages.videotron.com/aminer/
but didn't see any download link.
How can I get the source code? Does it even exist?

Regards,
Juha
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux.

theo

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1891
Re: My Parallel Sort Library and benchmarks ...
« Reply #3 on: May 07, 2010, 09:48:39 am »
I looked at the home page: http://pages.videotron.com/aminer/
but didn't see any download link.
How can I get the source code? Does it even exist?

There is a ZIP link at the right side of the table (column:Files)

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3887
  • I like bugs.
Re: My Parallel Sort Library and benchmarks ...
« Reply #4 on: May 07, 2010, 10:28:55 am »
There is a ZIP link at the right side of the table (column:Files)

Oops, yes, it does exist.
Has someone tested it with FPC? There seem to be some Delphi units copied which could easily be replaced with what FPC offers.

Can anyone explain the speed difference. How can 2 cores make it more than 2 times faster? Is it some cache hit issue?
Now I don't have a multi-core CPU around. Some day I will test it myself.

Juha
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux.

jarto

  • Full Member
  • ***
  • Posts: 106
Re: My Parallel Sort Library and benchmarks ...
« Reply #5 on: May 09, 2010, 07:16:49 am »
I spent some time trying to get a decent test case done. When I try to run it on my Ubuntu 9.10:

An unhandled exception occurred at $0809F010 :
EAccessViolation : Access violation
  $0809F010  GETTHREADID,  of LockFreePrimitives.pas
  $0809E9C2  TGJRINGBUFFER__MEASUREEXECUTIONTIMES,  line 240 of RingBuffer.pas

I noticed, that there's quite a bit of assembler code in these files:

classespatch.pas
lockfree_mpmc.pas
LockFreePrimitives.pas
ParallelQueue.pas
parallelqueuerb.pas
ParallelSort.pas
threadpool.pas

I've never done any asm-programming on Delphi/FPC, but I presume that the library is Windows-only?

I also noticed that the test runs with different cores were not done with identical data. That's why I wanted to make a test case, that sorts identical data with different number of cores.

If someone could try this on Windows, here's my testcase:

http://www.starsoft.fi/jarto/ParallelSortTest.zip

In my code, you can decide the number of names and number of cores to test here:

TestParallelSort(10000,1,4); //Tests 10000 names with 1 -> 4 cores.

scionn

  • Newbie
  • Posts: 1
Re: My Parallel Sort Library and benchmarks ...
« Reply #6 on: May 09, 2010, 08:31:16 am »
Can anyone explain the speed difference. How can 2 cores make it more than 2 times faster? Is it some cache hit issue?

I suppose that this differenced is based on the base performance.

In the single core test the core is not 100% working on the sort, in fact while it testing there are a lot of other programs that windows must execute, think at an antivirus, or any other software.

So the performance is not right one core, but 0.75 core.

In the other tests, we can think that the first core work at 75% but the other can run at 100% so you have e final result that is bigger than the number of core.

This is what I suppose with my knowledge of System Administrator, I don't know if in the algoritm of quicksort there is something that take advance with parallel processing, making it result better.

p.s. sorry for my english

Roberto Gibertini

jarto

  • Full Member
  • ***
  • Posts: 106
Re: My Parallel Sort Library and benchmarks ...
« Reply #7 on: May 10, 2010, 05:27:58 am »
Here's an updated version of my test program. In this one you can specify the number of names as a command line argument. It also saves the original names and results as text files as long as number of names does not exceed 100000.

http://www.starsoft.fi/jarto/ParallelSortTest2.zip

With my Dell Latitute E5500 with 2,40GHz P8600 I got this kind of results:

Cores: 2
4000000 names sorted in 00:13:360
Cores: 1
4000000 names sorted in 00:24:000

But the biggest concern was, that the result files did not always match. One or a few names were not sorted properly in some of the test runs.

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3887
  • I like bugs.
Re: My Parallel Sort Library and benchmarks ...
« Reply #8 on: May 10, 2010, 07:18:35 am »
With my Dell Latitute E5500 with 2,40GHz P8600 I got this kind of results:

Cores: 2
4000000 names sorted in 00:13:360
Cores: 1
4000000 names sorted in 00:24:000

But the biggest concern was, that the result files did not always match. One or a few names were not sorted properly in some of the test runs.

Thanks for testing!
So 2 cores didn't make it more than 2 times faster. This looks more realistic.

It is a big concern if the result is not sorted correctly, yes.
I hope the original author, Aminer, could comment these results.

BTW: The assembly code doesn't necessarily make this "Windows only" but it makes this "Intel only". It is used for locking threads and for atomic moves. I guess it could be replaced with portable code.

Juha
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux.

aminer

  • Hero Member
  • *****
  • Posts: 956
Re: My Parallel Sort Library and benchmarks ...
« Reply #9 on: May 11, 2010, 02:57:15 am »


>It is a big concern if the result is not sorted correctly.

I can not reproduce this...

Also in the TParallelSort.create(n,ctQuicksort);

the number of cores must be 2^N: 1,2,4 etc. 

>BTW: The assembly code doesn't necessarily
>make this "Windows >only" but it makes this "Intel only"

rigth.


Amine.



aminer

  • Hero Member
  • *****
  • Posts: 956
Re: My Parallel Sort Library and benchmarks ...
« Reply #10 on: May 11, 2010, 03:07:15 am »

> hope the original author, Aminer, could comment these results.


As i have noticed , you are using this:
 
Intel® Pentium® Processor E5500
(2M Cache, 2.80 GHz, 800 MHz FSB)

http://ark.intel.com/Product.aspx?id=42800

It has only *1* L2 cache of 2M.


I have used CPU-Z on my hardware:

http://www.cpuid.com/cpuz.php

and it gives me this:

Level 1 Data: 4 x 34 KBytes  8ways
Level 1 Inst.: 4 x 34 KBytes  8 ways
Level 2: 2 x 4096 KBytes  16 ways

So, mine has *2* L2 caches of 4M

So , i think this explains the difference...


Amine.








aminer

  • Hero Member
  • *****
  • Posts: 956
Re: My Parallel Sort Library and benchmarks ...
« Reply #11 on: May 11, 2010, 03:19:00 am »

I have on my hardware:

Level 1 Data: 4 x 34 KBytes  8ways
Level 1 Inst.: 4 x 34 KBytes  8 ways
Level 2: 2 x 4096 KBytes  16 ways


So, when i am quicksorting a big array on 1 core ,
there is much higher cache misses than
on 2 threads/2 cores - cause i am breaking the array
in 4 parts and quicksorting in parallel, also ,
i have 2 x L2 caches of 4M  - and you have only one
one on your E5500 - and this explains my results...



Amine.




aminer

  • Hero Member
  • *****
  • Posts: 956
Re: My Parallel Sort Library and benchmarks ...
« Reply #12 on: May 11, 2010, 03:28:39 am »


>But the biggest concern was, that the result files did not always >match. One or a few names were not sorted properly in some of >the test runs.


As i have noticed, you are using this:

TestParallelSort(10000,1,4); //Tests 10000 names with 1 -> 4 cores.

As i said before, the number of cores must be 2^N: 1,2,4 etc. 

So, you have to avoid *3*, just: 1,2,4,6,8 etc.


I can not reproduce your problem here.



Amine.












aminer

  • Hero Member
  • *****
  • Posts: 956
Re: My Parallel Sort Library and benchmarks ...
« Reply #13 on: May 11, 2010, 03:33:50 am »

>As i have noticed, you are using this:
>TestParallelSort(10000,1,4); //Tests 10000 names with 1 -> 4 cores.
>As i said before, the number of cores must be 2^N: 1,2,4 etc. 
>So, you have to avoid *3*, just: 1,2,4,6,8 etc.


I have reproduced your problem with a number of cores
equal to 3.


So, try to use a number that is not 3 and that is 2^N,
example: 1,2,4,6,8 ,10,12,14,16 etc.

and you will be safe.

 
You are welcome  :)

Sincerely,

Amine.











 

aminer

  • Hero Member
  • *****
  • Posts: 956
Re: My Parallel Sort Library and benchmarks ...
« Reply #14 on: May 11, 2010, 05:16:48 am »

>I spent some time trying to get a decent test case done. When I >try to run it on my Ubuntu 9.10:

>An unhandled exception occurred at $0809F010 :
>EAccessViolation : Access violation
>  $0809F010  GETTHREADID,  of LockFreePrimitives.pas
>  $0809E9C2  TGJRINGBUFFER__MEASUREEXECUTIONTIMES,  line >240 of RingBuffer.pas


I think my ParallelQueue will compile correctly...

RingBuffer is from
http://www.odsrv.com/RingBuffer/RingBuffer.htm

So, if you want to run it on  Ubuntu 9.10 just
delete the 'uses parallelqueuerb,ringbuffer' inside
threadpool.pas, after that compile it and tell me
if it's working on  Ubuntu 9.10 correctly.



Amine.








 

TinyPortal © 2005-2018