Hello,
Parallel Sort Library ws updated to version 3.02
Description:
Parallel Sort Library that supports Parallel Quicksort, Parallel HeapSort and Parallel MergeSort on Multicores systems.
Parallel Sort Library uses my Thread Pool Engine and sort many array parts - of your array - in parallel using Quicksort or HeapSort or MergeSort and after that it finally merge them - with the merge() procedure -
In the previous parallelsort version i have parallelized only the sort part, but in this new parallelsort version i have parallelized also the merge procedure part and it gives better performance.
I have implemented a Parallel hybrid divide-and-conquer merge algorithm that performs 0.9-5.8 times better than sequential merge, on a quad-core processor, with larger arrays outperforming by over 5 times. Parallel processing combined with a hybrid algorithm approach provides a powerful high performance result.
The best case complexity of ParallelSort using mergesort is:
((n/p)* log(n/p)) + O(n/p)
p: is the number of cores
the ((n/p)* log(n/p)) is the complexity of the sorting part.
O(n/p) is the best case complexity of the merging part.
so the best case complexity is: ((n/p)* log(n/p))
The worst case complexity of parallel sort library using mergesort is:
((n/p)* log(n/p)) + O(n)
the ((n/p)* log(n/p)) is the complexity of the sorting part.
O(n) is the worst case complexity of the merging part.
so the worst case complexity of parallelsort using mergesort is approximatly: O(n)
I have done some tests with my ParallelSort library and i have noticed that it can give up to 3.8x scalability with strings, and it give 3x scalability with integers.
So, why it scale to 3.8x with strings and only 3x with integers ?
I explain:
In the SequentialMerge() method and QSort() method inside Parallel Sort library, i am calling the Scompare() method and also in both of them i am copying to the memory system.
So when i am using strings the SCompare() method is more expensive, so the parallel part p in the Amdahl equation 1/ S + P/N (S: the serial part, P: parallel part and N: the number of cores) is bigger than with integers so the Amdahl equation will scale better, but when we are using integers the SCompare() method is less expensive than the SCompare() with strings, so the parallel part p in the Amdahl equation is less bigger than with strings. so this is why parallel sorting with strings scales better than with integers.
I have implemented mergsort and quicksort, but as you know the complexity of mergesort in the worst case is better than quicksort , and the mergesort that i have implemented is faster than quicksort, but mergesort takes more space..
The reccurence equation of the complexity of mergesort is:
T(n) = 2 * T(n/2) + n
cause it take O(n) for the merging part.
It gives:
T(n) = 2 * T(n/2) + n
= 2 (2T(n/4) +n/2) + n
=4T(n/4)+n+n
=4T(n/4)+2*n
=4 (2T(n/8) +n/4) + 2*n
=8T(n/8)+n+2n
=8T(n/8)+3*n
=2k T(n/2^k) + k*n
We want:
n/2k = 1
n = 2k
log n = k
so the reccurence equation gives:
= nT(1) +n*log(n)
= n+ (n * log(n))
So the mergesort complexity in the best case and in the worst case is:
n * log(n)
But the complexity of the quicksort in the worst case is:
T(n)= n + T(n-1)
it gives:
T(n) = n + (n-1) + T(n-2)
T(n) = n + (n-1) + (n-2)+ T(n-3)
T(n) = 1 + 2+ 3+.+N
.T(n) = O(n^2)
And Quicksort in best case performance is:
T(n) = T(n/2) + T(n/2) + O(n)
= 2T(n/2) + (n)
this gives: T(n) = (n lg n)
Parallelizing the Sorts
One way to parallelize the sorts is:
- Divide the data among the processors
- Sort the data on the individual processors.
- Merge the various data
Note that the merge operation is a reduction operation !
I have done some scalability tests on my parallelsort library and i have come to the conclusion that parallel heapsort is better on scalability than parallel quicksort cause the P part (of the Amdahl equation) is bigger in parallel heapsort than in parallel quicksort, the parallel heapsort is doing more on the parallel part, it's why it scales better than parallel quicksort, but parallel quicksort is still faster than parallel heapsort on my tests on a quad core processor.
And please look at test.pas a parallel quicksort on many cores - compile and execute it...
Language: FPC Pascal v2.2.0+ / Delphi 7+:
http://www.freepascal.org/Operating Systems: Win , Linux and Mac (x86).
Required FPC switches: -O3 -Sd -dFPC -dWin32 -dFreePascal
-Sd for delphi mode....
Required Delphi switches: -DMSWINDOWS -$H+
For Delphi 5,6,7 use -DDelphi
For Delphi 2005,2006,2007,2009,2010+ use the switch -DDELPHI2005+
You can download ParallelSort library from:
http://pages.videotron.com/aminer/Thank you,
Amine Moulay Ramdane.