Recent

Author Topic: Parallel sorting algorithms  (Read 6484 times)

aminer

  • Hero Member
  • *****
  • Posts: 956
Parallel sorting algorithms
« on: September 07, 2012, 07:37:48 pm »

hello,

look down the the following link...
it's about parallel partition...

http://www.redgenes.com/Lecture-Sorting.pdf


I have tried to simulate this parallel partition method ,
but i don't think it will scale cause we have to do a merging,
which essentially is an array-copy operation but this array-copy
operations will be expensive compared to an integer compare
operation that you find inside the partition fuinction, and it will still
be expensive compared to a string compare operation that you find
inside the partition function. So since it's not scaling i have abondoned
the idea to implement this parallel partition method in my parallel
quicksort.

I have also just read the following paper about Parallel Merging:

http://www.economyinformatics.ase.ro/content/EN4/alecu.pdf

And i have implemented this algorithm just to see what is its performance.
and i have noticed that the serial algorithm is 8 times slower than the
merge
function that you find in the serial mergesort algorithm.So 8 times slower,
it's too slow.

So the only way to do a somewhat better parallel sorting algorithm,
it's to use the following algorithm;

http://www.drdobbs.com/parallel/parallel-merge/229204454?queryText=parallel%2Bsort


The idea is simple:

Let's assume we want to merge sorted arrays X and Y. Select X[m]
median element in X. Elements in X[ .. m-1] are less than or equal to
X[m]. Using binary search find index k of the first element in Y greater
than X[m]. Thus Y[ .. k-1] are less than or equal to X[m] as well.
Elements in X[m+1..] are greater than or equal to X[m] and Y[k .. ]
are greater. So merge(X, Y) can be defined as
concat(merge(X[ .. m-1], Y[ .. k-1]), X[m], merge(X[m+1.. ], Y[k .. ]))
now we can recursively in parallel do merge(X[ .. m-1], Y[ .. k-1]) and
merge(X[m+1 .. ], Y[k .. ]) and then concat results.

But read this:

"Parallel hybrid merge algorithm was developed that outperformed
sequential simple merge as well as STL merge by 0.9-5.8 times overall
and by over 5 times for larger arrays"

http://www.drdobbs.com/parallel/parallel-merge/229204454?pgno=3

This paper as you have noticed has fogot to tell that this method is
dependant on the distribution of the data

Read for exemple this:

"Select X[m] median element in X. Elements in X[ .. m-1] are less than
or equal to X[m]. Using binary search find index k of the first element in
Y greater than X[m]."

So if "median element:" of X is not near or equal to the "median
element" of Y so this method can have bad parallel performance
and it may not scale as you think.

There is another parallel method for parallel partition, here it's:

http://www.cs.sunysb.edu/~rezaul/Spring-2012/CSE613/CSE613-lecture-9.pdf

but as you will notice it's still too expensive, causeyou have to create
3 arrays and copy in them:

3. array B[ 0: n ? 1 ], lt[ 0: n ? 1 ], gt[ 0: n ? 1 ]

You can use SIMD instructions on the parallel-prefix-sum function

8. lt [ 0: n ? 1 ] ¬ Par-Prefix-Sum ( lt[ 0: n ? 1 ], + )
:
But the algorithm is still expensive i think on a quad or eight cores or
even
16 cores, you have to have much more than 16 cores to be able to benefit
from this method i think.

Bucket sort is not a sorting algorithm itself, rather it is a
procedure for partitioning data to be sorted using some given
sorting algorithm-a "meta-algorithm" so to speak.

Bucket sort will be better, when elements are uniformly distributed
over an interval [a, b] and buckets have not significantly different
number of elements.

bucketsort sequential computational complexity using quicksort is
= p × (n/p) log(n/p) = n log(n/p)

bucket sort parallel computational complexity using quicksort
= (n/p) log(n/p)

Parallel BucketSort gave me also 3x scaling when sorting strings on a
quad cores, it scales better than my parallel quicksort and it can be
much faster than my parallel quicksort.

I have thought about my parallel quicksort , and i have found
that parallelquicksort is fine but the problem with it is that the
partition function is not parallelizable , so i have thought about this
and i have decided to write a parallel BucketSort,and this parallel
bucketsort can give you much better performance than parallel quicksort
when sorting 100000 strings or more, just test it yourself and see,
parallel bucketsort can sort just strings at the moment, and it can use
quicksort or mergesort, mergesort is faster.

I have updated parallel bucketsort to version 1.01 , i have
changed a little bit the interface, now you have to pass
to the bucketsort method three parameters: the array,
a TSortCompare function and a TSortCompare1 function.

Here is a small example in Object Pascal:

program test;

uses parallelbucketsort,sysutils,timer;

type

TStudent = Class
public
mystring:string;
end;

var tab:Ttabpointer;
myobj:TParallelSort;
student:TStudent;
i,J:integer;
a:integer;


function comp1(Item1:Pointer): string;
begin
result:=TStudent(Item1).mystring ;
end;
?
function comp(Item1, Item2: Pointer): integer;
begin
if TStudent(Item1).mystring > TStudent(Item2).mystring
then
begin
result:=1;
exit;
end;
if TStudent(Item1).mystring < TStudent(Item2).mystring
then
begin
result:=-1;
exit;
end;
?
if TStudent(Item1).mystring = TStudent(Item2).mystring
then
begin
result:=0;
exit;
end;
end;
?
begin

myobj:=TParallelSort.create(1,ctQuicksort); // set to the number of cores...

setlength(tab,1000000);
?
for i:=low(tab) to high(tab)
do
begin
student:=TStudent.create;
student.mystring:= inttostr(i);
tab[high(tab)-i]:= student;
end;
?
HPT.Timestart;

myobj.bucketsort(tab,comp,comp1);
//myobj.qsort(tab,comp);
writeln('Time in microseconds: ',hpt.TimePeriod);
?
writeln;
writeln('Please press a key to continu...');
readln;

for i := LOW(tab) to HIGH(Tab)-1
do
begin
if tstudent(tab).mystring > tstudent(tab[i+1]).mystring
then
begin
writeln('sort has failed...');
halt;
end;
end;

for i := LOW(tab) to HIGH(Tab)
do
begin
writeln(TStudent(tab).mystring,' ');
end;

for i := 0 to HIGH(Tab) do freeandnil(TStudent(tab));

setlength(tab,0);
myobj.free;

end.


You can download parallel bucketsort from:

http://pages.videotron.com/aminer/




Amine Moulay Ramdane.



Knipfty

  • Full Member
  • ***
  • Posts: 232
Re: Parallel sorting algorithms
« Reply #1 on: September 07, 2012, 07:57:30 pm »
Hi aminer,

I've been reading your posts.  I'm glad to see that there are people in this world that do real testing and seeing where the bottlenecks to performance lay.  It also seems like you have a real interest on this topic.

However, I live in a world dominated by databases and its rare that I really need to sort anything beyond 10 or 20 items which makes sorting a non-issue.  That is, I almost always let the DB do the sorting via indexes.

Now, what would really make my life easier, is not having to worry about which sorting alorithm makes sense, but letting a highlevel routine decide for me.  It could work something like this:
Inputs:
- number of items to compare (rows?)
- size of items to compare
- size of data row

Output:
- Sort method to choose

You could take it a step further and execute that sorting algorithm.  The nice thing about this would be that as a program migrates from computer to computer, this would take the new computer into account.  And yes, that would mean larger code size, but who really cares at this point when a new computer comes with a terabyte hard drive and 4 to 8 gb ram.

This high level sort would be able to take into account number of CPUs, RAM available, cost of comparison vs cost of array copy, and even disk speed (hard drisk vs SSD vs RAM disk).  I think you will find that below some threshold, it really doesn't matter, but on larger sets of data, there will be huge differences.

Anyway, this is just my opinion.  Keep up the good work,

Knipfty
64-bit Lazarus 2.2.0 FPC 3.2.2, 64-bit Win 11

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: Parallel sorting algorithms
« Reply #2 on: September 07, 2012, 08:55:44 pm »
1st I would like to thank animer for his work so far and his posts were always informative. well done and thank you I wish more people shared knowledge instead of solutions.

Knipfty I disagree although sorting is used as a means of testing and designing parallel algorithms and techniques the work done here by animer is a lot more than that. You can use the knowledge shared for any parallel task and not only for sorting.

As for your request of choosing the best algorithm for you, it can't be done the number of elements the size of row or the number of rows has nothing to do with the algorithm you choose instead the order of the items has everything to do eg if a list of items is already in descending order a bubble sort will be the fastest method to use (if I remember my sorting methods correctly that is) so no it can't be decided you as programmer must decide what is the best method to use.

Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

 

TinyPortal © 2005-2018