### Author Topic: Can this algorithm be made more efficient?  (Read 7558 times)

#### VTwin

• Hero Member
• Posts: 573
• Former Turbo Pascal 3 user
##### Re: Can this algorithm be made more efficient?
« Reply #30 on: July 05, 2018, 09:59:41 pm »
Hi.
Since this is the first time I post to this forum (or any other forum for that matter) I can only hope that everything goes well.
I have attached a file with a function which seems to work and is two or three times faster than the original. Its main features are:
- Sorts both arrays to reduce the number of comparisons.
- Rather than doing the floating point calculations inside the loop it counts how many comparisons result in set1 being greater than set2 and how many result in equal values. The calculation of num is done at the end.

Welcome to the forum!

“Talk is cheap. Show me the code.” -Linus Torvalds

macOS 10.11.6: Lazarus 2.0.1 svn 60414 (64 bit Cocoa)
Ubuntu 18.04.2: Lazarus 2.0.0 (64 bit on VBox)
Windows 7 Pro SP1: Lazarus 2.0.0 (64 bit on VBox)

#### VTwin

• Hero Member
• Posts: 573
• Former Turbo Pascal 3 user
##### Re: Can this algorithm be made more efficient?
« Reply #31 on: July 05, 2018, 10:29:31 pm »
I just tried optimizing with integer addition, add 2 if greater, 1 if equal, then divide by 2 at the end. It worked fine, but oddly this seemed to slow things down, if negligibly. Example:

1e7 element arrays: 16926 vs 14421 ms

Roughly 16 seconds for integers, 14 for double. So, at least for me, it does not help.

EDIT: I replaced "j * 2" with "j + j" and got down to ~15 seconds, still slower though. Using optimization level 3 helps (12 and 11 sec), but chasing decimal places now!
« Last Edit: July 05, 2018, 11:31:54 pm by VTwin »
“Talk is cheap. Show me the code.” -Linus Torvalds

macOS 10.11.6: Lazarus 2.0.1 svn 60414 (64 bit Cocoa)
Ubuntu 18.04.2: Lazarus 2.0.0 (64 bit on VBox)
Windows 7 Pro SP1: Lazarus 2.0.0 (64 bit on VBox)

#### Martin_fr

• Hero Member
• Posts: 5109
##### Re: Can this algorithm be made more efficient?
« Reply #32 on: July 05, 2018, 11:26:36 pm »
I have to admit - with some embarrassment on my part - that I only quickly glanced at your suggestion and missed the fact that you save the result of the previous search.   You are indeed correct that your solution will likely be faster than a binary search _as long as_ the number of elements doesn't go "too high".

Seems we both weren't thorough. I missed that you only sort one array. Then since you binary search has the same cost as sorting the 2nd array, you indeed save a bit.  (and that goes for low and high numbers)

If you want to look at speed of implementation too (not just algorithm / so this has nothing to do with any post till here): It should be easy to run the 2 sorts in 2 threads. That could be an easy way to save time.
Of course you can also partition the data, and then run any algorithm in threads.

--EDIT
On 2nd thought: "I missed that you only sort one array. "

One binary search in the inner list, for each element of the outer list is O(n log n) => same as sort.

But there may be many duplicates in that list. So you need a 2nd search to find the other bound of the current matching elements.
I have not given it much thought, but it may be possible to construct data, where (even in large lists, with millions of entries), your idea would perform worse.
- It might, I have no proof (It may also be that worst case is equal to my suggestion)
- It would be very unlikely to occur
The inner list has to be larger, so it can have several matches for each value of the outer list.
« Last Edit: July 05, 2018, 11:40:14 pm by Martin_fr »

#### 440bx

• Sr. Member
• Posts: 487
##### Re: Can this algorithm be made more efficient?
« Reply #33 on: July 06, 2018, 01:57:42 am »

--EDIT
On 2nd thought: "I missed that you only sort one array. "

One binary search in the inner list, for each element of the outer list is O(n log n) => same as sort.

Nice try but, I'm afraid not.  A binary search will do n log n compares as you point out but,  that's all it will do.  Your sort will do n log n compares to determine the order and 2n log n swaps (unless you find a way to use cmpxchg - assembler ?) to put the elements in sorted order.  You don't get to sort without doing swaps and the swaps are a lot more expensive than the compares (and that is true even if you managed to use cmpxchg.)

For a fairly large number of elements, say 1,000,000, we're even for the first sort.  Your algorithm will need 6,000,000 compares and most likely 12,000,000 swaps for a total of 18,000,000 operations spent before you've done the very first search.  I'll spend a grand total of 20,000,000 operations searching in the worst case, that leaves you 2,000,000 ops to perform all the searches which at first blush seems impossible since you've got 1,000,000 elements of one array to match against 1,000,000 elements of the other array.  You'd have to average a maximum of 2 compares per element to match the binary search approach - that seems rather unlikely to happen unless the data is carefully chosen to match in 2 or less compares.  Simply not realistic.

The binary search is a significantly more solid approach, particularly as the number of elements gets larger and, the algorithm benefits from being simpler (which likely means faster too.)

As far as the presence of duplicates, that is a very good point, one that any algorithm will have to deal with (presuming it has to be dealt with.)  The OP is the one that would have to shed light on, if and how, duplicate values are to be handled.

I have not given it much thought, but it may be possible to construct data, where (even in large lists, with millions of entries), your idea would perform worse.

I believe you are correct when stating you haven't given it much thought.  I have serious doubts about the rest.

Just to be clear, it was never my intention to dissect the algorithmic possibilities available.  I simply wanted to give the OP some food for thought.  So far, the one sort, binary search approach is, IMO, the best one for being stable, easy to understand and easy to maintain, not to mention fastest when the number of elements reaches a certain level.  Obviously, you may have a different opinion, which is perfectly fine.

Peace.

« Last Edit: July 06, 2018, 02:01:17 am by 440bx »

#### Martin_fr

• Hero Member
• Posts: 5109
##### Re: Can this algorithm be made more efficient?
« Reply #34 on: July 06, 2018, 02:41:13 am »
I was mostly speaking about the algorithm performance according to Big O (that is how many comparisons / never mind how long a comparison takes). This give a picture how the time grows when the input size grows.

Well, we already did a bit more than BigO. If my understanding is correct, in terms of BigO they are both just O(n log n), never mind how many of those (constant factors are not taken into account). But I would have to double check the exact definitions of Big O.

I agree implementation wise the search (read only) will on most (maybe all) platform be faster than the sort (read/write).
How much would have to be measured....

As for Big O-ish:
After the binary search, you have to do extra compares to check for neighbours with equal value. the Question is, in the very worst case: how many.
Do you do this with a 2nd binary search, or with a linear search?

---------
in a normal case you would

- for each value of the outer array O(m)
- bin search the inner O(log n)
- check if the next value in the inner is equal too
if it is not then O(1)
if it is then up to O(n) (if all are equal)

Those extra compares add up. I am assuming you do them linear, if you do them in a 1nd binary search, then you have another O(log m) for each m. => makes another O(m log n)

---------
Assume both arrays have all values exactly the same.

- for each outer
- bin search the inner O(log n)  -> this will get the first value in the list)
- to find the last equal value, on a linear search you have to go over all values O(n) foreach m -> that make O(m * n)

How to you handle that case.

Yes you can search the other bound equal value with another binary search, but that affects the "normal case"

You can compare the current outer value with the last outer value to optimize this... But then I give you an outer array that alternates between exactly 2 values (it is not sorted), and an inner array that has exactly half of each of the 2 values too.

The exact amount of comparison depends on how you handle the issue, which I do not know yet.

#### 440bx

• Sr. Member
• Posts: 487
##### Re: Can this algorithm be made more efficient?
« Reply #35 on: July 06, 2018, 04:06:35 am »
I was mostly speaking about the algorithm performance according to Big O (that is how many comparisons / never mind how long a comparison takes). This give a picture how the time grows when the input size grows.

When comparing algorithms it's very important to keep in mind that the Big O is neither intended nor does it try to take into account the cost of the operations.  It simply gives a measure of how much work the algorithm requires to accomplish its purpose.

It is very useful.  It makes it quite clear that, in the perennial example of sorting, a heapsort is usually vastly superior to a bubble sort or a selection sort.   But, the considerations cannot stop just at the Big O.  If the elements to be sorted are "large" (say several hundred bytes), depending on the number of elements, a selection sort that uses pointers to the elements and swaps the pointers, instead of the elements, will perform better than a quicksort that swaps the element themselves.   The other thing the Big O doesn't take into account is how simple or complicated the algorithm is and, particularly in the case of Quicksort, that's very important.

The point is obvious, Big O is the starting place/consideration but, there is a lot more to consider than just the Big O of an algorithm to determine how it will perform in a specific situation.

This is a somewhat besides the point of the thread but still related, personally, I am not fond of Quicksort because the "naïve" algorithm which blindly selects the element in the middle of the partition as the pivot, can often lead to poor performance and in some cases worse than a selection sort. In addition to that, its recursive nature also makes it susceptible of causing stack overflows.  There have been a number of Quicksort implementations that had a superb algorithm to select pivots but, their cost was high enough that simpler sorts, such as a selection sort, would give Quicksort a run for its clock cycles (for element counts of about a 100 or less) because too much work was expended selecting "perfect" pivots.   This is so common that many books on algorithms warn that once the partitions reach a small size - usually somewhere in the ballpark of 64 - it is better to switch to a selection sort (or better yet, shellsort) to finish sorting that partition.

Anyway, the points you bring up regarding duplicate values, identical arrays and such are beyond valid.  Personally, when I find these strange/extreme cases, I go to the user (if the user isn't me) and ask what the correct way of handling those cases is and, the user's answer will determine what algorithms I can use to accomplish the task correctly.   IOW, performance is no longer a consideration, correctness is the overriding one.

When I program my priorities are as follows:

1. Correctness
2. Maintainable, i.e, easy to understand.  That is more important than performance.  CPUs are cheap, programmers are expensive.
3. Fast enough.  This is obviously highly variable.  1 second may be considered very fast for some things but, very slow for other (such as syntax highlighting and text scrolling - thoughts of Visual Studio 2017 are in my mind as I type this )

I enjoyed the discussion, thank you.

#### Martin_fr

• Hero Member
• Posts: 5109
##### Re: Can this algorithm be made more efficient?
« Reply #36 on: July 06, 2018, 11:44:56 am »
@ 440bx
Agreed on the points in your last message. (except on whether extreme case are valid or not )

Also did enjoy the discussion.

#### segfault

• Jr. Member
• Posts: 62
##### Re: Can this algorithm be made more efficient?
« Reply #37 on: July 06, 2018, 11:52:20 am »
Using ideas from this thread I came up with the following algorithm. I haven't actually tested it against the "naive" code but it should be pretty fast by comparison, and it's quite simple. The index is moved along even when the values are equal (when a "half credit" is due), but are counted and then subtracted after the looping. I hope it's correct because the logic is a bit tricky.

Code: Pascal  [Select]
1. function cles(x1, x2 : Tvector) : real;
2. var
3.   index, i,
4.   halfCredits  : integer;
5.   denominator  : longint;
6.   numerator    : real;
7. begin
8.   index := 0;
9.   halfCredits := 0;
10.   numerator := 0.0;
11.   sort(x1);
12.   sort(x2);
13.   for i := 1 to length(x2) do begin
14.     while (index < length(x1)) and (x1[index + 1] <= x2[i]) do begin
15.       if x1[index + 1] = x2[i] then inc(halfCredits);
16.       inc(index);
17.     end;
18.     numerator += index;
19.   end;
20.   numerator -= 0.5 * halfCredits;
21.   denominator := length(x1) * length(x2);
22.   cles := numerator / denominator
23. end;

Example : passing the arrays v1 = (2,1,3) and v2 = (3,2,4) returns 7/9 or 0.778 which can be confirmed by hand using the naive approach.

#### VTwin

• Hero Member
• Posts: 573
• Former Turbo Pascal 3 user
##### Re: Can this algorithm be made more efficient?
« Reply #38 on: July 06, 2018, 05:30:21 pm »
I haven't actually tested it against the "naive" code but it should be pretty fast by comparison, and it's quite simple.

I implemented six versions of cles, including five posted here, and one that I 'optimized' using integer addition which ran slower, so I did not post it. Results:

OP ("naive", m * n): 0.545139
My first code, one sorted array: 0.545139
My second, two sorted arrays (m + n): 0.545139
My third using integers (not posted as slower): 0.545139
juanba code: 0.062500

Your code throws an exception because:

Code: Pascal  [Select]
1. for i := 1 to length(x2) do

goes out of array bounds. It is great that you are attempting to write improved code, but verifying that a function gives the correct answer is important.

Cheers,
VTwin

EDIT: I confused two posters, sorry.
« Last Edit: July 06, 2018, 06:14:03 pm by VTwin »
“Talk is cheap. Show me the code.” -Linus Torvalds

macOS 10.11.6: Lazarus 2.0.1 svn 60414 (64 bit Cocoa)
Ubuntu 18.04.2: Lazarus 2.0.0 (64 bit on VBox)
Windows 7 Pro SP1: Lazarus 2.0.0 (64 bit on VBox)

• Hero Member
• Posts: 7420
##### Re: Can this algorithm be made more efficient?
« Reply #39 on: July 06, 2018, 05:40:59 pm »
For Low() to High() prevents that.....<sigh, >
It has no speed penalty.
Again, go to the wiki and all is clear.
This is more simple to explain than Pi. So shut up you all.
Sometimes I really over-react but this is plain stupid.
« Last Edit: July 06, 2018, 05:49:00 pm by Thaddy »

#### VTwin

• Hero Member
• Posts: 573
• Former Turbo Pascal 3 user
##### Re: Can this algorithm be made more efficient?
« Reply #40 on: July 06, 2018, 05:57:19 pm »
The discussion on binary searches is interesting. The algorithm I posted can do the calculation on floating point arrays of a million elements in less than a second.

I'd love to see some code that shows whether, how, or in what cases, a binary search would speed it up.

Cheers,
VTwin
« Last Edit: July 06, 2018, 06:00:55 pm by VTwin »
“Talk is cheap. Show me the code.” -Linus Torvalds

macOS 10.11.6: Lazarus 2.0.1 svn 60414 (64 bit Cocoa)
Ubuntu 18.04.2: Lazarus 2.0.0 (64 bit on VBox)
Windows 7 Pro SP1: Lazarus 2.0.0 (64 bit on VBox)

#### 440bx

• Sr. Member
• Posts: 487
##### Re: Can this algorithm be made more efficient?
« Reply #41 on: July 06, 2018, 08:03:52 pm »
I'd love to see some code that shows whether, how, or in what cases, a binary search would speed it up.

Cheers,
VTwin

That's tempting.    I wish I could take you up on that at this time.  Though I do know Pascal, I'm a complete newbie in FPC, that's a bit of a problem.  Also, I have to finish a C program that is a real pain in the neck (thanks to Visual C++ 2017, which is a wonderful instrument to create complexity where there wasn't any.)

As usual, the more one thinks about a problem, the more potential solutions come to mind.  Some may be possible, others have flaws which are discovered during their implementation.  That's what "proof of concept" code is good for, to find out if an idea holds water (or better, hold sensible logic) or not.

Here is something that, if it holds water, might be the fastest (and may be even simplest) way of solving the OP's problem.  It hinges on the fact that the data values seem to reside in a "reasonable" range, where reasonable means, not too far apart, in this case.

In the OP's example, the values' floor and ceiling are 16 and 22.  The values have one decimal value which can be eliminated by a multiplication by 10.

This would allow a third "helper" (call it "match") array to be allocated with a range from 160 to 220 (that's only 60 elements).  The array would have elements structured along the lines of:

Code: Pascal  [Select]
1. element = record
2.   { keeping the Value below is not necessary but it makes the objective obvious and it is a nice aid during debugging }
3.   Value : dword;
4.   CountLeft : integer;
5.   CountRight : integer;
6. end;
7.
In this representation, the original arrays are thought of as the left and right arrays.

Have one loop walk the "left" array and increase the CountLeft whenever its value "falls" into that slot and do the same thing for the "right" array.  It amounts to this, two loops where

Code: Pascal  [Select]
1. Index = Left[i];
2. Match[Index].Value = Index;   { obvious and unnecessary but may be useful during debugging }
3. Match[Index].CountLeft++;
4.
5.
do the same thing for the Right array but only updating the CountRight.

The Match array now has all the matches and mismatches.  The Big O is simply 2n.

A walk of the Match array can now be used to calculate the values based on whether there was a match (CountLeft and CountRight not zero for an entry) or mismatches (one of the counts is zero and the other isn't) entries where both counts are zero are simply ignored.

If the assumptions are valid (a big if obviously) then this method would solve the problem in O(3n).    It would probably be hard to find a faster algorithm.  It has a few (obvious) downsides and possibly other non obvious ones:

1.  Additional memory is needed for the Match array.  This could become a real problem if there is a large number of elements in the original array and little to no overlap in their values.
2. When walking the Match array to perform the necessary calculations, it _may_ be necessary to locate the previous and next entries (maybe not but, something to think about during implementation.  If the previous and next entries need to be determined then a sentinel at the bottom and top should be added to keep the code from having to consider these two cases as exceptional)

If this is possible, this would solve some inconvenient problems, such as duplicate entries (they would simply have a count greater than one) and identical arrays, as well as, no matching values in either array (all entries would have one of the counts at zero).

It would make processing these occurrences simple and part of the "normal" process (that is, no code to process exceptional cases).

Again, this is just food for thought.  It seems like a viable solution that would be difficult to top as far as speed and simplicity.

Cheers!
« Last Edit: July 06, 2018, 08:16:19 pm by 440bx »

#### VTwin

• Hero Member
• Posts: 573
• Former Turbo Pascal 3 user
##### Re: Can this algorithm be made more efficient?
« Reply #42 on: July 07, 2018, 01:53:40 am »
That's tempting.    I wish I could take you up on that at this time.

Fair enough, everyone is busy.

It hinges on the fact that the data values seem to reside in a "reasonable" range, where reasonable means, not too far apart, in this case.

In the OP's example, the values' floor and ceiling are 16 and 22.  The values have one decimal value which can be eliminated by a multiplication by 10.

This would allow a third "helper" (call it "match") array to be allocated with a range from 160 to 220 (that's only 60 elements).

No statistics library would include a function that restricts the data to one decimal place. Any real values, over any range must be allowed.

It seems like a viable solution that would be difficult to top as far as speed and simplicity.

That is good news. Anyone else interested?

Cheers,
VTwin
« Last Edit: July 07, 2018, 02:15:06 am by VTwin »
“Talk is cheap. Show me the code.” -Linus Torvalds

macOS 10.11.6: Lazarus 2.0.1 svn 60414 (64 bit Cocoa)
Ubuntu 18.04.2: Lazarus 2.0.0 (64 bit on VBox)
Windows 7 Pro SP1: Lazarus 2.0.0 (64 bit on VBox)

#### 440bx

• Sr. Member
• Posts: 487
##### Re: Can this algorithm be made more efficient?
« Reply #43 on: July 07, 2018, 02:15:46 am »
No statistics library would include a function that restricts the data to one decimal place. Any real values, over any range should be allowed.

Obviously I agree.  It _seems_ that the values in the OP's problem are naturally limited to a range of valid values.  If that real range can be represented with 32 or 64 bit integers (which FPC does handle) then the method may be applicable.  Additionally, it seems that the data collected for the calculation would not benefit from accuracy beyond 2 decimal places (I could be wrong but, his data gives that impression.)

More of a concern would be that the values in the left and right arrays may be widely spread.  If the minimum is 5 and the maximum is 100,000.02 (for instance) and there are only a few hundred elements in each array, there will be a lot of memory wasted.  OTOH, computers have gigabytes of memory now and the memory would only be used for a fraction of a second, because of that it seems to still be within reason, as long as the spread isn't too extreme.

Like everything, having more and more accurate knowledge of the problem would definitely help figuring out what is and isn't appropriate to solve it.

#### VTwin

• Hero Member
• Posts: 573
• Former Turbo Pascal 3 user
##### Re: Can this algorithm be made more efficient?
« Reply #44 on: July 07, 2018, 02:25:55 am »
Like everything, having more and more accurate knowledge of the problem would definitely help figuring out what is and isn't appropriate to solve it.

The problem is clearly defined. It is a straight forward statistical function.
“Talk is cheap. Show me the code.” -Linus Torvalds

macOS 10.11.6: Lazarus 2.0.1 svn 60414 (64 bit Cocoa)
Ubuntu 18.04.2: Lazarus 2.0.0 (64 bit on VBox)
Windows 7 Pro SP1: Lazarus 2.0.0 (64 bit on VBox)