Recent

Author Topic: Very slow Android executables  (Read 12957 times)

Kostas

  • New Member
  • *
  • Posts: 30
Very slow Android executables
« on: June 10, 2012, 05:15:46 pm »
After i have managed to run my backgammon engine on my Galaxy Note Smartphone, i started to do some tests regarding the speed of the engine.

Unfortunatelly i found out that the speed of the lazarus/fpc executable on the Android is very very slow...

As a reference i used a tool that 'compiled' the same(!) object pascal code of the engine to java script and run it in a browser on my smartphone.

I don't want to start a war about speed comparisons...
Maybe i have done something wrong... That's what i want to find out...

So here are my results:
1. a) The native arm executable needed for the calcualtion of 10000 evaluations around 8(!!) seconds.
    b) The Object Pascal to JavaScript compiled version of the program running on Opera Mobile needed around 1(!!) second!!!!
8 times(!) slower is a very big difference for a native compiled program even if the javascript is 'jitted' gut in the browser.

2. Drawing a stretched image in a Form needs around 10(!!) seconds...

Generally everything seems to run slow in the native executable.

I tried to set different types of optimisations on but the speed does not change.
It looks like that Free Pascal does not optimize the executable.

I doing a lot of mathematical calculations and as i found out the version 2.5.1 of free pascal that is used to compile the executables is not using the FPU of the Arm processor. Is this right??

Should we have to wait for the next version for this?

Is there a way to do some optimizations with free pascal?

Thanks in advance
Kostas


« Last Edit: June 10, 2012, 05:22:29 pm by Kostas »

Laksen

  • Hero Member
  • *****
  • Posts: 724
    • J-Software
Re: Very slow Android executables
« Reply #1 on: June 10, 2012, 07:18:36 pm »
Did you compile with FPU support? Otherwise the calculations will be done using softfpu

Kostas

  • New Member
  • *
  • Posts: 30
Re: Very slow Android executables
« Reply #2 on: June 10, 2012, 09:20:34 pm »
Yes it seems that definitely FPC compiles with software emulation of floating point arithmentic.
This is of course one of the reasons that the speed of the execuatable is so slow.

But every time that i compile the executable with an -Cf{paremeter} other than SOFT i get the following error:
Fatal: Can not find system used by androidlcltest, ppu=C:\lazarus\fpc\2.5.1\units\arm-linux\rtl\system.ppu

What can i do?

What about the other optimizations?
O2, O3, using variables on registers or not, uncertain optimizations etc. do not seem to have any impact on the speed of the program.

Kostas
« Last Edit: June 10, 2012, 09:26:05 pm by Kostas »

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11382
  • FPC developer.
Re: Very slow Android executables
« Reply #3 on: June 10, 2012, 09:51:57 pm »
But every time that i compile the executable with an -Cf{paremeter} other than SOFT i get the following error:
Fatal: Can not find system used by androidlcltest, ppu=C:\lazarus\fpc\2.5.1\units\arm-linux\rtl\system.ppu

What can i do?

You need to recompile the whole of FPC with such parameters.
 

Kostas

  • New Member
  • *
  • Posts: 30
Re: Very slow Android executables
« Reply #4 on: June 10, 2012, 10:20:13 pm »
You need to recompile the whole of FPC with such parameters.
Nice :)
As i understood right these units provided for Android are special precompiled units for Android development.

After i needed some days to get everything work right, i am really scared of recompiling the whole FPC...

Anyway, i am not a FPC - Guru...
So how can i recompile the whole FPC with the right files needed for Android?

Kostas

felipemdc

  • Administrator
  • Hero Member
  • *
  • Posts: 3538
Re: Very slow Android executables
« Reply #5 on: June 11, 2012, 09:46:07 am »
You should know that not all devices have hardware FPU, so if you use the hardware FPU your software will not run in many telephones. It won't run in mine, for example, which is HTC Wildfire.

About how to build FPC for Android, there are complete instructions here: http://wiki.lazarus.freepascal.org/Custom_Drawn_Interface/Android#Building_the_compiler_yourself_in_Windows

felipemdc

  • Administrator
  • Hero Member
  • *
  • Posts: 3538
Re: Very slow Android executables
« Reply #6 on: June 11, 2012, 09:56:29 am »
So here are my results:
1. a) The native arm executable needed for the calcualtion of 10000 evaluations around 8(!!) seconds.
    b) The Object Pascal to JavaScript compiled version of the program running on Opera Mobile needed around 1(!!) second!!!!
8 times(!) slower is a very big difference for a native compiled program even if the javascript is 'jitted' gut in the browser.

I work in Opera Software in the Mobile department so I know that Opera is extremely well optimized by dozens of great developers. The JIT can detect if the phone has FPU or not and will act accordingly.

Opera Mobile itself is a native application, it is not a Java app, so you can be 100% sure that native apps can be much faster then Java or JavaScript as the speed of Opera Mobile is amazing. But they do know a trick which allows to put 2 binaries into the APK: 1 for lower end phones and another for higher end phones, but I have no idea how they do this.

Quote
2. Drawing a stretched image in a Form needs around 10(!!) seconds...

Buffer any stretching into a separate TBitmap and draw that in the OnPaint event.

Opera software is a large company with hundreds of developers. Lazarus for Android is mostly me coding for bounties. If other people send patches then things can get much more optimized. In particular the stretching code needs code which can optimize it for some most common pixel formats. Right now there is only the generic solution which works in any pixel format which I wrote, but if you know the pixel format and optimize for it the speed can grow dramatically. I already did this optimization for bitmap copying/drawing so this part is tens of times faster then it used to be. But my time is limited and my ToDo is huge. So optimizing stretching is not a priority item in my list.

Kostas

  • New Member
  • *
  • Posts: 30
Re: Very slow Android executables
« Reply #7 on: June 11, 2012, 12:12:22 pm »
I work in Opera Software in the Mobile department so I know that Opera is extremely well optimized by dozens of great developers. The JIT can detect if the phone has FPU or not and will act accordingly.
So according to your answer it is not possible for FPC to compile in such a way, so that the executable when it starts can see if the FPU is there or not.
What about the other optimizations?

Quote
Opera Mobile itself is a native application, it is not a Java app, so you can be 100% sure that native apps can be much faster then Java or JavaScript as the speed of Opera Mobile is amazing. But they do know a trick which allows to put 2 binaries into the APK: 1 for lower end phones and another for higher end phones, but I have no idea how they do this.
What i have done was to "compile"/convert the object Pascal Code of the engine to Javascript code, which then i run in Opera Mobile. The very good speed in my opinion comes from the very good Jitter that Opera Mobile has!!!
The same Javascript code running on Firefox mobile is around 4(!) times slower that in Opera Mobile!!!!
But even so, Firefox mobile is 2 times faster that the native executable from FPC...
A good optimized native executable should be, as you already said, faster (.. or at least as fast) than any Java or Javascript code.

Quote
Buffer any stretching into a separate TBitmap and draw that in the OnPaint event.
I will give it a try.

Quote
Opera software is a large company with hundreds of developers. Lazarus for Android is mostly me coding for bounties. If other people send patches then things can get much more optimized. In particular the stretching code needs code which can optimize it for some most common pixel formats. Right now there is only the generic solution which works in any pixel format which I wrote, but if you know the pixel format and optimize for it the speed can grow dramatically. I already did this optimization for bitmap copying/drawing so this part is tens of times faster then it used to be. But my time is limited and my ToDo is huge. So optimizing stretching is not a priority item in my list.
I understand what you mean.
You are doing a good work but it is a lot of work and i can understand you.

If you can tell me where and how to start optimizing some parts of the LCL (expecially the stretching), maybe i could help a bit!
But without the use of the FPU the speed improvements can never be perfect...
Is it possible to use the GPU for the drawing part of the LCL?

Thanks for your answers
Kostas

felipemdc

  • Administrator
  • Hero Member
  • *
  • Posts: 3538
Re: Very slow Android executables
« Reply #8 on: June 11, 2012, 01:13:58 pm »
So according to your answer it is not possible for FPC to compile in such a way, so that the executable when it starts can see if the FPU is there or not.

As far as I know is not possible, although I would recommend asking in the fpc-pascal mailling list. In x86 FPC does a FPU check, but in ARM it doesn't. I don't know if this is possible at all in ARM, maybe not.

But you can do it like Opera Mobile does: 2 binaries in the same APK, one with FPU usage, the other without.

Or just ignore the phones without FPU.

Quote
What i have done was to "compile"/convert the object Pascal Code of the engine to Javascript code, which then i run in Opera Mobile. The very good speed in my opinion comes from the very good Jitter that Opera Mobile has!!!
The same Javascript code running on Firefox mobile is around 4(!) times slower that in Opera Mobile!!!!
But even so, Firefox mobile is 2 times faster that the native executable from FPC...

Well, if you run it on my phone, HTC Wildfire, I bet that the results will be very different, the native code will be probably faster. And exactly the slower phones are the ones without FPU, so for me it makes no sense to compare speed in a high-end phone. Either you are supporting low end phones, and then you should measure speed on them, or you are targeting high end phones only, then you should build your binary with FPU support.

Quote
If you can tell me where and how to start optimizing some parts of the LCL (expecially the stretching), maybe i could help a bit!
But without the use of the FPU the speed improvements can never be perfect...

Study this commit: http://svn.freepascal.org/cgi-bin/viewvc.cgi?view=rev&root=lazarus&revision=36576

They key thing is optimizing the routines in the unit lcl/lazcanvas.pas for the pixel format utilized in Android. Use a if and fallback to the default generic code for other pixel formats.

Quote
Is it possible to use the GPU for the drawing part of the LCL?

No idea, but somehow I doubt that it would bring any speed improvements for 2D graphics like the LCL.

Laksen

  • Hero Member
  • *****
  • Posts: 724
    • J-Software
Re: Very slow Android executables
« Reply #9 on: June 11, 2012, 01:42:47 pm »
Couldn't you do your floating point calculations with fixed point instead?

Kostas

  • New Member
  • *
  • Posts: 30
Re: Very slow Android executables
« Reply #10 on: June 11, 2012, 08:24:23 pm »
I downloaded the sources from SVN and i have done the following:

1. In the Build.sh file i replaced the CROSSOPT="-CfSOFT" with CROSSOPT="-CfVFPV3".
2. I run the build.bat file and rebuild the cross compiler. I used FPC 32Bit 2.6.1 for the rebuild.
3. I copied from the output directory the units directory to fpc\2.5.1\units directory
4. I copied the ppcrossarm.exe to the fpc\2.6.1\bin directory
5. In Lazarus i edited the options of the androidlcltest and added -CfVFPV3.
6. I recompiled the project.

But i still get the same error...
Any idea what i could have done wrong?

Couldn't you do your floating point calculations with fixed point instead?
This won't be easy.
There are floating point operations for the calculation of a neural network...
Calculating an exp or tanh function with fixed point arithmetic and needing a good accuracy is not easy, and i guess that i will end up more or less to the same speed as now.

Best regards
Kostas


felipemdc

  • Administrator
  • Hero Member
  • *
  • Posts: 3538
Re: Very slow Android executables
« Reply #11 on: June 11, 2012, 09:35:39 pm »
But i still get the same error...
Any idea what i could have done wrong?

Look at the file fpc.cfg and see in which directories it is looking for the units. If this is correct here, then try to compile with -va and read the output, it will show all info (really imense amounts of info) and show where it search units and why it rejected a particular object file.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11382
  • FPC developer.
Re: Very slow Android executables
« Reply #12 on: June 11, 2012, 10:33:36 pm »

3. I copied from the output directory the units directory to fpc\2.5.1\units directory

2.5.1 here

Quote
4. I copied the ppcrossarm.exe to the fpc\2.6.1\bin directory

and 2.6.1 here? Is that logical?

 

felipemdc

  • Administrator
  • Hero Member
  • *
  • Posts: 3538
Re: Very slow Android executables
« Reply #13 on: June 11, 2012, 11:30:48 pm »
and 2.6.1 here? Is that logical?

Yes, it is correct. If it is not in the same directory then how will fpc.exe find ppcrossarm.exe? But one has to make sure that fpc.cfg is searching for the .ppu and .o files using the $VERSION macro.
« Last Edit: June 11, 2012, 11:45:54 pm by felipemdc »

felipemdc

  • Administrator
  • Hero Member
  • *
  • Posts: 3538
Re: Very slow Android executables
« Reply #14 on: June 20, 2012, 08:24:46 am »
I found some interresting info here while browsing the web for JNI tips:

http://www.moodstocks.com/2012/03/20/ice-cream-sandwich-why-native-code-support-sucks/

From reading this I think that just placing the 2 libs in these directories should make it work (although sometimes it might choose the wrong one anyway):

/lib/armeabi-v7a/libfoo.so
/lib/armeabi/libfoo.so

Extremely simple! Just make 2 build modes in Lazarus and for each use the appropriate flags and output directory =) And then you get 2 binaries: 1 for arm 6 without FPU and another for arm 7 with FPU

 

TinyPortal © 2005-2018