Recent

Author Topic: What am I doing wrong? Similar FPC code runs 1.5x slower than CPython?!?!  (Read 13203 times)

kveroneau

  • Full Member
  • ***
  • Posts: 119
Okay, I am rather baffled, so I am assuming that I am creating this program completely wrong in FPC, and there's a bottleneck somewhere, as I would imagine compiled machine code would be 2x or even more faster than bytecode in Python...  So, here are both programs, can someone explain why the FPC code would be 1.5x slower and where I could optimize the code:

Code: Pascal  [Select][+][-]
  1. Program ChkBlock;
  2. {$M OBJPAS}
  3.  
  4. Uses classes, sysutils;
  5.  
  6. Var
  7.   blocking: TStringList;
  8.   ips: TStringList;
  9.   matches: TStringList;
  10.   i: longint;
  11.   st: TDateTime;
  12.  
  13. Begin
  14.   st := Time;
  15.   blocking := TStringList.Create;
  16.   ips := TStringList.Create;
  17.   matches := TStringList.Create;
  18.   blocking.LoadFromFile('blocking.txt');
  19.   ips.LoadFromFile('ips.txt');
  20.   WriteLn(ips.Count);
  21.   for i := 0 to ips.Count-1 do
  22.   begin
  23.     if (blocking.IndexOf(ips[i]) > -1) then
  24.     begin
  25.       Write('.');
  26.       matches.Append(ips[i]);
  27.     end;
  28.   end;
  29.   WriteLn(#10'Total matches: ', matches.Count);
  30.   matches.SaveToFile('matches.txt');
  31.   matches.Destroy;
  32.   ips.Destroy;
  33.   blocking.Destroy;
  34.   WriteLn('Started at ', TimeToStr(st));
  35.   WriteLn('Took ',Time-st);
  36. End.
  37.  

I also don't really know how to properly time code in Pascal, in Python, there's a timedelta object, as you'll see in the Python code example:

Code: Python  [Select][+][-]
  1. import datetime, sys
  2.  
  3. st=datetime.datetime.now()
  4. blocking = [line.split(',')[1] for line in open('BIG_blocktable_mapi-Updated.csv','r').readlines()]
  5. ips = open('ips.txt','r').readlines()
  6.  
  7. matches=[]
  8. for ip in ips:
  9.   if ip.replace('\n','') in blocking:
  10.     sys.stdout.write('.')
  11.     sys.stdout.flush()
  12.     matches.append(ip)
  13.  
  14. print "\nTotal matches: %s" % len(matches)
  15. open('matches.txt', 'w').writelines(matches)
  16. d=datetime.datetime.now()-st
  17. print "Took %s seconds." % d.seconds
  18.  

Would the standard output being slowing things down?  I plan on doing another benchmark in a bit without any output, besides that which displays the length of the operation.

Are there additional options I could provide to the FPC compiler which would optimize code like this?  I would really love to know what everybody else does in regards to Pascal/Delphi code optimizations.

Should I be creating an array instead of using a TStringList class for this?  Would that make the code run 5x quicker like compiled code should?

Also, oddly, running the same code using Cython(Python module compiled into machine code) runs the same loop at basically the same speed...

This speed concern won't deter me from using Pascal/Delphi/Lazarus for future projects, as I see other benefits of it other than shear speed.  Such as being compiled(less deps on targets), and being able to target more platforms than Python currently can.  I personally use a Linux PC for everything, and when it comes to distributing Python apps to other platforms... Well, I'm sure anybody who's use a scripting language knows the ordeals here.  Python is much larger, and shipping more than one app potentially means multiple Python installs on the target OS.  Where Pascal just depends on the core OS libraries, and is thus easier to ship.  Anyways, any insight on this issue would be great!

Phil

  • Hero Member
  • *****
  • Posts: 2737
Re: What am I doing wrong? Similar FPC code runs 1.5x slower than CPython?!?!
« Reply #1 on: February 14, 2016, 05:56:38 am »
I don't think you're doing anything wrong. In all 3 cases you cite (Pascal, CPython, compiled Python), I would guess the bulk of execution occurs in the runtime, not in your code, and the Python runtime is compiled C code, not bytecode, just as the Pascal runtime is compiled Pascal.

Is searching a Python list ("in blocking") faster than searching a Pascal TStringList ("IndexOf")? Well, maybe it is as written. You could try setting the blocking TStringList's Sorted property to True - that should speed up IndexOf with a very large list.

Minor suggestions:

- Call Free, which calls Destroy for you; don't call Destroy directly.

- SecondsBetween in the standard DateUtils unit is what you're looking for.

I generally reach for TStringList without thinking. Particularly useful with lists of key=value pairs or if you want to associate an object with each string in the list. TStringList is used throughout VCL/LCL, so it's probably worth reviewing its properties and methods.

Leledumbo

  • Hero Member
  • *****
  • Posts: 8757
  • Programming + Glam Metal + Tae Kwon Do = Me
Re: What am I doing wrong? Similar FPC code runs 1.5x slower than CPython?!?!
« Reply #2 on: February 14, 2016, 06:34:57 am »
Use profiler to determine the bottleneck.

The two programs are certainly not similar, TStringList is rather complex instead of a simple list. Even the Add method (which is the same as Append) has a lot of things to consider. According to Python wiki, the in operator has linear complexity in case of list, so IndexOf performance should be ALMOST similar (again, IndexOf requires additional code to check for Sorted property which could give noticable slowdown in a tight loop).

Something else like Classes.TFPList or gvector.TVector might be better for this since they don't have complex implementation, while the LoadFromFile in TStringList might be better replaced by LoadFromFile in TMemoryStream. Tested with 5.6 MB file, on my system TStringList.LoadFromFile takes 0.032 seconds, while TMemoryStream.LoadFromFile only takes 0.002 seconds, that's a 16 times difference. Retested with a 111 MB file, it's 0.696 vs 0.045, which is roughly 15 times difference, so the difference is actually proportional and consistent.

Code optimization in a high level language but still lower than scripting language requires thorough understanding of what's available and how they work ;)

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: What am I doing wrong? Similar FPC code runs 1.5x slower than CPython?!?!
« Reply #3 on: February 14, 2016, 08:49:47 am »
Make sure TStringList.Sorted is True.

OR

Try THashedStringList instead of TStringList.
« Last Edit: February 14, 2016, 08:59:13 am by engkin »

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11452
  • FPC developer.
Re: What am I doing wrong? Similar FPC code runs 1.5x slower than CPython?!?!
« Reply #4 on: February 14, 2016, 01:04:30 pm »
There are multiple issues going on:

In general, TStringList is a convenience class, not a performance oriented class.

It also supports duplicates and is somewhat conservative with memory.

- as said for quick test sorted must be on, and it is default off.
- If it is on, however insertion will get slower when number of items >100000
- Python has in languages hashes etc that are not written in Python, giving it an edge. However probably it has some slowdowns too due to its typesystem.
- Using the "text" property for large strings uses multiples of the memory. Some of TStringList routines do this internally.

Some other options are at the wiki page, but a bit outdated, and the most important one, the delphi compatible generics.collections is not there.
« Last Edit: February 14, 2016, 01:07:24 pm by marcov »

balazsszekely

  • Guest
Re: What am I doing wrong? Similar FPC code runs 1.5x slower than CPython?!?!
« Reply #5 on: February 14, 2016, 04:48:34 pm »
Please try this, it should be faster:
Code: Pascal  [Select][+][-]
  1. Program ChkBlock;
  2. {$M OBJPAS}
  3.  
  4. uses classes, sysutils;
  5.  
  6. function FileToString(const AFile: String; var Str: String): Boolean;
  7. var
  8.   MS: TMemoryStream;
  9. begin
  10.   Result := False;
  11.   if not FileExists(AFile) then
  12.     Exit;
  13.   MS := TMemoryStream.Create;
  14.   try
  15.     MS.LoadFromFile(AFile);
  16.     if MS.Size > 0 then
  17.     begin
  18.       SetLength(Str, MS.Size div SizeOf(Char));
  19.       MS.ReadBuffer(Pointer(Str)^, MS.Size div SizeOf(Char));
  20.       Result := True;
  21.     end;
  22.   finally
  23.     MS.Free;
  24.   end;
  25. end;
  26.  
  27. function StringToFile(const AFile, Str: String): Boolean;
  28. var
  29.   MS: TMemoryStream;
  30. begin
  31.   Result := False;
  32.   MS := TMemoryStream.Create;
  33.   try
  34.     MS.Write(Pointer(Str)^, Length(Str) div SizeOf(Char));
  35.     MS.Position := 0;
  36.     MS.SaveToFile(AFile);
  37.     Result := FileExists(AFile);
  38.   finally
  39.     MS.Free;
  40.   end;
  41. end;
  42.  
  43. var
  44.   IPs: String;
  45.   Blocking: String;
  46.   Matches: String;
  47.   SL: TStringList;
  48.   Ellapsed: DWord;
  49.   P, I: Integer;
  50. begin
  51.   Ellapsed := GetTickCount;
  52.   IPs := '';
  53.   Blocking := '';
  54.   Matches := '';
  55.   FileToString('IPs.txt', IPs);
  56.   FileToString('Blocking.txt', Blocking);  
  57.   SL := TStringList.Create;
  58.   try
  59.     SL.Text := Blocking;
  60.     for I := 0 to SL.Count - 1 do
  61.     begin
  62.       P := Pos(SL[I], IPs);
  63.       if P > 0 then
  64.         Matches := Matches + Sl[I] + sLineBreak;
  65.     end;
  66.   finally
  67.     SL.Free;
  68.   end;
  69.   StringToFile('Matches.txt', Matches);
  70.   Ellapsed := GetTickCount - Ellapsed;
  71.   Writeln('Took ', Ellapsed, ' ms');
  72.   Readln;
  73. end.
  74.  

Cyrax

  • Hero Member
  • *****
  • Posts: 836
Re: What am I doing wrong? Similar FPC code runs 1.5x slower than CPython?!?!
« Reply #6 on: February 14, 2016, 04:58:57 pm »
Please try this, it should be faster:
Code: Pascal  [Select][+][-]
  1. Program ChkBlock;
  2. {$M OBJPAS}
  3.  
  4. uses classes, sysutils;
  5.  
  6. function FileToString(const AFile: String; var Str: String): Boolean;
  7. var
  8.   MS: TMemoryStream;
  9. begin
  10.   Result := False;
  11.   if not FileExists(AFile) then
  12.     Exit;
  13.   MS := TMemoryStream.Create;
  14.   try
  15.     MS.LoadFromFile(AFile);
  16.     if MS.Size > 0 then
  17.     begin
  18.       SetLength(Str, MS.Size div SizeOf(Char));
  19.       MS.ReadBuffer(Pointer(Str)^, MS.Size div SizeOf(Char));
  20.       Result := True;
  21.     end;
  22.   finally
  23.     MS.Free;
  24.   end;
  25. end;
  26.  
  27. function StringToFile(const AFile, Str: String): Boolean;
  28. var
  29.   MS: TMemoryStream;
  30. begin
  31.   Result := False;
  32.   MS := TMemoryStream.Create;
  33.   try
  34.     MS.Write(Pointer(Str)^, Length(Str) div SizeOf(Char));
  35.     MS.Position := 0;
  36.     MS.SaveToFile(AFile);
  37.     Result := FileExists(AFile);
  38.   finally
  39.     MS.Free;
  40.   end;
  41. end;
  42.  
  43. var
  44.   IPs: String;
  45.   Blocking: String;
  46.   Matches: String;
  47.   SL: TStringList;
  48.   Ellapsed: DWord;
  49.   P, I: Integer;
  50. begin
  51.   Ellapsed := GetTickCount;
  52.   IPs := '';
  53.   Blocking := '';
  54.   Matches := '';
  55.   FileToString('IPs.txt', IPs);
  56.   FileToString('Blocking.txt', Blocking);  
  57.   SL := TStringList.Create;
  58.   try
  59.     SL.Text := Blocking;
  60.     for I := 0 to SL.Count - 1 do
  61.     begin
  62.       P := Pos(SL[I], IPs);
  63.       if P > 0 then
  64.         Matches := Matches + Sl[I] + sLineBreak;
  65.     end;
  66.   finally
  67.     SL.Free;
  68.   end;
  69.   StringToFile('Matches.txt', Matches);
  70.   Ellapsed := GetTickCount - Ellapsed;
  71.   Writeln('Took ', Ellapsed, ' ms');
  72.   Readln;
  73. end.
  74.  

Should {$M OBJPAS} to be {$M OBJFPC}{$H+} (enable ansistring for bigger string sizes instead of 256 chars) ?

MS.ReadBuffer(Pointer(Str)^, MS.Size div SizeOf(Char)); should be MS.ReadBuffer(Str[1], MS.Size div SizeOf(Char));.

balazsszekely

  • Guest
Re: What am I doing wrong? Similar FPC code runs 1.5x slower than CPython?!?!
« Reply #7 on: February 14, 2016, 05:40:25 pm »
Quote
@Cyrax
Should {$M OBJPAS} to be {$M OBJFPC}{$H+} (enable ansistring for bigger string sizes instead of 256 chars) ?
String must be larger then 256 chars in this mode, because I loaded files bigger then 20Mb without problems.

Quote
@Cyrax
MS.ReadBuffer(Pointer(Str)^, MS.Size div SizeOf(Char)); should be MS.ReadBuffer(Str[1], MS.Size div SizeOf(Char));.
Those are identical. Yours look prettier though.  :D

FTurtle

  • Sr. Member
  • ****
  • Posts: 292
Re: What am I doing wrong? Similar FPC code runs 1.5x slower than CPython?!?!
« Reply #8 on: February 14, 2016, 06:07:41 pm »
Should {$M OBJPAS} to be {$M OBJFPC}{$H+} (enable ansistring for bigger string sizes instead of 256 chars) ?

The maximum string length is only one side.
The other side is copying of strings content.

Quote
@Cyrax
Should {$M OBJPAS} to be {$M OBJFPC}{$H+} (enable ansistring for bigger string sizes instead of 256 chars) ?
String must be larger then 256 chars in this mode, because I loaded files bigger then 20Mb without problems.

Why not?
classesh.inc begins from {$H+}
Do you work with separate strings longer then 255 in your module?


engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: What am I doing wrong? Similar FPC code runs 1.5x slower than CPython?!?!
« Reply #9 on: February 14, 2016, 06:26:46 pm »
FileToString is using two buffers: one for TMemorySteam and another one for Str. I think that is redundant, copying to TMemoryStream and then copying the same data again to Str. Why not cancel the middle man (TMemorySteam):
Code: Pascal  [Select][+][-]
  1. function FileToString(const AFile: String; var Str: String): Boolean;
  2. var
  3.   f: file;
  4.   Size: Int64;
  5. begin
  6.   Result := False;
  7.   if not FileExists(AFile) then
  8.     Exit;
  9.   AssignFile(f, AFile);
  10.   Reset(f, 1);
  11.   Size := FileSize(f);
  12.   try
  13.     if Size > 0 then
  14.     begin
  15.       SetLength(Str, Size div SizeOf(Char));
  16.       BlockRead(f, Str[1], Size div SizeOf(Char));
  17.       Result := True;
  18.     end;
  19.   finally
  20.     CloseFile(f);
  21.   end;
  22. end;
  23.  

Leledumbo

  • Hero Member
  • *****
  • Posts: 8757
  • Programming + Glam Metal + Tae Kwon Do = Me
Re: What am I doing wrong? Similar FPC code runs 1.5x slower than CPython?!?!
« Reply #10 on: February 14, 2016, 06:33:31 pm »
FileToString is using two buffers: one for TMemorySteam and another one for Str. I think that is redundant, copying to TMemoryStream and then copying the same data again to Str. Why not cancel the middle man (TMemorySteam):
Code: Pascal  [Select][+][-]
  1. function FileToString(const AFile: String; var Str: String): Boolean;
  2. var
  3.   f: file;
  4.   Size: Int64;
  5. begin
  6.   Result := False;
  7.   if not FileExists(AFile) then
  8.     Exit;
  9.   AssignFile(f, AFile);
  10.   Reset(f, 1);
  11.   Size := FileSize(f);
  12.   try
  13.     if Size > 0 then
  14.     begin
  15.       SetLength(Str, Size div SizeOf(Char));
  16.       BlockRead(f, Str[1], Size div SizeOf(Char));
  17.       Result := True;
  18.     end;
  19.   finally
  20.     CloseFile(f);
  21.   end;
  22. end;
  23.  
IMHO, that's fine. There's no read function that reads a whole file into memory in one go, TMemoryStream.LoadFromFile has such an implementation and is lightweight because there's no further processing done. Copying to string allows the data to be manipulated as string, as opposed to plain bytes. TMemoryStream.Memory can also be used directly to reduce memory usage but requires careful handling (it has no protection at all that plain string has).

balazsszekely

  • Guest
Re: What am I doing wrong? Similar FPC code runs 1.5x slower than CPython?!?!
« Reply #11 on: February 14, 2016, 06:46:00 pm »
@engkin, @Leledumbo

In my tests,  the OP's original code runs in 50411 ms. The optimized code 1238 ms. If I replace FileToString with @engkin's version, the time is 1242, basically the same.
The bottleneck is still the TStringList, especially this line:
Code: Pascal  [Select][+][-]
  1. SL.Text := Blocking;

We need a fast code, which loads a text file into an array of string.

PS: I did try to split the string "Blocking"  with a repeat, using Pos, this way avoiding a TStringList. Believe it or not it's incredibly slow, slower then SL.Text := Blocking; . I suppose because of the large size of the string.
« Last Edit: February 14, 2016, 06:49:59 pm by GetMem »

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: What am I doing wrong? Similar FPC code runs 1.5x slower than CPython?!?!
« Reply #12 on: February 14, 2016, 06:52:14 pm »
There's no read function that reads a whole file into memory in one go, TMemoryStream.LoadFromFile has such an implementation and is lightweight because there's no further processing done.

I am not sure what you mean. On Windows at least, both TMemoryStream.LoadFromFile and BlockRead use the same OS function readfile.

Copying to string allows the data to be manipulated as string, as opposed to plain bytes.

Is it needed in our specific example above?

Leledumbo

  • Hero Member
  • *****
  • Posts: 8757
  • Programming + Glam Metal + Tae Kwon Do = Me
Re: What am I doing wrong? Similar FPC code runs 1.5x slower than CPython?!?!
« Reply #13 on: February 14, 2016, 06:57:06 pm »
I am not sure what you mean. On Windows at least, both TMemoryStream.LoadFromFile and BlockRead use the same OS function readfile.
I mean, at Pascal level.
Copying to string allows the data to be manipulated as string, as opposed to plain bytes.
Is it needed in our specific example above?
GetMem solution relies on TStringList.Text for splitting by new line, so yes it's required.

Plain good old Pascal might help:
Code: Pascal  [Select][+][-]
  1. const
  2.   fn = 'name of the file';
  3. var
  4.   f: File;
  5.   fs: Int64;
  6. begin
  7.   fs := FileSizeUtf8(fn);
  8.   AssignFile(f,fn);
  9.   Reset(f,fs);
  10.   SetLength(s,fn);
  11.   BlockRead(f,s[1],1);
  12.   CloseFile(f);
  13.  
This is just as fast TMemoryStream.LoadFromFile (faster actually, but you need a very big file to see the difference), but uses single buffer (s). I have no idea yet how to convert it to array of string efficiently. I think it's better to further process s into a THashSet<String,SomeFastHashFunction> instance, at least Insert & Contains method complexity is guaranteed to be O(1), so only SomeFastHashFunction left to optimize, which should be fairly easy.
« Last Edit: February 14, 2016, 07:16:44 pm by Leledumbo »

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: What am I doing wrong? Similar FPC code runs 1.5x slower than CPython?!?!
« Reply #14 on: February 14, 2016, 07:01:52 pm »
The bottleneck is still the TStringList, especially this line:
Code: Pascal  [Select][+][-]
  1. SL.Text := Blocking;

We need a fast code, which loads a text file into an array of string.

Or a fast code to build array of location and length of each line.

 

TinyPortal © 2005-2018