Recent

Author Topic: [SOLVED] Speed-up masive file writing.  (Read 7021 times)

miki

  • New member
  • *
  • Posts: 7
[SOLVED] Speed-up masive file writing.
« on: August 23, 2016, 12:27:10 am »
Hi everybody  :D

I'm making a program with some test and PoC, in both FreePascal [3.0] and Java [1.8].

I have a function to write integer arrays into text files (each line is a number). The Java code:
Code: Javascript  [Select]
  1.     public void arrayToFile(int[] array, String fileName) throws IOException
  2.     {
  3.         BufferedWriter writer = new BufferedWriter(new FileWriter(new File(fileName)));
  4.         for (int number: array)
  5.         {
  6.             writer.write(Integer.toString(number));
  7.             writer.newLine();
  8.         }
  9.         writer.close();
  10.     }
  11.  

This code takes 700~800ms to save 10,000,000 integers, creating a 110MB file.

When writing this in FreePascal, I first came to this solution
Code: Pascal  [Select]
  1. procedure arrayToFile(numbers: Array of Integer; fileName: String);
  2. var
  3.     list: TStrings;
  4.     i, len: Integer;
  5. begin
  6.     list := TStringList.Create;
  7.     len := Length(numbers) - 1;
  8.     for i := 0 to len do
  9.     begin
  10.         list.Add(IntToStr(numbers[i]));
  11.     end;
  12.     list.SaveToFile(FileName);
  13.     list.Free;
  14. end;
  15.  

It took 2.2~2.8 seconds. Setting list.Capacity or list.Begin/EndUpdate had no measurable effect.
Searching for something better, I tried this approach:
Code: Pascal  [Select]
  1. procedure arrayToFile(numbers: Array of Integer; fileName: String);
  2. var
  3.     data: TMemoryStream;
  4.     i, len: Integer;
  5.     Str: String;
  6. begin
  7.     data := TMemoryStream.Create;
  8.     len := Length(numbers) - 1;
  9.     for i := 0 to len do
  10.     begin
  11.         Str := IntToStr(numbers[i]) + lineEnding;
  12.         data.write(str[1], Length(str));
  13.     end;
  14.     data.Position := 0;
  15.     data.SaveToFile(fileName);
  16.     data.Free;
  17.     done;
  18. end;
  19.  

I removed the sizeOf(char) part, because I expect this to be always 1, right?
This was better, 1.8~2.0 seconds. Still slower than Java.

Searching further, I also found this other thing:
Code: Pascal  [Select]
  1. procedure arrayToFile(numbers: Array of Integer; fileName: String);
  2. var
  3.     data: TMemoryStream;
  4.     i, len: Integer;
  5.     str, t: String;
  6. begin
  7.     data := TMemoryStream.Create;
  8.     len := Length(numbers) - 1;
  9.     for i := 0 to len do
  10.     begin
  11.         System.Str(numbers[i], t);
  12.         Str := t + lineEnding;
  13.         data.write(str[1], Length(str));
  14.     end;
  15.     data.Position := 0;
  16.     data.SaveToFile(fileName);
  17.     data.Free;
  18. end;
  19.  

Faster!!! 1.0~1.2 seconds. Still slower than Java, but very close.

The question: any ideas on how to speed up this process? To make it at least as fast as Java, problably reaching my HDD bottleneck.

Regards  :)

PS - extra details:
Compiler options: -O4 -XX -CX -Xs
JVM options: -server -XX:CompileThreshold=2 -XX:+AggressiveOpts -XX:+UseFastAccessorMethods
« Last Edit: August 23, 2016, 09:42:11 pm by miki »

Phil

  • Hero Member
  • *****
  • Posts: 2750
Re: Speed-up masive file writing.
« Reply #1 on: August 23, 2016, 12:31:46 am »
Try doing a WriteLn to the file.

You'll need AssignFile on a TextFile, then Rewrite, then the WriteLn calls, finally CloseFile.

See FPC docs.


miki

  • New member
  • *
  • Posts: 7
Re: Speed-up masive file writing.
« Reply #2 on: August 23, 2016, 12:58:12 am »
Hi Phil, thanks for your fast answer. Today it's about speed  8)

You'll need AssignFile on a TextFile, then Rewrite, then the WriteLn calls, finally CloseFile.
See FPC docs.

Don't need to see any docs, I learnt to use those functions when I was 14 (maaaany years ago).
For some reason, I excepted them to be slower. But I was wrong: 900ms 620ms

Anyways, I got the answer for my own question. Just removing the string concatenation, things go much faster.
Code: Pascal  [Select]
  1. procedure arrayToFile(numbers: Array of Integer; fileName: String);
  2. var
  3.     data: TMemoryStream;
  4.     i, len, lendl: Integer;
  5.     str, lend: String;
  6. begin
  7.     data := TMemoryStream.Create;
  8.     len := Length(numbers) - 1;
  9.     lend := lineEnding;
  10.     lendl := length(str);
  11.     for i := 0 to len do
  12.     begin
  13.         System.Str(numbers[i], str);
  14.         data.write(Str[1], Length(Str));
  15.         data.write(lend[1], lendl);
  16.     end;
  17.     data.Position := 0;
  18.     data.SaveToFile(fileName);
  19.     data.Free;
  20. end;
  21.  

Time: 660~680ms. I'm happy for today :)

Of course, the WriteLn approach has a great pro: smaller memory footprint. But 100MB aren't that much in modern computers.

Regards.

EDIT: running this in a SSD device, the saving time is even a bit faster, about 600ms. So hey, the code is faster than the HDD.
« Last Edit: August 23, 2016, 08:44:26 pm by miki »

johnsson

  • New Member
  • *
  • Posts: 22
  • Lazarus Rocks
Re: Speed-up masive file writing.
« Reply #3 on: August 23, 2016, 02:59:56 am »
Can I suggest a modification?

I' m realy sorry, I don't see the string conversion, so here is the correct version.

Code: [Select]
procedure arrayToFile(var numbers: Array of Integer; fileName: String);
var
  Data: TMemoryStream;
  I: Integer;
  P, S: Pointer;
  V: string;
begin
  Data := TMemoryStream.Create;
  Data.SetSize(15 * Length(numbers));
  P := Data.Memory;
  S := P;
  for I := 0 to High(numbers) do
  begin
    System.Str(numbers[i], V);
    Move(V[1],P^,Length(V));
    P += Length(V);
    Move(LineEnding,P^,2);
    P += 2;
  end;
  Data.SetSize(P - S);
  Data.SaveToFile(fileName);
  Data.Free;
end;

Here I got a 1400ms execution time, but I' am running this in a laptop with rly poor CPU. The major cost is the conversion procedure.

 :D
« Last Edit: August 23, 2016, 06:38:13 am by johnsson »
Just a regular guy

Laksen

  • Hero Member
  • *****
  • Posts: 621
    • J-Software
Re: Speed-up masive file writing.
« Reply #4 on: August 23, 2016, 03:29:45 am »
Try this. Much simpler and much faster
Code: Pascal  [Select]
  1. rocedure arrayToFile3(const numbers: Array of longint; const fileName: String);
  2. var
  3.   i: longint;
  4.   buf: array[0..65535] of char;
  5.   f: Text;
  6. begin
  7.   AssignFile(f, fileName);
  8.   Rewrite(f);
  9.  
  10.   SetTextBuf(f,buf[0],sizeof(buf));
  11.  
  12.   for i := 0 to high(numbers) do
  13.     writeln(f, numbers[i]);
  14.  
  15.   CloseFile(f);
  16. end;
  17.  

Phil

  • Hero Member
  • *****
  • Posts: 2750
Re: Speed-up masive file writing.
« Reply #5 on: August 23, 2016, 03:36:41 am »
Try this. Much simpler and much faster

Yes, that's the way I would write it.

Question: Does SetTextBuf make much of a difference in performance? That was a common trick years ago, but I'm wondering if modern OS buffering reduces its effect.


miki

  • New member
  • *
  • Posts: 7
Re: Speed-up masive file writing.
« Reply #6 on: August 23, 2016, 08:42:59 pm »
Ok, people, sumary (all test in SSD drive):

My last approach: 580~610ms
Phil's approach: 620~630ms
johnsson's approach: 460~480ms (the result is wrong)
Lanksen's approach: 435ms (Phil's + SetTextBuf)

Notes:
Yes, it seems SetTextBuf makes a difference. I don't like the remarks in docs, but it's the fastest!
Phil's approach: faster now? yes, I made a profiling error yesterday, sorry for that.
johnsson's: what is wrong? The line "Move(LineEnding...)" didn't compile for some reason. I guess we are using different OS, and this constant may be a string for you and a char for me. I made a little change in the code to make it work, but introduced some mistake, because the final file is a bit bigger and has broken line breaks [get the joke? broken breaks]. Anyways, very fast, too.

Lots of thanks to all of you.

Regards.
« Last Edit: August 23, 2016, 08:45:21 pm by miki »

Phil

  • Hero Member
  • *****
  • Posts: 2750
Re: Speed-up masive file writing.
« Reply #7 on: August 23, 2016, 08:56:50 pm »
Yes, it seems SetTextBuf makes a difference. I don't like the remarks in docs, but it's the fastest!

I would say the remarks are just common sense; Laksen's code is safe.

Good to know that SetTextBuf is still useful.


marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 7508
Re: Speed-up masive file writing.
« Reply #8 on: August 23, 2016, 09:19:14 pm »
Default the buffer is 128 bytes for historical reasons.

So yes it saves something. Setting it above 32kb is never noticable, and typically 8kb is already enough.

User137

  • Hero Member
  • *****
  • Posts: 1791
    • Nxpascal home
Re: [SOLVED] Speed-up masive file writing.
« Reply #9 on: August 24, 2016, 04:41:33 am »
Is there a typo in the docs? http://www.freepascal.org/docs-html/rtl/system/settextbuf.html

Quote
The maximum size of the newly assigned buffer is 65355 bytes.
Should be 65535? (2^16 - 1)
« Last Edit: August 24, 2016, 04:43:27 am by User137 »

johnsson

  • New Member
  • *
  • Posts: 22
  • Lazarus Rocks
Re: Speed-up masive file writing.
« Reply #10 on: August 24, 2016, 05:58:55 am »
Ok, people, sumary (all test in SSD drive):

My last approach: 580~610ms
Phil's approach: 620~630ms
johnsson's approach: 460~480ms (the result is wrong)
Lanksen's approach: 435ms (Phil's + SetTextBuf)

Notes:
Yes, it seems SetTextBuf makes a difference. I don't like the remarks in docs, but it's the fastest!
Phil's approach: faster now? yes, I made a profiling error yesterday, sorry for that.
johnsson's: what is wrong? The line "Move(LineEnding...)" didn't compile for some reason. I guess we are using different OS, and this constant may be a string for you and a char for me. I made a little change in the code to make it work, but introduced some mistake, because the final file is a bit bigger and has broken line breaks [get the joke? broken breaks]. Anyways, very fast, too.

Lots of thanks to all of you.

Regards.

I' m working with Win 10 64bits and Lazarus 1.6, the LineEnding is just a constant #13#10 in the System unit. Btw here the SetBufText approach take more time to execute than my approach, also, I don't got any compiler error and the final file is correct. Maybe a few diferences between SO and my PC specs cause this.

PC Spec

Laptop Asus S400CA
Core I3 1.5ghz (2 Cores + 2 HT)
6GB DDR3 1600mhz
SSD 120gb PNY

Anyaway the most important is the best time execution front the java version.

Free Pascal Rocks  8-)

A last comment, there a few routines optimization to convert Int type to string using assembly and SIMD commands, probably this will result in a better time execution.

 :D
Just a regular guy

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 7508
Re: [SOLVED] Speed-up masive file writing.
« Reply #11 on: August 24, 2016, 09:52:30 am »
Is there a typo in the docs? http://www.freepascal.org/docs-html/rtl/system/settextbuf.html

Quote
The maximum size of the newly assigned buffer is 65355 bytes.
Should be 65535? (2^16 - 1)

Maybe not. If a 16-bits value is used to hold buffersize. My guess however is that that is old TP leftover, and FPC accepts larger ones. Delphi allows buffers of MBs and larger (not that it matters much)

miki

  • New member
  • *
  • Posts: 7
Re: Speed-up masive file writing.
« Reply #12 on: August 24, 2016, 09:28:41 pm »
I' m working with Win 10 64bits and Lazarus 1.6, the LineEnding is just a constant #13#10 in the System unit. Btw here the SetBufText approach take more time to execute than my approach, also, I don't got any compiler error and the final file is correct. Maybe a few diferences between SO and my PC specs cause this.

PC Spec

Laptop Asus S400CA
Core I3 1.5ghz (2 Cores + 2 HT)
6GB DDR3 1600mhz
SSD 120gb PNY


Similar computer here, but desktop, not laptop. Using Manjaro Linux 64bits. According to documentation, LineEnding is system dependent.
In my case (Linux), it's just a #10, so it may be seen as char, while your Windows' #13#10 is a string.

Anyways, the wrong file I was getting was my fault, wrong adaption of your code. I realized later. Here the correct one:
Code: Pascal  [Select]
  1. procedure .arrayToFile(numbers: Array of Integer; fileName: String);
  2. var
  3.     Data: TMemoryStream;
  4.     I, Ll: Integer;
  5.     P, S: Pointer;
  6.     V, L: string;
  7. begin
  8.     Data := TMemoryStream.Create;
  9.     L := LineEnding; // <-- if it's a char, now it's a string
  10.     Ll:= Length(LineEnding); // <-- 1 on *NIX, 2 on windows
  11.     Data.SetSize((11 + Ll) * Length(numbers)); //<-- as numbers are signed 32bits integers, they will never be longer than 11digits + LineEnding
  12.     P := Data.Memory;
  13.     S := P;
  14.     for I := 0 to High(numbers) do
  15.     begin
  16.         System.Str(numbers[i], V);
  17.         Move(V[1], P^, Length(V));
  18.         P += Length(V);
  19.         Move(L, P^, Ll);
  20.         P += Ll; // <--- my error was here, leaving your "2"; that's the wrong thing with magic numbers :)
  21.     end;
  22.     Data.SetSize(P - S);
  23.     Data.SaveToFile(fileName);
  24.     Data.Free;
  25. end;
  26.  

FreePascal faster than Java? Well, I expected so, but it seems I have to do some tricks to achieve that!

I did the reverse-way function (fileToArray), and to get a fast result, I had to write my own StrToInt. Special one, because it reads.

First attempt, with TStringList + StrToInt loop, 2.8s
Code: Pascal  [Select]
  1. function fileToArray(fileName: String): TIntArray;
  2. var
  3.     list: TStrings;
  4.     i, len: Integer;
  5. begin
  6.     list := TStringList.Create;
  7.     list.LoadFromFile(fileName);
  8.     len := list.Count - 1;
  9.     setLength(result, len + 1);
  10.     for i := 0 to len do
  11.     begin
  12.         result[i] := StrToInt(list[i]);
  13.     end;
  14.     list.Free;
  15. end;
  16.  

Second attempt, using AssignFile, ReadLn, ... 1.2~1.6s. The magic number makes a lot!!! big one is faster. SetTextBuf is relevant, but not that much.
Code: Pascal  [Select]
  1. function fileToArray(fileName: String): TIntArray;
  2. var
  3.     F: TextFile;
  4.     i, bufs, s: Integer;
  5.     buf: array[0..65535] of char;
  6. begin
  7.     AssignFile(F, fileName);
  8.     SetTextBuf(F, buf[0], sizeof(buf));
  9.     Reset(F);
  10.     i := 0; bufs := 1; s := bufs*10000000;
  11.     SetLength(result, s);
  12.     while not eof(F) do
  13.     begin
  14.         ReadLn(F, result[i]);
  15.         Inc(i);
  16.         if i = s then
  17.         begin
  18.             Inc(bufs);
  19.             s := bufs*10000000;
  20.             SetLength(result, s);
  21.         end;
  22.     end;
  23.     CloseFile(F);
  24.     setLength(result, i);
  25. end;
  26.  

And the winner, using a stream, fastest but bigger, algo more memory footprint, 350ms!!!!!!
Code: Pascal  [Select]
  1. function fileToArray(fileName: String): TIntArray;
  2. var
  3.     fs: TFileStream;
  4.     i, bufs, s: Integer;
  5.     n, num, sign: Integer;
  6.     ss: string; c:char;
  7.     skip: boolean;
  8. begin
  9.     fs := TFileStream.Create(fileName, fmOpenRead);
  10.     setLength(ss, fs.size);
  11.     fs.read(ss[1], fs.size);
  12.     fs.free;
  13.    
  14.     i := 0; bufs := 1; s := bufs*10000000;
  15.     SetLength(result, s);
  16.    
  17.     num := 0; sign := 1; skip := true;
  18.     for n := 1 to high(ss) do
  19.     begin
  20.         c := ss[n];
  21.         if c = '-' then
  22.         begin
  23.             skip := false;
  24.             sign := -1;
  25.         end
  26.         else if c in ['0'..'9'] then
  27.         begin
  28.             skip := false;
  29.             num := num*10 + (ord(c) - 48);
  30.         end
  31.         else if not skip then
  32.         begin
  33.             skip := true;
  34.             result[i] := num * sign;
  35.             sign := 1; num := 0;
  36.             Inc(i);
  37.             if i = s then
  38.             begin
  39.                 Inc(bufs);
  40.                 s := bufs*10000000;
  41.                 SetLength(result, s);
  42.             end;
  43.         end;
  44.     end;
  45.     setLength(result, i);
  46. end;
  47.  

The Java version is not that fast, but still competes and much cleaner. It takes about 900ms.
Code: Javascript  [Select]
  1.     public int[] fileToArray(String fileName) throws IOException
  2.     {
  3.         try (BufferedReader br = new BufferedReader(new FileReader(fileName))) {
  4.             int[] res = br.lines()
  5.                      .mapToInt(Integer::parseInt)
  6.                      .toArray();
  7.              return res;
  8.         }
  9.     }
  10.  

Comparisons are not fair, anyways. Java hasn't the compatibility with old code and Delphi and other stuff, and FreePascal hasn't the Java budget.

What do you say?

Regards.

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 7508
Re: Speed-up masive file writing.
« Reply #13 on: August 25, 2016, 11:18:58 am »

Comparisons are not fair, anyways. Java hasn't the compatibility with old code and Delphi and other stuff, and FreePascal hasn't the Java budget.

What do you say?

And it is just one terribly small piece, aka microbenchmarking. Make an application that actually does something.

A task that maps (pun intended) nearly wholly onto a library or language feature will look shorter using that.


Leledumbo

  • Hero Member
  • *****
  • Posts: 8112
  • Programming + Glam Metal + Tae Kwon Do = Me
Re: Speed-up masive file writing.
« Reply #14 on: August 25, 2016, 03:25:27 pm »
The Java version is not that fast, but still competes and much cleaner. It takes about 900ms.
It's not difficult to make Pascal version that looks like Java one. TReadBufStream is practically BufferedReader counterpart, so is TFileStream for FileReader. Their interface is different, though. TReadBufStream has no lines() method, but extending the class to create such method is no difficult (just use array of String or TStrings for return value) with the help of StreamIO unit, just ReadLn until EOF. Again, mapToInt() can be implemented using type helper (for array of String) or extending TString(List) with such a method. No need for toArray() as it's better for the method to directly return an array (of Integer) instead of another stream (of int).

Wanna try creating and benchmarking that version?