Recent

Author Topic: [SOLVED] Why can't I beat Python on generating 1 Billion Rows?  (Read 5996 times)

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1125
  • Professional amateur ;-P
Hey Y'All,

While doing preparations for the 1 Billion Row Challenge in Object Pascal, mainly the generator program, I'm finding myself running quite a bit longer than the Python script.

I've done some optimisation with regards of the output of the progress bar to STDOUT, but now I'm stumped at what more I need to do...

Thought the random function was it, but it seams that's not the case.

Can anyone shed some light here, or even do a profile of the current code?

Anything would be quite appreciated!! Thanks in advance!!

Cheers,
Gus
« Last Edit: March 03, 2024, 06:14:07 pm by Gustavo 'Gus' Carreno »
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9915
  • Debugger - SynEdit - and more
    • wiki
Re: Why can't I beat Python on generating 1 Billion Rows?
« Reply #1 on: March 03, 2024, 10:29:00 am »
Try valgrind --tool=cachegrind  and kcachegrind on linux....

I haven't profiled it, but experience says: String operations

E.g.
Code: Pascal  [Select][+][-]
  1.     entry:= entry.Split(';')[0];
Generates an entire array (allocating memory for each string within), and then throws them away (Except for entry 0).
Find the first ";" (Pos()) and then SetLength(entry, ....)

Code: Pascal  [Select][+][-]
  1.   Result:= '[';
  2.   Result:= Result + StringOfChar('#', filled);
  3.   Result:= Result + StringOfChar('-', ALength - filled);
  4.   Result:= Result + Format('] %5.2f %% done.', [ percentDone ]);
The string "Result" needs to be resized over and over as content gets added => lots of mem re-alloc and moving string content.
Get the full size string first with SetLength. After SetLength it has a refcount of 1, so then use PChar access to fill in the content.

"Format" ... probably not the fastest...

Where do you set FStationNames.Capacity? Would a hashmap of some form be more efficient?

Thaddy

  • Hero Member
  • *****
  • Posts: 14400
  • Sensorship about opinions does not belong here.
Re: Why can't I beat Python on generating 1 Billion Rows?
« Reply #2 on: March 03, 2024, 10:38:56 am »
Did you already compile in {$I- state?
Did you indeed change the random to simpler, faster one?
https://wiki.freepascal.org/Delphi_compatible_LCG_Random
?
« Last Edit: March 03, 2024, 10:54:08 am by Thaddy »
Object Pascal programmers should get rid of their "component fetish" especially with the non-visuals.

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1125
  • Professional amateur ;-P
Re: Why can't I beat Python on generating 1 Billion Rows?
« Reply #3 on: March 03, 2024, 10:56:46 am »
Hey Martin_fr,

Try valgrind --tool=cachegrind  and kcachegrind on linux....

I haven't profiled it, but experience says: String operations

Thanks for this!! I'm a complete ignorant on how to profile under Linux, and this is a good starting point !!

E.g.
Code: Pascal  [Select][+][-]
  1.     entry:= entry.Split(';')[0];
Generates an entire array (allocating memory for each string within), and then throws them away (Except for entry 0).
Find the first ";" (Pos()) and then SetLength(entry, ....)

Code: Pascal  [Select][+][-]
  1.   Result:= '[';
  2.   Result:= Result + StringOfChar('#', filled);
  3.   Result:= Result + StringOfChar('-', ALength - filled);
  4.   Result:= Result + Format('] %5.2f %% done.', [ percentDone ]);
The string "Result" needs to be resized over and over as content gets added => lots of mem re-alloc and moving string content.
Get the full size string first with SetLength. After SetLength it has a refcount of 1, so then use PChar access to fill in the content.

"Format" ... probably not the fastest...

Where do you set FStationNames.Capacity? Would a hashmap of some form be more efficient?

These are very good points to consider!! Thanks!!

Alas, this is not the main problem at all. The building of the weather station list is already quite fast, and doesn't impact the main thing: Spitting out 1 Billion rows.

The only issue I have with the weather station list is having it be the same under Linux and Windows, due to locales.
On this answer, JuhaManninen mentions that I can set .UseLocale:= False to solve this, I think...

The code for the progress bar is only called on every 10% done, so that frequency diminishes the bigger the row count.

The problem is really the loop that outputs the rows onto the file.

Cheers,
Gus
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

Seenkao

  • Hero Member
  • *****
  • Posts: 550
    • New ZenGL.
Re: Why can't I beat Python on generating 1 Billion Rows?
« Reply #4 on: March 03, 2024, 10:59:24 am »
Потому что Python использует код C/C++? И генерирует строки посредством представления строки в байтах? Где не надо ни каких манипуляций делать со строками, а можно сгенерировать строку и записать её в файл. Даже проверок ни каких не будет.

Google translate:
Because Python uses C/C++ code? And generates strings by representing the string in bytes? Where you don’t need to do any manipulations with strings, but you can generate a string and write it to a file. There won't even be any checks.
« Last Edit: March 03, 2024, 11:01:05 am by Seenkao »
Rus: Стремлюсь к созданию минимальных и достаточно быстрых приложений.

Eng: I strive to create applications that are minimal and reasonably fast.
Working on ZenGL

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1125
  • Professional amateur ;-P
Re: Why can't I beat Python on generating 1 Billion Rows?
« Reply #5 on: March 03, 2024, 11:05:03 am »
Hey Thaddy,

Did you already compile in {$I- state?

Ermmm, I dunno...
Could you explain it better to this less knowledgeable person on this matter?

Did you indeed change the random to simpler, faster one?
https://wiki.freepascal.org/Delphi_compatible_LCG_Random

I have not!!
I'll put that on the list to test next!!
Thanks !!!

Cheers,
Gus
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

Warfley

  • Hero Member
  • *****
  • Posts: 1499
Re: Why can't I beat Python on generating 1 Billion Rows?
« Reply #6 on: March 03, 2024, 11:05:43 am »
If you look into the python file:
Code: Pascal  [Select][+][-]
  1. batch_size = 10000 # instead of writing line by line to file, process a batch of stations and put it to disk

Thats probably the sole reason why python is so much faster than your code. Individual calls to Write/Read are really slow, which is why yhou should buffer for large outputs. The python code does that, you don't

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9915
  • Debugger - SynEdit - and more
    • wiki
Re: Why can't I beat Python on generating 1 Billion Rows?
« Reply #7 on: March 03, 2024, 11:05:49 am »
Hey Martin_fr,

Try valgrind --tool=cachegrind  and kcachegrind on linux....

I haven't profiled it, but experience says: String operations

Thanks for this!! I'm a complete ignorant on how to profile under Linux, and this is a good starting point !!

You need to compile your app with   -gv   (and have heaptrc off)
This uses a diff mem manager, so times slightly shift, but not to worry.
And if you can, use fpc with debug info, so you see if certain fpc functions are called often, and take much time.


Actually, IIRC, for basic profilnig use

valgrind --tool=callgrind  yourapp

It will run dead slow, so maybe just run it over 10000 entries.
Then a file starting with callgrind.... will have been created.

kcachegrind callgrind1234.out

And it will show you how much time was spent in each function...
« Last Edit: March 03, 2024, 11:08:52 am by Martin_fr »

domasz

  • Sr. Member
  • ****
  • Posts: 437
Re: Why can't I beat Python on generating 1 Billion Rows?
« Reply #8 on: March 03, 2024, 11:06:42 am »
I looked at your code and the Python one and perhaps you loose some speed of classes? Procedural code could be quicker, I think.

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1125
  • Professional amateur ;-P
Re: Why can't I beat Python on generating 1 Billion Rows?
« Reply #9 on: March 03, 2024, 11:07:31 am »
Hey Seenkao,

Потому что Python использует код C/C++? И генерирует строки посредством представления строки в байтах? Где не надо ни каких манипуляций делать со строками, а можно сгенерировать строку и записать её в файл. Даже проверок ни каких не будет.

Google translate:
Because Python uses C/C++ code? And generates strings by representing the string in bytes? Where you don’t need to do any manipulations with strings, but you can generate a string and write it to a file. There won't even be any checks.

Hummm, you do have a point!!

Maybe using byte arrays instead of strings could be an option... I'll have a go at that!!

Cheers,
Gus
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1125
  • Professional amateur ;-P
Re: Why can't I beat Python on generating 1 Billion Rows?
« Reply #10 on: March 03, 2024, 11:11:10 am »
Hey Warfley,

If you look into the python file:
Code: Pascal  [Select][+][-]
  1. batch_size = 10000 # instead of writing line by line to file, process a batch of stations and put it to disk

Thats probably the sole reason why python is so much faster than your code. Individual calls to Write/Read are really slow, which is why yhou should buffer for large outputs. The python code does that, you don't

That's not a bad point!!
Thanks !!

But allow me to ask a dumb question: By using a buffered write stream with a 20MiB capacity, am I not already dealing with that?

Cheers,
Gus
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1125
  • Professional amateur ;-P
Re: Why can't I beat Python on generating 1 Billion Rows?
« Reply #11 on: March 03, 2024, 11:16:18 am »
Hey Martin_fr,

Hey Martin_fr,

Try valgrind --tool=cachegrind  and kcachegrind on linux....

I haven't profiled it, but experience says: String operations

Thanks for this!! I'm a complete ignorant on how to profile under Linux, and this is a good starting point !!

You need to compile your app with   -gv   (and have heaptrc off)
This uses a diff mem manager, so times slightly shift, but not to worry.
And if you can, use fpc with debug info, so you see if certain fpc functions are called often, and take much time.


Actually, IIRC, for basic profilnig use

valgrind --tool=callgrind  yourapp

It will run dead slow, so maybe just run it over 10000 entries.
Then a file starting with callgrind.... will have been created.

kcachegrind callgrind1234.out

And it will show you how much time was spent in each function...

Cannot thank you enough for this!!! You DA MAN!!!

Will have a go at it once I get my head around the switches you're mentioning.
I'll have to be careful with what I set on Lazarus in terms of the compiling options. A thing I'm not that good at cuz I simply rely on Default, Debug and Release, all created by Lazarus with their own defaults.

Cheers,
Gus
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1125
  • Professional amateur ;-P
Re: Why can't I beat Python on generating 1 Billion Rows?
« Reply #12 on: March 03, 2024, 11:22:33 am »
Hey domasz,

I looked at your code and the Python one and perhaps you loose some speed of classes? Procedural code could be quicker, I think.

Quite a good point also!!
Many thanks!!!

I'll add a procedural only version to my tests.

Cheers,
Gus
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1125
  • Professional amateur ;-P
Re: Why can't I beat Python on generating 1 Billion Rows?
« Reply #13 on: March 03, 2024, 11:24:30 am »
Hey Y'all,

This is my shame:

From the Python script
Code: Bash  [Select][+][-]
  1. $ time ./bin/create_measurements.py 1_000_000_000
  2. Estimated max file size is:  27.9 GiB.
  3. True size is probably much smaller (around half).
  4. Building test data...
  5. [                                                  ] 0%
  6. Test data successfully written to /tmp/measurements.txt
  7. Actual file size:  14.8 GiB
  8. Elapsed time: 5 minutes 39 seconds
  9. Test data build complete.
  10.  
  11. real    5m39.953s
  12. user    5m26.479s
  13. sys    0m13.345s

From my code:
Code: Bash  [Select][+][-]
  1. $ time ./bin/generator -i data/weather_stations.csv -o /tmp/measurements-1_000_000_000.txt -n 1_000_000_000
  2. Input File : /home/gcarreno/Programming/1brc-ObjectPascal/data/weather_stations.csv
  3. Output File: /tmp/measurements-1_000_000_000.txt
  4. Line Count : 1,000,000,000
  5.  
  6. [##################################################] 100.00 % done.
  7.  
  8. real    61m8.893s
  9. user    24m1.417s
  10. sys    37m5.462s
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

Warfley

  • Hero Member
  • *****
  • Posts: 1499
Re: Why can't I beat Python on generating 1 Billion Rows?
« Reply #14 on: March 03, 2024, 11:29:40 am »
But allow me to ask a dumb question: By using a buffered write stream with a 20MiB capacity, am I not already dealing with that?

Cheers,
Gus
Ah didn't the the WriteBufStream there. This target the problem, but 20 megs is way to much. Memory page size is 4k everything above that causes page faults when accessing the memory. On the other hand SSD Sector size is mostly around 64k, everything above that causes multiple writes internally anyway. So you should probably use either 4k or 64k as buffer size. But that shouldn't make that much of a difference.

A few other things I've noticed in the python code:
Code: Bash  [Select][+][-]
  1. random.choices(station_names_10k_max, k=batch_size)
This creates all the random numbers in one call to a C++ library. So if they use a faster rng algorithm, or have more optimized code, this can make a big difference.

Aside from that there is not much that could be optimized

 

TinyPortal © 2005-2018