Recent

Author Topic: [SOLVED] Why can't I beat Python on generating 1 Billion Rows?  (Read 7987 times)

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1179
  • Professional amateur ;-P
Re: Why can't I beat Python on generating 1 Billion Rows?
« Reply #15 on: March 03, 2024, 11:37:33 am »
Hey Martin_fr,

Are the settings on the image OK, for the valgrind build mode?

Cheers,
Gus
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1179
  • Professional amateur ;-P
Re: Why can't I beat Python on generating 1 Billion Rows?
« Reply #16 on: March 03, 2024, 11:43:48 am »
Hey Warfley,

But allow me to ask a dumb question: By using a buffered write stream with a 20MiB capacity, am I not already dealing with that?

Cheers,
Gus
Ah didn't the the WriteBufStream there. This target the problem, but 20 megs is way to much. Memory page size is 4k everything above that causes page faults when accessing the memory. On the other hand SSD Sector size is mostly around 64k, everything above that causes multiple writes internally anyway. So you should probably use either 4k or 64k as buffer size. But that shouldn't make that much of a difference.

This is gold !!
Thanks !!
Will have a go with both 4k and 64k to see if I get anything good.

A few other things I've noticed in the python code:
Code: Bash  [Select][+][-]
  1. random.choices(station_names_10k_max, k=batch_size)
This creates all the random numbers in one call to a C++ library. So if they use a faster rng algorithm, or have more optimized code, this can make a big difference.

Aside from that there is not much that could be optimized

Humm, yeah, that makes a ton of sense...

One thing I'm not limiting is the amount of weather stations on the list.
And the list is about 40K entries.
Maybe I should make that limitation in order to have Apples to Apples comparison. Or is this irrelevant since a TStringList being accessed by an index is quite fast?

Cheers,
Gus
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 10914
  • Debugger - SynEdit - and more
    • wiki
Re: Why can't I beat Python on generating 1 Billion Rows?
« Reply #17 on: March 03, 2024, 11:45:10 am »
Settings should be ok. Make sure packages have debug info too. (either in each packages options, or through "additions and overrides", or for most packages through the IDE build config)

Warfley

  • Hero Member
  • *****
  • Posts: 1870
Re: Why can't I beat Python on generating 1 Billion Rows?
« Reply #18 on: March 03, 2024, 11:50:21 am »
Well StringLists aren't the fastest thing anyway, because of the double indirection (you first look up the string pointer and then have to follow that to the string data), which possibly causes 2 page faults.

That said, the python code is also just using python lists, which has the same problem plus additionally dynamic typing checks. So while you could optimize that with using a static array of shortstrings, it should not be something where python has an advantage.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 10914
  • Debugger - SynEdit - and more
    • wiki
Re: Why can't I beat Python on generating 1 Billion Rows?
« Reply #19 on: March 03, 2024, 11:51:55 am »
Quote
Maybe I should make that limitation in order to have Apples to Apples comparison. Or is this irrelevant since a TStringList being accessed by an index is quite fast?

The index access is fast. Really fast. But make sure, you start with "Stringlist.Capacity := 40000;" or whatever you need.

Of course if you strings end up scattered over many memory pages, then they can cause extra page faults. But that is more about large amount of data, than a Stringlist.

If you want every last microsecond... Use pchars, saves the time of refcounting. But adds plenty of work.


Mind that an "IndexOf(name)" is slow.

Ok you can use a sorted list the it is O(log n) but then you pay the cost for sorting.

If you need to find a list by name quite often: use a hash map.
« Last Edit: March 03, 2024, 11:54:19 am by Martin_fr »

alpine

  • Hero Member
  • *****
  • Posts: 1373
Re: Why can't I beat Python on generating 1 Billion Rows?
« Reply #20 on: March 03, 2024, 11:58:08 am »
@Gustavo 'Gus' Carreno
Set the capacity of TStringList at the beginning.
IMO, your way of eliminating duplicates isn't an efficient one.
Keeping the list sorted will continuously move big chunks of memory to make room for add(). Maybe using of a hash map for checking duplicates and simply append to the end of the list? Sorting just once at the end (if needed)? 

In addition you can use the whole numbers Random() for temperature instead of extended one, and use it as a fixed(1) type. Not much of a gain here, though.
« Last Edit: March 03, 2024, 12:12:22 pm by alpine »
"I'm sorry Dave, I'm afraid I can't do that."
—HAL 9000

Thaddy

  • Hero Member
  • *****
  • Posts: 16653
  • Kallstadt seems a good place to evict Trump to.
Re: Why can't I beat Python on generating 1 Billion Rows?
« Reply #21 on: March 03, 2024, 12:17:33 pm »
Could you explain it better to this less knowledgeable person on this matter?
{$I-} prevents IO checks on every write/writeln and can speed up console / terminal output considerably.
You can still check for errors, but there is no need for expensive try/except.

You could also try FPC's LLVM backend.
« Last Edit: March 03, 2024, 12:24:28 pm by Thaddy »
But I am sure they don't want the Trumps back...

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12110
  • FPC developer.
Re: Why can't I beat Python on generating 1 Billion Rows?
« Reply #22 on: March 03, 2024, 12:51:53 pm »
or using settextbuf() on the output file descriptor ?

domasz

  • Hero Member
  • *****
  • Posts: 566
Re: Why can't I beat Python on generating 1 Billion Rows?
« Reply #23 on: March 03, 2024, 12:56:07 pm »
A few ideas- perhaps would be better to not operate on floats?
When reading 12.3  just skip the dot and read 123 as an integer.
FloatToStr/IntToStr instead of Format.
Maybe you update progressbar more often than the Python script?

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1179
  • Professional amateur ;-P
Re: Why can't I beat Python on generating 1 Billion Rows?
« Reply #24 on: March 03, 2024, 01:29:42 pm »
Hey Martin_fr,

Settings should be ok. Make sure packages have debug info too. (either in each packages options, or through "additions and overrides", or for most packages through the IDE build config)

I was able to get valgrind to work. But then I could not use Brendan Greg's FlameGraph to work with valgrind.

I was able to get perf to work with FlameGraph and I'm posting the image on this post, because I'm too stooopid to get any insight from it, so maybe someone can give me some help...

Nonetheless, between you and ChatGPT, I was able to produce a flame graph, so thank you!!!

Cheers,
Gus
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1179
  • Professional amateur ;-P
Re: Why can't I beat Python on generating 1 Billion Rows?
« Reply #25 on: March 03, 2024, 01:44:13 pm »
Hey Martin_fr,

Quote
Maybe I should make that limitation in order to have Apples to Apples comparison. Or is this irrelevant since a TStringList being accessed by an index is quite fast?

The index access is fast. Really fast. But make sure, you start with "Stringlist.Capacity := 40000;" or whatever you need.

This is good, very good, cuz I don't access the string list by the name of the weather station, I do it via a randomly generated index.

Of course if you strings end up scattered over many memory pages, then they can cause extra page faults. But that is more about large amount of data, than a Stringlist.

Also good to know !!

If you want every last microsecond... Use pchars, saves the time of refcounting. But adds plenty of work.


Mind that an "IndexOf(name)" is slow.

Ok you can use a sorted list the it is O(log n) but then you pay the cost for sorting.

If you need to find a list by name quite often: use a hash map.

I think I'm OK with setting a capacity of 50,000 and have .UseLocale:=False.
The building of the weather station list is about 2 to 3 seconds long, and when generating 1 Billion rows, that's pretty insignificant.

The bottleneck is somewhere else. Hopefully someone can give me more insight from the flame graph...

Cheers,
Gus
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

Warfley

  • Hero Member
  • *****
  • Posts: 1870
Re: Why can't I beat Python on generating 1 Billion Rows?
« Reply #26 on: March 03, 2024, 01:45:20 pm »
I was able to get perf to work with FlameGraph and I'm posting the image on this post, because I'm too stooopid to get any insight from it, so maybe someone can give me some help...

Simple your code is a call graph bottom to top. So you can see your code spends most of it's time doing a syscall, which is the write syscall. So most of the time is spent writing to the filesystem. Thats pretty much it.

BeniBela

  • Hero Member
  • *****
  • Posts: 922
    • homepage
Re: Why can't I beat Python on generating 1 Billion Rows?
« Reply #27 on: March 03, 2024, 01:48:40 pm »
>outputFileStream.WriteBuffer(line[1], Length(line));


You create an outputBufWriter, but do not use it

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1179
  • Professional amateur ;-P
Re: Why can't I beat Python on generating 1 Billion Rows?
« Reply #28 on: March 03, 2024, 01:48:56 pm »
Hey Alpine,

@Gustavo 'Gus' Carreno
Set the capacity of TStringList at the beginning.
IMO, your way of eliminating duplicates isn't an efficient one.
Keeping the list sorted will continuously move big chunks of memory to make room for add(). Maybe using of a hash map for checking duplicates and simply append to the end of the list? Sorting just once at the end (if needed)? 

Like I mentioned to @Martin_fr, the building of the list is rather negligible when compared with the real bottleneck: Spitting out 1 Billion rows.

My only worry is that I have a consistent random sequence for index and temperature, and that the order is preserved in the names between Linux and Windows.

In addition you can use the whole numbers Random() for temperature instead of extended one, and use it as a fixed(1) type. Not much of a gain here, though.

Yeah, I've been thinking about using Integers for a while... Still mulling that option!!

Thanks !!!

Cheers,
Gus
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1179
  • Professional amateur ;-P
Re: Why can't I beat Python on generating 1 Billion Rows?
« Reply #29 on: March 03, 2024, 01:52:33 pm »
Hey Benibela,

>outputFileStream.WriteBuffer(line[1], Length(line));


You create an outputBufWriter, but do not use it

ARGHH!!!! I'm so effing duuuuuuuumb !!!!!!!

This is such a stooooooopid mistake!!!

Thanks for mentioning this !!!!

I should go and flagelate myself for half an hour now!!!!  :-[

Ok, now new testing !!!

Thank you soooo much for this !!!!

Cheers,
Gus
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

 

TinyPortal © 2005-2018