I have to sort the 2 billion datapoints
that is called "streaming sort" or "merge sort"
https://en.wikipedia.org/wiki/Merge_sortconceptually it is a loop like this
1. copy 1 big file into 2 half-files, with some simple transformations.
2. copy 2 half files into 1 big file, with some simple transformations.
3. check if the file became sorted
4. if it did not - repeat the loop
It is not hard to combine those stages into actually one procedure. But tedious and requires attention to details.
however, the concept itself, split to small easy to understand non-intertwined functions, is like i outline above
usually disk I/O speed is the bottleneck. In the scheme above you do 3 full readings and 2 full writings on every loop iteration. By making intertwined fuse of those steps you would have one reading and one writing per iteration, making it about twice faster than simplest implementation. You can start with simplest one to see if it fits your need in general. Then if you would need you can make it twice faster by writing more complex, fused funciton,m instead of 3 simplistic ones. But maybe you would not.
frankly, merge sort is told in every school/course/textbook on computing. It is much simpler than QuickSort and is equally basic idea to all programming.
-------
Now, as long as posisble you would better do it using binary files with fixed-size elements.
Even DBF could be better than CSV (i am not sure DBF format would handle 2B rows it though).
There were special databases exactly for simple-structure but huge-amount data.
--------
there is a possible approach to store separately data and pointers to them, using something like Tokyo Cabinet (or competitors, or successors),
https://dbmx.net/tkrzw/#overviewIn such approach you would only sort "pointers" without copying data itself.
it might be good or bad idea. Given that you can not hold in RAM a lot of data
- you would eliminate most of streaming writes, you would only write "pointers", not dataframes. And in a best case you would be able to totally sort pointers in memory and then write then to disk only once
- at the same time, you would do equal amount of reads, but those would change from sequential reads to random reads (google: butterfly HDD test)
SSD-like disks have no penalty for random read but are worn out by writes (google: chia cryptocurrency), for those it looks like a good optimization
HDD-like discs are not worn out by writes, but random access is much slower and wear out their "head positioning" units, there it would be bad idea.
---
whatever approach you would choose, you may decide to look into those:
https://github.com/d-mozulyov/CachedBuffersread the sources to learn ideas on strategy of reading-writing big mounts opf data
same about links mentioned in the discussion
https://stackoverflow.com/questions/5639531/buffered-files-for-faster-disk-access