Recent

Author Topic: Does anyone know of a library to provide huge arrays?  (Read 2234 times)

ad1mt

  • Sr. Member
  • ****
  • Posts: 480
    • Mark Taylor's Home Page
Does anyone know of a library to provide huge arrays?
« on: June 27, 2025, 07:54:58 pm »
Does anyone know of a library to provide huge arrays?
The array would be stored on disk, and the size of the array would only be limited by the available disk space.
And hopefully would be relatively easy to use, without having to rewrite all my existing code.
Thanks.
« Last Edit: June 27, 2025, 08:26:38 pm by ad1mt »

Nicole

  • Hero Member
  • *****
  • Posts: 1281
Re: Does anyone know of a library to provide huge arrays?
« Reply #1 on: June 27, 2025, 08:53:36 pm »
Not quite sure, what you mean.
I work with dynamic arrays. They are organized in blocks. Every time a certain amount of fields is exceeded, a new block is added. For this, I wrote myself a Unit, which manages those arrays. It is not only that the (new) block is added, the empty not used fields of a have-used block have to be cut, once the array is ready

Are you sure, you do not prefer to work with a database instead of arrays?

Perhaps you can describe your task and show some snippets of your declaration.

440bx

  • Hero Member
  • *****
  • Posts: 5912
Re: Does anyone know of a library to provide huge arrays?
« Reply #2 on: June 27, 2025, 09:26:51 pm »
Does anyone know of a library to provide huge arrays?
Even for a very large array, you probably don't need a library.

Presuming you're doing this under Windows.  The maximum size will be 16TB - 64K (it doesn't quite make it to 16TB), that's quite a bit less than high(qword) which, at least in theory, is the maximum amount of memory VirtualAllocEx will allow you to allocate (I seriously doubt it will allow something close to that.)

Using VirtualAllocEx, you could allocate (MEM_COMMIT) a very large array (might even allow greater than 1TB but, I've never tried it), the best part is that since Windows is demand paged, the array is _automatically_ sparse, IOW, if all the elements are within 4K, then there will at most be 2 pages of memory used even if the amount committed is 1 TB (or greater) and it is the last 2 pages that are used.

Succinctly, in 64 bit, the O/S already gives you huge arrays.  How huge ?... that part I don't know but, really big!.  I suggest you do a little trial and error to find out.

Also, as Nicole implied, if you gave a more detailed description of what you're trying to do then a more "precise" suggestion might be possible.

Lastly, that doesn't work using heaps because heap blocks are managed a different way.

HTH.
FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 11926
  • Debugger - SynEdit - and more
    • wiki
Re: Does anyone know of a library to provide huge arrays?
« Reply #3 on: June 27, 2025, 09:27:36 pm »
If you don't need a database, but really just a continuous (i.e. no gaps) and flat list/array of entities (integer, record, but not managed like ansistring or embedded dyn-array), then a simple seek able file would do?

"flat" = no sub-array. Only one level of index. Also, if your subarray is fixed of (small-ish) length, then that counts as an entity.

Cross platform, some sort of stream (or direct fileopen, fileread) which allows you do load "sections" of the file to memory.
Well, yes you need to manage that yourself...

If you are on windows: filemappings https://learn.microsoft.com/en-us/windows/win32/memory/file-mapping (but you must still manage which section is loaded)

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 11926
  • Debugger - SynEdit - and more
    • wiki
Re: Does anyone know of a library to provide huge arrays?
« Reply #4 on: June 27, 2025, 09:35:44 pm »
As for rewriting your code...

You can always hide everything behind a class, with a default property with indexed access
Code: Pascal  [Select][+][-]
  1. TMyArraySimulator = class  // not sure, maybe you can do  = object
  2. public
  3.   property Items[AnIndex: Int64_OR_OWord]: TMyEntityType read GetItem write PutItem; default;
  4. end;

And then everywhere were you have a variable of whatever you currently use as array, you replace it with that class type.

Of course you need to make sure initialization changes. And SetLength changes....



One thing though... And that will always be if disk swapping (even for OS swapfile) comes into play: random access will be slow.
You want your code to work on "loaded" data in a single go for as much as possible. And only then load another section.

If you access items that are far apart at complete random => Expect a drastic slowdown.
« Last Edit: June 27, 2025, 09:38:19 pm by Martin_fr »

Thaddy

  • Hero Member
  • *****
  • Posts: 18529
  • Here stood a man who saw the Elbe and jumped it.
Re: Does anyone know of a library to provide huge arrays?
« Reply #5 on: June 28, 2025, 11:39:25 am »
Is this the common misconception that an array on disk would be more efficient than a proper database like SQLite?
Due to censorship, I changed this to "Nelly the Elephant". Keeps the message clear.

ad1mt

  • Sr. Member
  • ****
  • Posts: 480
    • Mark Taylor's Home Page
Re: Does anyone know of a library to provide huge arrays?
« Reply #6 on: June 29, 2025, 11:05:25 am »
If you are on windows: filemappings https://learn.microsoft.com/en-us/windows/win32/memory/file-mapping (but you must still manage which section is loaded)
This looks like what I need.
Problems are: (1) Windows only (2) I would have to write lots of code to manage it.

ad1mt

  • Sr. Member
  • ****
  • Posts: 480
    • Mark Taylor's Home Page
Re: Does anyone know of a library to provide huge arrays?
« Reply #7 on: June 29, 2025, 11:08:41 am »
Are you sure, you do not prefer to work with a database instead of arrays?
A database is a bit like... when I ask for a rowing boat to cross a small stream, suggesting a Titanic  :D

ad1mt

  • Sr. Member
  • ****
  • Posts: 480
    • Mark Taylor's Home Page
Re: Does anyone know of a library to provide huge arrays?
« Reply #8 on: June 29, 2025, 11:10:27 am »
I can't be the first person to ever need a large array?

440bx

  • Hero Member
  • *****
  • Posts: 5912
Re: Does anyone know of a library to provide huge arrays?
« Reply #9 on: June 29, 2025, 11:23:29 am »
I can't be the first person to ever need a large array?
define "large array".

With today's 64bit operating systems you can have multi-gigabyte arrays and, possibly even terabyte sized arrays.  Isn't that "huge" enough ?

Honestly, if somehow you end up needing an array larger than a few gigabytes, I'd say, that app's design is quite likely in need to be revisited.

FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

Thaddy

  • Hero Member
  • *****
  • Posts: 18529
  • Here stood a man who saw the Elbe and jumped it.
Re: Does anyone know of a library to provide huge arrays?
« Reply #10 on: June 29, 2025, 12:41:43 pm »
Filemappings are very easy to make cross-platform. You can fpmmap and the Windows API have a seemless published interface.
The point is, a multi-GB array, even if completely flat, is so big it will be very, very slow. Hence we use database engines for that.

Example from this week: for my password shenanigans I needed a 317 GB scientific PW collection.
Even on my very fast, modern laptop, without a database engine, this is completely unusable in size.
It would already be unusable in 1000st of that size: MB.
« Last Edit: June 29, 2025, 12:47:56 pm by Thaddy »
Due to censorship, I changed this to "Nelly the Elephant". Keeps the message clear.

BeniBela

  • Hero Member
  • *****
  • Posts: 948
    • homepage
Re: Does anyone know of a library to provide huge arrays?
« Reply #11 on: June 29, 2025, 09:48:42 pm »
With today's 64bit operating systems you can have multi-gigabyte arrays and, possibly even terabyte sized arrays.  Isn't that "huge" enough ?

only if you want to support 32 bit computers, too, you can load a maximum of like two gigabytes

that is often too little

440bx

  • Hero Member
  • *****
  • Posts: 5912
Re: Does anyone know of a library to provide huge arrays?
« Reply #12 on: June 29, 2025, 10:36:15 pm »
only if you want to support 32 bit computers, too, you can load a maximum of like two gigabytes

that is often too little
In 32bit, the maximum size of a block is about 2GB (marking the exe as large address space aware), I'd say that is often _sufficient_. 

Using MapViewOfFile allows using that 2GB window into a file whose maximum size is close to 16TB (under Windows.) Granted that managing more than 2GB would require a bit of additional code to map file sections into the view.

It is unusual to need an array larger than 2GB and, the cases where such a thing is needed should be closely inspected to ensure it is not a result of poor design.  It is even questionable that something that genuinely requires multi-terabyte arrays (or anything significantly larger than 2GB) should be implemented on PC class hardware.

Lastly, 32bit apps are on their way to extinction.  The future is undoubtedly 64bit.
« Last Edit: June 29, 2025, 10:38:54 pm by 440bx »
FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

Nicole

  • Hero Member
  • *****
  • Posts: 1281
Re: Does anyone know of a library to provide huge arrays?
« Reply #13 on: June 30, 2025, 06:46:57 pm »
 :D
Titanic has somehow a bad reputation. But row boat? You said, you want to fill your hard disc by it?
This row boat would sink.

Back to a solution.
What speaks against dynamic arrays?
If you want to try this, I can crawl into my code and find a lot of admin stuff for it.
I cannot imagine what cannot be done with them.
My whole database would fit into my dynamic array system. And my DB contains millions of data back decades.

 

TinyPortal © 2005-2018