Recent

Author Topic: How can one optimise inserting an item into an array, anywhere but the end?  (Read 3585 times)

jcmontherock

  • Full Member
  • ***
  • Posts: 234
Returning to the first question:
Insert for dynamic array now exists in 322:

Insert(Item, Array, Index);
at end:
Insert(Item, Array, Length(Array));
Windows 11 UTF8-64 - Lazarus 3.2-64 - FPC 3.2.2

MarkMLl

  • Hero Member
  • *****
  • Posts: 6676
And the timings look like... ?

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

Ten_Mile_Hike

  • New Member
  • *
  • Posts: 44
Is an "array of arrays" the same thing as a multidimensional array ... Asking for a friend %)

alpine

  • Hero Member
  • *****
  • Posts: 1038
Is an "array of arrays" the same thing as a multidimensional array ... Asking for a friend %)
Now I can remember my old professor saying : "In the end it all boils down to a vector, be it to the vector of the main memory".
"I'm sorry Dave, I'm afraid I can't do that."
—HAL 9000

MarkMLl

  • Hero Member
  • *****
  • Posts: 6676
Is an "array of arrays" the same thing as a multidimensional array ... Asking for a friend %)

Not necessarily.

With a small number of exceptions on historical systems, the content of an array will be contiguous. That applies to both the case of a single-dimensioned array, and a multidimensioned array.

You cannot necessarily rely on an array of arrays having contiguous content, particularly where for some reason an array is represented by a pointer or some form of descriptor with the actual content on the heap (e.g. FPC dynamic arrays).

The historical exceptions are some very old systems which divided even single-dimensioned arrays into page-size chunks, and systems such as OS/2 on the '286 which had convoluted rules for making large segmented arrays look as though they were contiguous even if they really weren't.

In addition, considering a language such as (any dialect of) Pascal a multidimensioned array has all elements the same type, while in the case of an array-of-arrays they obviously are not (the first dimension has elements which are arrays, the final dimension has elements which are e.g. numbers).

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

jollytall

  • Sr. Member
  • ****
  • Posts: 306
Is an "array of arrays" the same thing as a multidimensional array ... Asking for a friend %)

MarkMLI maybe found it too obvious, but one more difference: A normal multidimensional array has a rectangular shape (in 2D) and so on in larger dimensions, while (for dynamic arrays) an array of an array can have different size rows.

alpine

  • Hero Member
  • *****
  • Posts: 1038
Is an "array of arrays" the same thing as a multidimensional array ... Asking for a friend %)

MarkMLI maybe found it too obvious, but one more difference: A normal multidimensional array has a rectangular shape (in 2D) and so on in larger dimensions, while (for dynamic arrays) an array of an array can have different size rows.
Considering that, the dynamic arrays can give a memory size benefit, not a speed one. Splitting the array to smaller chunks is just marginally better, since the problem of shifting the tail remains, just the tail becomes smaller.

But the problem with the linearisation emerges - in which chunk to insert the current element given the 1D insertion index n? The chunk sizes must be extracted sequentially from n until n is smaller than the current one and that will be the 2-nd level index. Sounds optimal?   

Back to the original question:
Quote
How can one optimise inserting an item into an array, anywhere but the end?
Here IMHO the most straightforward answer:
Quote
Don't use an array.

MarkMLl

Or with other words: "No one can."

The OP must give more details for the application in order to receive more beneficial answer. Other is just a speculation.
Being speculative, I would propose in turn the linear hashing.
"I'm sorry Dave, I'm afraid I can't do that."
—HAL 9000

Thaddy

  • Hero Member
  • *****
  • Posts: 14205
  • Probably until I exterminate Putin.
Is an "array of arrays" the same thing as a multidimensional array ... Asking for a friend %)
Now I can remember my old professor saying : "In the end it all boils down to a vector, be it to the vector of the main memory".
Strange professor, because a multiple dimension array has a linear layout in memory, same as a one dimensional array, so unless you sort, it is O(N) and the sort is costly. Basics.

In the case at hand, one should assume O(N). It is impossible to improve on that.
« Last Edit: August 14, 2022, 06:10:58 pm by Thaddy »
Specialize a type, not a var.

alpine

  • Hero Member
  • *****
  • Posts: 1038
Is an "array of arrays" the same thing as a multidimensional array ... Asking for a friend %)
Now I can remember my old professor saying : "In the end it all boils down to a vector, be it to the vector of the main memory".
Strange professor, because a multiple dimension array has a linear layout in memory, same as a one dimensional array, so unless you sort, it is O(N) and the sort is costly. Basics.

In the case at hand, one should assume O(N). It is impossible to improve on that.
Why strange? Isn't it the same statement? Or maybe I didn't wrote it well in English   %)
"I'm sorry Dave, I'm afraid I can't do that."
—HAL 9000

MarkMLl

  • Hero Member
  • *****
  • Posts: 6676
In the case at hand, one should assume O(N). It is impossible to improve on that.

That only applies if the insertion point is randomly-distributed: if it's predominantly near the end the workload will be proportionately less.

And for that matter, if it's predominantly near the beginning the obvious hack will also make it predominantly less.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 1313
If you want both fast inserts and fast retrieval, you can do it like a database. You use chunks that are an integer amount times the max size, a linked list to add new ones and an index to find the one you like.

To insert data: see if there is enough room left in the last chunk, otherwise just allocate a new one and put it in there. Do this until everything is stored. Now add the entries to the index.

The index can be a simple array, or you could use chunks as well. That gives you an index of the index, where only the blocks are sorted. And that is what you want for fast access.

And, of course, you can make a binary tree of the index.



BrunoK

  • Sr. Member
  • ****
  • Posts: 452
  • Retired programmer
I have created a TCountedTree = class(TAVLTree)
The joined graphic shows that using a Tree for insertion, even if suffering from the overhead of having node data instances, is much faster than inserting to a TFPList.
I didn't put the line for the total timing of TFPList because it shoots out of the graphic.
For the same 400'000 items, it takes a total of 10.170 seconds to insert to a FPList to compare to 325.023 ms for inserting to a balanced tree.




 

TinyPortal © 2005-2018