Hey All,
Context:
- There's an infinitely growing amount of blocks.
- Blocks are identified by an sequential integer(Int64).
- Each block has zero, one or many entries for operations.
- Each operation is identified by a HASH
Yes, it's a cryptocurrency model, but please, keep with me.
I want to create an index file that will map each operation to it's block. This way when responding to a
JSON-RPC call for an Operation, I don't need to parse all the Blocks in order to find the Operation that was requested.
In small quantities of Blocks, this is easily achievable by both the
AVL and/or the Hash List. It will fit in our modest to more pricey VPS servers of 4GB to how ever deep your pockets are.
The problem is that, like I mentioned above, the amount of Blocks will increase for ever. And the rate isn't that small, since a single day can have a yield of ~100.
This alone makes it a nightmare in terms of future scalability since there is no upper bound.
In my mind the only thing that makes sense is some kind of stream, in order not to load the whole index into memory, but to process it by chunks with a positionable system.
I would like to have the benefits of the
O(Log(n)) of an
AVL, but is this possible in a stream?
I'm guessing that this approach will hit a point where the IO needed to balance the
AVL will be too costly to make this useable. Another pitfall

If I can't find a solution for this I'll have to break down the index files into chunks of 10K blocks each(Or some other number that makes sense) and maybe just run those in memory.
Can anyone smack me in the face with some better approach to this CS problem?
Cheers,
Gus