First of all, I do not know what linux and how you set it up. This does not matter do most of my answer, but it may matter do what you do.
On a normal OS, nothing is realtime, any thread (including your data collector) can be interrupted for any amount of time (usually happens if other apps run on your disk, and compete for cpu or ram (worst, if your thread gets swapped to disk).....
That also means you have to think on how to manage memory, as again you could run into swapping...
Do you use an intel compatible CPU?
They have "lock" asm instruction. In FPC InterlockedExchange/InterlockedIncrement/...Decrement/...Add
Those are but *one* asm instruction. You cant get faster than this, but...
Now before diving into it, you should read up on "memory barriers" (google)
If thread-A does "Foo := 1", then it is possible that thread-B does not see this (since both threads have CPU mem caches). So you cant just change variables...
This is where interlocked comes in. It ensures cpu caches are flushed. (and whatever other optimization your cpu may do).
So each interlocked, does have a small cost, because after it, the cpu must read data from slower ram, rather than cache (and may have lost other optimizations).
However, any critical section always includes this too, so the interlocked can (depending on usage) be faster.
Next it depends on how you organize your memory.
If your worker (reader) thread has to allocate new memory whenever needed, then the "allocmem" call to the OS will cost time, and that may again loose a read sample. FPC pre-allocs big chunks from the OS (because the OS call is expensive), and gives you slices of it as you need them (which is quick, and why you normally do not see time loss on getting memory). But if you use more and more memory, you will get the OS call... I have no idea how much time that costs (So it could be fast enough, but I do not know).
I also do not know, how the fpc mem manager implements thread safety, in case you make any allocation/deallocations.
Also interlocked does not wake the main thread. So you have to have your main thread run in a loop, always looking for data. Your main thread will therefore use more CPU time.
Normally your main thread sleeps, if it has nothing todo, and Syncronize/Queue/... will signal it to wake up.
Ok that was a lot of info upfront.
If you have only one thread writing (or a separate queue for each thread), then you can do something like (not tested)
var
entries: array [0..15] of TEntry; // 16 entries, must be power of 2
writePos: integer; // next write goes to entries[(writepos + 1) and 15];
ReadPos: integer; // the next read of the main thread will be entries[readpos and 15]; /
Only the ONE worker thread is allowed to write
Only the main thread will read (and must do so, without for a wake up call)
If (writepos + 1) and 15 = (readpos and 15) the queue is full, and the worker must alloc more mem, and keep collecting date. It will then later write a bigger chunk of data, all at once.
If (readpos and 15) = (writepos and 15) then the queue is empty
ReadPos is ONLY changed by the main thread
WritePos is ONLY changed by the worker thread
Both threads will increment their ...pos. So the value can be 1299 or anything. An "AND" will give the correct number (hence the array size must be a power of 2).
The counters will overflow when the reach $FFFFFFFF, but that still is ok.
worker thread
r := InterlockedAdd(ReadPos, 0); // threadsafe read, as "add 0" is a none op
//readpos can change, while we are between lines.... But that does not matter in this case (you can simulate all possibilities)
// we can read writepos directly, since the main thread never changes it
If (writepos + 1) and 15 = (readpos and 15) then
... keep going, queue full (keep the data, and join it with whatever is read next)
// write to queue
entries[writepos + 1] := data
InterLockedIncrement(writePos) // threadsave inc, so the main thread will be able to read it.
main thread
w := InterlockedAdd(WritePos, 0); // threadsafe read, as "add 0" is a none op
//writepos can change, while we are between lines.... But that does not matter in this case (you can simulate all possibilities)
// we can read readpos directly, since the worker thread never changes it
If (writepos) and 15 = (readpos and 15) then
... keep going, queue empty
data := entries[readpos and 15];
InterLockedIncrement(readPos)
You can google "interlocked queue" ...