It supports threads, hardware such as MMX/SSE, but not (automatic) parallelisation of arbitrary code.
MarkMLl
Humanly speaking, I can say this is beyond my current knowledge and experience, I won't ask about it again except if once I'll study threads.
The worst kind of soul is the great Slav soul. People who suffer from it are usually very deep thinkers. They may say things like this: "Sometimes I am so merry and sometimes I am so sad. Can you explain why?" (You cannot, do not try.) Or they may say: "I am so mysterious ... I sometimes wish I were somewhere else than where I am." (Do not say: "I wish you were.")
Oh, I'm thoroughly evil: it's the Slav in me :-)
It seems so beautiful piece of English literature.
Llwyd:"(ˈhluːɪd, English lɔid) (in Welsh legend) a magician who avenged his friend Gwawl upon Pryderi, the son of Pwyll, by casting various spells upon Pryderi and his estate."
Oh, I'm thoroughly evil: it's the Slav in me :-)Is there something allegorical in the "great Slav soul" or it somehow connected to your personal ancestry?QuoteThe worst kind of soul is the great Slav soul. People who suffer from it are usually very deep thinkers. They may say things like this: "Sometimes I am so merry and sometimes I am so sad. Can you explain why?" (You cannot, do not try.) Or they may say: "I am so mysterious ... I sometimes wish I were somewhere else than where I am." (Do not say: "I wish you were.")
http://f2.org/humour/howalien.html
You started it :-)
MarkMLl
Oh, and the Ll stands for Llwyd. Your problem :-)
Although /Morgan/ Llwyd was a renowned Christian preacher.
* Message Passing/b] - using an approach like MPI or MPICH (yes, works with FPC I believe)
> virtualisation and containerisation ....
Is, sort of, the opposite to parallel ?
> https://en.wikipedia.org/wiki/Vector_Pascal
While Intel do now have some vector support but nothing like that offered by, eg, Cray. I wonder if the speed up delivered would be less than the penalty for not using an optimized compiler like FPC ? Cray could do a complete step through a matrix with one op. But such machines are incredibly expensive, all purchased by US homeland defense and military. Apparently very efficient at listening in on trans Atlantic communications ....
I see absolutely no reason why FPC would not be just as useful in an MPI environment as C++. It would be far more reliable code (important on long running, compute intensive jobs) and, hitting the same MPI libraries, would be just as efficient through the interconnect.
I have seen, for example both python and java running MPI. But we would groan when we saw those users logging on ....
And in the second case (tasks), you make a copy of all the data the thread is allowed to access up front.Having multiple copies of some data is usually a very bad idea because it requires keeping all those copies in synch. Database theory figured that out long ago. A program is better off having a single instance of the data and synchronization mechanisms to ensure every thread sees current and accurate information.
In that case, they'll have to queue to access it. One by one. You just serialized your application.And in the second case (tasks), you make a copy of all the data the thread is allowed to access up front.Having multiple copies of some data is usually a very bad idea because it requires keeping all those copies in synch. Database theory figured that out long ago. A program is better off having a single instance of the data and synchronization mechanisms to ensure every thread sees current and accurate information.
You just serialized your application.No. protecting a piece of data with a synchronization object does not serialize an application. It only serializes access to that one piece of data and, that should be for an _extremely short_ time.
[a lot od sensible stuff]
But that, of course, is where the "Itanic" failed: the difficulty of an efficient toolchain.
Having multiple copies of some data is usually a very bad idea because it requires keeping all those copies in synch. Database theory figured that out long ago. A program is better off having a single instance of the data and synchronization mechanisms to ensure every thread sees current and accurate information.
Having multiple copies of some data is usually a very bad idea because it requires keeping all those copies in synch. Database theory figured that out long ago. A program is better off having a single instance of the data and synchronization mechanisms to ensure every thread sees current and accurate information.
Generally speaking that's something that operating systems worked out fairly early: when a block of code or data is copied nothing really happens except that the original area is marked read-only, and then when one of the processes/threads wants to change it an actual copy is made in physical memory and both revert to their original permissions. The same happens with each of the cache levels.
Generally speaking that's something that operating systems worked out fairly early:...I don't think his comment was directed at how an O/S manages code and data (copy on write and that kind of stuff.) Just run of the mill user threads and, he suggested making copies of the data for different threads to access. The O/S isn't going to poke its fingers in there and, having multiple copies of a piece of data is likely always a bad idea (if there is an exception, I cannot think of it at this time.)
Unified memory is the culprit. As MarkMLI and PascalDragon said, you're best off with local RAM (resources in general) for maximum parallelism. The best example is probably how the Cell CPU did it: move all needed data to local storage with DMA (including the software), run the task and move the result back to main memory. And that's also how most recent micro-controllers do it. All devices have their own micro-micro-controller and local storage, RAM is segmented and stuff moves around through DMA transfers. It might sound horrible for bandwidth and storage requirements, but is is the best way to maximize throughput.Generally speaking that's something that operating systems worked out fairly early:...I don't think his comment was directed at how an O/S manages code and data (copy on write and that kind of stuff.) Just run of the mill user threads and, he suggested making copies of the data for different threads to access. The O/S isn't going to poke its fingers in there and, having multiple copies of a piece of data is likely always a bad idea (if there is an exception, I cannot think of it at this time.)
Unified memory is the culprit. As MarkMLI and PascalDragon said, you're best off with local RAM (resources in general) for maximum parallelism. The best example is probably how the Cell CPU did it: move all needed data to local storage with DMA (including the software), run the task and move the result back to main memory. And that's also how most recent micro-controllers do it. All devices have their own micro-micro-controller and local storage, RAM is segmented and stuff moves around through DMA transfers. It might sound horrible for bandwidth and storage requirements, but is is the best way to maximize throughput.
(That's why it is strange that Gabe Newell was very vehement that programming like that was a waste of everyone's time ;) )
I don't recognise that paradigm in the context of what are generally understood to be microcontrollers. Do you have an example?
MarkMLl
I don't recognise that paradigm in the context of what are generally understood to be microcontrollers. Do you have an example?
MarkMLl
The big picture (https://electronics.stackexchange.com/questions/229125/is-the-stm32-dma-destination-address-restricted).
In detail (https://deepbluembedded.com/stm32-dma-tutorial-using-direct-memory-access-dma-in-stm32/).
... move all needed data to local storage with DMA (including the software), run the task and move the result back to main memory. And that's also how most recent micro-controllers do it. All devices have their own micro-micro-controller and local storage, RAM is segmented and stuff moves around through DMA transfers.
So, how would you incorporate an UART in your micro-controller application? Using interrupts and reading/writing the single byte in the UART register?
A CPU that consists of ~1000 simple ARM cores (with wide busses and lots of DMA) would be interesting. Bandwidth is going to be the main problem if you're going many-core. And if size is not a problem, a lot of micro-controllers. Optical connections and switches would rule.To a certain degree these problems can be targeted by NUMA architectures where you have different memory regions assigned, or near to to different CPUs and connected between a shared memory bridge. Access to the "near" memory is extremly fast, while access to the more "distant" memory regions that need to go through the shared bridge. So the only question is how to map threads to cores such that their local memory access is maximized while shared memory access it minimized.
I'm not convinced of the benefit of a VM in this context, and the industry as a whole seems to be tending towards the LLVM model.You do not necessarily need a VM, you can build a similar software threading models also natively (there are some C userspace scheduling libraries out there that are basically ready to use), but a vm makes this much easier and cleaner, which is why erlang went this route
You do not necessarily need a VM, you can build a similar software threading models also natively (there are some C userspace scheduling libraries out there that are basically ready to use), but a vm makes this much easier and cleaner, which is why erlang went this route
The main difference with ARM is, that it has a lot of conditional opcodes. The good: much less branching. The bad: you have to execute both if you want maximum speed. But because it's just a single opcode and the registers and execution units on CPU's are highly virtualized anyway, that's a net gain. As long as you don't want to run a long pipeline, that is.
For the big picture: a current x86 CPU is a VM that emulates one. And it is filled with VM's that emulate execution units. Because the x86 architecture and opcodes do really poorly if you want to multiplex and multitask. But that's what everyone uses, so that's what you have to run.
Pascal does not really have great support for parallel programming.It has in trunk/main since a month or two or so.
Pascal does not really have great support for parallel programming.It has in trunk/main since a month or two or so.
Things like this https://docwiki.embarcadero.com/RADStudio/Sydney/en/Anonymous_Methods_in_Delphi#Variable_Binding now work.
(Alas you will either have to use trunk or wait for the next major release 4.0)
Warfley is talking about a deeper language integration (simple example: language based parallel-for).Not necessarily, some of the things discussed above could be build in language (as library), while others would either require language extensions or make it much easier.