Recent

Author Topic: Stream processing  (Read 5183 times)

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 1313
Stream processing
« on: August 16, 2017, 01:55:09 pm »
The old programming model is serial and uses functions. That scales badly and is not interactive. The next step was event driven programming. It still uses functions, but they're triggered by external events. The step after that was OO, where you throw it all together in an autonomous unit.

Interestingly enough, it stopped there. The new models that are distributed (websites, mostly) take great pains to keep the same model intact. You still call functions, or they are called through external events. Even if the function you call takes many seconds to complete, because it is first send through a socket to a remote browser, where it is executed by a JavaScript function, and the result of that is returned.

And, if you are going to use threads, you don't want each one to have lots of event procedures. If you cannot handle any request immediately, spawn another thread to do it.

That also requires that all those threads are free-running (you don't keep the pointer) and that they either do their job and terminate, or that you send messages through sockets to them.

To make it easier, I give them a pointer to the message queue of the parent, so they can return the result directly.


But actually, it would be better if I would just check a list to see if there is already a thread active that can do what I want, and put the task in its queue. Stream processing, or Microservices.

So, when such a service is created, it sends a message to a service manager that it is available. And when you want to use its functionality, you either ask the service manager if that service is already available, or you send it the message in which you specify the service to use.

That mostly decides who is responsible for spawning those services.

And, as an added bonus, they don't all have to run on the same computer. Although that requires RSA encryption.


Anyway, I'm certainly not the first person to think this up, so I was wondering if there is already a framework that does this.

I do know, that very few programmers think of this model if you talk about multi-threading or distributed computing.

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: Stream processing
« Reply #1 on: August 16, 2017, 03:48:27 pm »
The old programming model is serial and uses functions. That scales badly and is not interactive. The next step was event driven programming. It still uses functions, but they're triggered by external events. The step after that was OO, where you throw it all together in an autonomous unit.

Interestingly enough, it stopped there. The new models that are distributed (websites, mostly) take great pains to keep the same model intact. You still call functions, or they are called through external events. Even if the function you call takes many seconds to complete, because it is first send through a socket to a remote browser, where it is executed by a JavaScript function, and the result of that is returned.

And, if you are going to use threads, you don't want each one to have lots of event procedures. If you cannot handle any request immediately, spawn another thread to do it.

That also requires that all those threads are free-running (you don't keep the pointer) and that they either do their job and terminate, or that you send messages through sockets to them.

To make it easier, I give them a pointer to the message queue of the parent, so they can return the result directly.


But actually, it would be better if I would just check a list to see if there is already a thread active that can do what I want, and put the task in its queue. Stream processing, or Microservices.

So, when such a service is created, it sends a message to a service manager that it is available. And when you want to use its functionality, you either ask the service manager if that service is already available, or you send it the message in which you specify the service to use.

That mostly decides who is responsible for spawning those services.

And, as an added bonus, they don't all have to run on the same computer. Although that requires RSA encryption.


Anyway, I'm certainly not the first person to think this up, so I was wondering if there is already a framework that does this.

I do know, that very few programmers think of this model if you talk about multi-threading or distributed computing.
No you are simply looking it from the wrong perspective. As you said, you post an event to a manager and instead of the manager tracking which threads are active, busy or ready to accept new events it simple adds it to a queue and allows the threads to retrieve it when they are finished their current job. This simplifies the framework a lot and at the same time avoids the bottleneck of a single pusher and all that locking and releasing.
This is a well known framework it is used exclusively on the web where the web server is the manager and the browsers are the threads that ask and process the events (html).
For a more application centric solution you could read up on message queues like zeroMQ, Apache ActiveMQ or even microsoft's MSMQ. And if you are dead set to have the manager or producer inform every interest party of the event then you should study the windows messaging system where multiple threads/processes/drivers post events to all the applications in series until one of them responds that it handled the event and depending on the event, it might stop posting or continue until all application got the event, or even read up on subject / Observer pattern where each producer keeps a list of interested observers to be notify of changes.
As for the RSA encryption, its an implementation detail and should be optional aka the framework should work with and with out encryption.
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 1313
Re: Stream processing
« Reply #2 on: August 16, 2017, 04:48:42 pm »
There are basically two problems with threads:

1. They are only active while the main application is running.
2. It is not a good idea to access methods and properties of another thread.

The first one can be fixed by starting a new process for each thread, but that gives a lot of overhead (in most cases the OS loads a new executable from disk). That's even a requirement if you want to run them on multiple computers.

Also, if you want to make it scalable, some process has to monitor workload and spawn more threads when they get swamped. And alternatively, disable the ones that aren't used. Or they can do that themselves.


You can communicate with threads through message queues, but who is allocating those queues? That should be the main process, otherwise it still has to access properties of the other threads. And if it goes down (for example, because the thread doesn't exist anymore, or an exception occurred in one of the threads), all of the threads go down with it.

And that prevents two-way communication. No asking: "Are you still alive?", and if you require it to post a message periodically, it has to start another thread before doing something that takes a long time.

And of course, you cannot access the properties of threads that don't run in the same process.

So, you need a manager anyway.


An exception are one-shot worker threads, that terminate when they're done. That's what I prefer to do. A much less nice alternative is a thread pool, where you run a self-contained function on a thread when one is available. Most web servers and such use one of these.


If you use sockets for accessing the services, you also don't need to write all of them in the same language. Or run them all on the same kind of CPU and OS.

Accessing a service on another computer on a LAN is normally much faster than starting a new process on the same computer.


Windows messages have two problems as well: they're really short (two dwords, and sending a pointer to some random data is not something I would want) and they require that the manager (Windows) knows exactly what the state of each thread is. Your own application can't do that.


Anyway, all that's why things like servlets (shudder) exist, IMO. And why people write "Micro"-services in C# ASP.NET MVC, that require a dedicated database and run on an IIS server...

Blestan

  • Sr. Member
  • ****
  • Posts: 461
Re: Stream processing
« Reply #3 on: August 16, 2017, 05:32:29 pm »
basicly im creating this type of framework... the idea is to have only a "reasonable" count of sleeping threads an one main thread that listen to a socket ... the main thread just peek a idle one and dispach the request to it ... if no sleeping threads are available a new is created to process ... the the thread sleeps for another lets say 1000 ms and if no new request is coming then the non persitent thread is destroyed... if max count of threads is reached the request is queued by the main thread then goes to listening on socket  for 300ms ... on wake up the main thread checks if new connection is comming of just timeout dispatch new request and tries to dispach a queued one if any
thats all. and running pretty fast

ps the framework is called utramachine you can find it.on github... its in early stage but usable
Speak postscript or die!
Translate to pdf and live!

Thaddy

  • Hero Member
  • *****
  • Posts: 14197
  • Probably until I exterminate Putin.
Re: Stream processing
« Reply #4 on: August 16, 2017, 05:46:09 pm »
The old programming model is serial and uses functions. That scales badly and is not interactive.
That is soooo wrong! And that is the first sentence. Don't mix up serial and sequential. Having read the whole thread I have a headache: How can I politely ask to take computer science classes first.
I leave out any grumpy's or angry's. I am dead serious. Almost all of your reasoning is flawed. Especially the part about threads and processes. You lack knowledge.

I will not even begin to show you what is wrong and why.... Takes too much time and it is already in the public realm.
« Last Edit: August 16, 2017, 05:47:56 pm by Thaddy »
Specialize a type, not a var.

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: Stream processing
« Reply #5 on: August 16, 2017, 06:13:16 pm »
There are basically two problems with threads:

1. They are only active while the main application is running.

True then again if the application is not running there is no event interest or generation .

2. It is not a good idea to access methods and properties of another thread.
False, code is usually safe to access from multiple threads, no problem, data on the other hand not so much and when I say data I do not mean the methods local variables those are safe excuding pointers classes and other reference type variables, so as long as the method accesses the thread's queue in a thread safe way and spend as little time as possible in there then you have no problems.

The first one can be fixed by starting a new process for each thread, but that gives a lot of overhead (in most cases the OS loads a new executable from disk). That's even a requirement if you want to run them on multiple computers.
actually no a single application with multiple threads per machine is my preferred way of processing. This makes things easier to handle?
Also, if you want to make it scalable, some process has to monitor workload and spawn more threads when they get swamped. And alternatively, disable the ones that aren't used. Or they can do that themselves.
the only thing that my application monitors are three things 1) the maximum number of running threads 2) the minimum number of pooled threads and 3)when a thread finished processing to instruct it to sleep or free. 

You can communicate with threads through message queues, but who is allocating those queues? That should be the main process, otherwise it still has to access properties of the other threads.
That is impossible to avoid, in some way on an other there has to be some sort of information exchange. It matters very little which thread initiates it. so Each thread allocates its own queue the main process will only allocate and manage its own queue even if that queue is simple a thread queue.
And if it goes down (for example, because the thread doesn't exist anymore, or an exception occurred in one of the threads), all of the threads go down with it.
that is a no no. There should be no uncaptured exception in the thread all exception must be captured and handled inside the thread it self. If for some reason your application goes down which is not recommended even if there is unrecoverable exception happened that requires an application restart the application should restart it self.

And that prevents two-way communication.
That is false. Nothing prevents two way communication but we tend to minimize the two way communication to absolutely necessary to keep things simple and manageable, you can always have the thread inform the main application that it finished processing or frees it self that should be enough.

No asking: "Are you still alive?",
Why not?
and if you require it to post a message periodically, it has to start another thread before doing something that takes a long time.
that is why the main application does nothing only manages the threads that "do something".
And of course, you cannot access the properties of threads that don't run in the same process.
that is why I said message queue.
So, you need a manager anyway.

erm yes you do for 3 reasons. 1) thread pooling to speed things up 2)thread constraint to avoid overwhelming the CPU and spend more time in switching instead of processing. 3) inform the user of the current process usually through logging not GUI updates to keep everything on constant timing.

An exception are one-shot worker threads, that terminate when they're done. That's what I prefer to do. A much less nice alternative is a thread pool, where you run a self-contained function on a thread when one is available. Most web servers and such use one of these.
well not very friendly solution but overall acceptable in my book. I use it too but I always inform the main application when a thread is destroyed (ondestroy event usually runs in the main thread context).

If you use sockets for accessing the services, you also don't need to write all of them in the same language. Or run them all on the same kind of CPU and OS.
again read up on message queues they are external servers that mange a many producers to many consumers relationship zeroMQ has is headers translated to delphi which makes it an easy test platform for learning.
Accessing a service on another computer on a LAN is normally much faster than starting a new process on the same computer.
unless you are talking about linux which did not had threads up until recently and forking was the only way for multi threading, I agree no multi process on windows. On Linux on the other hand it was faster the last time I checked (coupl of years back that is). I really do not see the problem of a single application with multiple threads.

Windows messages have two problems as well: they're really short (two dwords, and sending a pointer to some random data is not something I would want) and they require that the manager (Windows) knows exactly what the state of veach thread is. Your own application can't do that.
sorry you lost me, as far as I know windows does not track the state of a thread that is why it is not recommended to use calls like suspend resume etc. Then again I might have missed something.
Anyway, all that's why things like servlets (shudder) exist, IMO. And why people write "Micro"-services in C# ASP.NET MVC, that require a dedicated database and run on an IIS server...
erm sorry you lost me here too. IIS/apache, cgi, micro services etc are all used for reason of comfort, instead of learning something like CORBA, DCOM etc. that already support something along those lines it is far easier to bash them as inadequate.

It seems that I do not understand your reasoning behind such a framework especially with all the frameworks that do exactly that. The more I read up the more I'm convinsed that a message queue server is your solution and yes please use threads not processes.
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

Thaddy

  • Hero Member
  • *****
  • Posts: 14197
  • Probably until I exterminate Putin.
Re: Stream processing
« Reply #6 on: August 16, 2017, 08:25:23 pm »
@Taazz
I broadly agree. Good effort (apart from some minor details too technical to mention in this context).
There's even more wrong but you have the basics covered pretty well.
Once he understands your comments it is likely he can iron out the rest of his deficiencies himself.
Specialize a type, not a var.

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 1313
Re: Stream processing
« Reply #7 on: August 16, 2017, 10:26:48 pm »
basicly im creating this type of framework... the idea is to have only a "reasonable" count of sleeping threads an one main thread that listen to a socket ... the main thread just peek a idle one and dispach the request to it ... if no sleeping threads are available a new is created to process ... the the thread sleeps for another lets say 1000 ms and if no new request is coming then the non persitent thread is destroyed... if max count of threads is reached the request is queued by the main thread then goes to listening on socket  for 300ms ... on wake up the main thread checks if new connection is comming of just timeout dispatch new request and tries to dispach a queued one if any
thats all. and running pretty fast

ps the framework is called utramachine you can find it.on github... its in early stage but usable
Ok, I'll take a look tomorrow.

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 1313
Re: Stream processing
« Reply #8 on: August 16, 2017, 10:29:46 pm »
The old programming model is serial and uses functions. That scales badly and is not interactive.
That is soooo wrong! And that is the first sentence. Don't mix up serial and sequential. Having read the whole thread I have a headache: How can I politely ask to take computer science classes first.
I leave out any grumpy's or angry's. I am dead serious. Almost all of your reasoning is flawed. Especially the part about threads and processes. You lack knowledge.

I will not even begin to show you what is wrong and why.... Takes too much time and it is already in the public realm.
If I wanted to say sequential, I would have done so.

Serial, as in serializing. Which is what most programmers do when they write "parallel" programs, as they religiously lock and unlock every action.

Thaddy

  • Hero Member
  • *****
  • Posts: 14197
  • Probably until I exterminate Putin.
Re: Stream processing
« Reply #9 on: August 16, 2017, 10:44:08 pm »
You have two options:
1. Get a life
2. Get an education

You are one of those that fell into the following trap:
Parallel programming. (How? e.g. two computers and left hand types on one right hand types on second? Unlikely. Real computer scientists don't use that terminology anymore.)
Programming Parallel. (Your in a team, ok)
Programming for Parallel execution. (That is what you actually mean.. in your case probably., ahum, maybe.. >:D >:()

A single programmer can program sequential (not serial, unless you don't make mistakes, which you probably won't <silly>) to achieve parallel execution.

Free lesson.

Give it to your professors... They will agree...
« Last Edit: August 16, 2017, 11:01:06 pm by Thaddy »
Specialize a type, not a var.

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 1313
Re: Stream processing
« Reply #10 on: August 16, 2017, 11:13:59 pm »
There are basically two problems with threads:

1. They are only active while the main application is running.

True then again if the application is not running there is no event interest or generation .
For monolithic applications you are right, but I'm talking about distributed ones. It doesn't consist of just the one process.

Quote
2. It is not a good idea to access methods and properties of another thread.
False, code is usually safe to access from multiple threads, no problem, data on the other hand not so much and when I say data I do not mean the methods local variables those are safe excuding pointers classes and other reference type variables, so as long as the method accesses the thread's queue in a thread safe way and spend as little time as possible in there then you have no problems.
If you call a method, does it run in the context of the calling thread, or in the one of the instance? What does "Self" refer to?

Quote
The first one can be fixed by starting a new process for each thread, but that gives a lot of overhead (in most cases the OS loads a new executable from disk). That's even a requirement if you want to run them on multiple computers.
actually no a single application with multiple threads per machine is my preferred way of processing. This makes things easier to handle?
I think that depends on your definition of "simpler to handle". Less code and complexity, certainly.

Quote
Also, if you want to make it scalable, some process has to monitor workload and spawn more threads when they get swamped. And alternatively, disable the ones that aren't used. Or they can do that themselves.
the only thing that my application monitors are three things 1) the maximum number of running threads 2) the minimum number of pooled threads and 3)when a thread finished processing to instruct it to sleep or free. 
Does it differentiate between blocked, idle and processing threads? Because those make a large difference. And you imply the use of a thread manager.

Quote
You can communicate with threads through message queues, but who is allocating those queues? That should be the main process, otherwise it still has to access properties of the other threads.
That is impossible to avoid, in some way on an other there has to be some sort of information exchange. It matters very little which thread initiates it. so Each thread allocates its own queue the main process will only allocate and manage its own queue even if that queue is simple a thread queue.
Well, you can avoid it by not using queues but sockets to communicate.

Quote
And if it goes down (for example, because the thread doesn't exist anymore, or an exception occurred in one of the threads), all of the threads go down with it.
that is a no no. There should be no uncaptured exception in the thread all exception must be captured and handled inside the thread it self. If for some reason your application goes down which is not recommended even if there is unrecoverable exception happened that requires an application restart the application should restart it self.
Yes, I agree. That's my main problem with servlets, or threads crashing the main process, and so the whole, distributed application. Which can be huge, and doing thousands of things at the same time.

If it is only the one thread/process, there is no recovery possible.

Quote

And that prevents two-way communication.
That is false. Nothing prevents two way communication but we tend to minimize the two way communication to absolutely necessary to keep things simple and manageable, you can always have the thread inform the main application that it finished processing or frees it self that should be enough.
Quote
Every means of communication that is not enclosed by mutexes or through sockets increases the risk of crashing the application when shit happens.

No asking: "Are you still alive?",
Why not?
Because in this example, the thread doesn't have a socket or their own message queue.

Quote
So, you need a manager anyway.

erm yes you do for 3 reasons. 1) thread pooling to speed things up 2)thread constraint to avoid overwhelming the CPU and spend more time in switching instead of processing. 3) inform the user of the current process usually through logging not GUI updates to keep everything on constant timing.
On a regular desktop computer, there are thousands of threads active at any one time. Of which most (more than 90%) are blocked and waiting on an event (I/O, mostly), and a few hundred that are executing code ("running").

But, most of them are waiting the majority of their time as well. On slow memory (hundreds of cycles), on a mutex lock or cache invalidation, or even an extremely slow (millions of cycles) disk access.

It is really hard to make a thread run at full speed for an extended time.

So, you actually want to count the total CPU time those running threads spend. And only the OS knows that. Your application doesn't.

Quote
An exception are one-shot worker threads, that terminate when they're done. That's what I prefer to do. A much less nice alternative is a thread pool, where you run a self-contained function on a thread when one is available. Most web servers and such use one of these.
well not very friendly solution but overall acceptable in my book. I use it too but I always inform the main application when a thread is destroyed (ondestroy event usually runs in the main thread context).

If you use sockets for accessing the services, you also don't need to write all of them in the same language. Or run them all on the same kind of CPU and OS.
Quote
again read up on message queues they are external servers that mange a many producers to many consumers relationship zeroMQ has is headers translated to delphi which makes it an easy test platform for learning.
I don't agree: message queues are not servers. There is no memory barrier or exception prevention when accessing them. They use owned pointers.

Quote
Accessing a service on another computer on a LAN is normally much faster than starting a new process on the same computer.
unless you are talking about linux which did not had threads up until recently and forking was the only way for multi threading, I agree no multi process on windows. On Linux on the other hand it was faster the last time I checked (coupl of years back that is). I really do not see the problem of a single application with multiple threads.
Yes, on Linux it is probably faster as long as the binary containing the code is already loaded into main memory.

Quote
Windows messages have two problems as well: they're really short (two dwords, and sending a pointer to some random data is not something I would want) and they require that the manager (Windows) knows exactly what the state of veach thread is. Your own application can't do that.
sorry you lost me, as far as I know windows does not track the state of a thread that is why it is not recommended to use calls like suspend resume etc. Then again I might have missed something.
You can send a message to either a window or a thread, both of which have to be active, have a valid handle and message queue. And they should be on the "active" list of the scheduler (ie. scheduled to get CPU time). Otherwise, you won't get the message.

Quote
Anyway, all that's why things like servlets (shudder) exist, IMO. And why people write "Micro"-services in C# ASP.NET MVC, that require a dedicated database and run on an IIS server...
erm sorry you lost me here too. IIS/apache, cgi, micro services etc are all used for reason of comfort, instead of learning something like CORBA, DCOM etc. that already support something along those lines it is far easier to bash them as inadequate.

It seems that I do not understand your reasoning behind such a framework especially with all the frameworks that do exactly that. The more I read up the more I'm convinsed that a message queue server is your solution and yes please use threads not processes.
Well, they're not the same  :)

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 1313
Re: Stream processing
« Reply #11 on: August 16, 2017, 11:28:20 pm »
You have two options:
1. Get a life
2. Get an education

You are one of those that fell into the following trap:
Parallel programming. (How? e.g. two computers and left hand types on one right hand types on second? Unlikely. Real computer scientists don't use that terminology anymore.)
Programming Parallel. (Your in a team, ok)
Programming for Parallel execution. (That is what you actually mean.. in your case probably., ahum, maybe.. >:D >:()

A single programmer can program sequential (not serial, unless you don't make mistakes, which you probably won't <silly>) to achieve parallel execution.

Free lesson.

Give it to your professors... They will agree...

So, I get it: I'm a moron that doesn't understand anything.

What I don't get is: exactly what is it I'm not understanding? Where do I go wrong? You don't tell me. And I want to learn. Or at least: discuss it.

I have been programming at least as long as you. I really know how it all works. Yes, the electronics as well.

Then again, I have no formal education whatsoever (well, MAVO, but that doesn't count). I'm completely autodidact. If I encounter something I don't know or understand, I'll research it.

My main problem is, that my co-workers rarely have the slightest idea what I'm talking about. So I have to take everything really slow. At a snails pace. And even slower than that. REALLY slow.

Then again, they mostly don't care and don't want to know. They do it to earn money, no more, no less. "Shut up, Frank! Don't bother me with it!"


So, yes, it is very possible that I make large blunders, because just about the only useful feedback I ever get is from this forum.

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: Stream processing
« Reply #12 on: August 17, 2017, 08:25:00 am »
For monolithic applications you are right, but I'm talking about distributed ones. It doesn't consist of just the one process.
it doesn't? so no one start the distributed client on the external computers? how many clients are running?
If you call a method, does it run in the context of the calling thread, or in the one of the instance? What does "Self" refer to?
As far as I understand, a method is a procedure with a hidden parameter that parameter is "self". The value of the "self" parameter is the thread of the instance (if I understand you correctly) and the code is always executed in the context of the calling thread.
I think that depends on your definition of "simpler to handle". Less code and complexity, certainly.
I'm talking about the code here yes.
Does it differentiate between blocked, idle and processing threads? Because those make a large difference. And you imply the use of a thread manager.
it tracks idle threads only, it assumes that if a thread is not idle is busy. d I never had any blocked threads as far as I know that is.
Well, you can avoid it by not using queues but sockets to communicate.
no you did not avoided it, you simple changed the guarding mechanism, if the thread is to busy to check for incoming socket communication it will not receive any.
Yes, I agree. That's my main problem with servlets, or threads crashing the main process, and so the whole, distributed application. Which can be huge, and doing thousands of things at the same time.

If it is only the one thread/process, there is no recovery possible.
Well to guard against all exception is as simple as adding a try except block that encompasses the complete code of your execute method and does not raise any exception for what ever reason. Is that too much?
Every means of communication that is not enclosed by mutexes or through sockets increases the risk of crashing the application when shit happens.
Yes and no. There are atomic operations that do not need guards and there are operation specifically designed to be atomic, like the interlockedXXXXXX procedures. Those do work both under guard and with out a guard with out probles,

On a regular desktop computer, there are thousands of threads active at any one time. Of which most (more than 90%) are blocked and waiting on an event (I/O, mostly), and a few hundred that are executing code ("running").

But, most of them are waiting the majority of their time as well. On slow memory (hundreds of cycles), on a mutex lock or cache invalidation, or even an extremely slow (millions of cycles) disk access.

It is really hard to make a thread run at full speed for an extended time.

So, you actually want to count the total CPU time those running threads spend. And only the OS knows that. Your application doesn't.

You are overthinking it. even the cpu has to wait for the bus to become available its out of your hands its out of the OS hands too. take a step back and see the forest.
If you use sockets for accessing the services, you also don't need to write all of them in the same language. Or run them all on the same kind of CPU and OS.
true, then again you add a level of indirection and an order of complexity and it has nothing to do with the processing it self.

I don't agree: message queues are not servers. There is no memory barrier or exception prevention when accessing them. They use owned pointers.
It seems that "message queues" hits some kind of wall, try message brokers instead.
Yes, on Linux it is probably faster as long as the binary containing the code is already loaded into main memory.
same as threads on windows.

You can send a message to either a window or a thread, both of which have to be active, have a valid handle and message queue. And they should be on the "active" list of the scheduler (ie. scheduled to get CPU time). Otherwise, you won't get the message.
isn't that a global requirement? your client has to be active, non blocking and has open the socket/door combination to accept communication as well why is this a problem? they are the minimum requirements in everything why it is a problem in queues suddenly?

Well, they're not the same  :)
I guess I do not see the difference, sorry.
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 1313
Re: Stream processing
« Reply #13 on: August 19, 2017, 12:05:24 am »
For monolithic applications you are right, but I'm talking about distributed ones. It doesn't consist of just the one process.
it doesn't? so no one start the distributed client on the external computers? how many clients are running?
Yes, you do need a server on each computer.

Quote
If you call a method, does it run in the context of the calling thread, or in the one of the instance? What does "Self" refer to?
As far as I understand, a method is a procedure with a hidden parameter that parameter is "self". The value of the "self" parameter is the thread of the instance (if I understand you correctly) and the code is always executed in the context of the calling thread.
Yes. And that's why you shouldn't access then without locking them up front. Because everyone can do that, all at the same time.

Quote
I think that depends on your definition of "simpler to handle". Less code and complexity, certainly.
I'm talking about the code here yes.
Does it differentiate between blocked, idle and processing threads? Because those make a large difference. And you imply the use of a thread manager.
it tracks idle threads only, it assumes that if a thread is not idle is busy. d I never had any blocked threads as far as I know that is.
Blocked threads are the best ones: they're waiting on events or other I/O. They take little CPU time, code or maintenance.

Then again, it is really hard to make a functional thread that isn't just waiting more than 50% of the time on slow resources (mostly memory).

Quote
Well, you can avoid it by not using queues but sockets to communicate.
no you did not avoided it, you simple changed the guarding mechanism, if the thread is to busy to check for incoming socket communication it will not receive any.
True. But "efficient" in threading is often equal to "fully decoupled". Any interaction slows thing down quite a bit. Sockets are completely decoupled, lock-free queues are nice, but 95% of the code is in preventing the shared access to fuck up things.

Quote
Yes, I agree. That's my main problem with servlets, or threads crashing the main process, and so the whole, distributed application. Which can be huge, and doing thousands of things at the same time.

If it is only the one thread/process, there is no recovery possible.
Well to guard against all exception is as simple as adding a try except block that encompasses the complete code of your execute method and does not raise any exception for what ever reason. Is that too much?
Well, for starters: that does work in free pascal most of the time, but not in C++.

But exceptions are the same as "on error goto handler". They make it hard to figure out the program flow. Especially if they can happen in child threads.

Mostly, because it is hard to enforce a wrapper that prevents crashing in all cases. Because, as you said, you have to put the try..except religiously around anything.

Quote
Every means of communication that is not enclosed by mutexes or through sockets increases the risk of crashing the application when shit happens.
Yes and no. There are atomic operations that do not need guards and there are operation specifically designed to be atomic, like the interlockedXXXXXX procedures. Those do work both under guard and with out a guard with out probles,
Yes. I would love it if there was an atomic variant for each operation. But, there's only a few really atomic operations.

Most "atomic" operations are like this:

Code: Pascal  [Select][+][-]
  1. Lock; // this is a global action over all processors and cores that often requires flushing pipelines and caches
  2. DoSomething(Avalue);
  3. Unlock; // Well, continue with what you were doing when you have repaired the damage

There are a few really atomic ones, but they are hard to use.

Quote
On a regular desktop computer, there are thousands of threads active at any one time. Of which most (more than 90%) are blocked and waiting on an event (I/O, mostly), and a few hundred that are executing code ("running").

But, most of them are waiting the majority of their time as well. On slow memory (hundreds of cycles), on a mutex lock or cache invalidation, or even an extremely slow (millions of cycles) disk access.

It is really hard to make a thread run at full speed for an extended time.

So, you actually want to count the total CPU time those running threads spend. And only the OS knows that. Your application doesn't.

You are overthinking it. even the cpu has to wait for the bus to become available its out of your hands its out of the OS hands too. take a step back and see the forest.
If you have many threads that return results by locking, incrementing and unlocking a single variable, you just serialized your application.

Most multi-threading applications I see lock EVERYTHING. That makes them excessively slow.

Quote
If you use sockets for accessing the services, you also don't need to write all of them in the same language. Or run them all on the same kind of CPU and OS.
true, then again you add a level of indirection and an order of complexity and it has nothing to do with the processing it self.

I don't agree: message queues are not servers. There is no memory barrier or exception prevention when accessing them. They use owned pointers.
It seems that "message queues" hits some kind of wall, try message brokers instead.
I know how they work.

Making things more complex won't solve the underlying problems, it will only hide them from people who don't understand.

Quote
Yes, on Linux it is probably faster as long as the binary containing the code is already loaded into main memory.
same as threads on windows.
Absolutely not, they are vastly different.

Quote
You can send a message to either a window or a thread, both of which have to be active, have a valid handle and message queue. And they should be on the "active" list of the scheduler (ie. scheduled to get CPU time). Otherwise, you won't get the message.
isn't that a global requirement? your client has to be active, non blocking and has open the socket/door combination to accept communication as well why is this a problem? they are the minimum requirements in everything why it is a problem in queues suddenly?
Blocking is completely acceptable, terminated and removed isn't.

It's all about the context where the code executes.

Quote
Well, they're not the same  :)
I guess I do not see the difference, sorry.
Yes. I'm not sure I can explain it.

I'll probably just have to build and use it, no matter what anyone else thinks.


Then again, that is the problem: people don't like change, so why would they use it?

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: Stream processing
« Reply #14 on: August 19, 2017, 01:36:40 am »
Yes. And that's why you shouldn't access then without locking them up front. Because everyone can do that, all at the same time.

Wrong. you have to remember the golden rule of multi threading, only accessing data can create problems and only when changed unpredictably. What that means for you is simple. You only need to protect your data not the code and only protect them during writing them in which case they must not be accessible until the write operation finishes. A hidden parameter that only gets read does not need to be protected. 


Blocked threads are the best ones: they're waiting on events or other I/O. They take little CPU time, code or maintenance.

and never process their data, so not so much fun.

Then again, it is really hard to make a functional thread that isn't just waiting more than 50% of the time on slow resources (mostly memory).
I'd like to see some tests on this conclusion, until then I'll be skeptical.

True. But "efficient" in threading is often equal to "fully decoupled". Any interaction slows thing down quite a bit. Sockets are completely decoupled, lock-free queues are nice, but 95% of the code is in preventing the shared access to fuck up things.

There are a number of "efficient" indicators, you need to be more specific efficient in what? and decoupled from what?

Well, for starters: that does work in free pascal most of the time, but not in C++.
1) I'm smelling a design flaw here.
2) You should never leave proper exception handling to external code, it should run protected at all times regardless the competence level of its developer.
3) I have no experience with C++ and exception handling but from a quick brush with c++ developers in general, I would say that it should be possible to achieve the same level of protection (if not better) as FPC it just requires more experience than pascal.


But exceptions are the same as "on error goto handler". They make it hard to figure out the program flow. Especially if they can happen in child threads.

Mostly, because it is hard to enforce a wrapper that prevents crashing in all cases. Because, as you said, you have to put the try..except religiously around anything.

You did not listen did you? you never handle exception in the child threads, the child thread handles its own exceptions, if for any reason the child thread raises an exception you mark that thread as unsafe and refuse to run it again.


Yes. I would love it if there was an atomic variant for each operation. But, there's only a few really atomic operations.

Most "atomic" operations are like this:

Code: Pascal  [Select][+][-]
  1. Lock; // this is a global action over all processors and cores that often requires flushing pipelines and caches
  2. DoSomething(Avalue);
  3. Unlock; // Well, continue with what you were doing when you have repaired the damage

There are a few really atomic ones, but they are hard to use.

No that would be normal operation, atomic operation are well defined in every CPU for example the assignment operation for a 32bit integer on 32bit aligned memory address is atomic on Intel CPU. They do not need any guards. Try reading the intel developers manual for a better feeling on atomic operations.

If you have many threads that return results by locking, incrementing and unlocking a single variable, you just serialized your application.

lets say that a badly designed multithreading model will always be slower than a properly designed one.

Most multi-threading applications I see lock EVERYTHING. That makes them excessively slow.

I can't speak for what I haven't seen.
I know how they work.

Making things more complex won't solve the underlying problems, it will only hide them from people who don't understand.
Well I tend to agree with the general statement, then again you already advocated using sockets as a decoupling mechanism this is the same thing with the ground work already done for you.


Absolutely not, they are vastly different.

Well there are general differences yes for example threads do not require IPC mechanisms, can have better tuned locking mechanisms and faster data exchange than forking yes but vastly different? maybe.

It's all about the context where the code executes.

that smells like an other design flaw but I'll wait and see the implementation to make sure that I do not miss anything.


Yes. I'm not sure I can explain it.

I'll probably just have to build and use it, no matter what anyone else thinks.
just make sure you do not ignore the lessons learned in the last 3 or 4 decades of multi threading and you do not simple reiterate past mistakes (been there done that I have the marks to prove it ;)).

Then again, that is the problem: people don't like change, so why would they use it?
Its not the change that I do not like its the uncertainty that comes with it that I can't stand. in order for me to accept your proposal I would have to implement it, test it, test it under load and profile it to see if there is anything useful in it. That means depending on the underline job anywhere from 2 to 3 weeks up to 6 months of work just to have something new that does the same job my current library does with out complains from my users.

So its not change that I do not like is change that brings no benefits that I avoid.

If you do implement it I wouldn't mind taking a look on it.
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

 

TinyPortal © 2005-2018