What you've shown in the previous post is safe but will very likely take noticeably longer (relatively speaking) than a single thread performing the entire task.
The reason for this is simple, you're creating a thread for every element in the array. Thread creation isn't a cheap operation. Certainly _much_ more expensive than adding two integers together which is what a single threaded program would do.
For what you're trying to do yield a noticeable improvement, the first thing you need is to calculate what the optimal number of threads is. That number depends on two things: 1. the global task that is to be performed and 2. the number of processors in the system.
In the example you presented, no improvement can be obtained because there are too few elements in the array and the operation is very simple. The point is, you need to create something that is relatively speaking simple but that still represents the original problem. This is necessary in order to be able to _measure_ the improvement, if any, a particular approach yields.
In the case of adding a couple of integers, in today's processors it probably takes a number of elements in the hundreds of thousands in order to be able to have a repeatable and measurable difference. Also, for your example, you're using random numbers which makes it difficult to verify the result is correct, considering the simplicity of the task it probably is but, it would be better to use a pattern that yields a value that can be pre-determined, that way _you_ can verify the results.
As a rule of thumb, a well behaved application should create, at most, 2/3 the number of threads as there are processors in the system. This because, in general, you want leave some processor power available to run other tasks to ensure the system keeps running smoothly. Generally speaking, give the user the option of selecting how many processors the task should use and "recommend" no more than 2/3 for the reason mentioned but, if the user want maximum speed at the expense of the system still running smoothly, I'd allow a maximum of (number of processors - 1) (always leave one processor available to run the O/S and other apps.)
Lastly, the threads should _not_ terminate when they are done with their task, they should suspend instead and be activated by inserting a task in their queue (which they check) before _resuming_ them. That way you have a pool of threads that perform work as it is needed. How this is controlled also depends on global task that is to be performed.
The above means that, one of the preliminary/setup steps the program carries out is to create a thread pool upfront, which is later used to accomplish the main task.
HTH.