Hello, 
Starting from version 1877, the StringManipulation example did not work correctly, at least on the 32-bit version.
In any case, it was an interesting quest.
https://sourceforge.net/p/cai/svncode/1877Is it possible to add a pre-made training file tinystories.txt to the NLP examples?
Does it really take more than a day to prepare it?
And yes, I'm joining the congratulations, it's truly a great job!
p.s
It turned out to be quite simple to implement batch multiplication using SSE/2/3
(thanks for the trick with double haddps), but unfortunately it didn't help much.
Apparently, this approach would work better on a more powerful processor like Xeon diamond rapid