Forum > Third party
AI, NLP and CAI: Text Generation with Convolutional Neural Networks in Pascal
schuler:
:) Hello Pascal Lovers! :)
Given the relevance of this post, I decided to start a new thread.
Short Description
I trained a (hello world) small neural network on the Tiny Stories dataset. This code
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} --- WriteLn(GenerateStringFromChars(NFit.NN, 'once', FSampler),'.'); WriteLn(GenerateStringFromChars(NFit.NN, 'one ', FSampler),'.');
produces this output:
--- Quote ---once upon a time, there was a little girl named lily. she loved to play outside i.
one day, a little girl named lily was playing in her garden. she saw a big car wi.
--- End quote ---
You can find my raw training file and run by yourself if you like at:
https://colab.research.google.com/github/joaopauloschuler/neural-api/blob/master/examples/SimpleNLP/NLP_CAI_TinyStories_Simple_Example.ipynb
Longer Description
This source code above uses a neural network to guess the next character in a string.
It downloads the Tiny Stories dataset (https://huggingface.co/datasets/roneneldan/TinyStories) and trains a small Pascal written neural network model. The neural network model is built with:
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---const csContextLen = 81; csTrainingFileName = 'tinystories.txt'; csVocabSize = 128; // Character based vocabulary/dictionary. csMinSampleSize = 3; // Minimum of 3 characters.
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} --- FNN.AddLayer([ TNNetInput.Create(csContextLen, 1, csVocabSize), TNNetPointwiseConv.Create(32,1), TNNetPadXY.Create(1,0), TNNetConvolutionReLU.Create(64,3,0,1,1), TNNetMaxPool.Create(3), TNNetPadXY.Create(1,0), TNNetConvolutionReLU.Create(128*3,3,0,1,1), TNNetPointwiseConvReLU.Create(1024,0), TNNetMaxPoolWithPosition.Create(27,27,0,1,0), TNNetPointwiseConvReLU.Create(1024), TNNetPointwiseConvReLU.Create(128), TNNetFullConnectLinear.Create(csVocabSize), TNNetSoftMax.Create() ]);
This neural network has some characteristics:
* It’s character based. Therefore, there is no dictionary. The convolutional layers are responsible for learning the words. In the first epochs of the training, we can see that the neural network is learning the words.This architecture benefits from the small vocabulary found in the “Tiny Stories” dataset.
* It predicts the next character in an input sequence (or context). In this example, the context is 81 characters.
* There is no recursive computation. It’s a convolutional model. Therefore, it’s memory efficient and can be computed in a highly parallel environment.
* One of the max pooling layers inserts the positional information of the max values.
* In this particular example, it learns very well the “Tiny Stories”. I also tried to train this model with wikipedia but wikipedia vocabulary and sentence structures are too complex for this small 2.8 million parameters model. You can just replace tinystories.txt and train it on your own text file (dataset). This source code is the “hello world” of the NLP. Don’t expect too much from it.In the case that you are curious, there are plenty of scientific studies supporting NLP with CNNs:
https://aclanthology.org/W18-6127/ - Convolutions Are All You Need (For Classifying Character Sequences)
https://arxiv.org/abs/1712.09662 - CNN Is All You Need
https://arxiv.org/abs/1804.09541 - QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension
https://aclanthology.org/N19-1407.pdf - Convolutional Self-Attention Networks
https://arxiv.org/pdf/1805.08318.pdf - Self-Attention Generative Adversarial Networks
A Bit of the API Behind the Scenes
Samplers are used to probabilistically select the next token (character) from the probabilities guessed by the neural network. The Greedy, Top-K, and Top-P samplers provide different ways to predict the next character in a sequence.
Greedy Sampling:
* Always selects the token with the highest probability at each step.
* Tends to produce repetitive and deterministic output.
Top-K Sampling:
* Samples from the K most likely next tokens at each step.
* K is a parameter that controls diversity - a bigger K leads to more diverse results.
Top-P Sampling:
* Samples from the smallest possible set of tokens whose cumulative probability exceeds P at each step.
* P is a parameter between 0 and 1 controlling diversity - lower P produces less diversity.
In summary:
Greedy sampling takes the most likely token, leading to less diversity. Top-K and Top-P allow controlling diversity by adjusting their parameters.
These samplers are available in plain pascal code:
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} --- { TNNetSamplerGreedy } TNNetSamplerGreedy = class (TNNetSamplerBase) public function GetToken(Origin: TNNetVolume): integer; override; end; { TNNetSamplerTopK } TNNetSamplerTopK = class (TNNetSamplerBase) protected FTopK: integer; public constructor Create(TopK: integer); function GetToken(Origin: TNNetVolume): integer; override; end; { TNNetSamplerTopP } TNNetSamplerTopP = class (TNNetSamplerBase) protected FTopP: TNeuralFloat; public constructor Create(TopP: TNeuralFloat); function GetToken(Origin: TNNetVolume): integer; override; end;
In this source code example, the sampler is created with “FSampler := TNNetSamplerTopP.Create(0.4);”
Then, you can just call the following to see the magic:
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} --- WriteLn(GenerateStringFromChars(NFit.NN, 'once', FSampler),'.'); WriteLn(GenerateStringFromChars(NFit.NN, 'one ', FSampler),'.');
The loading and saving of neural networks (NN) can be done with:
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} --- NN := TNNet.Create; NN.LoadFromFile('MyTrainedNeuralNetwork.nn'); NN.SaveToFile('MyTrainedNeuralNetwork.nn');
A small chat bot can be coded with:
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---procedure TestFromFile;var S: string; oSampler: TNNetSamplerBase; NN: TNNet;begin oSampler := TNNetSamplerTopP.Create(0.6); NN := TNNet.Create(); WriteLn('Loading neural network.'); NN.LoadFromFile(csAutosavedFileName); NN.DebugStructure(); WriteLn(); WriteLn('Write something and I will reply.'); repeat Write('User: '); ReadLn(S); WriteLn('Neural network: ',GenerateStringFromChars(NN, LowerCase(S), oSampler),'.'); until S = 'exit'; NN.Free; oSampler.Free;end;
There is plenty more to come. But, for today, we’ll stay with the “hello world” example.
:) Have fun! :)
cpicanco:
Nice schuler, as soon as possible, I will play with this.
schuler:
:) Hello :)
I would like to share an experiment that I did:
https://poe.com/s/ELl4xkluKjNpEE1h6vuZ
In the example above, the bot translated a Python code to pascal without me asking. I intended to ask to translate but the AI predicted my intentions. I found the above example very impressive.
The above experiment was made with:
https://poe.com/CAI-NEURAL-API
In the case that you prefer ChatGPT, I have it also ready:
https://chat.openai.com/g/g-bqMxEDpIg-neural-api-free-pascal-developer
I have no intention in making money with POE nor ChatGPT. I configured these bots for my own usage. They contain the CAI interfaces so they know how to suggest coding.
:) I wish everyone happy pascal coding :)
schuler:
:) Hello Pascal Lovers! :)
I coded the main building block of a large language model (LLM) in plain pascal: the transformer block. The transformer block is the main building block in many language models such as ChatGPT. In turn, the main building block of the transformer is the self-attention mechanism. I also ported the self-attention mechanism to pascal.
I ported the source code from python to pascal. The original python code can be found here:
https://github.com/tgautam03/Transformers/blob/master/classification.ipynb
The original source code from where I ported to pascal is explained in this youtube video:
https://www.youtube.com/watch?v=96KqiPQlP4s
This the transformer block:
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---function TNNet.AddTransformerBlockCAI(Heads: integer; IntermediateDim: integer; HasNorm: boolean = False ): TNNetLayer;var PrevLayer, AttendedPlusPrev, Attended: TNNetLayer; EmbeddingDim: integer;begin PrevLayer := GetLastLayer(); EmbeddingDim := PrevLayer.Output.Depth; Attended := AddSelfAttentionCAI(Heads); AttendedPlusPrev := AddLayer( TNNetSum.Create([Attended, PrevLayer]) ); AddLayer( TNNetPointwiseConvReLU.Create(IntermediateDim) ); if HasNorm then AddLayer( TNNetMovingStdNormalization.create() ); AddLayer( TNNetPointwiseConvLinear.Create(EmbeddingDim) ); AddLayer( TNNetSum.Create([ GetLastLayer(), AttendedPlusPrev]) ); Result := GetLastLayer();end;
This is the multi-head self-attention plain pascal implementation:
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---function TNNet.AddSelfAttentionCAI(Heads: integer): TNNetLayer;var W: TNNetLayer; PreviousLayer: TNNetLayer; InputChannelsPerGroup: integer; EachGroupOutput: array of TNNetLayer; HeadCnt: integer; QueryGroup, KeyGroup, ValueGroup, ValueTGroup: TNNetLayer;begin if Heads <= 1 then begin AddSingleHeadSelfAttention(Result, W); end else begin PreviousLayer := GetLastLayer(); SetLength(EachGroupOutput, Heads); InputChannelsPerGroup := PreviousLayer.FOutput.Depth div Heads; for HeadCnt := 0 to Heads - 1 do begin QueryGroup := AddLayerAfter( TNNetPointwiseConvLinear.Create(InputChannelsPerGroup), PreviousLayer); KeyGroup := AddLayerAfter( TNNetPointwiseConvLinear.Create(InputChannelsPerGroup), PreviousLayer); ValueGroup := AddLayerAfter( TNNetPointwiseConvLinear.Create(InputChannelsPerGroup), PreviousLayer); ValueTGroup := AddLayerAfter( TNNetTransposeXD.Create(), ValueGroup); (*W := *)AddLayer( TNNetDotProducts.Create(QueryGroup, KeyGroup) ); (*W := *)AddLayer( TNNetLayerMaxNormalization.Create() ); W := AddLayer( TNNetPointwiseSoftMax.Create() ); (*YT := *)AddLayer( TNNetDotProducts.Create(ValueTGroup, W) ); EachGroupOutput[HeadCnt] := GetLastLayer(); end; AddLayer( TNNetDeepConcat.Create(EachGroupOutput) ); SetLength(EachGroupOutput, 0); // Groups with few channels tend to be numerically unstable if InputChannelsPerGroup < 64 then begin AddLayer( TNNetMulByConstant.Create(InputChannelsPerGroup/64) ); end; Result := AddLayer( TNNetPointwiseConvLinear.Create(PreviousLayer.FOutput.Depth) ); end;end;
Is this the very first time that a transformer block is coded in Pascal? It's probable. Is it ready to use? Probably not. I'll eventually have it fully tested and examples to show for both NLP and Computer Vision. Anyway, it has been born!
:) I wish long life to Pascal. :)
domasz:
Very nice! Such things in Pascal!
Navigation
[0] Message Index
[#] Next page