Recent

Author Topic: AI, NLP and CAI: Text Generation with Convolutional Neural Networks in Pascal  (Read 18653 times)

schuler

  • Full Member
  • ***
  • Posts: 233
:) Hello Pascal Lovers! :)
Given the relevance of this post, I decided to start a new thread.

Short Description
I trained a (hello world) small neural network on the Tiny Stories dataset. This code

Code: Pascal  [Select][+][-]
  1.     WriteLn(GenerateStringFromChars(NFit.NN, 'once', FSampler),'.');
  2.     WriteLn(GenerateStringFromChars(NFit.NN, 'one ', FSampler),'.');

produces this output:
Quote
once upon a time, there was a little girl named lily. she loved to play outside i.
one day, a little girl named lily was playing in her garden. she saw a big car wi.

You can find my raw training file and run by yourself if you like at:
https://colab.research.google.com/github/joaopauloschuler/neural-api/blob/master/examples/SimpleNLP/NLP_CAI_TinyStories_Simple_Example.ipynb

Longer Description
This source code above uses a neural network to guess the next character in a string.
It downloads the Tiny Stories dataset (https://huggingface.co/datasets/roneneldan/TinyStories) and trains a small Pascal written neural network model. The neural network model is built with:

Code: Pascal  [Select][+][-]
  1. const
  2.   csContextLen = 81;
  3.   csTrainingFileName = 'tinystories.txt';
  4.   csVocabSize  = 128; // Character based vocabulary/dictionary.
  5.   csMinSampleSize = 3; // Minimum of 3 characters.

Code: Pascal  [Select][+][-]
  1.     FNN.AddLayer([
  2.       TNNetInput.Create(csContextLen, 1, csVocabSize),
  3.       TNNetPointwiseConv.Create(32,1),
  4.       TNNetPadXY.Create(1,0),
  5.       TNNetConvolutionReLU.Create(64,3,0,1,1),
  6.       TNNetMaxPool.Create(3),
  7.       TNNetPadXY.Create(1,0),
  8.       TNNetConvolutionReLU.Create(128*3,3,0,1,1),
  9.       TNNetPointwiseConvReLU.Create(1024,0),
  10.       TNNetMaxPoolWithPosition.Create(27,27,0,1,0),
  11.       TNNetPointwiseConvReLU.Create(1024),
  12.       TNNetPointwiseConvReLU.Create(128),
  13.       TNNetFullConnectLinear.Create(csVocabSize),
  14.       TNNetSoftMax.Create()
  15.     ]);

This neural network has some characteristics:
  • It’s character based. Therefore, there is no dictionary. The convolutional layers are responsible for learning the words. In the first epochs of the training, we can see that the neural network is learning the words.This architecture benefits from the small vocabulary found in the “Tiny Stories” dataset.
  • It predicts the next character in an input sequence (or context). In this example, the context is 81 characters.
  • There is no recursive computation. It’s a convolutional model. Therefore, it’s memory efficient and can be computed in a highly parallel environment.
  • One of the max pooling layers inserts the positional information of the max values.
  • In this particular example, it learns very well the “Tiny Stories”. I also tried to train this model with wikipedia but wikipedia vocabulary and sentence structures are too complex for this small 2.8 million parameters model. You can just replace tinystories.txt and train it on your own text file (dataset). This source code is the “hello world” of the NLP. Don’t expect too much from it.
In the case that you are curious, there are plenty of scientific studies supporting NLP with CNNs:
https://aclanthology.org/W18-6127/ - Convolutions Are All You Need (For Classifying Character Sequences)
https://arxiv.org/abs/1712.09662 - CNN Is All You Need
https://arxiv.org/abs/1804.09541 - QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension
https://aclanthology.org/N19-1407.pdf - Convolutional Self-Attention Networks
https://arxiv.org/pdf/1805.08318.pdf - Self-Attention Generative Adversarial Networks

A Bit of the API Behind the Scenes
Samplers are used to probabilistically select the next token (character) from the probabilities guessed by the neural network. The Greedy, Top-K, and Top-P samplers provide different ways to predict the next character in a sequence.

Greedy Sampling:
* Always selects the token with the highest probability at each step.
* Tends to produce repetitive and deterministic output.

Top-K Sampling:
* Samples from the K most likely next tokens at each step.
* K is a parameter that controls diversity - a bigger K leads to more diverse results.

Top-P Sampling:
* Samples from the smallest possible set of tokens whose cumulative probability exceeds P at each step.
* P is a parameter between 0 and 1 controlling diversity - lower P produces less diversity.

In summary:
Greedy sampling takes the most likely token, leading to less diversity. Top-K and Top-P allow controlling diversity by adjusting their parameters.

These samplers are available in plain pascal code:

Code: Pascal  [Select][+][-]
  1.   { TNNetSamplerGreedy }
  2.   TNNetSamplerGreedy = class (TNNetSamplerBase)
  3.     public
  4.       function GetToken(Origin: TNNetVolume): integer; override;
  5.   end;
  6.  
  7.   { TNNetSamplerTopK }
  8.   TNNetSamplerTopK = class (TNNetSamplerBase)
  9.     protected
  10.       FTopK: integer;
  11.     public
  12.       constructor Create(TopK: integer);
  13.       function GetToken(Origin: TNNetVolume): integer; override;
  14.   end;
  15.  
  16.   { TNNetSamplerTopP }
  17.   TNNetSamplerTopP = class (TNNetSamplerBase)
  18.     protected
  19.       FTopP: TNeuralFloat;
  20.     public
  21.       constructor Create(TopP: TNeuralFloat);
  22.       function GetToken(Origin: TNNetVolume): integer; override;
  23.   end;


In this source code example, the sampler is created with  “FSampler := TNNetSamplerTopP.Create(0.4);”

Then, you can just call the following to see the magic:

Code: Pascal  [Select][+][-]
  1.     WriteLn(GenerateStringFromChars(NFit.NN, 'once', FSampler),'.');
  2.     WriteLn(GenerateStringFromChars(NFit.NN, 'one ', FSampler),'.');

The loading and saving of neural networks (NN) can be done with:
Code: Pascal  [Select][+][-]
  1.    NN := TNNet.Create;
  2.    NN.LoadFromFile('MyTrainedNeuralNetwork.nn');
  3.    NN.SaveToFile('MyTrainedNeuralNetwork.nn');

A small chat bot can be coded with:

Code: Pascal  [Select][+][-]
  1. procedure TestFromFile;
  2. var
  3.   S: string;
  4.   oSampler: TNNetSamplerBase;
  5.   NN: TNNet;
  6. begin
  7.   oSampler := TNNetSamplerTopP.Create(0.6);
  8.   NN := TNNet.Create();
  9.   WriteLn('Loading neural network.');
  10.   NN.LoadFromFile(csAutosavedFileName);
  11.   NN.DebugStructure();
  12.   WriteLn();
  13.   WriteLn('Write something and I will reply.');
  14.   repeat
  15.     Write('User: ');
  16.     ReadLn(S);
  17.     WriteLn('Neural network: ',GenerateStringFromChars(NN, LowerCase(S), oSampler),'.');
  18.   until S = 'exit';
  19.   NN.Free;
  20.   oSampler.Free;
  21. end;
   
There is plenty more to come. But, for today, we’ll stay with the “hello world” example.

:) Have fun! :)

cpicanco

  • Hero Member
  • *****
  • Posts: 618
  • Behavioral Scientist and Programmer
    • Portfolio
Nice schuler, as soon as possible, I will play with this.
Be mindful and excellent with each other.
https://github.com/cpicanco/

schuler

  • Full Member
  • ***
  • Posts: 233
:) Hello :)

I would like to share an experiment that I did:
https://poe.com/s/ELl4xkluKjNpEE1h6vuZ

In the example above, the bot translated a Python code to pascal without me asking. I intended to ask to translate but the AI predicted my intentions. I found the above example very impressive.

The above experiment was made with:
https://poe.com/CAI-NEURAL-API

In the case that you prefer ChatGPT, I have it also ready:
https://chat.openai.com/g/g-bqMxEDpIg-neural-api-free-pascal-developer

I have no intention in making money with POE nor ChatGPT. I configured these bots for my own usage. They contain the CAI interfaces so they know how to suggest coding.

:) I wish everyone happy pascal coding :)
« Last Edit: December 30, 2023, 09:14:17 am by schuler »

schuler

  • Full Member
  • ***
  • Posts: 233
:) Hello Pascal Lovers! :)

I coded the main building block of a large language model (LLM) in plain pascal: the transformer block.  The transformer block is the main building block in many language models such as ChatGPT. In turn, the main building block of the transformer is the self-attention mechanism. I also ported the self-attention mechanism to pascal.

I ported the source code from python to pascal. The original python code can be found here:
https://github.com/tgautam03/Transformers/blob/master/classification.ipynb

The original source code from where I ported to pascal is explained in this youtube video:
https://www.youtube.com/watch?v=96KqiPQlP4s

This the transformer block:
Code: Pascal  [Select][+][-]
  1. function TNNet.AddTransformerBlockCAI(Heads: integer;
  2.   IntermediateDim: integer;
  3.   HasNorm: boolean = False
  4.   ): TNNetLayer;
  5. var
  6.   PrevLayer, AttendedPlusPrev, Attended: TNNetLayer;
  7.   EmbeddingDim: integer;
  8. begin
  9.   PrevLayer := GetLastLayer();
  10.   EmbeddingDim := PrevLayer.Output.Depth;
  11.   Attended := AddSelfAttentionCAI(Heads);
  12.   AttendedPlusPrev := AddLayer( TNNetSum.Create([Attended, PrevLayer]) );
  13.   AddLayer( TNNetPointwiseConvReLU.Create(IntermediateDim) );
  14.   if HasNorm then AddLayer( TNNetMovingStdNormalization.create() );
  15.   AddLayer( TNNetPointwiseConvLinear.Create(EmbeddingDim) );
  16.   AddLayer( TNNetSum.Create([ GetLastLayer(), AttendedPlusPrev]) );
  17.   Result := GetLastLayer();
  18. end;
  19.  

This is the multi-head self-attention plain pascal implementation:
Code: Pascal  [Select][+][-]
  1. function TNNet.AddSelfAttentionCAI(Heads: integer): TNNetLayer;
  2. var
  3.   W: TNNetLayer;
  4.   PreviousLayer: TNNetLayer;
  5.   InputChannelsPerGroup: integer;
  6.   EachGroupOutput: array of TNNetLayer;
  7.   HeadCnt: integer;
  8.   QueryGroup, KeyGroup, ValueGroup, ValueTGroup: TNNetLayer;
  9. begin
  10.   if Heads <= 1 then
  11.   begin
  12.     AddSingleHeadSelfAttention(Result, W);
  13.   end
  14.   else
  15.   begin
  16.     PreviousLayer := GetLastLayer();
  17.     SetLength(EachGroupOutput, Heads);
  18.     InputChannelsPerGroup := PreviousLayer.FOutput.Depth div Heads;
  19.     for HeadCnt := 0 to Heads - 1 do
  20.     begin
  21.       QueryGroup := AddLayerAfter( TNNetPointwiseConvLinear.Create(InputChannelsPerGroup), PreviousLayer);
  22.       KeyGroup := AddLayerAfter(   TNNetPointwiseConvLinear.Create(InputChannelsPerGroup), PreviousLayer);
  23.       ValueGroup := AddLayerAfter( TNNetPointwiseConvLinear.Create(InputChannelsPerGroup), PreviousLayer);
  24.       ValueTGroup := AddLayerAfter( TNNetTransposeXD.Create(), ValueGroup);
  25.       (*W := *)AddLayer( TNNetDotProducts.Create(QueryGroup, KeyGroup) );
  26.       (*W := *)AddLayer( TNNetLayerMaxNormalization.Create() );
  27.       W := AddLayer( TNNetPointwiseSoftMax.Create() );
  28.       (*YT := *)AddLayer( TNNetDotProducts.Create(ValueTGroup, W) );
  29.       EachGroupOutput[HeadCnt] := GetLastLayer();
  30.     end;
  31.     AddLayer( TNNetDeepConcat.Create(EachGroupOutput) );
  32.     SetLength(EachGroupOutput, 0);
  33.     // Groups with few channels tend to be numerically unstable
  34.     if InputChannelsPerGroup < 64 then
  35.     begin
  36.       AddLayer( TNNetMulByConstant.Create(InputChannelsPerGroup/64) );
  37.     end;
  38.     Result := AddLayer( TNNetPointwiseConvLinear.Create(PreviousLayer.FOutput.Depth) );
  39.   end;
  40. end;
  41.  

Is this the very first time that a transformer block is coded in Pascal? It's probable. Is it ready to use? Probably not. I'll eventually have it fully tested and examples to show for both NLP and Computer Vision. Anyway, it has been born!

:) I wish long life to Pascal. :)
« Last Edit: April 22, 2024, 08:37:20 am by schuler »

domasz

  • Sr. Member
  • ****
  • Posts: 443
Very nice! Such things in Pascal!

gidesa

  • Jr. Member
  • **
  • Posts: 73
Schuler, your work is impressive!  :o
A question: your code examples are very short. So have you used CAI existing functions/methods as building blocks of transformers?

schuler

  • Full Member
  • ***
  • Posts: 233
@gidesa,
YES! I have used existing layers from CAI to built the transformer. It may sound strange, but, sometimes, when you see "dense" in PyTorch/Keras, it's isomorphic to CAI's pointwise convolution. The matrix multiplication can be done with CAI's "DotProducts" and some "transposes".

The neural network layers are implemented at: https://github.com/joaopauloschuler/neural-api/blob/master/neural/neuralnetwork.pas .

BTW, I've finished coding TNNetEmbedding and TNNetTokenAndPositionalEmbedding. I haven't tested yet. This is a taste of what is to come:
Code: Pascal  [Select][+][-]
  1.     FNN.AddLayer([
  2.       TNNetInput.Create(csContextLen, 1, 1),
  3.       TNNetEmbedding.Create(FVocabSize, csEmbedDim)
  4.     ]);
  5.  
  6.     for I := 1 to 2 do FNN.AddTransformerBlockCAI(4);
  7.  
  8.     FNN.AddLayer([
  9.       TNNetFullConnectReLU.Create(csEmbedDim),
  10.       TNNetFullConnectReLU.Create(FVocabSize),
  11.       TNNetSoftMax.Create(1)
  12.     ]);

What can such a small Pascal based model do? You'll find in this paper: https://arxiv.org/pdf/2305.07759.pdf .
« Last Edit: April 02, 2024, 06:42:09 am by schuler »

schuler

  • Full Member
  • ***
  • Posts: 233
I'm suspecting that the first transformer block ever coded in Pascal belongs to the first Pascal based model ever placed at huggingface:
https://huggingface.co/datasets/schuler/TinyStories4Pascal

gidesa

  • Jr. Member
  • **
  • Posts: 73
I'm suspecting that the first transformer block ever coded in Pascal belongs to the first Pascal based model ever placed at huggingface:
https://huggingface.co/datasets/schuler/TinyStories4Pascal

Congratulations!

schuler

  • Full Member
  • ***
  • Posts: 233
Just to comment that I coded the Adam optimizer described at: https://arxiv.org/abs/1412.6980 .

This is how to use it:
Code: Pascal  [Select][+][-]
  1. var
  2.     Opt: TNeuralOptimizerAdam;
  3. begin
  4.     Opt := TNeuralOptimizerAdam.Create(0.9, 0.999);
  5.     NFit := TNeuralDataLoadingFit.Create();
  6.     NFit.Optimizer := Opt;

Adam is not a silver bullet as you can see at: https://arxiv.org/abs/2010.05627 . SGD with inertia (momentum) is still my preferred method.

Dzandaa

  • Sr. Member
  • ****
  • Posts: 276
  • From C# to Lazarus
Hello Joao Paulo,

I am very happy that you continue to develop CAI!!

B->
Dzandaa

Dzandaa

  • Sr. Member
  • ****
  • Posts: 276
  • From C# to Lazarus
Hi Joao Paulo,

Do you have an example of TNNetConcat and TNNetSplitChannels?

Thank you.

B->
Dzandaa

schuler

  • Full Member
  • ***
  • Posts: 233
@Dzandaa,
Quote
I am very happy that you continue to develop CAI!!
Thank you for your support!

Quote
TNNetConcat and TNNetSplitChannels?

Just for the sake of notation, I'll call channel the last dimension in the activation map. CAI is "channel last". In an 256x256 RGB image, in CAI, you'll have [X, Y, Depth]. The "Depth" represents channels such as RGB. In this case, you'll have [256, 256, 3]. In this example, you have 3 channels. I have already worked with images that have more than 500 input channels. This is called Hyperspectral imaging. I hope that I'm not too wordy today.

In most cases, if you are concatting channels, you'll call TNNetDeepConcat instead of TNNetConcat. The TNNetDeepConcat is actually used in the multi-head self-attention implementation just above. For TNNetDeepConcat to work, all input activation maps need to have the same X and Y dimensions although they can vary in Depth (channel count). You'll use TNNetConcat only if the positional information is not relevant.

TNNetDeepConcat is used in the DenseNet L40 implementation: https://github.com/joaopauloschuler/neural-api/tree/master/examples/DenseNetBCL40 .

Regarding TNNetSplitChannels, look at this example:
Code: Pascal  [Select][+][-]
  1. AddLayerAfter( TNNetSplitChannels.Create(15, 8), PreviousLayer);

In the example above, you'll copy all channels from the channel 15 up to the channel 22 (8 channels in total). If you intend to copy just certain channels, you can pass an array of integer as parameter.

A possible implementation for "grouped convolutions" is:
Code: Pascal  [Select][+][-]
  1. function TNNet.AddGroupedConvolution(Conv2d: TNNetConvolutionClass;
  2.   Groups, pNumFeatures, pFeatureSize, pInputPadding, pStride: integer;
  3.   pSuppressBias: integer; ChannelInterleaving: boolean): TNNetLayer;
  4. var
  5.   PreviousLayer: TNNetLayer;
  6.   FeaturesPerGroup: integer;
  7.   InputChannelsPerGroup: integer;
  8.   EachGroupOutput: array of TNNetLayer;
  9.   GroupCnt: integer;
  10. begin
  11.   if pInputPadding > 0 then
  12.   begin
  13.     PreviousLayer := AddLayer( TNNetPad.Create(pInputPadding) );
  14.   end
  15.   else
  16.   begin
  17.     PreviousLayer := GetLastLayer();
  18.   end;
  19.   Result := PreviousLayer;
  20.   SetLength(EachGroupOutput, Groups);
  21.   FeaturesPerGroup := pNumFeatures div Groups;
  22.   InputChannelsPerGroup := PreviousLayer.FOutput.Depth div Groups;
  23.   if Groups = 1 then
  24.   begin
  25.     Result := AddLayer( Conv2d.Create(FeaturesPerGroup, pFeatureSize, {pInputPadding=}0, pStride, pSuppressBias) );
  26.   end;
  27.   if Groups > 1 then
  28.   begin
  29.     for GroupCnt := 0 to Groups - 1 do
  30.     begin
  31.       if ChannelInterleaving
  32.         then AddLayerAfter( TNNetSplitChannelEvery.Create(Groups, GroupCnt), PreviousLayer)
  33.         else AddLayerAfter( TNNetSplitChannels.Create(GroupCnt*InputChannelsPerGroup, InputChannelsPerGroup), PreviousLayer);
  34.       EachGroupOutput[GroupCnt] := AddLayer( Conv2d.Create(FeaturesPerGroup, pFeatureSize, {pInputPadding=}0, pStride, pSuppressBias) );
  35.     end;
  36.     Result := AddLayer( TNNetDeepConcat.Create(EachGroupOutput) );
  37.   end;
  38.   SetLength(EachGroupOutput, 0);
  39. end;
« Last Edit: April 25, 2024, 06:06:10 pm by schuler »

Dzandaa

  • Sr. Member
  • ****
  • Posts: 276
  • From C# to Lazarus
Hi Joao Paulo,
Thank you for the response.

What I'm looking for is how to divide a layer into two branches and join one of the two branches further into another layer.

B->
Dzandaa

indydev

  • Jr. Member
  • **
  • Posts: 73
:) Hello :)

I would like to share an experiment that I did:
https://poe.com/s/ELl4xkluKjNpEE1h6vuZ

In the example above, the bot translated a Python code to pascal without me asking. I intended to ask to translate but the AI predicted my intentions. I found the above example very impressive.

The above experiment was made with:
https://poe.com/CAI-NEURAL-API

In the case that you prefer ChatGPT, I have it also ready:
https://chat.openai.com/g/g-bqMxEDpIg-neural-api-free-pascal-developer

I have no intention in making money with POE nor ChatGPT. I configured these bots for my own usage. They contain the CAI interfaces so they know how to suggest coding.

:) I wish everyone happy pascal coding :)

Regarding the chatGPT interface version. Is that available as an assistant in the playground? I don't have a PRO account, but use the API through my (still limited) client.


 

TinyPortal © 2005-2018