Hi,
I am working on (yet) a(nother) framework to make neural networks in object pascal. It supports automatic gradient computation (first-order derivative only for now). That said, it allows you to design your model in various abstraction levels, aside from using the predefined layers, i.e., something that you could've also done in keras or pytorch. You may utilize high level API like this:
{ Load and prepare the data. }
Dataset := ReadCSV('data.csv');
X := GetColumnRange(Dataset, 0, 4);
X := StandardScaler(X);
{ Convert raw labels to one-hot vectors }
Enc := TOneHotEncoder.Create;
y := GetColumn(Dataset, 4);
y := Enc.Encode(Squeeze(y));
{ Initialize the model. }
NNModel := TModel.Create([
TDenseLayer.Create(NInputNeuron, 32),
TReLULayer.Create(),
TDropoutLayer.Create(0.2),
TDenseLayer.Create(32, 16),
TReLULayer.Create(),
TDropoutLayer.Create(0.2),
TDenseLayer.Create(16, NOutputNeuron),
TSoftMaxLayer.Create(1)
]);
{ Initialize the optimizer. There are several other optimizers too. }
optimizer := TAdamOptimizer.Create;
optimizer.LearningRate := 0.003;
for i := 0 to MAX_EPOCH - 1 do
begin
{ Make a prediction and compute the loss }
yPred := NNModel.Eval(X);
Loss := CrossEntropyLoss(yPred, y) + L2Regularization(NNModel);
{ Update model parameter w.r.t. the loss }
optimizer.UpdateParams(Loss, NNModel.Params);
end;
or go down to the lower level and defining your own model and loss function by writing the math directly like this:
{ weights and biases }
W1 := RandomTensorNormal([NInputNeuron, NHiddenNeuron]);
W2 := RandomTensorNormal([NHiddenNeuron, NOutputNeuron]);
b1 := CreateTensor([1, NHiddenNeuron], (1 / NHiddenNeuron ** 0.5));
b2 := CreateTensor([1, NOutputNeuron], (1 / NOutputNeuron ** 0.5));
{ Since we need the gradient of weights and biases, it is mandatory to set
RequiresGrad property to True. We can also set the parameter individually
for each parameter, e.g., `W1.RequiresGrad := True;`. }
SetRequiresGrad([W1, W2, b1, b2], True);
Optimizer := TAdamOptimizer.Create;
Optimizer.LearningRate := 0.003;
Lambda := 0.001;
for i := 0 to MAX_EPOCH - 1 do
begin
{ Make the prediction. }
yPred := SoftMax(ReLU(X.Dot(W1) + b1).Dot(W2) + b2, 1);
{ Compute the cross-entropy loss. }
CrossEntropyLoss := -Mean(y * Log(yPred));
{ Your usual L2 regularization term. }
L2Reg := Sum(W1 * W1) + Sum(W2 * W2);
TotalLoss := CrossEntropyLoss + Lambda * L2Reg;
{ Update the network weight }
Optimizer.UpdateParams(TotalLoss, [W1, W2, b1, b2]);
end;
You can even skip the predefined optimizer and define your own weight update rule:
for i := 0 to MAX_EPOCH - 1 do
begin
{ Zero the gradient of all parameters from previous iteration. }
ZeroGradGraph(TotalLoss);
{ Make the prediction. }
yPred := SoftMax(ReLU(X.Dot(W1) + b1).Dot(W2) + b2, 1);
{ Compute the cross-entropy loss. }
CrossEntropyLoss := -Mean(y * Log(yPred));
{ Your usual L2 regularization term. }
L2Reg := Sum(W1 * W1) + Sum(W2 * W2);
TotalLoss := CrossEntropyLoss + Lambda * L2Reg;
{ Compute all gradients by simply triggering `Backpropagate` method in the
`TotalLoss`. }
TotalLoss.Backpropagate;
{ Vanilla gradient descent update rule. }
W1.Data := W1.Data - LearningRate * W1.Grad;
W2.Data := W2.Data - LearningRate * W2.Grad;
b1.Data := b1.Data - LearningRate * b1.Grad;
b2.Data := b2.Data - LearningRate * b2.Grad;
end;
If you are interested, kindly check the project here:
https://github.com/ariaghora/noe. Note that, at this stage, the framework is still a proof of concept. Thus, please bear with the leaks here and there, also with the lack of rigorous optimization.

Cheers