@microca,
Thank you for the bug report. I will look at it.
In the case that you are interested, I asked Claude to produce a list of interesting NLP related examples in this API. The below AI generated list has been coded but not yet properly tested.
In the list:
Model — Its Building Block: Example showing the building block.
-
DeepSeek — MLA + decoupled-RoPE slice (V2): examples/LatentAttention/LatentAttention.lpr (builds MHA vs MLA vs MLA+decoupled-RoPE arms side by side); multi-token prediction (V3): examples/MultiTokenPrediction/MultiTokenPrediction.lpr (also used by examples/SelfSpeculativeDecoding/SelfSpeculativeDecoding.lpr).
-
LLaMA — pre-norm RMSNorm residual blocks: examples/PreNormVsPostNorm/PreNormVsPostNorm.lpr; SwiGLU: examples/SwiGLUFeedForward/SwiGLUFeedForward.lpr; RoPE: examples/RoPEBaseFrequencySweep/RoPEBaseFrequencySweep.lpr (also compared in examples/PositionEncodingBakeoff/PositionEncodingBakeoff.lpr).
-
Mistral / Longformer — sliding-window causal mask: examples/SlidingWindowBakeoff/SlidingWindowBakeoff.lpr.
-
Gemma — logit soft-capping: examples/SoftCappingStability/SoftCappingStability.lpr and examples/SoftCappingSweep/SoftCappingSweep.lpr; GeGLU: examples/GEGLUFeedForward/GEGLUFeedForward.lpr (also in examples/GatedFFNBakeoff/GatedFFNBakeoff.lpr).
-
BERT / GPT — GELU: examples/ActivationBakeoff/ActivationBakeoff.lpr; GPT-style decoder blocks: examples/TransformerDecoderBlock/TransformerDecoderBlock.lpr (full pipeline in examples/SimpleNLP/TransformerWithTokenizer.lpr).
-
Mamba — TNNetSelectiveSSM: examples/SelectiveSSM/SelectiveSSM.lpr.
-
Modern recurrent family — RWKV-4: examples/RWKV/RWKV.lpr (cross variant: examples/CrossWKV/CrossWKV.lpr); xLSTM sLSTM: examples/SLSTMvsCfC/SLSTMvsCfC.lpr; xLSTM mLSTM: no example yet (TNNetMLSTMCell has tests but no examples/ program); RetNet: examples/RetentionDualForm/RetentionDualForm.lpr; Titans: examples/TitansMemory/TitansMemory.lpr; DeltaNet: examples/DeltaNet/DeltaNet.lpr; GLA: examples/GatedLinearAttention/GatedLinearAttention.lpr (block form: examples/GatedLinearAttentionBlock/GatedLinearAttentionBlock.lpr).
-
Switch Transformer (load-balance loss): examples/TopKMoE/TopKMoE.lpr (contrast without aux loss: examples/ExpertChoiceMoE/ExpertChoiceMoE.lpr).
-
DIFF Transformer: examples/DifferentialAttentionNoise/DifferentialAttentionNoise.lpr.
-
StreamingLLM (attention sinks): examples/SinkAttentionStability/SinkAttentionStability.lpr.
-
BLOOM-lineage ALiBi: examples/ALiBiSlopeSweep/ALiBiSlopeSweep.lpr (also in examples/PositionEncodingBakeoff/PositionEncodingBakeoff.lpr).
-
GQA/MQA via KVHeads: no example yet — the KVHeads parameter exists on the MHA builders (neural/neuralnetwork.pas:11123) and GQA is discussed in examples/LatentAttention/README.md as a comparison point, but no example program actually passes KVHeads <> Heads.
This is a good starting point:
https://github.com/joaopauloschuler/neural-api/blob/master/examples/SimpleNLP/DecodeFeaturesBakeoff.lpr (includes some building blocks for DeepSeek v2 and v3)
@all, the list of examples is located at:
https://github.com/joaopauloschuler/neural-api/blob/master/examples/README.md