What is this new functionality or approach in OpenGL?
The most basic approach would be roughly something like the following:
First, you'd need to define some kind of interleaved OpenGL data structure, such as what's shown below.
TTexCoord = record
U, V: Single;
end;
TFloatColor = record
R, G, B: Single;
end;
TVertex = record
X, Y, Z: Single;
end;
TSpriteVertex = record
T2F: TTexCoord;
C3F: TFloatColor;
V3F: TVertex;
end;
TSpriteTriangle = record
V1, V2, V3: TSpriteVertex;
end;
TSpriteQuad = record
Tri1, Tri2: TSpriteTriangle;
end;
TSpriteQuadArray = array of TSpriteQuad;
Then, you would need to define some kind of sprite manager class (or even advanced record) that had its own TSpriteQuadArray as well as a separate array of actual individual sprites (which would each have their own TSpriteQuad, as well as texture ID). Before drawing you'd sort the array of actual sprites by texture ID from low to high, call glBindTexture on the first one, and then loop over the array copying each sprites TSpriteQuad sequentially into the managers TSpriteQuadArray, until the texture ID of the current sprite was different from the previous one. At that point, you'd bind your vertex buffer and use something along the lines of glBufferSubData to upload the data from the TSpriteQuadArray into it, and then set up your "vertex attribute pointers" so that the stride matches your previously defined interleaved data structure. Then you'd call glDrawArrays (in GL_TRIANGLES mode of course), clear the managers TSpriteQuadArray, bind the new texture ID, and return to your initial loop at the index you left off until you hit a different texture ID again.
Doing things that way you'd be looking at enormous framerate increases over how ZenGL currently does things, both because you'd be making
drastically fewer OpenGL function calls (we're talking hundreds/thousands VS. tens of millions here) and also because all of the data OpenGL needs would be getting uploaded to the GPU in a single array ahead of time, instead of vertex-by-vertex in an endless series of glVertex3Fs and glTexCoord2Fs.
As far as your benchmarks, you are certainly comparing apples to oranges. The ZenGL demo is an utterly simplistic "true 2D" loop that just renders the same six textures in a row (thereby "animating" them) over and over again while adjusting their position every so often based on input from a timer. There are no matrix transformations, world space computations, or "game logic" of any kind going on. The Castle Dragon demo on the other hand is loading fully articulated spine models from JSON files and computing the transformations of each joint in real time, while also dealing with 3D matrix computations (since nothing in Castle is ever
actually 2D, as demonstrated by the 3D camera mode you can switch to in that demo) as well as rendering the on-screen menu buttons. Most of the same things apply to the isometric demo, other than the use of "rigged" spine models.