Hi Doug,
Welcome to the forum btw. I am in line with 440bx regarding the use of assembly. But first of all have a look at the algorithm. From your source it looks like you want to do a "saturated add" on a byte level. You are doing that via compare and branch, which generates data dependent branch mispredictions that will heavily slow down modern processors. A 1:1 translation of this to Pascal will have the same issues. And the added downside of this is that the execution speed of your routine is depending on the image you handle!
With some bit-wizardry you can make that code branch free if you considering the following
$xx + $yy = carry-bit + $zz => if you do bitwise or of $zz with Carry*255 you get what you want
The above is branch free and probably the Pascal version is already faster than your current asm, even if you handle one byte after the other. However, the branch free variant can be modified to handle all four bytes of a DWORD in parallel (or on a 64 bit machine even 8 bytes of a QWORD) giving a substantial boost.
If you are really craving for speed and willing to restrict to certain processor types, then there are AVX instructions on modern x86-64 processors that handle "saturated adds" and can provide another substantial boost. However I would first start with a Pascal version of the branch-free approach I explained above.
Regards,
MathMan