Also note this code is much faster (factor 100 not 2) than using fasthtmlparser. (But it needs work for e.g.  )
Factor 100? Thaddy, this is cheating...
I assume that you are referring to my old demo from http://forum.lazarus.freepascal.org/index.php/topic,35980.msg239199.html#msg239199. I wrote this not with a speed test in mind. You should know that adding strings to Memo.Lines.Text is extremely expensive. If I modify the demo to write the found text nodes to a memorystream then there is a tremendous speed increase by a factor 50. The remaining speed disadvantage to regexpr is probably due to the fact that fasthtmlparser unnecessarily converts all tags to upper case.
See the modified demo in the attachment. It also contains a comparison with the htmlutil functions as proposed by z505. Since these functions do more than just comparing two strings this version is slower almost by a factor 2 than the dumb-old "pos" method used by "some people".
Well, the functions in htmlutil could be optimized, I never worked on optimizing them because it was fast enough for my needs (parsing thousands of ebay pages and yahoo stock pages)... The htmlutil unit could even use Pos() internally itself, instead of other things..
But indeed, with a Memo, you can I believe speed it up by doing a trick... Lazarus does this trick when compiling. It might be BeginUpdate/EndUpdate or something like it, or like you say a memory stream. If it's not using memo.lines.add and actually modifying the memo text itself as a string, then the solution to that is a CapString algorithm by myself too, which GoLang uses in its cap() array/slice... It grows the string in chunks instead of thousands of small memory allocations. Similar to buffered writeln
The Uppercase is for ease of use. Case sensitivity in html tags is annoying to the end user of the library. Because if he is parsing for <strong> and it in the sources is actually <STRONG> then his whole parser is broken based on case sensitivity, so he has to pollute his own source code with upcase() functions instead of it being in the library itself.
An uppercase boolean could be used so the function does not call it automatically, but then the person may end up calling it themselves anyway 1000's of times in his application code in a loop.
If you don't upcase each and every tag, how do you know for sure a page doesn't have a <stRanGe> case tag.. the code you write may be relying on case sensitivity in html which is a bad thing, IMO, unless you know the website will never be modified and you are deailing with permanent fixed html that will not change in the future (some developer could modify the html and make a <miStake or <UPPERCASE one tag but not others. Reliability, vs performance in the parser ;-)