Just for my two centts worth. I write and sell some softtware that has to do massive parsing of documents containing, yes, HTML and related. For all intents and purposes UTF-8 is extremely fast parse here (does not matter if the website language is arabic, hebrew or whatever) since for parsing I am interested in bytes below 128 (and copying strings between html tags is also not a problem since I can deduce start/end byte positions)
One thing where there is a huge difference is also in memory footprint. UTF-8 uses way less in most cases. I absolutely love utf-8 and I do not really get why people think ufl-16 solves anything? If you need itertae over unicode chars the job is the same, you can't assume 2 bytes with utf-16, thus there is zero upside to utf-16 from my POV at least. (And actually UTF-8 is pretty smart in my opinion - not impossible to index in any way if you abstract it a little. Yes, the very simplest for loop won't work, but then again, it won't either for UTF-16 unless you mean UCS-2?)