Errrr..... someone will have to explain to me, how "§" (=ASCII 167) is Unicode/UTF8?!?!?!
I'll try.
In short:It is not ASCII 167.
This character is encoded as (in hex):
-
a7 (167 decimal) in cp1250 (not ASCII!),
-
00 a7 in UTF-16
-
c2 a7 in UTF-8
- has no representation in ASCII
Longer explanation:This is not ASCII 167. There is no ASCII 167.
ASCII is 7-bit encoding. There are no values above 127 in ASCII!
There are several so called
ANSI 8-bit encodings (code pages), also known as
Windows code pages. You probably mixed up one of these. This character is indeed encoded as 167 in cp1250 (sometimes called win-1250) which is the ANSI page used for Croat alphabet, as well as for several other east European latin languages -- Check, Slovak, Hungarian, Polish... For other languages there are other ANSI code pages, such as cp1252 for west european latin languages, cp1251 for cyrilic languages, cp1253 for Greek, etc.
It is not ASCII!All these 8-bit ANSI encodings are compatible with ASCII in the first 128 characters, which have values below 127. Every ANSI page, such as cp1250 mentioned above, have the same first 128 characters (0-127); these are taken from ASCII. However, they differ in values above 128 (in these position above 128 -- cp1250 has letters used in east european latin languages, such as č, ć, đ ..., cp1252 has letters used in west european languages, such as ü, ö, ç, ..., cp1251 has cyrilic letters there, etc.).
Furthermore, these ASCII values up to 127 are also encoded the same in utf-8, but
these are only characters which have one-byte encodings in utf-8.
Any character which appears in some ANSI (not ASCII!) encoding with a value 128 or above (that is, which has the bit 7 set), such as § which is encoded as 167 in cp1250, is represented in utf-8 with at least two bytes.