Forum > General
[SOLVED] Replacing a non-ascii chars in UTF8 string
alpine:
Hi,
Is anyone figured out an easy way to replace non-ascii chars in a UTF8 string with a translation table?
The problem is when the user enters a license plate number into an edit box, it can look same as the real number but written in Cyrillic characters. In Cyrillic alphabet there is many letters that look as Latin ones. When such a wrong number enters into the database, it can't be found easily.
The obvious way should be to replace all similar looking chars with Latin ones before processing, but I can't find a simple way of doing that.
winni:
Hi!
The UTF8 chars are grouped.
You should get the value of the UTF8char and look if it is member of the latin groups or the cyrilic groups.
The values:
Teh basic groups are
The Latin Groups U+0000 .. U+2CF7
The Cyrillic Groups U+0400 .. U+052F
There are some more.
Have a look at https://www.utf8-chartable.de/unicode-utf8-table.pl
Winni
Thaddy:
It would be best if the database field would be defined as UTF8. Then it will always store correctly. This also how most database string fields are defined, nowadays.
SymbolicFrank:
--- Quote from: Thaddy on January 15, 2022, 12:35:44 pm ---It would be best if the database field would be defined as UTF8. Then it will always store correctly. This also how most database string fields are defined, nowadays.
--- End quote ---
Well, yes, but only if you type the same char when searching for the value. The storage isn't the problem.
alpine:
--- Quote from: Thaddy on January 15, 2022, 12:35:44 pm ---It would be best if the database field would be defined as UTF8. Then it will always store correctly. This also how most database string fields are defined, nowadays.
--- End quote ---
It is defined as such.
The trouble is at another place - when the users enters data, usually the keyboard is switched to Cyrillic and because of similarity of the letters it is not obvious what kind of letters are there. The next time the keyboard can be switched to Latin - and the same plate number can look exactly the same, but with Latin letters.
FYI: АВЕКМНОРСТУХ are shared between Cyrillic/Latin, but they of course have different codes
UTF8 field can't resolve that issue, the chars must be replaced with a help of translation table:
АВЕКМНОРСТУХ -> ABEKMHOPCTYX
Navigation
[0] Message Index
[#] Next page