Forum > General

[SOLVED] Replacing a non-ascii chars in UTF8 string

(1/2) > >>

alpine:
Hi,

Is anyone figured out an easy way to replace non-ascii chars in a UTF8 string with a translation table?

The problem is when the user enters a license plate number into an edit box, it can look same as the real number but written in Cyrillic characters. In Cyrillic alphabet there is many letters that look as Latin ones. When such a wrong number enters into the database, it can't be found easily.

The obvious way should be to replace all similar looking chars with Latin ones before processing, but I can't find a simple way of doing that.

winni:
Hi!

The UTF8 chars are grouped.

You should get the value of the UTF8char and look if it is member of the latin groups or the cyrilic groups.

The values:

Teh basic groups are

The Latin Groups U+0000 .. U+2CF7
The Cyrillic Groups U+0400 .. U+052F

There are some more.

Have a look at https://www.utf8-chartable.de/unicode-utf8-table.pl

Winni

Thaddy:
It would be best if the database field would be defined as UTF8. Then it will always store correctly. This also how most database string fields are defined, nowadays.

SymbolicFrank:

--- Quote from: Thaddy on January 15, 2022, 12:35:44 pm ---It would be best if the database field would be defined as UTF8. Then it will always store correctly. This also how most database string fields are defined, nowadays.

--- End quote ---
Well, yes, but only if you type the same char when searching for the value. The storage isn't the problem.

alpine:

--- Quote from: Thaddy on January 15, 2022, 12:35:44 pm ---It would be best if the database field would be defined as UTF8. Then it will always store correctly. This also how most database string fields are defined, nowadays.

--- End quote ---
It is defined as such.

The trouble is at another place - when the users enters data, usually the keyboard is switched to Cyrillic and because of similarity of the letters it is not obvious what kind of letters are there. The next time the keyboard can be switched to Latin - and the same plate number can look exactly the same, but with Latin letters.

FYI: АВЕКМНОРСТУХ are shared between Cyrillic/Latin, but they of course have different codes 

UTF8 field can't resolve that issue, the chars must be replaced with a help of translation table:
 АВЕКМНОРСТУХ -> ABEKMHOPCTYX

Navigation

[0] Message Index

[#] Next page

Go to full version