### Bookstore

 Computer Math and Games in Pascal (preview) Lazarus Handbook

### Author Topic: How to determine Unicode character type (letter, punctuation, symbol, etc.)  (Read 239 times)

#### Manlio

• Full Member
• Posts: 145
• Pascal dev
##### How to determine Unicode character type (letter, punctuation, symbol, etc.)
« on: May 27, 2022, 10:10:40 am »
I need to parse Unicode strings with text in different languages, and I need to know, for every character (code point) that I parse, whether it is a letter, a digit, a symbol, or punctuation.

I tried to look into FPC unicode-related units but I didn't find anything.

Can anyone kindly point me into the right direction?

Thank you!
manlio mazzon gmail

#### AlexTP

• Hero Member
• Posts: 1809
##### Re: How to determine Unicode character type (letter, punctuation, symbol, etc.)
« Reply #1 on: May 27, 2022, 10:49:18 am »
This gets category of a widechar.
One of UGC_xxx.

Code: Pascal  [Select][+][-]
1. uses
2.   Classes, SysUtils,
3.   fpwidestring,
4.   StrUtils,
5.   unicodedata;
6.
7. function IsUnicodeWordChar(AChar: WideChar): boolean;
8. var
9.   NType: byte;
10. begin
11.   if AChar='_' then
12.     Exit(true);
13.
14.   if Ord(AChar) >= LOW_SURROGATE_BEGIN then
15.     Exit(False);
16.
17.   NType := GetProps(Ord(AChar))^.Category;
18.   Result := (NType <= UGC_OtherNumber);
19. end;
20.
21. function GetCateg(c: word): byte;
22. begin
23.   Result:= GetProps(c)^.Category;
24. end;
25.

#### paweld

• Sr. Member
• Posts: 390
##### Re: How to determine Unicode character type (letter, punctuation, symbol, etc.)
« Reply #2 on: May 27, 2022, 10:57:35 am »
Code: Pascal  [Select][+][-]
1. uses
2.   LazUTF8, Character;
3. procedure TForm1.Button1Click(Sender: TObject);
4. var
5.   s, r: String;
6.   i: Integer;
7. begin
8.   s := 'Test 12,3 gęŚlą jaŹń 〇⌀→Ⓣ■•';
9.   for i := 1 to UTF8Length(s) do
10.   begin
11.     r := '';
12.     if IsLetter(s, i) then
13.     begin
14.       if IsLower(s, i) then
15.         r := ' > Lower letter'
16.       else
17.         r := ' > Upper letter'
18.     end
19.     else if IsNumber(s, i) then
20.       r := ' > Number'
21.     else if IsPunctuation(s, i) then
22.       r := ' > Punctation'
23.     else if IsSeparator(s, i) then
24.       r := ' > Separator'
25.     else if IsSymbol(s, i) then
26.      r := ' > Symbol'
27.     else
28.       r := ' > ???';
29.     Memo1.Lines.Add(UTF8Copy(s, i, 1) + r);
30.   end;
31. end;
Best regards
paweld
---
Lazarus trunk / FPC stable

#### Manlio

• Full Member
• Posts: 145
• Pascal dev
##### Re: How to determine Unicode character type (letter, punctuation, symbol, etc.)
« Reply #3 on: May 27, 2022, 12:13:43 pm »
Thank you both for the great working code!

For everyone else interested, the "Character" unit is where the magic happens.
manlio mazzon gmail