Recent

Author Topic: TUnicodeCategory  (Read 4007 times)

columbo

  • Newbie
  • Posts: 5
TUnicodeCategory
« on: February 21, 2017, 04:15:08 pm »
It's my first post and therefore I want to thank everybody who was ever part of the FPC & LAZARUS community.
It's really already one of the best RAD- and Development-Tools I'am aware of.  :)

But here is my question:

I was a little playing around with UTF8 and wanted to find out more about the different TUnicodeCategory types.
However, I have got some strange results.

E.g. for '+' and '=' I am receiving 'Math Symbol' what I have expected but for '*' and '/' I get 'Other Punctuation.
I would have expected them to be in the category 'Math Symbol' as well.

Is there somewhere a good documentation and explanation why those UTF8 codes are categorized in that way?   

Here is my code:

Cat:=GetUnicodeCategory(UTF8ToUTF16(UTF8Key),1);
  case Cat of
       TUnicodeCategory.ucMathSymbol:Ans:='Math Symbol';
       TUnicodeCategory.ucUppercaseLetter:Ans:='Uppercase';
       TUnicodeCategory.ucLowercaseLetter:Ans:='Lowercase';
       TUnicodeCategory.ucOtherLetter:Ans:='Letter';
       TUnicodeCategory.ucDecimalNumber:Ans:='Decimal Number';
       TUnicodeCategory.ucOtherNumber:Ans:='Other Number';
       TUnicodeCategory.ucCurrencySymbol:Ans:='Currency';
       TUnicodeCategory.ucOpenPunctuation:Ans:='Open Punctuation';
       TUnicodeCategory.ucClosePunctuation:Ans:='Close Punctuation';
       TUnicodeCategory.ucOtherPunctuation:Ans:='Other Punctuation';
       TUnicodeCategory.ucOtherSymbol:Ans:='Other Symbol';
  else
       Ans:='Other Category (character.pas > TUnicodeCategory) ';
  end;


marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11382
  • FPC developer.
Re: TUnicodeCategory
« Reply #1 on: February 21, 2017, 07:25:57 pm »
Well, slash and star are not just math symbols.  Division is either : or a horizontal bar between the arguments, and multiplication an x like symbol or a floating dot.

* and / are just abused for math in computers because they were available on early keyboards. And they were there because they also have other functions/applications (get it?, that is the application for slash :-)

columbo

  • Newbie
  • Posts: 5
Re: TUnicodeCategory
« Reply #2 on: February 21, 2017, 07:44:33 pm »
Thank you for the answer.
I have no problem with that.  :)
However, it would be good to know how the codes are categorised.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11382
  • FPC developer.
Re: TUnicodeCategory
« Reply #3 on: February 21, 2017, 07:58:45 pm »
Thank you for the answer.
I have no problem with that.  :)
However, it would be good to know how the codes are categorised.

Afaik these categories are specified in files from the unicode Consortium that a FPC member imported. IOW, the category system comes from the Unicode Consortium. Maybe you'll find more on their site.

columbo

  • Newbie
  • Posts: 5
Re: TUnicodeCategory
« Reply #4 on: February 21, 2017, 09:06:31 pm »
Meanwhile, I have found a good documentation here:

http://www.fileformat.info/info/unicode/category/Sm/list.htm

columbo

  • Newbie
  • Posts: 5
Re: TUnicodeCategory
« Reply #5 on: February 22, 2017, 09:11:17 am »
...and here is a full list of all defined Unicodes in a csv-format with a lot of additional Information what might be useful for others as well.

http://www.unicode.org/Public/9.0.0/ucd/UnicodeData.txt

Other documents, e.g. about the Unicode Properties are there available as well.

Thaddy

  • Hero Member
  • *****
  • Posts: 14197
  • Probably until I exterminate Putin.
Re: TUnicodeCategory
« Reply #6 on: February 22, 2017, 09:23:12 am »
* and / are just abused for math in computers because they were available on early keyboards. And they were there because they also have other functions/applications (get it?, that is the application for slash :-)
And in sets * means?  O:-) Oh, well, intersection. To complicate matters further, which I do on purpose of course... 8-)

Note that AFAIK the FPC parts are computer generated from the official files. Like Marco says.
Specialize a type, not a var.

columbo

  • Newbie
  • Posts: 5
Re: TUnicodeCategory
« Reply #7 on: February 22, 2017, 10:49:04 am »
Yes, at the first sight it might look a bit confusing but in the end it makes sense.
There are many identical characters with different codepoints and also with different categories to cover all possible meanings.

Just good to know.  :D

 

TinyPortal © 2005-2018