Recent

Author Topic: [SOLVED] convert all char(Unicode) to integer(or HEX) and inverse??  (Read 24730 times)

Thaddy

  • Hero Member
  • *****
  • Posts: 14164
  • Probably until I exterminate Putin.
Re: convert char to integer and inverse??
« Reply #30 on: August 13, 2017, 10:01:01 am »
@Handoko
@majid.ebru

Much too complex:
Reduce to three edits, 2 buttons, reduce code to:
Code: Pascal  [Select][+][-]
  1. procedure TForm1.Button1Click(Sender: TObject);
  2. var
  3.   T: Word =$0643;//just a default. I also connected this to TForm1.OnShow...
  4.   C: unicodechar absolute T;
  5. begin
  6.   C := UnicodeString(Edit1.Text)[1];
  7.   Edit2.Text := IntToHex(T,4);
  8. end;
  9.  
  10. procedure TForm1.Button2Click(Sender: TObject);
  11. begin
  12.    Edit3.Text:= UnicodeChar( StrToInt('$'+Edit2.Text));
  13. end;
  14. { tested all of them
  15. 0    ۰    U+06F0    ٠    U+0660
  16. 1    ۱    U+06F1    ١    U+0661
  17. 2    ۲    U+06F2    ٢    U+0662
  18. 3    ۳    U+06F3    ٣    U+0663
  19. 4    ۴    U+06F4    ٤    U+0664
  20. 5    ۵    U+06F5    ٥    U+0665
  21. 6    ۶    U+06F6    ٦    U+0666
  22. 7    ۷    U+06F7    ٧    U+0667
  23. 8    ۸    U+06F8    ٨    U+0668
  24. 9    ۹    U+06F9    ٩    U+0669
  25. ye    ی    U+06CC    ي    U+064A
  26. kāf    ک    U+06A9    ك    U+0643
  27. }

No silly extra units that bloat code needed.
« Last Edit: August 13, 2017, 10:15:03 am by Thaddy »
Specialize a type, not a var.

majid.ebru

  • Sr. Member
  • ****
  • Posts: 494
Re: convert char to integer and inverse??
« Reply #31 on: August 13, 2017, 10:21:05 am »
i just  say : oh my GOD  :o :o :o :o :o

thankyou very much @Thaddy

Thaddy

  • Hero Member
  • *****
  • Posts: 14164
  • Probably until I exterminate Putin.
Re: [SOLVED] convert all char(Unicode) to integer(or HEX) and inverse??
« Reply #32 on: August 13, 2017, 05:13:20 pm »
Glad to be of help.
As I showed, by using the correct UTF16 string/char type (unicodestring/unicodechar), everything becomes much simpler.
Just typecasts.
UTF16--> UTF8 (for the Lazarus controls) is handled transparently in that case. This is likely to improve even more in the future.
And UTF8 is simply a dog to handle on its own as the eloquent - and working - code from Handoko merely demonstrates.

UTF8 is like shooting yourself in the foot on purpose. :D
I hope you now understand what my first question to you meant? Because there is a huge difference between all the unicode types and that is often very confusing.

Rule of thumb: in case of doubt, start with UTF16 (UnicodeString) even in Lazarus (defaults to utf8 Ansi hybrid) because the conversion from UnicodeString to UTF8 is much simpler than calling all kinds of utility functions and mappings. There are rare cases where this is still necessary, though.

Note for Lazarus developers: I was really impressed by the fact that right to left languages (as per my $0643) are handled so well! compliments!
Note for FPC developers: tnx for such a great typecasting system!
« Last Edit: August 13, 2017, 05:42:52 pm by Thaddy »
Specialize a type, not a var.

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: [SOLVED] convert all char(Unicode) to integer(or HEX) and inverse??
« Reply #33 on: August 14, 2017, 05:12:52 am »
UTF8 is like shooting yourself in the foot on purpose. :D

UTF16 is like shooting yourself in the foot by mistake.  :P

If you try any of the following emojis or maybe some Arabic Mathematical Alphabetic Symbols.

Thaddy

  • Hero Member
  • *****
  • Posts: 14164
  • Probably until I exterminate Putin.
Re: [SOLVED] convert all char(Unicode) to integer(or HEX) and inverse??
« Reply #34 on: August 14, 2017, 07:31:15 am »
UTF16 is like shooting yourself in the foot by mistake.  :P
Indeed. There are rare cases..I mentioned that.
Specialize a type, not a var.

majid.ebru

  • Sr. Member
  • ****
  • Posts: 494
Re: [SOLVED] convert all char(Unicode) to integer(or HEX) and inverse??
« Reply #35 on: October 22, 2017, 07:46:08 am »
Hi

so sorry ihave a problem agian ?!?!

i want to spllte char of string but i can not findout what length of char is 1 or 2 ?

when i use :
Code: Pascal  [Select][+][-]
  1. ShowMessage(IntToStr(Length("E")));   // Length = 1
  2.  

but when i use :
Code: Pascal  [Select][+][-]
  1. ShowMessage(IntToStr(Length("س")));   // Length  = 2
  2.  

how can i findout whitch type of char(ASCII or UTF or ...) inputed in editbox?

or

how can i findout what Length of char is 1 or  2?

thank you

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4458
  • I like bugs.
Re: [SOLVED] convert all char(Unicode) to integer(or HEX) and inverse??
« Reply #36 on: October 22, 2017, 11:04:49 am »
Code: Pascal  [Select][+][-]
  1. ShowMessage(IntToStr(Length("E")));   // Length = 1
  2. ShowMessage(IntToStr(Length("س")));   // Length  = 2
I don't think that code compiles.

Quote
how can i findout whitch type of char(ASCII or UTF or ...) inputed in editbox?
In LCL's EditBoxes it is always UTF-8.

Quote
how can i findout what Length of char is 1 or  2?
or 3 or 4? ... Short answer: UTF8Length().
For more info see:
 http://wiki.freepascal.org/UTF8_strings_and_characters
The same examples could be written in an encoding agnostic way using the unit LazUnicode in LazUtils. It also defines iterators for both codepoints and Unicode "characters".

Some people promote treating UTF-16 as a fixed width encoding which is just plain wrong. Currently already ~ half of codepoints are outside BMP and the number grows when Unicode is extended.
Usually programmers want to squash out even small bugs from their code. Ignoring half of Unicode codepoints however is a major bug.
Fortunately variable width codepoints are easy to get right, regardless of encoding.
« Last Edit: October 22, 2017, 11:16:47 am by JuhaManninen »
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: [SOLVED] convert all char(Unicode) to integer(or HEX) and inverse??
« Reply #37 on: October 22, 2017, 06:46:53 pm »
Some people promote treating UTF-16 as a fixed width encoding which is just plain wrong.

😆

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4458
  • I like bugs.
Re: [SOLVED] convert all char(Unicode) to integer(or HEX) and inverse??
« Reply #38 on: October 22, 2017, 07:15:03 pm »
Emoji outside BMP
How did you manage to enter the emoji on this forum?
If I do it, or just quote your post without changing anything, I get on a pink background:

The following error or errors occurred while posting this message:
The message body was left empty.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: [SOLVED] convert all char(Unicode) to integer(or HEX) and inverse??
« Reply #39 on: October 22, 2017, 07:37:04 pm »
😆
How did you manage to enter the emoji on this forum?
If I do it, or just quote your post without changing anything, I get on a pink background:

The following error or errors occurred while posting this message:
The message body was left empty.


😄
Use the "Quick Reply" at the bottom of the page.
« Last Edit: October 22, 2017, 07:41:14 pm by engkin »

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4458
  • I like bugs.
Re: [SOLVED] convert all char(Unicode) to integer(or HEX) and inverse??
« Reply #40 on: October 22, 2017, 11:13:58 pm »
Use the "Quick Reply" at the bottom of the page.
I don't have "Quick Reply" anywhere.
Did you just copy the emoji and it worked? Could my OS (Linux) affect? I don't think so, it is the server that gives the error.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: [SOLVED] convert all char(Unicode) to integer(or HEX) and inverse??
« Reply #41 on: October 23, 2017, 05:16:50 am »
Use the "Quick Reply" at the bottom of the page.
I don't have "Quick Reply" anywhere.
Did you just copy the emoji and it worked? Could my OS (Linux) affect? I don't think so, it is the server that gives the error.

Check the attached image for "Quick Reply" location.

Yes I just copied the emoji and it worked. I noticed that code points outside BMP work in the Quick Reply only. Not sure why.

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4458
  • I like bugs.
Re: [SOLVED] convert all char(Unicode) to integer(or HEX) and inverse??
« Reply #42 on: October 23, 2017, 11:10:09 am »
Check the attached image for "Quick Reply" location.
Interesting! See my screenshot.
Earlier I could not see attachments from others. Martin fixed it by giving me more rights. Maybe he can fix this one, too.
@Martin, ping...

Quote
Yes I just copied the emoji and it worked. I noticed that code points outside BMP work in the Quick Reply only. Not sure why.
Uhhh! Anyway this is a warning example of what can happen when programmers treat UTF-16 as a fixed-width encoding. This SMF system is widely used and is supposed to support Unicode. Unfortunately it does not, except maybe with the "Quick Reply" button.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

majid.ebru

  • Sr. Member
  • ****
  • Posts: 494
Re: [SOLVED] convert all char(Unicode) to integer(or HEX) and inverse??
« Reply #43 on: October 23, 2017, 02:21:03 pm »
Hi

please STOP!


I don't think that code compiles.

sorry .you say true.this is true
Code: Pascal  [Select][+][-]
  1. ShowMessage(IntToStr(Length('س')));
  2.  

or 3 or 4? ... Short answer: UTF8Length().

i use your code and work correct but i can't find out type of text in edit box !?!?

how can i use "codepoints "? can you help or guide me ?

thank-you



 

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4458
  • I like bugs.
Re: [SOLVED] convert all char(Unicode) to integer(or HEX) and inverse??
« Reply #44 on: October 23, 2017, 04:02:15 pm »
please STOP!
Stop what?

Quote
i use your code and work correct but i can't find out type of text in edit box !?!?
Type of text? If you mean its encoding, it is UTF-8.

Quote
how can i use "codepoints "? can you help or guide me ?
Codepoints are an essential part of Unicode. You must understand Unicode somehow before you can use it. There is plenty of information in internet. For starters:
 https://en.wikipedia.org/wiki/Unicode

Shortly: In Unicode the U+xxxx number is an "ID" for a codepoint.
Codepoints are then encoded using e.g. UTF-8, UTF-16 or UTF-32.
The true complexity of Unicode goes beyond codepoints. One "character" can be composed of multiple codepoints using complex (sometimes locale specific) rules.

I think the minimal requirement for any code to support Unicode is to get at least codepoints right always. Not all code out there passes that rule.
« Last Edit: October 23, 2017, 04:21:30 pm by JuhaManninen »
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

 

TinyPortal © 2005-2018