Recent

Author Topic: How ANSI character map should behave on multibyte code pages  (Read 36505 times)

Sam Bliss

  • New Member
  • *
  • Posts: 23
  • Brilliant
Re: How ANSI character map should behave on multibyte code pages
« Reply #30 on: August 04, 2009, 10:54:46 am »
cp936 (GB18030 Chinese Simplified)
The stars are brilliant tonight!

Sam Bliss

  • New Member
  • *
  • Posts: 23
  • Brilliant
Re: How ANSI character map should behave on multibyte code pages
« Reply #31 on: August 04, 2009, 10:56:36 am »
We must calculate these charactors from UCS-2 and convert to UTF-8,
Convert ANSI to UTF-8 is not a good way (because the result is different if the codepage is different).
The stars are brilliant tonight!

theo

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1890
Re: How ANSI character map should behave on multibyte code pages
« Reply #32 on: August 04, 2009, 10:57:17 am »
cp936 (GB18030 Chinese Simplified)

Looks correct: http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP936.TXT

There's nothing usable from 0x81 to 0xff

Vincent Snijders

  • Administrator
  • Hero Member
  • *
  • Posts: 2661
    • My Lazarus wiki user page
Re: How ANSI character map should behave on multibyte code pages
« Reply #33 on: August 04, 2009, 11:00:58 am »
We must calculate these charactors from UCS-2 and convert to UTF-8,
That is what the unicode page does.
Quote
Convert ANSI to UTF-8 is not a good way (because the result is different if the codepage is different).
That is what the ansi tab does, which may not be what you need. In that case don't use it.

Sam Bliss

  • New Member
  • *
  • Posts: 23
  • Brilliant
Re: How ANSI character map should behave on multibyte code pages
« Reply #34 on: August 07, 2009, 11:02:20 am »
And why the authors of Lazarus didn't add #0..#31
Because we don't need them?
The stars are brilliant tonight!

Vincent Snijders

  • Administrator
  • Hero Member
  • *
  • Posts: 2661
    • My Lazarus wiki user page
Re: How ANSI character map should behave on multibyte code pages
« Reply #35 on: August 07, 2009, 11:11:29 am »
I don't know, but I can think about some reasons:
* they are usually hard to show. (how do you show #7)?
* they have special meaning, for example #10

theo

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1890
Re: How ANSI character map should behave on multibyte code pages
« Reply #36 on: August 07, 2009, 11:13:39 am »
And why the authors of Lazarus didn't add #0..#31
Because we don't need them?

These are control characters. Not printable:
http://en.wikipedia.org/wiki/ASCII#ASCII_control_characters

Sam Bliss

  • New Member
  • *
  • Posts: 23
  • Brilliant
Re: How ANSI character map should behave on multibyte code pages
« Reply #37 on: August 13, 2009, 11:12:55 am »
I downloaded the newest version Lazarus 0.9.27 r21212 FPC 2.3.1 i386-win32-win32/win64
The character map is also broken.
The stars are brilliant tonight!

Vincent Snijders

  • Administrator
  • Hero Member
  • *
  • Posts: 2661
    • My Lazarus wiki user page
Re: How ANSI character map should behave on multibyte code pages
« Reply #38 on: August 13, 2009, 11:21:29 am »
In what way is it broken? Please give a screenshot and tell us what you expect. Then we can confirm it or explain why your expectations are wrong.

Sam Bliss

  • New Member
  • *
  • Posts: 23
  • Brilliant
Re: How ANSI character map should behave on multibyte code pages
« Reply #39 on: August 14, 2009, 11:01:31 am »
Screen shot here.
The stars are brilliant tonight!

Vincent Snijders

  • Administrator
  • Hero Member
  • *
  • Posts: 2661
    • My Lazarus wiki user page
Re: How ANSI character map should behave on multibyte code pages
« Reply #40 on: August 14, 2009, 11:07:12 am »
And what do you expect? #147 is not a valid ansichar in codepage 936, so it is correct.

Sam Bliss

  • New Member
  • *
  • Posts: 23
  • Brilliant
Re: How ANSI character map should behave on multibyte code pages
« Reply #41 on: August 20, 2009, 10:20:17 am »
But we should shows "Latin-1 Supplement" characters in #127..#255.
The stars are brilliant tonight!

Vincent Snijders

  • Administrator
  • Hero Member
  • *
  • Posts: 2661
    • My Lazarus wiki user page
Re: How ANSI character map should behave on multibyte code pages
« Reply #42 on: August 20, 2009, 10:32:34 am »
Why should it show Latin-1 supplement charaters in your code page on the ansi tab.

Latin-1 supplement characters are on the unicode code page.

Sam Bliss

  • New Member
  • *
  • Posts: 23
  • Brilliant
Re: How ANSI character map should behave on multibyte code pages
« Reply #43 on: August 21, 2009, 03:55:37 am »
Oh, yes.
And I suggest you add a search bar on the character map dialog.
Because there are so many characters in Unicode Tab.
We should find them quickly, right?
Type the character, its Unicode or UTF-8 and its discribtion (Better support i18n).
The stars are brilliant tonight!