Recent

Author Topic: How ANSI character map should behave on multibyte code pages  (Read 41499 times)

Sam Bliss

  • New Member
  • *
  • Posts: 23
  • Brilliant
I am using Lazarus 0.9.27 r20911 FPC 2.2.5 i386-win32-win32/win64
The language of my computer is Chinese.
When I use ANSI charactor map and put the mouse on #128..#255,
the dialog shows.
Quote
                  Lazarus
============================================
Access voilation.

Press OK to ignore and risk data corruption.
Press Cancel to kill the program.
============================================
                 OK Cancel
Get screen shots and more information see here.
« Last Edit: July 30, 2009, 11:00:36 am by Sam Bliss »
The stars are brilliant tonight!

skalogryz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2770
    • havefunsoft.com
Re: How ANSI character map should behave on multibyte code pages
« Reply #1 on: July 30, 2009, 10:31:47 am »
as Theo, already noted.
May be, it's enough to fix the Lazarus, so no access violation is risen.
And the half of ANSI table (128..255) should be left blank.

Because there're no 128..255 characters in windows Chinese code page.

theo

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1927
Re: How ANSI character map should behave on multibyte code pages
« Reply #2 on: July 30, 2009, 11:04:40 am »
Yes, simply apply Patch #1 against IDE crash.
Ansi tab is sort of obsolete for UTF-8 systems anyway.
Lookup your code points in the unicode tab, in your case "Latin-1 Supplement"

Sam Bliss

  • New Member
  • *
  • Posts: 23
  • Brilliant
Re: How ANSI character map should behave on multibyte code pages
« Reply #3 on: July 30, 2009, 11:07:56 am »
Re: skalogryyz
Yes, I made a patch,
but it is not perfect.
See http://bugs.freepascal.org/view.php?id=14189.
« Last Edit: August 04, 2009, 09:57:22 am by Sam Bliss »
The stars are brilliant tonight!

Sam Bliss

  • New Member
  • *
  • Posts: 23
  • Brilliant
Re: How ANSI character map should behave on multibyte code pages
« Reply #4 on: July 30, 2009, 11:13:17 am »
Re: theo
Yes, but sometimes we need to search a character in ANSI tab,
it's fast and easy.
The stars are brilliant tonight!

Sam Bliss

  • New Member
  • *
  • Posts: 23
  • Brilliant
Re: How ANSI character map should behave on multibyte code pages
« Reply #5 on: July 30, 2009, 11:19:03 am »
Re: theo
You replied that issue,
have you seen note 0029399 in that issue?
It means the #5 patch is broken.
The stars are brilliant tonight!

theo

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1927
Re: How ANSI character map should behave on multibyte code pages
« Reply #6 on: July 30, 2009, 11:26:58 am »
It means the #5 patch is broken.

I meant: apply charactermapdlg.pas.diff.fixcrash.zip and leave the rest.

Sam Bliss

  • New Member
  • *
  • Posts: 23
  • Brilliant
Re: How ANSI character map should behave on multibyte code pages
« Reply #7 on: July 31, 2009, 11:15:57 am »
Re: theo
Quote from: charactermapdlg.pas.diff.fixcrash.zip by theo
Code: Pascal  [Select][+][-]
  1. CharOrd := Ord(UTF8ToAnsi(StringGrid1.Cells[Col, Row])[1]);
Character #128..#255 need 2 bytes in UTF-8.
We need to convert UTF-8 to Unicode, then convert to ANSI,
or Col*16+Row-17.
Please test it by yourself, is it OK?
« Last Edit: July 31, 2009, 11:19:05 am by Sam Bliss »
The stars are brilliant tonight!

theo

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1927
Re: How ANSI character map should behave on multibyte code pages
« Reply #8 on: July 31, 2009, 11:59:07 am »
Character #128..#255 need 2 bytes in UTF-8.

All I can say is, that the ANSI Tab works as expected on my Windows box (ANSI, German). It shows "?" on Linux for chars > 127 however. That's why we have the unicode tab.
I don't have the right to apply patches to the LCL/IDE anyway. So let some core-developer (Paul Ishenin?) decide what to do.
« Last Edit: July 31, 2009, 12:02:45 pm by theo »

skalogryz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2770
    • havefunsoft.com
Re: How ANSI character map should behave on multibyte code pages
« Reply #9 on: July 31, 2009, 12:20:23 pm »
All I can say is, that the ANSI Tab works as expected on my Windows box (ANSI, German). It shows "?" on Linux for chars > 127 however. That's why we have the unicode tab.
I don't have the right to apply patches to the LCL/IDE anyway. So let some core-developer (Paul Ishenin?) decide what to do.

what's German Windows code page? shouldn't it be ANSI, so special German characters are shown?

theo

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1927
Re: How ANSI character map should behave on multibyte code pages
« Reply #10 on: July 31, 2009, 12:25:34 pm »
shouldn't it be ANSI, so special German characters are shown?

It does on windows, not on Linux (UTF-8) of course.

Sam Bliss

  • New Member
  • *
  • Posts: 23
  • Brilliant
Re: How ANSI character map should behave on multibyte code pages
« Reply #11 on: August 01, 2009, 02:36:31 pm »
Germany systems use these charsets:
Quote
CP437 (United States)
CP1141 (IBM EBCDIC - Germany(20273 + Euro))
CP1250 (Eastern European)
CP20273 (IBM EBCDIC Germany)
CP20106 (IA5 Germany)
CP20127 (US-ASCII)
CP28592 (ISO 8859-2 - Eastern European)
CP65000 (UTF-7)*
CP65001 (UTF-8)*
...
« Last Edit: August 01, 2009, 03:17:47 pm by Sam Bliss »
The stars are brilliant tonight!

Vincent Snijders

  • Administrator
  • Hero Member
  • *
  • Posts: 2661
    • My Lazarus wiki user page
Re: How ANSI character map should behave on multibyte code pages
« Reply #12 on: August 02, 2009, 01:11:11 pm »
IMHO, the ansi character map is only valid for single byte code pages, or rather for characters that can be represented with one byte. I don't know mulitbyte code pages well enough, to say wat exactly should happen, but if say #130 is a valid (complete) character for your multibyte code page, it should be shown, if it is not a valid character, but just a partial character, a ? (or maybe empty) should be shown.

skalogryz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2770
    • havefunsoft.com
Re: How ANSI character map should behave on multibyte code pages
« Reply #13 on: August 03, 2009, 06:21:09 pm »
Sam Bliss, the patch is applied.
you should to use Unicode, to paste #128..#255 characters.
Half of the ANSI table will be empty.

Sam Bliss

  • New Member
  • *
  • Posts: 23
  • Brilliant
Re: How ANSI character map should behave on multibyte code pages
« Reply #14 on: August 04, 2009, 09:23:10 am »
I hope the ANSI charactor map will not be deleted in the future.
Maybe Unicode charactor map is more important,
but we also need ANSI charactor map.
Swap Unicode charactor map to the first, and ANSI to the second one is better.
The stars are brilliant tonight!

 

TinyPortal © 2005-2018