Recent

Author Topic: Unicode Support  (Read 26145 times)

Astral

  • New Member
  • *
  • Posts: 49
Re: Unicode Support
« Reply #15 on: June 23, 2009, 11:51:06 pm »
Whoops:

I'm comparing the first WideChar of a WideString with another WideChar in a table, and it's just hung.  The values of the array indices appear to be correct.  I'm not accessing outside the bounds of the WideString or the WideChar in the record.

Sometimes when it appears to be hung on one line, I can set a breakpoint beyond the line and click Run and it will proceed to the breakpoint.

I did that several times in a row, and then that stopped working and it just froze, with no way to proceed, the Run, Step Into, and Step Over buttons were grayed out.  But variables could still be examined by hovering the cursor over them.

I also discovered that if I keep hitting Step Over 23 times it will eventually step down to the next line.  Apparently it has to do with some sort of optimization, since the inner loop is supposed to be executed for J in the range 0 .. 22.  Normally when you single step on a source line it executes the line and then goes to the next source line.

Curious and curioser.

I will keep single stepping to see if it eventually gets out of the two nested loops.

Some programmers don't believe that "la la land" really exists.

This is very odd behavior and I can't see any obvious reason for it.  I should check the compiler options.


Astral

  • New Member
  • *
  • Posts: 49
Re: Unicode Support
« Reply #16 on: June 23, 2009, 11:52:47 pm »

Thanks, Theo!

Astral

  • New Member
  • *
  • Posts: 49
Re: Unicode Support
« Reply #17 on: June 24, 2009, 12:01:41 am »

Of course it's "my problem", but the code in question has
been running for years on various versions of Delphi, including Delphi 3, Delphi 7, Delphi 2007, and Delphi 2009, so my
inclination is to question what is different between the Delphi
and Lazarus environments.  If the code works fine in one place and not in the other, then perhaps there is something different that is causing the problem?  It's very simple, straightforward code.


theo

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1890
Re: Unicode Support
« Reply #18 on: June 24, 2009, 12:02:47 am »
It's very simple, straightforward code.

Then boil the code down to a minimum to show the problem and post it here.
« Last Edit: June 24, 2009, 12:04:59 am by theo »

Astral

  • New Member
  • *
  • Posts: 49
Re: Unicode Support
« Reply #19 on: June 24, 2009, 01:01:13 am »
Code: [Select]

type
  CodeRec =
  record
    Normal  : WideChar;   // the normal unaccented form of a character
    Lower   : WideChar;   // the Unicode for the lower case version with diacritic
    Upper   : WideChar;   // the Unicode for the upper case version with diacritic
  end;

const
  NumRomanceCodes  = 23;

  RomanceCodes : array[0..NumRomanceCodes-1] of CodeRec = (
    (Normal : 'a'; Lower : #$00E0; Upper : #$00C0),
    (Normal : 'a'; Lower : #$00E1; Upper : #$00C1),
    (Normal : 'a'; Lower : #$00E2; Upper : #$00C2),
    (Normal : 'a'; Lower : #$00E4; Upper : #$00C4),
    (Normal : 'a'; Lower : #$00E5; Upper : #$00C5),
    (Normal : 'c'; Lower : #$00E7; Upper : #$00C7),
    (Normal : 'e'; Lower : #$00E8; Upper : #$00C8),
    (Normal : 'e'; Lower : #$00E9; Upper : #$00C9),
    (Normal : 'e'; Lower : #$00EA; Upper : #$00CA),
    (Normal : 'e'; Lower : #$00EB; Upper : #$00CB),
    (Normal : 'i'; Lower : #$00EC; Upper : #$00CC),
    (Normal : 'i'; Lower : #$00ED; Upper : #$00CD),
    (Normal : 'i'; Lower : #$00EE; Upper : #$00CE),
    (Normal : 'i'; Lower : #$00EF; Upper : #$00CF),
    (Normal : 'n'; Lower : #$00F1; Upper : #$00D1),
    (Normal : 'o'; Lower : #$00F2; Upper : #$00D2),
    (Normal : 'o'; Lower : #$00F3; Upper : #$00D3),
    (Normal : 'o'; Lower : #$00F4; Upper : #$00D4),
    (Normal : 'o'; Lower : #$00F6; Upper : #$00D6),
    (Normal : 'u'; Lower : #$00F9; Upper : #$00D9),
    (Normal : 'u'; Lower : #$00FA; Upper : #$00DA),
    (Normal : 'u'; Lower : #$00FB; Upper : #$00DB),
    (Normal : 'u'; Lower : #$00FC; Upper : #$00DC)
  );

var
  I, J, Jlo, Jhi : Integer
  Recognized : Boolean;


              for I := 1 to Length(InpWord) do begin
                TestWord := iUnicode.ToLower(InpWord);
                Recognized := False;
                Jlo := 0;                 // Low(RomanceCodes);
                Jhi := NumRomanceCodes-1; // High(RomanceCodes);
                for J := Jlo to Jhi do begin
                  if TestWord[I] = RomanceCodes[J].Normal then begin
                    TestWord[I] := RomanceCodes[J].Lower;
                    iLook := LookUp( ForwardDictFile, TestWord );
                    if iLook <> nil then begin
                      Recognized := True;
                      break;
                    end
                    else begin
                      TestWord[I] := RomanceCodes[J].Normal;
                    end;
                  end;
                end;
                if Recognized then break;
              end;


This is the table of characters and the associated
code.

It looks very straightforward to me, but maybe there's
something I am missing.

Astral

  • New Member
  • *
  • Posts: 49
Re: Unicode Support
« Reply #20 on: June 24, 2009, 01:13:58 am »

Voilà!

I changed the optimization level to O3 and did a Build All.


The program now runs to completion without any hangups.

I didn't change a single line of code, just changed the
optimization level.

I was surprised.

But it indicates that there is some kind of problem.

Changing from O1 to O3 without changing any code should not fix the problem, but it did.

I will try it again with O2 and see what happens.

 O:-)


Astral

  • New Member
  • *
  • Posts: 49
Re: Unicode Support
« Reply #21 on: June 24, 2009, 01:31:46 am »

This is a red letter day in history!

The universal translator is online using Lazarus/FPC.   :D


Astral

  • New Member
  • *
  • Posts: 49
Re: Unicode Support
« Reply #22 on: June 24, 2009, 02:35:25 am »

I will try it again with O2 and see what happens.

It works fine with O2 as well.

It just won't run correctly at O1 level.

Vincent Snijders

  • Administrator
  • Hero Member
  • *
  • Posts: 2661
    • My Lazarus wiki user page
Re: Unicode Support
« Reply #23 on: June 24, 2009, 09:14:50 am »
Can you extract that code in a small program, that can be compiled and run. Then we can test if this bug is still present in the development version of fpc. If so, a bug report should be made.

Astral

  • New Member
  • *
  • Posts: 49
Re: Unicode Support
« Reply #24 on: June 24, 2009, 10:34:53 am »

I can try to do it, but I cannot guarantee that it  will fail in the same way in isolation.  I may have to give you the entire unit, but hopefully not.

I will try to set up a simple testcase which reproduces the problem or report back if that is not possible.

Thanks!

Astral

  • New Member
  • *
  • Posts: 49
Re: Unicode Support
« Reply #25 on: June 27, 2009, 08:01:30 am »
No luck so far trying to reproduce problem.  It seems to have vanished.
I'm sure if it's a real problem it will return to haunt me!

Thanks for all the help!  My program is running pretty well now on Lazarus/FPC.

I'm using the Character unit in UTF8Tools.  One thing I noticed about it is that it has ToLower and ToUpper, but no IsUpper or IsLower functions.  Anyone else notice that?

theo

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1890
Re: Unicode Support
« Reply #26 on: June 27, 2009, 11:00:10 am »
I'm using the Character unit in UTF8Tools.  One thing I noticed about it is that it has ToLower and ToUpper, but no IsUpper or IsLower functions.  Anyone else notice that?

I don't know if anybody else is using it.
I wrote it and forgot to add functions for these.
But the functionality is already there, like:

if TCharacter.GetUnicodeCategory('A') in ucUpper then ...
if TCharacter.GetUnicodeCategory('A') in ucLower then ...
if TCharacter.GetUnicodeCategory('A') in ucTitle then ...

I'll add the IsUpper/IsLower functions later.

EDIT: Fixed: http://www.theo.ch/lazarus/utf8tools.zip
« Last Edit: June 27, 2009, 11:56:42 am by theo »

Astral

  • New Member
  • *
  • Posts: 49
Re: Unicode Support
« Reply #27 on: June 27, 2009, 01:00:56 pm »
Thanks for the quick fix.

I will give it a quick try.