Recent

Author Topic: Character handling help  (Read 2970 times)

ertank

  • Sr. Member
  • ****
  • Posts: 274
Character handling help
« on: February 17, 2019, 11:37:09 pm »
Hello,

I am using Lazarus 2.0.1 on Raspberry Pi directly. Application is for a vending machine and LCD display cannot handle national letters. I wanted to convert them to ASCII letters as below:
Code: [Select]
function TfrmVendingMain.GetLatinLetters(const Value: AnsiString): string;
var
  I: Integer;
begin
  SetLength(Result, Value.Length);

  for I := 1 to Result.Length do
  begin
    case Value[I] of
      'Ü', 'ü': Result[I] := 'U';  // below error is for that line
      'Ğ', 'ğ': Result[I] := 'G';
      'İ', 'ı': Result[I] := 'I';
      'Ş', 'ş': Result[I] := 'S';
      'Ç', 'ç': Result[I] := 'C';
      'Ö', 'ö': Result[I] := 'O';
      else Result[I] := UpperCase(Value[I]);
    end;
  end;
end;

Compiler says:
Code: [Select]
uvendingmain.pas(453,11) Error: Constant and CASE types do not match
uvendingmain.pas(453,11) Error: Ordinal expression expected

On the other hand, using something like below works:
Code: [Select]
  Result := StringReplace(Value, 'Ü', 'U', [rfReplaceAll]);
  Result := StringReplace(Result, 'Ğ', 'G', [rfReplaceAll]);
  Result := StringReplace(Result, 'İ', 'I', [rfReplaceAll]);
  Result := StringReplace(Result, 'Ş', 'S', [rfReplaceAll]);
  Result := StringReplace(Result, 'Ö', 'O', [rfReplaceAll]);
  Result := StringReplace(Result, 'Ç', 'C', [rfReplaceAll]);
  Result := StringReplace(Result, 'ü', 'u', [rfReplaceAll]);
  Result := StringReplace(Result, 'ğ', 'g', [rfReplaceAll]);
  Result := StringReplace(Result, 'ı', 'i', [rfReplaceAll]);
  Result := StringReplace(Result, 'ş', 's', [rfReplaceAll]);
  Result := StringReplace(Result, 'ö', 'o', [rfReplaceAll]);
  Result := StringReplace(Result, 'ç', 'c', [rfReplaceAll]);

I am not willing to use that later code example. Is there a way I can use my initial implementation?

Thanks & regards,
Ertan

Bart

  • Hero Member
  • *****
  • Posts: 5275
    • Bart en Mariska's Webstek
Re: Character handling help
« Reply #1 on: February 17, 2019, 11:52:43 pm »
Value[ I ] will be a Char (1-byte length), which is what the compiler expects, OTOH 'Ü' in Lazarus default is UTF8-encode and will be a 2 or 3 bytes long string.

Bart

lucamar

  • Hero Member
  • *****
  • Posts: 4219
Re: Character handling help
« Reply #2 on: February 18, 2019, 12:08:11 am »
Value[ I ] is a char but your accented letters are UTF-8 strings. Use rather something like this (untested!):
Code: Pascal  [Select][+][-]
  1. function TfrmVendingMain.GetLatinLetters(const Value: AnsiString): string;
  2. var
  3.   I: Integer;
  4.   sv: UTF8String;
  5. begin
  6.   SetLength(Result, Value.Length);
  7.  
  8.   I := 1
  9.   while i < UTF8Length(Value) do begin
  10.     sv := UTF8Copy(Value, I, 1);
  11.     case sv of
  12.       'Ü', 'ü': begin
  13.           { You should probably move this out to its own function }
  14.           UTF8Delete(Value, I, 1);
  15.           UTF8Insert('U', Value, I);
  16.         end;
  17.       'Ğ', 'ğ':
  18.         {etc., the same for the rest}
  19.       else Result[I] := UpperCase(Value[I]);
  20.     end;
  21.     Inc(I);
  22.   end;
  23. end;

Alternatively, add a {$codepage cp1252} (or whatever your system's code page is) and convert your source so that your constants can be interpreted as chars, but note that this won't work if your input string is in UTF-8!
« Last Edit: February 18, 2019, 12:12:04 am by lucamar »
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus/FPC 2.0.8/3.0.4 & 2.0.12/3.2.0 - 32/64 bits on:
(K|L|X)Ubuntu 12..18, Windows XP, 7, 10 and various DOSes.

jamie

  • Hero Member
  • *****
  • Posts: 6090
Re: Character handling help
« Reply #3 on: February 18, 2019, 12:34:37 am »
a Graphical LCD instead of a ASCII type should be used in this case. This way the fonts can be drawn out
to the display, also special logos etc. could be used on top of that.
The only true wisdom is knowing you know nothing

ertank

  • Sr. Member
  • ****
  • Posts: 274
Re: Character handling help
« Reply #4 on: February 18, 2019, 12:49:40 am »
a Graphical LCD instead of a ASCII type should be used in this case. This way the fonts can be drawn out
to the display, also special logos etc. could be used on top of that.
Thanks. Unfortunately that is not up to me and device is already selected.

ertank

  • Sr. Member
  • ****
  • Posts: 274
Re: Character handling help
« Reply #5 on: February 18, 2019, 01:22:38 am »
Value[ I ] is a char but your accented letters are UTF-8 strings. Use rather something like this (untested!):

Alternatively, add a {$codepage cp1252} (or whatever your system's code page is) and convert your source so that your constants can be interpreted as chars, but note that this won't work if your input string is in UTF-8!

I have tried below. Did not work:
Code: [Select]
function TfrmVendingMain.GetLatinLetters(const Value: string): string;
var
  I: Integer;
  sv: UTF8String;
begin
  SetLength(Result, Value.Length);

  for I := 1 to Result.Length do
  begin
    sv := UTF8Copy(Value, I, 1);
    case sv of
      'Ü', 'ü': Result[I] := 'U';
      'Ğ', 'ğ': Result[I] := 'G';
      'İ', 'ı': Result[I] := 'I';
      'Ş', 'ş': Result[I] := 'S';
      'Ç', 'ç': Result[I] := 'C';
      'Ö', 'ö': Result[I] := 'O';
      else Result[I] := Value[I];
    end;
  end;
  Result := Result.ToUpper();
end;

Incoming data is received from a web service and inserted directly in sqlite3 database and is apparently UTF-8.

I have tried some other codes. None of them seems to work. I will try some more and if I fail, I will likely use unwanted StringReplace() as in my initial post.

lucamar

  • Hero Member
  • *****
  • Posts: 4219
Re: Character handling help
« Reply #6 on: February 18, 2019, 02:10:32 am »
When you do p.e. Result[ I ] := 'U' you're just replacing a (probably unrelated) byte of your string. Use the full code I gave you.

You can't treat UTF-8 strings as you would single-byte characters strings.
« Last Edit: February 18, 2019, 02:13:33 am by lucamar »
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus/FPC 2.0.8/3.0.4 & 2.0.12/3.2.0 - 32/64 bits on:
(K|L|X)Ubuntu 12..18, Windows XP, 7, 10 and various DOSes.

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Character handling help
« Reply #7 on: February 18, 2019, 02:42:07 am »
Try this:
Code: Pascal  [Select][+][-]
  1. program Project1;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. uses
  6.   character, unicodedata;
  7.  
  8. function RemoveMarks(AString:string): string;
  9. var
  10.   uIn,uOut: UnicodeString;
  11.   wc: WideChar;
  12. begin
  13.   { Convert to UTF16 }
  14.   uIn := UnicodeString(AString);
  15.   uIn := NormalizeNFD(uIn);
  16.  
  17.   { Remove marks }
  18.   for wc in uIn do
  19.   begin
  20.     if not (TCharacter.GetUnicodeCategory(wc) in
  21.       [TUnicodeCategory.ucCombiningMark, TUnicodeCategory.ucNonSpacingMark]) then
  22.       uOut := uOut + wc;
  23.   end;
  24.  
  25.   { Convert back }
  26.   Result := String(uOut);
  27. end;
  28.  
  29. var
  30.   s: string;
  31. begin
  32.   SetMultiByteConversionCodePage(65001);
  33.   s := 'ÜĞğ';
  34.   s := RemoveMarks(s);
  35.   WriteLn(s);
  36.   ReadLn;
  37. end.

While not perfect, it should work for your text.

Edit:
Corrected the uses section typo, and removed unused units.
« Last Edit: February 18, 2019, 02:55:24 am by engkin »

ertank

  • Sr. Member
  • ****
  • Posts: 274
Re: Character handling help
« Reply #8 on: February 18, 2019, 10:19:05 am »
Try this:

While not perfect, it should work for your text.

Works OK. There is one minor thing that is small i letter without dot at top is not working. So, following input
Code: [Select]
ÜĞİŞÇÖüğişçöı
outputs that
Code: [Select]
OGISCOugiscoı

Last letter should be converted to "i" letter and it stays as it is with current code. I did not quite understand the code. So I am not able to trying any fixes myself.

Other than this it is working nicely.

Thanks.

Bart

  • Hero Member
  • *****
  • Posts: 5275
    • Bart en Mariska's Webstek
Re: Character handling help
« Reply #9 on: February 18, 2019, 01:28:02 pm »
In maskedit unit there are 2 functions: GetCodePoint() and SetCodePoint(), which will let you retrieve and set individual codepoints.
Alternatively you can use the methods from LazUnicode unit, which allows "for .. in" loop with codepoints.
And even the use of StringReplace is OK.

Bart

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Character handling help
« Reply #10 on: February 18, 2019, 04:52:29 pm »
Code: Pascal  [Select][+][-]
  1.   s := 'ÜĞİŞÇÖüğişçöı';
  2.   s := RemoveMarks(s);
  3.  
  4.   { Correct dotless i }
  5.   s := StringReplace(s, 'ı', 'i',[rfReplaceAll]);

 

TinyPortal © 2005-2018