Recent

Author Topic: TEdit maxlength for Multibyte character  (Read 496 times)

meeeeeeeeee

  • New Member
  • *
  • Posts: 10
TEdit maxlength for Multibyte character
« on: August 06, 2024, 08:54:40 am »
Hello. I'm using Lazarus v.2.2.6.
I paste TEdit on a Form and set TEdit maxlength to 6, for example.
In Delphi when I type Multibyte character on TEdit, I can do like this
ok:123456
ok:123

In Lazarus when I do the same things, it is different from Delphi. Half-width character and Multibyte character is the same number.
[123456] and [123456] is counted as 6 on TEdit of Lazarus.
I want to limit Multibyte character letters like Delphi in Lazarus TEdit.
I tried to do like this...

procedure TForm1.Edit6Change(Sender: TObject);
var
   xMaxLen:integer;
   wdr1:widestring;  st1:string;
 i, xNowPosition:integer;
begin

 xMaxLen := Edit6.MaxLength;
 if xMaxLen <= 0 then
   Exit;
 if Length(UTF8towinCp(Edit6.Text)) <= xMaxLen then begin
  Exit;
 end;

 wdr1:=Edit6.Text;
 xNowPosition :=0;
 for i:=1 to Length(wdr1) do begin
   st1 :=wdr1;
   xNowPosition :=xNowPosition+Length(UTF8towinCp(st1));
   if xNowPosition>xMaxLen then begin
       xNowPosition:=xNowPosition-Length(UTF8towinCp(st1));
       break;
   end;
 end;
 Edit6.Text :=winCpToUTF8(Copy(UTF8towinCp(Edit6.Text), 1, xNowPosition));
 Edit6.SelLength := 0;
 Edit6.SelStart  :=xMaxLen;
end;

But It doesn't work well. When I type Multibyte letters from the begining, letters are overwrite.
How can I modify maxlength of TEdit of Lazarus like Delphi?
Best regards.

wp

  • Hero Member
  • *****
  • Posts: 12293
Re: TEdit maxlength for Multibyte character
« Reply #1 on: August 06, 2024, 09:17:34 am »
I don't have Laz 2.2.6 available at the moment. But in Laz 3.4 the Edit is working correctly: When I drop a TEdit on a form, set its MaxLength to 6, run and type "1 2 3 4 5 6 " (with spaces) the string is truncated before the "4". And when I type a string with UTF8 characters "äöüß1234" it is truncated after the "2" - as expected.

Normally you do not have to care about "widestrings" in Lazarus. Simply use "string". Only in a few cases (Length, Copy, Pos, so so) you must use the UTF8 versions of these functions from unit LazUTF8.

Bart

  • Hero Member
  • *****
  • Posts: 5349
    • Bart en Mariska's Webstek
Re: TEdit maxlength for Multibyte character
« Reply #2 on: August 06, 2024, 02:52:39 pm »
Characters outside of the BMP (Basic Multilingual Plane) might be a problem though (they are counted more than 1 when using SelStart/SelLength, which is a Windows "feature". You actually can select half of such a charater.)

Bart

meeeeeeeeee

  • New Member
  • *
  • Posts: 10
Re: TEdit maxlength for Multibyte character
« Reply #3 on: August 07, 2024, 09:26:26 am »
I also have Laz 3.2, and tried the same things, but  the result was same.
I take this as Lazarus feature.
Thank you for replying me Bart and wp

Bart

  • Hero Member
  • *****
  • Posts: 5349
    • Bart en Mariska's Webstek
Re: TEdit maxlength for Multibyte character
« Reply #4 on: August 07, 2024, 01:25:53 pm »
@meeeeeeeeee: please use [code][/code] tags to properly format you code.

I'm not sure what you are trying to accomplish in your code.
Basically, if MaxLength <> 0, then the code should do nothing at all.

Also, why all the conversions (Utf8 <-> WinCP)?
If you want to know the "length in visual characters" (i.e. ä counts as 1 visual character), simply use Utf8Length().

Quote
[123456] and [123456] is counted as 6 on TEdit of Lazarus.

I'm not sure what you mean by that.
I see [123456] as a followed by a space, followed by 2, followed by a space etc., yet you talk about these as if they were multi-byte characters.

When I try with MaxLength = 6, I can type e.g. '日本語日本語' into the control, and after that nothing.
Utf8Length=6
Length=18  // 3 bytes per character
Length(UTF8towinCp(Text)=6
And Utf8ToWinCP(Edit6.Text)=??????  //the characters of the string in question cannot be translated to my Windows codepage, so they will become the '?' character.

Also, you access all individual bytes of Edit6.Text and perfom a Utf8ToWinCP() on them.
This is in itself totally non-sensical: the individual bytes of an UTF-8 encoded string have no meaning in itself as a "character".
And then you take the Length() of this???
Notice that Utf8ToWinCP with a single-byte character as input, will always return a 1-byte chacater:
  • either Utf8ToWinCP will return the same character (if that charater is in the lower ASCII range), and therefor Length will be 1, or
  • Utf8ToWinCP will return '?' as the input character is not valid UTF8, and again Length will be 1


Bart

 

TinyPortal © 2005-2018