Recent

Author Topic: Function Format compatible with UTF8  (Read 4027 times)

esvignolo

  • Full Member
  • ***
  • Posts: 159
  • Using FPC in Windows, Linux, Macos
Function Format compatible with UTF8
« on: March 06, 2017, 06:44:32 pm »
Hello, I was trying to use the format function, but with the utf8 strings I have problems, as I could see, it has to do with the length of utf8 strings.
Anyone know if there is a replacement of this function for utf.


Code: Pascal  [Select][+][-]
  1.    sText:='Impresión Color';
  2.    sText:=Format('%-40s',[sText]); //the result have 39 characters
  3.    sText:='Impresion Color';
  4.    sText:=Format('%-40s',[sText]); //the result have 40 characters
  5.  

Thanks!

Ñuño_Martínez

  • Hero Member
  • *****
  • Posts: 1186
    • Burdjia
Re: Function Format compatible with UTF8
« Reply #1 on: March 06, 2017, 08:06:42 pm »
It should be the other way:  the string with the acute has one more CHAR than the oter one because such UTF-8 character is 16 bit wide.

I've used Format with UTF-8 strings and never had that problem.
_________________________________________________

Debería ser al revés:  la cadena con el acento tiene un CHAR más que la que no lo tiene ya que el carácter UTF-8 correspondiente es de 16 bits.

Yo he usado Format con cadenas UTF-8 a menudo y no he visto nunca este problema.
« Last Edit: March 06, 2017, 08:09:05 pm by Ñuño_Martínez »
Are you interested in game programming? Join the Pascal Game Development community!
Also visit the Game Development Portal

esvignolo

  • Full Member
  • ***
  • Posts: 159
  • Using FPC in Windows, Linux, Macos
Re: Function Format compatible with UTF8
« Reply #2 on: March 06, 2017, 09:39:01 pm »
Hi Ñuño, the problem maybe is the length:

http://forum.lazarus.freepascal.org/index.php/topic,36097.0.html

The format work, but when u use a accent word, in string lose one char.

Ñuño_Martínez

  • Hero Member
  • *****
  • Posts: 1186
    • Burdjia
Re: Function Format compatible with UTF8
« Reply #3 on: March 07, 2017, 11:41:01 am »
If you read the thread you linked, it demonstrates my point.

Length counts bytes, no characters!  You should use the appropriate character counter function to do that (as UTF8Length for UTF-8 strings).

And as I've said, Format works properly with UTF-8 strings so what's the problem?
Are you interested in game programming? Join the Pascal Game Development community!
Also visit the Game Development Portal

esvignolo

  • Full Member
  • ***
  • Posts: 159
  • Using FPC in Windows, Linux, Macos
Re: Function Format compatible with UTF8
« Reply #4 on: March 07, 2017, 02:25:35 pm »
Hi!, thanks for answer. The length problem make the generation of spaces to complete the length has a fail.

I make a app sample and a screenshot.

Code: Pascal  [Select][+][-]
  1. unit Unit1;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. interface
  6.  
  7. uses
  8.   Classes, SysUtils, FileUtil, Forms, Controls, Graphics, Dialogs, StdCtrls;
  9.  
  10. type
  11.  
  12.   { TForm1 }
  13.  
  14.   TForm1 = class(TForm)
  15.     Button1: TButton;
  16.     Edit1: TEdit;
  17.     Memo1: TMemo;
  18.     procedure Button1Click(Sender: TObject);
  19.   private
  20.  
  21.   public
  22.  
  23.   end;
  24.  
  25. var
  26.   Form1: TForm1;
  27.  
  28. implementation
  29.  
  30. {$R *.lfm}
  31.  
  32. { TForm1 }
  33.  
  34. procedure TForm1.Button1Click(Sender: TObject);
  35. var sText:string;
  36. begin
  37.   sText:=Format('%-40s<-length=40',[Edit1.Text]);
  38.   Memo1.Lines.Add(sText);
  39.  
  40. end;
  41.  
  42. end.
  43.  
  44.  

wp

  • Hero Member
  • *****
  • Posts: 11916
Re: Function Format compatible with UTF8
« Reply #5 on: March 07, 2017, 03:01:34 pm »
Use UTF8Length() to count the characters, not the bytes, of a utf8 string. It is in unit LazUTF8 where you find also other useful procedures and functions (e.g. UTF8PadRight(txt, count) - to add spaces to the right of the string txt such that the total character length is count).

If you absolutely want to use the Format function you cannot use a fixed width parameter in the format string, instead use a '*', calculate the required width parameter to compensate for the difference in byte and character count and pass it as an additional parameter to the parameter list - not very nice...
Code: Pascal  [Select][+][-]
  1. sText := Format('%-*s<length=40', [40 - UTF8Length(Edit1.Text) + Length(Edit1.Text), Edit1.Text1]);
« Last Edit: March 07, 2017, 03:03:52 pm by wp »

esvignolo

  • Full Member
  • ***
  • Posts: 159
  • Using FPC in Windows, Linux, Macos
Re: Function Format compatible with UTF8
« Reply #6 on: March 07, 2017, 04:08:38 pm »
Thanks wp!, i use a similar solution, using UTF8PadRight, but I like how you solved with the length parameter in format function. It is a shame that this function does not work with UTF because it is quite flexible, since it formats several types of data and supports alignment.

 

TinyPortal © 2005-2018