* * *

Author Topic: Function Format compatible with UTF8  (Read 1114 times)

esvignolo

  • Full Member
  • ***
  • Posts: 126
  • Using FPC in Windows, Linux, Macos
Function Format compatible with UTF8
« on: March 06, 2017, 06:44:32 pm »
Hello, I was trying to use the format function, but with the utf8 strings I have problems, as I could see, it has to do with the length of utf8 strings.
Anyone know if there is a replacement of this function for utf.


Code: Pascal  [Select]
  1.    sText:='Impresión Color';
  2.    sText:=Format('%-40s',[sText]); //the result have 39 characters
  3.    sText:='Impresion Color';
  4.    sText:=Format('%-40s',[sText]); //the result have 40 characters
  5.  

Thanks!

Ñuño_Martínez

  • Hero Member
  • *****
  • Posts: 628
    • Burdjia
Re: Function Format compatible with UTF8
« Reply #1 on: March 06, 2017, 08:06:42 pm »
It should be the other way:  the string with the acute has one more CHAR than the oter one because such UTF-8 character is 16 bit wide.

I've used Format with UTF-8 strings and never had that problem.
_________________________________________________

Debería ser al revés:  la cadena con el acento tiene un CHAR más que la que no lo tiene ya que el carácter UTF-8 correspondiente es de 16 bits.

Yo he usado Format con cadenas UTF-8 a menudo y no he visto nunca este problema.
« Last Edit: March 06, 2017, 08:09:05 pm by Ñuño_Martínez »

esvignolo

  • Full Member
  • ***
  • Posts: 126
  • Using FPC in Windows, Linux, Macos
Re: Function Format compatible with UTF8
« Reply #2 on: March 06, 2017, 09:39:01 pm »
Hi Ñuño, the problem maybe is the length:

http://forum.lazarus.freepascal.org/index.php/topic,36097.0.html

The format work, but when u use a accent word, in string lose one char.

Ñuño_Martínez

  • Hero Member
  • *****
  • Posts: 628
    • Burdjia
Re: Function Format compatible with UTF8
« Reply #3 on: March 07, 2017, 11:41:01 am »
If you read the thread you linked, it demonstrates my point.

Length counts bytes, no characters!  You should use the appropriate character counter function to do that (as UTF8Length for UTF-8 strings).

And as I've said, Format works properly with UTF-8 strings so what's the problem?

esvignolo

  • Full Member
  • ***
  • Posts: 126
  • Using FPC in Windows, Linux, Macos
Re: Function Format compatible with UTF8
« Reply #4 on: March 07, 2017, 02:25:35 pm »
Hi!, thanks for answer. The length problem make the generation of spaces to complete the length has a fail.

I make a app sample and a screenshot.

Code: Pascal  [Select]
  1. unit Unit1;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. interface
  6.  
  7. uses
  8.   Classes, SysUtils, FileUtil, Forms, Controls, Graphics, Dialogs, StdCtrls;
  9.  
  10. type
  11.  
  12.   { TForm1 }
  13.  
  14.   TForm1 = class(TForm)
  15.     Button1: TButton;
  16.     Edit1: TEdit;
  17.     Memo1: TMemo;
  18.     procedure Button1Click(Sender: TObject);
  19.   private
  20.  
  21.   public
  22.  
  23.   end;
  24.  
  25. var
  26.   Form1: TForm1;
  27.  
  28. implementation
  29.  
  30. {$R *.lfm}
  31.  
  32. { TForm1 }
  33.  
  34. procedure TForm1.Button1Click(Sender: TObject);
  35. var sText:string;
  36. begin
  37.   sText:=Format('%-40s<-length=40',[Edit1.Text]);
  38.   Memo1.Lines.Add(sText);
  39.  
  40. end;
  41.  
  42. end.
  43.  
  44.  

wp

  • Hero Member
  • *****
  • Posts: 3667
Re: Function Format compatible with UTF8
« Reply #5 on: March 07, 2017, 03:01:34 pm »
Use UTF8Length() to count the characters, not the bytes, of a utf8 string. It is in unit LazUTF8 where you find also other useful procedures and functions (e.g. UTF8PadRight(txt, count) - to add spaces to the right of the string txt such that the total character length is count).

If you absolutely want to use the Format function you cannot use a fixed width parameter in the format string, instead use a '*', calculate the required width parameter to compensate for the difference in byte and character count and pass it as an additional parameter to the parameter list - not very nice...
Code: Pascal  [Select]
  1. sText := Format('%-*s<length=40', [40 - UTF8Length(Edit1.Text) + Length(Edit1.Text), Edit1.Text1]);
« Last Edit: March 07, 2017, 03:03:52 pm by wp »
Lazarus trunk / fpc 3.0.0 / Win32

esvignolo

  • Full Member
  • ***
  • Posts: 126
  • Using FPC in Windows, Linux, Macos
Re: Function Format compatible with UTF8
« Reply #6 on: March 07, 2017, 04:08:38 pm »
Thanks wp!, i use a similar solution, using UTF8PadRight, but I like how you solved with the length parameter in format function. It is a shame that this function does not work with UTF because it is quite flexible, since it formats several types of data and supports alignment.

 

Recent

Get Lazarus at SourceForge.net. Fast, secure and Free Open Source software downloads Open Hub project report for Lazarus