Recent

Author Topic: Byte array to UTF8 string  (Read 13262 times)

gabriemarcelo

  • New Member
  • *
  • Posts: 28
Byte array to UTF8 string
« on: July 12, 2012, 08:47:07 pm »
hi, I'm new with freepascal-lazarus and have this problem. I have a byte array which contain the bytes from a UTF8 string (these bytes were generated by a .NET app). I'm trying to convert such bytes to a UTF8 string in a lazarus console-app doing the following:

var
    myutf8string : UTF8String;
   
begin

    myutf8string := UTF8String(arrUtf8);
    writeln(myutf8string);
       
end;

NOTE: arrUtf8 is a dynamic byte array which in some part of the code has these bytes (dec notation): 65, 66, 67, 68, 69 i.e. ABCDE
However, myutf8string var is being created with just ABCD i.e. it is missing the last char (always as I tested it with other strings/bytes samples. What am I doing wrong? thanks in advance

KpjComp

  • Hero Member
  • *****
  • Posts: 680
Re: Byte array to UTF8 string
« Reply #1 on: July 12, 2012, 09:57:52 pm »
I would use the unit LazUTF8.

And then use the function UTF8CStringToUTF8String.

gabriemarcelo

  • New Member
  • *
  • Posts: 28
Re: Byte array to UTF8 string
« Reply #2 on: July 12, 2012, 10:13:41 pm »
thanks for the reply. I'm using Lazarus 0.9.30.4 and cannot find LazUTF8. Any hints?

KpjComp

  • Hero Member
  • *****
  • Posts: 680
Re: Byte array to UTF8 string
« Reply #3 on: July 12, 2012, 10:46:36 pm »
I believe more recent versions of Lazarus have better support for UTF8, so is upgrading an option.
If not, then maybe something like this ->  (not tested)..

Code: [Select]
var
  myutf8string : UTF8String;   
begin
  setlength(myuf8string, SourceSize);
  move(arrUTF8[0], myutf8string[1],SourceSize);
  writeln(myutf8string);
end;

gabriemarcelo

  • New Member
  • *
  • Posts: 28
Re: Byte array to UTF8 string
« Reply #4 on: July 12, 2012, 11:30:31 pm »
thanks! that works but... will I get a UTF8 string by using that code? Do you know why UTF8String() "eats" last byte?

KpjComp

  • Hero Member
  • *****
  • Posts: 680
Re: Byte array to UTF8 string
« Reply #5 on: July 12, 2012, 11:41:04 pm »
Quote
will I get a UTF8 string by using that code

I can't see why not, UTF8 strings are just a series of bytes.  In fact in newer versions of Lazarus and the LCL you don't even need to use a UTF8String type, a normal string will work fine.

Quote
Do you know why UTF8String() "eats" last byte?

Not sure, UTF8String() is a cast, and maybe FPC doesn't know how to handle such casts.

I think a lot of confusion with Lazarus and UTF8 is because Lazarus has it's own UTF8 handling and FPC has it's own too.  A bit of a mixed bag at the moment. :)

gabriemarcelo

  • New Member
  • *
  • Posts: 28
Re: Byte array to UTF8 string
« Reply #6 on: July 12, 2012, 11:50:54 pm »
Thanks! You mentioned "newer version" of Lazarus... per lazarus download page, it seems I have the latest one, right? I mean, not the latest "snapshot" but the latest "stable" version. Am I missing something? thanks,

KpjComp

  • Hero Member
  • *****
  • Posts: 680
Re: Byte array to UTF8 string
« Reply #7 on: July 13, 2012, 12:00:44 am »
Quote
Am I missing something? thanks,

Yes, I think there has been a few people mention about the release cycle.

It looks like that was released back in March, 4 months old now.  I think the preferred way to get Lazarus/FPC is SVN, but if like me you find all that stuff too much hassle you can use CodeTyphon.  There is also FPCUP that works pretty well too.

 

TinyPortal © 2005-2018