Recent

Author Topic: JSONStringToString do not decode emoji  (Read 7368 times)

Renat.Su

  • Full Member
  • ***
  • Posts: 232
    • Renat.Su
JSONStringToString do not decode emoji
« on: November 10, 2017, 03:42:18 pm »
I'm trying to decode JSON data, for example "\ud83c\udf1f\u0410\u043b\u043b" should decoded as "🌟алле"...
I use JSONStringToString from fpjson module.
But unfortunately, emoji are not decoded correctly... as a result: "��лле.."
How do advise? If it's a bug, I should send a bug report or if is there an easy way to solve the problem?
P.S.
There are code from JSONStringToString
Quote
function JSONStringToString(const S: TJSONStringType): TJSONStringType;

Var
  I,J,L : Integer;
  P : PJSONCharType;
  w : String;

begin
  I:=1;
  J:=1;
  L:=Length(S);
  Result:='';
  P:=PJSONCharType(S);
  While (I<=L) do
    begin
    if (P^='\') then
      begin
      Result:=Result+Copy(S,J,I-J);
      Inc(P);
      If (P^<>#0) then
        begin
        Inc(I);
        Case AnsiChar(P^) of
          '\','"','/'
              : Result:=Result+P^;
          'b' : Result:=Result+#8;
          't' : Result:=Result+#9;
          'n' : Result:=Result+#10;
          'f' : Result:=Result+#12;
          'r' : Result:=Result+#13;
          'u' : begin
                W:=Copy(S,I+1,4);
                Inc(I,4);
                Inc(P,4);
                Result:=Result+WideChar(StrToInt('$'+W)); // The problem, apparently there???
                end;
        end;
        end;
      J:=I+1;
      end;
    Inc(I);
    Inc(P);
    end;
  Result:=Result+Copy(S,J,I-J+1);
end;
« Last Edit: November 10, 2017, 03:45:18 pm by Renat.Su »

esvignolo

  • Full Member
  • ***
  • Posts: 159
  • Using FPC in Windows, Linux, Macos

Renat.Su

  • Full Member
  • ***
  • Posts: 232
    • Renat.Su
Re: JSONStringToString do not decode emoji
« Reply #2 on: November 11, 2017, 07:34:24 am »
Thank you! The topic made me thinking, but nothing is given. This is similar but actually quite different. By the url is also a problem with JSON strings. But there is:
  • the problem is that does not execute JSON encoding
  • it appeared only in the trunk version.
In "my" case.
  • I use the stable version of FPC.
  • the JSONEncoding works fine, even with emoji (procedure StringToJSONString from fpJSON)..
  • Improper handling of Emoji procedures JSONStringToString preserved from version 3.0.2 to trunk - I watched - the procedure has not changed.

Renat.Su

  • Full Member
  • ***
  • Posts: 232
    • Renat.Su
Re: JSONStringToString do not decode emoji
« Reply #3 on: November 11, 2017, 07:39:11 am »
My attempts to explore the function of JSONStringToString led to unexpected results. Here is an example code
Code: Pascal  [Select][+][-]
  1. procedure TForm1.Button3Click(Sender: TObject);
  2. var
  3.   S, S1: TJSONStringType;
  4. begin
  5.   S:='';
  6.   S+=WideChar(StrToInt('$d83c'));
  7.   S+=WideChar(StrToInt('$df1f'));
  8.   S+=WideChar(StrToInt('$0410'));
  9.   S1:='';
  10.   S1+=WideChar(StrToInt('$d83c'))+WideChar(StrToInt('$df1f'))+WideChar(StrToInt('$0410'));
  11.   EdtOutput.Text:='S: '+S+', S1: '+S1;
  12. end;
And output:
S: ��А, S1: 🌟А
« Last Edit: November 11, 2017, 07:41:18 am by Renat.Su »

Renat.Su

  • Full Member
  • ***
  • Posts: 232
    • Renat.Su
Re: JSONStringToString do not decode emoji
« Reply #4 on: November 11, 2017, 07:44:32 am »
As you can see concantenating of WideChar in single line of the sample code and consistent concantenating string by string under other equal conditions, give completely different results!!!
 %)  :o

Renat.Su

  • Full Member
  • ***
  • Posts: 232
    • Renat.Su
Re: JSONStringToString do not decode emoji
« Reply #5 on: November 11, 2017, 09:57:26 am »
Perhaps this is a problem not only for this function, but for fpc as a whole.
I found a [temporary] solution. Maybe this will help you solve the problem: if you specify UnicodeString instead of JSONString as the result type of the function, then everything works fine. I.e
Code: Pascal  [Select][+][-]
  1. -- function JSONStringToString(const S: TJSONStringType): TJSONStringType;
  2. ++ function JSONStringToString(const S: TJSONStringType): UnicodeString;
You can copy this uprgaded function in your module link as Your-Module.JSONStringToString(YourTextWithEmoji);
« Last Edit: November 11, 2017, 09:59:26 am by Renat.Su »

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: JSONStringToString do not decode emoji
« Reply #6 on: November 11, 2017, 01:48:36 pm »
FYI, 🌟 code point is
UTF32: $1f31f
UTF16: $d83c $df1f
UTF8: ....

Notice that in UTF16 it takes two words. Based on that
Code: Pascal  [Select][+][-]
  1.   S+=WideChar(StrToInt('$d83c'));
is not correct. It provides half of a code point and would give � because TJSONStringType = UTF8String and the compiler is going to convert an invalid WideChar to UTF8.
The same is true about:
Code: Pascal  [Select][+][-]
  1.   S+=WideChar(StrToInt('$df1f'));

That is why this code:
Code: Pascal  [Select][+][-]
  1. S1+=WideChar(StrToInt('$d83c'))+WideChar(StrToInt('$df1f'))
works. The compiler is emitting a conversion for the whole code point WideChar(StrToInt('$d83c'))+WideChar(StrToInt('$df1f'))

Renat.Su

  • Full Member
  • ***
  • Posts: 232
    • Renat.Su
Re: JSONStringToString do not decode emoji
« Reply #7 on: November 11, 2017, 02:02:00 pm »
FYI, 🌟 code point is
UTF32: $1f31f
UTF16: $d83c $df1f
UTF8: ....

Notice that in UTF16 it takes two words. Based on that
Code: Pascal  [Select][+][-]
  1.   S+=WideChar(StrToInt('$d83c'));
Thank you! But in FPC lib fcl-json use this code
Code: Pascal  [Select][+][-]
  1.                 Result:=Result+WideChar(StrToInt('$'+W));
where Result is TJSONStringType=UTF8String... As a result, in the case of emoji decoding is not happening. Had to use my own version of the function JSONStringToString

Thaddy

  • Hero Member
  • *****
  • Posts: 14393
  • Sensorship about opinions does not belong here.
Re: JSONStringToString do not decode emoji
« Reply #8 on: November 11, 2017, 04:33:11 pm »
Quote
where Result is TJSONStringType=UTF8String... As a result, in the case of emoji decoding is not happening. Had to use my own version of the function JSONStringToString
Wrong: FPC has no UTF8 string as default string type, Lazarus has. FPC knows only Ansi and UTF16 (Unicodestring) as default string types.., aliases for string...
Object Pascal programmers should get rid of their "component fetish" especially with the non-visuals.

Renat.Su

  • Full Member
  • ***
  • Posts: 232
    • Renat.Su
Re: JSONStringToString do not decode emoji
« Reply #9 on: November 11, 2017, 04:38:17 pm »
Not the point, I just cite part of the code from /fpc/fpc-json/fpJSON in lazarus folder
« Last Edit: November 11, 2017, 04:42:42 pm by Renat.Su »

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: JSONStringToString do not decode emoji
« Reply #10 on: November 11, 2017, 04:40:38 pm »
My attempts to explore the function of JSONStringToString led to unexpected results. Here is an example code
Code: Pascal  [Select][+][-]
  1. procedure TForm1.Button3Click(Sender: TObject);
  2. var
  3.   S, S1: TJSONStringType;
  4. begin
  5.   S:='';
  6.   S+=WideChar(StrToInt('$d83c'));
  7.   S+=WideChar(StrToInt('$df1f'));
  8.   S+=WideChar(StrToInt('$0410'));
  9.   S1:='';
  10.   S1+=WideChar(StrToInt('$d83c'))+WideChar(StrToInt('$df1f'))+WideChar(StrToInt('$0410'));
  11.   EdtOutput.Text:='S: '+S+', S1: '+S1;
  12. end;
And output:
S: ��А, S1: 🌟А

I just tested your code using Win/Laz1.8.0RC3/FPC3.0.2. I do not get the same results as yours. What OS/Laz/Fpc did you test?

Renat.Su

  • Full Member
  • ***
  • Posts: 232
    • Renat.Su
Re: JSONStringToString do not decode emoji
« Reply #11 on: November 11, 2017, 04:45:09 pm »
Hmm...   :-\
Lazarus 1.6.4 r54278 FPC 3.0.2 i386-win32-win32/win64
But compiled executable in other computer (Ubuntu/Debian 64) has same problem
« Last Edit: November 11, 2017, 05:06:58 pm by Renat.Su »

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: JSONStringToString do not decode emoji
« Reply #12 on: November 11, 2017, 04:52:11 pm »
 :(
Here is the emitted code:
Code: ASM  [Select][+][-]
  1. # [138] S:='';
  2.         movl    $0,%edx
  3.         leal    -12(%ebp),%eax
  4.         call    fpc_ansistr_assign
  5. # [139] S+=WideChar(StrToInt('$d83c'));
  6.         leal    -64(%ebp),%eax
  7.         call    fpc_widestr_decr_ref
  8.         movl    $_$UNIT1$_Ld10,%eax
  9.         call    SYSUTILS_$$_STRTOINT$ANSISTRING$$LONGINT
  10.         leal    -68(%ebp),%edx
  11.         call    fpc_uchar_to_widestr
  12.         movl    -68(%ebp),%ebx
  13.         leal    -72(%ebp),%edx
  14.         movl    -12(%ebp),%eax
  15.         call    fpc_ansistr_to_widestr
  16.         movl    -72(%ebp),%edx
  17.         leal    -64(%ebp),%eax
  18.         movl    %ebx,%ecx
  19.         call    fpc_widestr_concat
  20.         movl    -64(%ebp),%eax
  21.         leal    -60(%ebp),%ecx
  22.         movw    $65001,%dx
  23.         call    fpc_widestr_to_ansistr
  24.         movl    -60(%ebp),%edx
  25.         leal    -12(%ebp),%eax
  26.         call    fpc_ansistr_assign
  27. # [140] S+=WideChar(StrToInt('$df1f'));
  28.         leal    -72(%ebp),%eax
  29.         call    fpc_widestr_decr_ref
  30.         movl    $_$UNIT1$_Ld11,%eax
  31.         call    SYSUTILS_$$_STRTOINT$ANSISTRING$$LONGINT
  32.         leal    -68(%ebp),%edx
  33.         call    fpc_uchar_to_widestr
  34.         movl    -68(%ebp),%ebx
  35.         leal    -64(%ebp),%edx
  36.         movl    -12(%ebp),%eax
  37.         call    fpc_ansistr_to_widestr
  38.         movl    -64(%ebp),%edx
  39.         leal    -72(%ebp),%eax
  40.         movl    %ebx,%ecx
  41.         call    fpc_widestr_concat
  42.         movl    -72(%ebp),%eax
  43.         leal    -60(%ebp),%ecx
  44.         movw    $65001,%dx
  45.         call    fpc_widestr_to_ansistr
  46.         movl    -60(%ebp),%edx
  47.         leal    -12(%ebp),%eax
  48.         call    fpc_ansistr_assign
  49. # [141] S+=WideChar(StrToInt('$0410'));
  50.         leal    -72(%ebp),%eax
  51.         call    fpc_widestr_decr_ref
  52.         movl    $_$UNIT1$_Ld12,%eax
  53.         call    SYSUTILS_$$_STRTOINT$ANSISTRING$$LONGINT
  54.         leal    -68(%ebp),%edx
  55.         call    fpc_uchar_to_widestr
  56.         movl    -68(%ebp),%ebx
  57.         leal    -64(%ebp),%edx
  58.         movl    -12(%ebp),%eax
  59.         call    fpc_ansistr_to_widestr
  60.         movl    -64(%ebp),%edx
  61.         leal    -72(%ebp),%eax
  62.         movl    %ebx,%ecx
  63.         call    fpc_widestr_concat
  64.         movl    -72(%ebp),%eax
  65.         leal    -60(%ebp),%ecx
  66.         movw    $65001,%dx
  67.         call    fpc_widestr_to_ansistr
  68.         movl    -60(%ebp),%edx
  69.         leal    -12(%ebp),%eax
  70.         call    fpc_ansistr_assign
  71. # [142] S1:='';
  72.         movl    $0,%edx
  73.         leal    -16(%ebp),%eax
  74.         call    fpc_ansistr_assign
  75. # [143] S1+=WideChar(StrToInt('$d83c'))+WideChar(StrToInt('$df1f'))+WideChar(StrToInt('$0410'));
  76.         leal    -76(%ebp),%eax
  77.         call    fpc_unicodestr_decr_ref
  78.         leal    -80(%ebp),%eax
  79.         call    fpc_unicodestr_decr_ref
  80.         movl    $_$UNIT1$_Ld10,%eax
  81.         call    SYSUTILS_$$_STRTOINT$ANSISTRING$$LONGINT
  82.         leal    -96(%ebp),%edx
  83.         call    fpc_uchar_to_unicodestr
  84.         movl    -96(%ebp),%eax
  85.         movl    %eax,-92(%ebp)
  86.         movl    $_$UNIT1$_Ld11,%eax
  87.         call    SYSUTILS_$$_STRTOINT$ANSISTRING$$LONGINT
  88.         leal    -100(%ebp),%edx
  89.         call    fpc_uchar_to_unicodestr
  90.         movl    -100(%ebp),%eax
  91.         movl    %eax,-88(%ebp)
  92.         movl    $_$UNIT1$_Ld12,%eax
  93.         call    SYSUTILS_$$_STRTOINT$ANSISTRING$$LONGINT
  94.         leal    -104(%ebp),%edx
  95.         call    fpc_uchar_to_unicodestr
  96.         movl    -104(%ebp),%eax
  97.         movl    %eax,-84(%ebp)
  98.         leal    -92(%ebp),%edx
  99.         leal    -80(%ebp),%eax
  100.         movl    $2,%ecx
  101.         call    fpc_unicodestr_concat_multi
  102.         movl    -80(%ebp),%ebx
  103.         leal    -104(%ebp),%edx
  104.         movl    -16(%ebp),%eax
  105.         call    fpc_ansistr_to_unicodestr
  106.         movl    -104(%ebp),%edx
  107.         leal    -76(%ebp),%eax
  108.         movl    %ebx,%ecx
  109.         call    fpc_unicodestr_concat
  110.         movl    -76(%ebp),%eax
  111.         leal    -60(%ebp),%ecx
  112.         movw    $65001,%dx
  113.         call    fpc_unicodestr_to_ansistr
  114.         movl    -60(%ebp),%edx
  115.         leal    -16(%ebp),%eax
  116.         call    fpc_ansistr_assign
  117. # [144] EdtOutput.Text:='S: '+S+', S1: '+S1;
  118.         leal    -112(%ebp),%eax
  119.         call    fpc_ansistr_decr_ref
  120.         pushl   $65535
  121.         movl    $_$UNIT1$_Ld13,%eax
  122.         movl    %eax,-128(%ebp)
  123.         movl    -12(%ebp),%eax
  124.         movl    %eax,-124(%ebp)
  125.         movl    $_$UNIT1$_Ld14,%eax
  126.         movl    %eax,-120(%ebp)
  127.         movl    -16(%ebp),%eax
  128.         movl    %eax,-116(%ebp)
  129.         leal    -128(%ebp),%edx
  130.         leal    -112(%ebp),%eax
  131.         movl    $3,%ecx
  132.         call    fpc_ansistr_concat_multi
  133.         movl    -112(%ebp),%eax
  134.         leal    -108(%ebp),%ecx
  135.         movw    $0,%dx
  136.         call    fpc_ansistr_to_ansistr
  137.         movl    -108(%ebp),%edx
  138.         movl    -8(%ebp),%eax
  139.         movl    1136(%eax),%eax
  140.         call    CONTROLS$_$TCONTROL_$__$$_SETTEXT$TTRANSLATESTRING

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: JSONStringToString do not decode emoji
« Reply #13 on: November 11, 2017, 04:59:02 pm »
We obviously don't have identical tests. Can you post a small project that produce the problem on your system?

Renat.Su

  • Full Member
  • ***
  • Posts: 232
    • Renat.Su
Re: JSONStringToString do not decode emoji
« Reply #14 on: November 11, 2017, 05:26:03 pm »
Yes.
In turn throw my little demo code to demonstrate the problem.

 

TinyPortal © 2005-2018