* * *

Author Topic: Linux-Lazarus-Indy  (Read 6051 times)

Remy Lebeau

  • Sr. Member
  • ****
  • Posts: 483
    • Lebeau Software
Re: Linux-Lazarus-Indy
« Reply #15 on: July 13, 2018, 10:29:43 pm »
Looks like enUTF8 appears to be deprecated now, Indy prefers the IndyTextEncoding_*() functions instead, at least on Linux.

On all platforms.

I understand the use of IndyTextEncoding_UTF8() should work for Linux and that it is the choice of preference, however for some reason it seems it doesn’t not work.

Not sure why.  Again, can you please test it?  You don't even need to make a TCP connection, simply use IndyTextEncoding_UTF8() by itself and encode/decode some strings with it.  Make sure everything works correctly on Linux.

In any case, I have just now checked in some tweaks to how Indy's TIdUTF8Encoding class is initialized.  See if it makes any difference.

Looks the problem arises client side, as it is the client app that freezes while the server one appears as not getting anything.

That implies to me that you are performing your socket I/O in the main UI thread (client side), and that are not correctly reading whatever you send.  Sends and Reads needs to be matched up correctly, so the receiver doesn't block waiting for data the sender does not send.

Have managed to pass a Stream, had a lot of difficulties, until I read the example at https://github.com/tinydew4/indy-project-demos/tree/master/TCPIP%20Delphi%20%26%20Indy10%20Client%20Server%20Demo/4_sample%20Stream%20Exchange, and realize the use of buffered writes.

You really should not use buffered writes when sending streams, especially large streams.  That is a good way to waste a lot of memory unnecessarily.

It is my understanding that to pass a stream I have to start by sending server the stream size first, then the actual stream so that the server will know the size of the stream to read.

You don't HAVE to do it that way, but it is the RECOMMENDED way, yes.

In the example that saved my life, the author sends the size, then he ignites a buffered write by calling WriteBufferOpen that is before he sends the stream itself. Is this the correct way?

I would not use buffered writes, no.  But regardless of buffering, yes, the general approach of sending the stream size first is a good choice.  However, you don't need to send the stream size manually, TIdIOHandler.Write(TStream) can send it for you:

Code: Pascal  [Select]
  1. IdTCPClient1.IOHandler.LargeStream := ...; // False to send the size as a 32bit integer, True for 64bit integer
  2. IdTCPClient1.IOHandler.Write(MemoryStream, 0, True); // send size + stream

Unless the formatting of the stream size will not match your communication protocol, in which case you would have to send it manually as needed.
Remy Lebeau
Lebeau Software - Owner, Developer
Internet Direct (Indy) - Admin, Developer (Support forum)

Remy Lebeau

  • Sr. Member
  • ****
  • Posts: 483
    • Lebeau Software
Re: Linux-Lazarus-Indy
« Reply #16 on: July 13, 2018, 10:33:49 pm »
You have to try to handle the IdTCPServer1Disconnect(AContext: TIdContext) event to prevent disconnect except.

That is not true at all.  An exception is going to happen whether you handle the OnDisconnect event or not.  But, if an exception DOES occur, and the OnExecute event doesn't catch it, the OnDisconnect event will still be called as expected (followed by the OnException event).
Remy Lebeau
Lebeau Software - Owner, Developer
Internet Direct (Indy) - Admin, Developer (Support forum)

scasparz

  • New member
  • *
  • Posts: 30
Re: Linux-Lazarus-Indy
« Reply #17 on: July 15, 2018, 01:07:53 am »
Quote
Not sure why. Again, can you please test it? You don't even need to make a TCP connection, simply use IndyTextEncoding_UTF8() by itself and encode/decode some strings with it.  Make sure everything works correctly on Linux.
Sorry Remy, have taken a look there http://packages.lazarus-ide.org, where it claims to provide the latest Indy version 10.6.2.5457, which is exactly the one I currently use. By the way both the tIdTCPClient as well as tIdTCPServer report their corresponding versions to be 10.6.2.0 instead. Anyway whatever this version may be there is not much I have not tried so far, on this topic.

I presume perhaps you expect me to download the latest version from https://svn.atozed.com:444/svn/Indy10. Have no previous experience with SVN. Have taken a look on synaptic -the debian source of heaven-, have found a number of SVN packages, do not know which one is the correct one to use. Not to mention that having tried to get on this page through http, I was asked for a user name and password that I could not provide. I ‘d be more than just happy to try the latest patches had I know the way to. At the moment am afraid I simply cannot get the latest updates.

Other than that have tried non-buffered writes successfully. Thanks for the LargeStream property, which I was not aware of, am using the 64-bit version, had no issues without LargeStream by using:
Code: Pascal  [Select]
  1.   IdTCPClient1.IOHandler.Write(MemoryStream.Size);
  2.   IdTCPClient1.IOHandler.Write(MemoryStream, MemoryStream.Size);
at the client and
Code: Pascal  [Select]
  1.   VLOC_MemoryStreamSize := AContext.Connection.IOHandler.ReadInt64();
  2.   AContext.Connection.IOHandler.ReadStream(MemoryStream, VLOC_MemoryStreamSize);
on the server side.

As for the OnDisconnect server side exception, seems reasonable to me and I can live with it, as is.


s
« Last Edit: July 15, 2018, 01:10:15 am by scasparz »

Remy Lebeau

  • Sr. Member
  • ****
  • Posts: 483
    • Lebeau Software
Re: Linux-Lazarus-Indy
« Reply #18 on: July 16, 2018, 09:02:05 pm »
Sorry Remy, have taken a look there http://packages.lazarus-ide.org, where it claims to provide the latest Indy version 10.6.2.5457, which is exactly the one I currently use. By the way both the tIdTCPClient as well as tIdTCPServer report their corresponding versions to be 10.6.2.0 instead.

FYI, the latest version in Indy's SVN is actually 10.6.2.5464, so OPM is a few revisions behind.  But either way, the version number is actually stored as 10.6.2.0 in the source code, and has to be overwritten with a command-line script before compiling.  I don't think OPM does that, though.

Anyway whatever this version may be there is not much I have not tried so far, on this topic.

But, did you try what I ASKED you to try?  Have you tested IndyTextEncoding_UTF8() BY ITSELF to make sure it encodes/decodes UTF-8 strings correctly on Linux in general?

I presume perhaps you expect me to download the latest version from https://svn.atozed.com:444/svn/Indy10. Have no previous experience with SVN.

You can download a nightly snapshot from https://indy.fulgan.com/ZIP/ instead.

Have taken a look on synaptic -the debian source of heaven-, have found a number of SVN packages, do not know which one is the correct one to use. Not to mention that having tried to get on this page through http, I was asked for a user name and password that I could not provide.

http://www.indyproject.org/Sockets/Download/DevSnapshot.aspx

The username for Indy's SVN is "Indy-Public-RO".  There is no password.
Remy Lebeau
Lebeau Software - Owner, Developer
Internet Direct (Indy) - Admin, Developer (Support forum)

scasparz

  • New member
  • *
  • Posts: 30
Re: Linux-Lazarus-Indy
« Reply #19 on: July 17, 2018, 11:30:53 am »
Many thanks Remy, I managed to download the latest version (10.6.2.5465) from https://indy.fulgan.com/ZIP/ as suggested. Unistalled the package I had so far (10.6.2.5457) and attempted to install the new one.

This new package has a different directory structure from the previous one. On adding package indylaz.lpk I got errors coming from unit IdSSLOpenSSL:
Code: Text  [Select]
  1. IdSSLOpenSSL.pas(3304,7) Error: Identifier not found "M"
  2. IdSSLOpenSSL.pas(3305,13) Error: ENDIF without IF(N)DEF
  3. IdSSLOpenSSL.pas(3306,7) Fatal: Syntax error, ";" expected but "identifier SSL_DEFAULT_CIPHER_LIST" found
as a result of which installation failed.

Reversed back to the old version, but replaced unit IdGlobal.pas that contains the IndyTextEncoding_*() functions with that of the new version. Reinstalled patched old Indy version sans issues. Then tested:
Code: Pascal  [Select]
  1. at the client
  2. IdTCPClient1.IOHandler.WriteLn(Edit1.Text, IndyTextEncoding_UTF8());
  3.  
  4. at the server
  5. LLine := AContext.Connection.IOHandler.ReadLn(IndyTextEncoding_UTF8());
and failed. Much like with the previous version the client app freezes while apparently the server one does not get anything. Tested again, this time using NIL at the server only, still the same. Tested the opposite permutation with NIL on the client and IndyTextEncoding_UTF8() on the server, again came with nothing.

Can it be that the IndyTextEncoding_UTF8() function actually depends on one of the numerous include files to be found at the IdGlobal unit? If yes then please advise me which one(s) and I could give it another run.

How am I supposed to test IndyTextEncoding_UTF8() by itself given that apparently this function does not bear any arguments while resulting an interface? Have never worked with interfaces before. Could you please provide me with a syntactic example of what you expect me to try?



kind regards
s

Remy Lebeau

  • Sr. Member
  • ****
  • Posts: 483
    • Lebeau Software
Re: Linux-Lazarus-Indy
« Reply #20 on: July 17, 2018, 08:55:07 pm »
This new package has a different directory structure from the previous one.

Actually, what is in the nightly ZIP is the official folder structure.  It is Lazarus that doesn't use the standard structure.

You can just replace the Lazarus source files with the official source files in whatever folder structure Lazarus uses.

On adding package indylaz.lpk I got errors coming from unit IdSSLOpenSSL:
Code: Text  [Select]
  1. IdSSLOpenSSL.pas(3304,7) Error: Identifier not found "M"
  2. IdSSLOpenSSL.pas(3305,13) Error: ENDIF without IF(N)DEF
  3. IdSSLOpenSSL.pas(3306,7) Fatal: Syntax error, ";" expected but "identifier SSL_DEFAULT_CIPHER_LIST" found
as a result of which installation failed.

That was my bad.  It has been fixed now.

Reversed back to the old version, but replaced unit IdGlobal.pas that contains the IndyTextEncoding_*() functions with that of the new version. Reinstalled patched old Indy version sans issues. Then tested:
Code: Pascal  [Select]
  1. at the client
  2. IdTCPClient1.IOHandler.WriteLn(Edit1.Text, IndyTextEncoding_UTF8());
  3.  
  4. at the server
  5. LLine := AContext.Connection.IOHandler.ReadLn(IndyTextEncoding_UTF8());
and failed.

(sigh) Can you please test IndyTextEncoding_UTF8() BY ITSELF, NO TCP COMPONENTS AT ALL?  For example, something like this:

Code: [Select]
var
  b: TIdBytes;
begin
  b := IndyTextEncoding_UTF8.GetBytes(Edit1.Text);
  // verify bytes are properly UTF-8 encoded...
  Edit2.Text := IndyTextEncoding_UTF8.GetString(b);
end;

Or this:

Code: [Select]
var
  b: TIdBytes;
begin
  b := ToBytes(Edit1.Text, IndyTextEncoding_UTF8);
  // verify bytes are properly UTF-8 encoded...
  Edit2.Text := BytesToString(b, IndyTextEncoding_UTF8);
end;

Which is roughly what TIdIOHandler actually does internally.

Much like with the previous version the client app freezes while apparently the server one does not get anything. Tested again, this time using NIL at the server only, still the same. Tested the opposite permutation with NIL on the client and IndyTextEncoding_UTF8() on the server, again came with nothing.

Sniff the TCP traffic with a packet sniffer, such as Wireshark.  Does the client actually transmitting the string data?  Does the server actually receiving the data?

There are a lot of steps involved in string handling and TCP transmissions.  Can't you debug into Indy's source code at runtime at all?  That would eliminate so much of this guesswork if you could actually isolate with a debugger the exact code that is freezing.

Can it be that the IndyTextEncoding_UTF8() function actually depends on one of the numerous include files to be found at the IdGlobal unit?

On Linux, Indy's TextEncoding classes rely on the iconv library.  The usage of that library is all contained in IdGlobal.pas, using the library definitions from FreePascal's own 'iconvenc' unit.
Remy Lebeau
Lebeau Software - Owner, Developer
Internet Direct (Indy) - Admin, Developer (Support forum)

scasparz

  • New member
  • *
  • Posts: 30
Re: Linux-Lazarus-Indy
« Reply #21 on: July 18, 2018, 12:53:45 am »
Hi again,

Downloaded version Indy10_5465.zip from https://indy.fulgan.com/ZIP/ again and had exactly the same installation issues I had on the previous attempt. You may find more information on this at my previous post. Anyway intend to give it another try as soon as I will be able to have a newer version installed.

Reversed back to the old working Indy version, unpatched -that is with the original IdGlobal.pas unit coming with it- and tried both examples you provided. The only Indy unit I used has been IdGlobal.pas. To do this had to include the Indy path at the project options - paths.

The first example worked fine, was able to test the entire Greek alphabet without issues. The second one did not work at all. All EditBox Edit2 could get was an empty string.

Used the debugger on either procedures. On the first example variable b was populated after the execution of the statement:
Code: Pascal  [Select]
  1. b := IndyTextEncoding_UTF8.GetBytes(Edit1.Text);
On the second example variable b remained empty after a call to the statement:
Code: Pascal  [Select]
  1. b := ToBytes(Edit1.Text, IndyTextEncoding_UTF8);
The full execution path (in F7 steps) of the second example was:
  • Unit IdGlobal procedure IndyTextEncoding_UTF8, obviously it was this resulting the empty string b
  • file control.inc procedure TControl.GetText: TCaption which apparently is irrelevant to the issue
  • then the statemennt Edit2.Text := BytesToString(b, IndyTextEncoding_UTF8); bringing the empty string on Edit2.
In both examples, executing procedure IndyTextEncoding_UTF8(), Indy finds GIdUTF8Encoding to be NIL before proceeding with the statement:
Code: Pascal  [Select]
  1. Result := GIdUTF8Encoding;

Hope it helps.

Understand this problem keeps everyone busy, but can you tell the ignorant me if I can use the tIdContrext.Data property which is a tObject to store my local variables into the connection thread (eg the one to store the incoming stream and its size and perhaps any manipulative methods) or should I keep it off limits?

At your service with any suggestions you make.
« Last Edit: July 18, 2018, 01:18:34 am by scasparz »

Remy Lebeau

  • Sr. Member
  • ****
  • Posts: 483
    • Lebeau Software
Re: Linux-Lazarus-Indy
« Reply #22 on: July 18, 2018, 10:40:08 pm »
Downloaded version Indy10_5465.zip from https://indy.fulgan.com/ZIP/ again and had exactly the same installation issues I had on the previous attempt. You may find more information on this at my previous post. Anyway intend to give it another try as soon as I will be able to have a newer version installed.

See http://wiki.freepascal.org/Indy_with_Lazarus

The first example worked fine, was able to test the entire Greek alphabet without issues. The second one did not work at all. All EditBox Edit2 could get was an empty string.

Hmm...

Used the debugger on either procedures. On the first example variable b was populated after the execution of the statement:
Code: Pascal  [Select]
  1. b := IndyTextEncoding_UTF8.GetBytes(Edit1.Text);

And, was it populated *correctly*?  Did you verify that the bytes were actually encoded using proper UTF-8 for Greek?

On the second example variable b remained empty after a call to the statement:
Code: Pascal  [Select]
  1. b := ToBytes(Edit1.Text, IndyTextEncoding_UTF8);

Hmm, ideally it should have output the same bytes as the first example, since ToBytes() will use IndyTextEncoding_UTF8 internally.

However, there is a small but important difference between the two examples:

If you are using FreePascal in a mode where the 'string' type maps to 'AnsiString', ToBytes() will explicitly convert the input string from ANSI to UTF-16 using IndyTextEncoding_OSDefault (unless you specify otherwise via the optional ASrcEncoding parameter), and then convert the UTF-16 to UTF-8 using IndyTextEncoding_UTF8.  So, my ToBytes() example would be more on-par with this example for an 'AnsiString':

Code: Pascal  [Select]
  1. var
  2.   b: TIdBytes;
  3.   s: AnsiString;
  4.   ch: TIdWideChars;
  5. begin
  6.   //b := ToBytes(Edit1.Text, IndyTextEncoding_UTF8);
  7.   s := Edit1.Text;
  8.   b := RawToBytes(PAnsiChar(s)^, Length(s));
  9.   ch := IndyTextEncoding_OSDefault.GetChars(b);
  10.   b := IndyTextEncoding_UTF8.GetBytes(ch);
  11.   ...
  12. end;
  13.  

So, data could be getting lost in either the call to IndyTextEncoding_OSDefault.GetChars() or the call to IndyTextEncoding_UTF8.GetBytes().

Whereas my first example, passing Edit1.Text directly to IndyTextEncoding.GetBytes(), relies on the compiler implicitly converting an 'AnsiString' to a '(Wide|Unicode)String' using its own native ANSI->Unicode conversion.

When 'string' maps to 'UnicodeString' instead, then only IndyTextEncoding_UTF8 is involved when calling ToBytes(), eg:

Code: Pascal  [Select]
  1. var
  2.   s: UnicodeString;
  3.   b: TIdBytes;
  4.   len: Integer;
  5. begin
  6.   //b := ToBytes(Edit1.Text, IndyTextEncoding_UTF8);
  7.   s := Edit1.Text;
  8.   len := IndyTextEncoding_UTF8.GetByteCount(s, 1, Length(s));
  9.   SetLength(b, len);
  10.   if len > 0 then begin
  11.     IndyTextEncoding_UTF8.GetBytes(s, 1, Length(s), b, 0);
  12.   end;
  13.   ...
  14. end;
  15.  

The full execution path (in F7 steps) of the second example was:
  • Unit IdGlobal procedure IndyTextEncoding_UTF8, obviously it was this resulting the empty string b

But, that doesn't tell me WHERE inside of Indy the data is getting lost exactly.  I need you to step INSIDE of Indy's source code with the debugger, find out EXACTLY what happens at each step in the conversion process.

In both examples, executing procedure IndyTextEncoding_UTF8(), Indy finds GIdUTF8Encoding to be NIL before proceeding with the statement:
Code: Pascal  [Select]
  1. Result := GIdUTF8Encoding;

As it should be, since GIdUTF8Encoding is initially nil.  GIdUTF8Encoding is assigned to a TIdUTF8Encoding object the first time IndyTextEncoding_UTF8() is called.

Hope it helps.

Not really.

can you tell the ignorant me if I can use the tIdContext.Data property which is a tObject to store my local variables into the connection thread (eg the one to store the incoming stream and its size and perhaps any manipulative methods) or should I keep it off limits?

Yes, you can use the TIdContext.Data property like that.  The Data property is specifically intended for users to store whatever per-connection context data they want (similar to what the TComponent.Tag property for integers).

Though, it is typically better to derive a new class from TIdServerContext, put whatever you want in that class, and then assign that class type to the server's ContextClass property before activating the server.
« Last Edit: July 18, 2018, 10:42:42 pm by Remy Lebeau »
Remy Lebeau
Lebeau Software - Owner, Developer
Internet Direct (Indy) - Admin, Developer (Support forum)

scasparz

  • New member
  • *
  • Posts: 30
Re: Linux-Lazarus-Indy
« Reply #23 on: July 19, 2018, 06:47:45 am »
Quote
Quote from: scasparz on July 18, 2018, 12:53:45 am
Hope it helps.

Not really.
Sorry Remy, am trying to do my best while am not a system developer.

Quote
http://wiki.freepascal.org/Indy_with_Lazarus
Again thanks for the information. Will use it next time. Took a look for a newer nightly at https://indy.fulgan.com/ZIP/, seems it is now Forbidden, while I have been there multiple times. No other options for the moment other than keep using version 10.6.2.5457 unpatched.

Quote
And, was it populated *correctly*?  Did you verify that the bytes were actually encoded using proper UTF-8 for Greek?
Testing was done with the 'αβγδεζηθικλμνξοπρστυφχψω ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ άέήίόύώ ϊϋ ΐΰ' which is the entire Greek character set on the Edit1 editbox. All am trying to say is that what I got on Edit2 was a facsimile replica of what I had typed on Edit1. Plus Edit1.Text and Edit2.Text compare identical. No corruptions, no losses, everything came as it should. Looks example 1 works fine. If you want me to try something different with it please ask.

Quote
But, that doesn't tell me WHERE inside of Indy the data is getting lost exactly.  I need you to step INSIDE of Indy's source code with the debugger, find out EXACTLY what happens at each step in the conversion process.
I can absolutely understand you here, however F7 is the most detailed granularity with the debugger that I know. If there is a way to increase this further please advise me on how to do this. Meanwhile all I could do was to set a breakpoint at statement:
Code: Pascal  [Select]
  1. b := ToBytes(Edit1.Text, IndyTextEncoding_UTF8); //(1)
then stepped forward with F7 and I was send to the first statement of the IndyTextEncoding_UTF8() on unit IdGlobal.pas. There has been no Indy code before my eyes other than that am afraid. And yes, I was expecting to see something related with the ToBytes function afterwards, but IMHO either this has not been transparent with the debugger or for some reason that I ignore this call to the ToBytes was not materialized.

The value of the Result variable is the same (<IIDTEXTENCODING> = <IDUNKNOWN>) on both examples at least with Greek characters input.

Ran example 2 with Latin chars, actually it contained the default editbox 'Edit1' text which I did not alter.
Looks this time the execution path was different though:

Code: Pascal  [Select]
  1. LEncoding := TIdUTF8Encoding.Create;
  2. constructor TIdUTF8Encoding.Create;
  3. TIdMBCSEncoding.Create;
  4. constructor TIdMBCSEncoding.Create(const CharSet: String);
  5. function PosInStrArray(const SearchStr: string; const Contents: array of string; const CaseSensitive: Boolean = True): Integer;
  6. function TextIsSame(const A1, A2: string): Boolean; resulting False
  7. function PosInStrArray(const SearchStr: string; const Contents: array of string; const CaseSensitive: Boolean = True): Integer;
  8. function TextIsSame(const A1, A2: string): Boolean;
  9. function PosInStrArray(const SearchStr: string; const Contents: array of string; const CaseSensitive: Boolean = True): Integer;
  10. function TextIsSame(const A1, A2: string): Boolean;
  11. function PosInStrArray(const SearchStr: string; const Contents: array of string; const CaseSensitive: Boolean = True): Integer;
  12. function TextIsSame(const A1, A2: string): Boolean;
  13. function PosInStrArray(const SearchStr: string; const Contents: array of string; const CaseSensitive: Boolean = True): Integer;
  14. function TextIsSame(const A1, A2: string): Boolean;
  15. function PosInStrArray(const SearchStr: string; const Contents: array of string; const CaseSensitive: Boolean = True): Integer;
  16. function TextIsSame(const A1, A2: string): Boolean;
  17. function PosInStrArray(const SearchStr: string; const Contents: array of string; const CaseSensitive: Boolean = True): Integer;
  18. function TextIsSame(const A1, A2: string): Boolean;
  19. function PosInStrArray(const SearchStr: string; const Contents: array of string; const CaseSensitive: Boolean = True): Integer;
  20. function TextIsSame(const A1, A2: string): Boolean;
  21. function PosInStrArray(const SearchStr: string; const Contents: array of string; const CaseSensitive: Boolean = True): Integer; exited with -1
  22. back to constructor TIdMBCSEncoding.Create(const CharSet: String);
  23. function TIdUTF7Encoding.GetByteCount(const AChars: PIdWideChar; ACharCount: Integer): Integer;
  24. function TIdMBCSEncoding.GetByteCount(const AChars: PIdWideChar; ACharCount: Integer): Integer;
  25. function DoIconvCharsToBytes(const ACharset: string; AChars: PIdWideChar; ACharCount: Integer;
  26. function CreateIconvHandle(const ACharSet: String; AToUTF16: Boolean): iconv_t;
from where I got a dialog box with message
<FILE NOT FOUND>
The file "iconv_open.c" was not found.
Do you want to locate it yourself?

Answered no, which brought me the assembler, then most probably I lost him while trying to maintain this log. Am with the impression Indy was inspecting the 'Edit1' string character by character, chances are I have missed an iteration. Anyway Remy this is French to me and I mean I do not speak French. You have been extremely helpful from the very beginning, would at least be unfair to not cooperate with you fully. Correcting Indy is important to either of us. Am still at your service mate with whatever suggestions you have.

Run it for the second time without changing the Edit1 contents. This time the execution path was different though. Instead of all that fuzz, there has been no call to the
Code: Pascal  [Select]
  1. LEncoding := TIdUTF8Encoding.Create;
at the IndyTextEncoding_UTF8() function, however Edit2 contents remained correct.

Many thanks for your reply on the TIdContext.Data property. Actually deriving a new tIdContext descendant and employ it through the use of the TIdServerContextClass was what I had in mind.

PS. https://indy.fulgan.com/ZIP/ came back just before I was to send this post. Will try latest version 5468 as soon as I can find some time.

Remy Lebeau

  • Sr. Member
  • ****
  • Posts: 483
    • Lebeau Software
Re: Linux-Lazarus-Indy
« Reply #24 on: July 19, 2018, 07:14:12 pm »
Took a look for a newer nightly at https://indy.fulgan.com/ZIP/, seems it is now Forbidden, while I have been there multiple times.

Works fine for me.  You might have been accessing it during maintenance time, or while it was syncing with Indy's SVN, or something like that.

No other options for the moment other than keep using version 10.6.2.5457 unpatched.

You can always download files from Indy's SVN directly, or one of its mirrors:

http://www.indyproject.org/Sockets/Download/DevSnapshot.aspx

Testing was done with the 'αβγδεζηθικλμνξοπρστυφχψω ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ άέήίόύώ ϊϋ ΐΰ' which is the entire Greek character set on the Edit1 editbox.

The *correct* UTF-8 byte sequence for that string of Greek characters is:

Code: [Select]
CE B1 CE B2 CE B3 CE B4 CE B5 CE B6 CE B7 CE B8
CE B9 CE BA CE BB CE BC CE BD CE BE CE BF CF 80
CF 81 CF B3 CF 84 CF 85 CF 86 CF 87 CF 88 CF 89
20 CE 91 CE 92 CE 93 CE 94 CE 95 CE 96 CE 97 CE
98 CE 99 CE 9A CE 9B CE 9C CE 9D CE 9E CE 9F CE
A0 CE A1 CE A3 CE A4 CE A5 CE A6 CE A7 CE A8 CE
A9 20 CE AC CE AD CE AE CE AF CF 8C CF 8D CF 8E
20 CF 8A CF 8B 20 CE 90 CE B0

Is that what IndyTextEncoding_UTF8.GetBytes() produces?  And if those exact bytes are passed to IndyTextEncoding_UTF8.GetString(), does it return the original string ('αβγδεζηθικλμνξοπρστυφχψω ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ άέήίόύώ ϊϋ ΐΰ') as expected?

F7 is the most detailed granularity with the debugger that I know. If there is a way to increase this further please advise me on how to do this.

Sounds like you don't have Indy compiled with debug info enabled.  Try recompiling it.  Then hopefully the debugger will let you step into the source code.

Ran example 2 with Latin chars, actually it contained the default editbox 'Edit1' text which I did not alter.
Looks this time the execution path was different though:

Code: Pascal  [Select]
  1. LEncoding := TIdUTF8Encoding.Create;
  2. constructor TIdUTF8Encoding.Create;
  3. TIdMBCSEncoding.Create;
  4. constructor TIdMBCSEncoding.Create(const CharSet: String);
  5. ... snip 8 TextIsSame() comparisons ...
  6. function PosInStrArray(const SearchStr: string; const Contents: array of string; const CaseSensitive: Boolean = True): Integer; exited with -1
  7.  

Please make sure you are using the absolute latest version of IdGlobal.pas, whatever it takes to get installed.  In the older version you are testing with, the PosInStrArray() call inside of the TIdMBCSEncoding constructor compares 8 charset names .  In the latest version, it compares 16 names (of which 'UTF-8' is now one of them).  I had made changes to how TIdMBCSEncoding and its descendants calculate the FMaxCharSize member, I need you to be testing with the latest version, or we are just going to end up going around in circles.

Code: Pascal  [Select]
  1. function TIdUTF7Encoding.GetByteCount(const AChars: PIdWideChar; ACharCount: Integer): Integer;
  2.  


FYI, in the latest version, TIdUTF8Encoding no longer derives from TIdUTF7Encoding.

from where I got a dialog box with message
<FILE NOT FOUND>
The file "iconv_open.c" was not found.
Do you want to locate it yourself?

The debugger is trying to step into the ICONV library source code.  That is not necessary in this situation, so just ignore that.

Run it for the second time without changing the Edit1 contents. This time the execution path was different though. Instead of all that fuzz, there has been no call to the
Code: Pascal  [Select]
  1. LEncoding := TIdUTF8Encoding.Create;
at the IndyTextEncoding_UTF8() function, however Edit2 contents remained correct.

If you ran the conversion a second time in the same process, then yes, because the GIdUTF8Encoding object has already been created and doesn't need to be created again.

If you really wanted to re-create a new UTF8 object on each test, you could use the (deprecated) IndyUTF8Encoding() function, setting its AOwnedByIndy parameter to False.
Remy Lebeau
Lebeau Software - Owner, Developer
Internet Direct (Indy) - Admin, Developer (Support forum)

scasparz

  • New member
  • *
  • Posts: 30
Re: Linux-Lazarus-Indy
« Reply #25 on: July 20, 2018, 12:31:19 am »
Downloaded and installed latest nightly Indy10-10.6.2.5468 sans issues. Run the transliteration app again and got exactly the same results as before. The quick answer is that example 1 passes with flying colours while example 2 fails.

Tried to log execution path again for the second example, focusing on the first run only, that seems the run which matters. Noticed GDB behaves erratically. Jumps mid-subprograms omitting whatever lies before rather than starting from their beginning, or jumps over grayed code -presume disabled- etc. none of which makes any sense to me. This was not like yesterday when at least GDB seemed to behave itself. Me believes this affects Indy code only, as GDB behaviour on Lazarus code appears to be the usual. Do you think I had to compile Indy twice as with older versions?

Suspect but am far from being certain somehow my Lazarus installation could be corrupted, perhaps by installing or unistalling a number of Indy’s versions and rebuilding Lazarus each and every time. Intend to purge Lazarus and attempt a fresh installation in the hope of fixing these issues.

Will come back as soon as I have something.

Remy Lebeau

  • Sr. Member
  • ****
  • Posts: 483
    • Lebeau Software
Re: Linux-Lazarus-Indy
« Reply #26 on: July 20, 2018, 02:33:00 am »
Downloaded and installed latest nightly Indy10-10.6.2.5468 sans issues. Run the transliteration app again and got exactly the same results as before. The quick answer is that example 1 passes with flying colours while example 2 fails.

I've already explained the difference between the 1st and 2nd examples.  You haven't addresses any of the issues I've described.  Which mode are using FreePascal in?  Does the 'string' type map to AnsiString or UnicodeString? If AnsiString, did you do any tests using the extended examples I have shown where IndyTextEncoding_OSDefault is used as an intermediate conversion?

Do you think I had to compile Indy twice as with older versions?

I have no idea.  I don't have FreePascal/Lazarus installed, let alone tried to ever install Indy in it.
Remy Lebeau
Lebeau Software - Owner, Developer
Internet Direct (Indy) - Admin, Developer (Support forum)

scasparz

  • New member
  • *
  • Posts: 30
Re: Linux-Lazarus-Indy
« Reply #27 on: July 20, 2018, 08:55:16 pm »
To start with Lazarus maps String to AnsiString through the omnipresent default directive {$H+} at least on Linux-64.

Have given it a another try on a vmachine this time. GDB seems to be working now.

For some reason GDB would not take me to the ToBytes function using F7 during the runs of the previous days. This time I set an explicit breakpoint and was the first time I saw the execution of this code. Suspect maybe this is where the problem lies, for IndyTextEncoding_UTF8() seems to be working fine with the syntax of the example1. Tested with Indy 10.6.2.5468 using string 'αβ' on editbox Edit1.

Below only the numbered // (x) lines were executed, have added a few comments, hope they help this time.

Code: Pascal  [Select]
  1. function ToBytes(const AValue: string; const ALength: Integer; const AIndex: Integer = 1;
  2.   ADestEncoding: IIdTextEncoding = nil
  3.   {$IFDEF STRING_IS_ANSI}; ASrcEncoding: IIdTextEncoding = nil{$ENDIF}
  4.   ): TIdBytes; overload;
  5. var
  6.   LLength: Integer;
  7.   {$IFDEF STRING_IS_ANSI}
  8.   LBytes: TIdBytes;
  9.   {$ENDIF}
  10. begin
  11.   {$IFDEF STRING_IS_ANSI}
  12.   LBytes := nil; // keep the compiler happy  // (0)
  13.   {$ENDIF}
  14.   LLength := IndyLength(AValue, ALength, AIndex); // (1), LLength becomes 4
  15.   if LLength > 0 then // (2)
  16.   begin
  17.     EnsureEncoding(ADestEncoding); // (3) ADestEncoding = IUNKNOWN, GDB won't let me F7 this
  18.     {$IFDEF STRING_IS_UNICODE}
  19.     SetLength(Result, ADestEncoding.GetByteCount(AValue, AIndex, LLength));
  20.     if Length(Result) > 0 then begin
  21.       ADestEncoding.GetBytes(AValue, AIndex, LLength, Result, 0);
  22.     end;
  23.     {$ELSE}
  24.     EnsureEncoding(ASrcEncoding, encOSDefault); // (4) ASrcEncoding = IUNKNOWN
  25.     LBytes := RawToBytes(AValue[AIndex], LLength); // (5) LBytes becomes (206,177,206,178)
  26.     CheckByteEncoding(LBytes, ASrcEncoding, ADestEncoding); // (6) LBytes becomes ()
  27.     Result := LBytes; // (7)
  28.     {$ENDIF}
  29.   end else begin
  30.     SetLength(Result, 0);
  31.   end;
  32. end;

Statement of interest seems to be #6. Before the call LBytes is (206,177,206,178), while ASrcEncoding as well as ADestEncoding are valued IUNKNOWN. After this statement LBytes is given the () value.

Have F7ed CheckByteEncoding as well.
Code: Pascal  [Select]
  1. procedure CheckByteEncoding(var VBytes: TIdBytes; ASrcEncoding, ADestEncoding: IIdTextEncoding);
  2. begin
  3.   if ASrcEncoding <> ADestEncoding then begin // according to the GDB both ASrcEncoding and ADestEncoding are valued IUNKNOWN
  4.     VBytes := ADestEncoding.GetBytes(ASrcEncoding.GetChars(VBytes)); // (8) this statement is executed, should it?
  5.   end;
  6. end;

Both ASrcEncoding as well as ADestEncoding are of the IIdTextEncoding type, which I understand is an interface. Have no clues on how do interfaces compare, do not know the theory behind. Would expect statement #8 to not be executed. Anyway #8 changes the value of VBytes into (), finally resulting a blanc string.

Cannot F7 statement #8 of course.
« Last Edit: July 20, 2018, 09:03:23 pm by scasparz »

Remy Lebeau

  • Sr. Member
  • ****
  • Posts: 483
    • Lebeau Software
Re: Linux-Lazarus-Indy
« Reply #28 on: July 21, 2018, 02:37:16 am »
To start with Lazarus maps String to AnsiString through the omnipresent default directive {$H+} at least on Linux-64.

Perhaps.  Just note that Indy also supports when 'string' maps to 'UnicodeString' via {$ModeSwitch UnicodeStrings} or {$Mode DelphiUnicode}.  Various properties and function signatures in Indy change depending on whether AnsiString or UnicodeString is being used.

IIRC, doesn't Lazarus use UTF-8 encoded AnsiStrings, though?  Would that also apply to strings from the UI?  You need to take a string's encoding into account when calling Indy's ToBytes() and BytesToString() functions when dealing with AnsiStrings.  You need to tell them that input/output strings are encoded in UTF-8 and not in the OS's default locale, via either:

- the ASrcEncoding parameter of ToBytes(), and the ADestEncoding parameter of BytesToString().

- the ASrcEncoding parameter of TIdIOHandler.WriteLn(), and the ADestEncoding parameter of TIdIOHandler.ReadLn().

- the DefAnsiEncoding property of TIdIOHandler, which WriteLn() and ReadLn() default to if no encoding is passed in explicitly.

- the global GIdDefaultTextEncoding variable in the IdGlobal.pas unit.

For some reason GDB would not take me to the ToBytes function using F7 during the runs of the previous days.

Again, did you recompile Indy with debug info enabled?

This time I set an explicit breakpoint and was the first time I saw the execution of this code. Suspect maybe this is where the problem lies, for IndyTextEncoding_UTF8() seems to be working fine with the syntax of the example1.

As I've already explained earlier, I don't expect any problems with example 1, because the IIdTextEncoding.GetBytes() method takes a UnicodeString as input, not a generic String.  So, if you pass in an AnsiString as input to GetString(), the compiler/RTL - not Indy - will convert the string to Unicode before GetBytes() is even entered.  And in the context of Lazarus, with its UTF-8 encoded AnsiStrings, I would expect that conversion to take UTF-8 into account, thus no data loss.  Same with the IIdTextEncoding.GetString() method, which returns a UnicodeString, which the compiler/RTL - not Indy - will convert when assigned to an AnsiString.

But example 1 is not what TIdIOHandler does internally, example 2 is, and that is the one that is having issues, likely because of the use of IndyTextEncoding_OSDefault as an intermediate conversion.

This time I set an explicit breakpoint and was the first time I saw the execution of this code.

OK, now we are finally getting somewhere useful...

Code: Pascal  [Select]
  1.   {$IFDEF STRING_IS_ANSI}
  2.   LBytes := nil; // keep the compiler happy  // (0)
  3.   {$ENDIF}
  4.  

We've established that 'string' is 'AnsiString'...

Code: Pascal  [Select]
  1.   LLength := IndyLength(AValue, ALength, AIndex); // (1), LLength becomes 4
  2.  

And that 'string' is UTF-8 encoded ('αβ' is 4 bytes when encoded in UTF-8)...

In which case, you need to either:

- set the ASrcEncoding parameter of ToBytes(), and the ADestEncoding parameter of BytesToString(), to IndyTextEncoding_UTF8 instead of their default values of nil.

- set the IdGlobal.GIdDefaultTextEncoding variable to encUTF8 instead of its default value of encASCII.

In the context of TIdIOHandler, you can set its DefAnsiEncoding property to IndyTextEncoding_UTF8, and leave off any encodings when calling TIdIOHandler.WriteLn() and TIdIOHandler.ReadLn().

Code: Pascal  [Select]
  1.     EnsureEncoding(ADestEncoding); // (3) ADestEncoding = IUNKNOWN, GDB won't let me F7 this
  2.  

EnsureEncoding() is also in IdGlobal.pas, don't know why the debugger won't let you step into it.

If the VEncoding parameter is nil (which it is not in this example), it gets set to an encoding specified by the ADefEncoding parameter, which is encIndyDefault by default, so the VEncoding will be set to IndyTextEncoding_Default, which returns an encoding determined by the IdGlobal.GIdDefaultTextEncoding variable, which is encASCII by default, so IndyTextEncoding_ASCII is used by default.

But, in this example, ADestEncoding is being set to IndyTextEncoding_UTF8 by the caller, so EnsureEncoding() is a no-op.

Code: Pascal  [Select]
  1.     EnsureEncoding(ASrcEncoding, encOSDefault); // (4) ASrcEncoding = IUNKNOWN
  2.  

Following the above logic, ASrcEncoding is initially nil, so it gets set to IndyTextEncoding_OSDefault, whose implementation is TIdMBCSEncoding using 'char' or 'ASCII' as the charset when calling into ICONV, depending on the IdGlobal.GIdIconvUseLocaleDependantAnsiEncoding variable, which is false by default so 'ASCII' is the default charset.

Code: Pascal  [Select]
  1.     LBytes := RawToBytes(AValue[AIndex], LLength); // (5) LBytes becomes (206,177,206,178)
  2.  

Those bytes are the correct UTF-8 encoded form of 'αβ'...

Code: Pascal  [Select]
  1.     CheckByteEncoding(LBytes, ASrcEncoding, ADestEncoding); // (6) LBytes becomes ()
  2.  

And this is where data loss occurs, because ASrcEncoding is set to IndyTextEncoding_OSDefault instead of IndyTextEncoding_UTF8, so the bytes will not be interpreted as UTF-8 correctly.  There is a TODO comment inside of IndyTextEncoding_OSDefault() to have it use UTF-8 on POSIX systems (which includes Linux), but that has not been enabled yet.

Internally, CheckByteEncoding() looks like this:

Code: Pascal  [Select]
  1. procedure CheckByteEncoding(var VBytes: TIdBytes; ASrcEncoding, ADestEncoding: IIdTextEncoding);
  2. begin
  3.   if ASrcEncoding <> ADestEncoding then begin
  4.     VBytes := ADestEncoding.GetBytes(ASrcEncoding.GetChars(VBytes));
  5.   end;
  6. end;
  7.  

We know what the output of ADestEncoding.GetBytes() is (no bytes), but what is the output of ASrcEncoding.GetChars(VBytes)?  Is it also empty?  IOW, is the loss of data happening because TIdMCBSEncoding.GetChars() returns no chars at all when given UTF-8 encoded bytes, or is the loss because TIdUTF8Encoding.GetBytes() can't process the chars that TIdMBCSEncoding.GetChars() returned?  You should be able to put breakpoints in those method implementations.

Even though the input UTF-8 bytes are not being interpreted as UTF-8, I would expect TIdMBCSEncoding.GetChars() to still be able to return SOME chars.  Incorrect chars perhaps, maybe even $FFFD chars, but not zero chars.  That would imply a logic bug inside of the IdGlobal.DoIconvBytesToChars() function.

You might need to set the IdGlobal.GIdIconvUseLocaleDependantAnsiEncoding variable to true, or the IdGlobal.GIdIconvIgnoreIllegalChars variable to true, too work around that.

Have F7ed CheckByteEncoding as well.
Code: Pascal  [Select]
  1. procedure CheckByteEncoding(var VBytes: TIdBytes; ASrcEncoding, ADestEncoding: IIdTextEncoding);
  2. begin
  3.   if ASrcEncoding <> ADestEncoding then begin // according to the GDB both ASrcEncoding and ADestEncoding are valued IUNKNOWN
  4.     VBytes := ADestEncoding.GetBytes(ASrcEncoding.GetChars(VBytes)); // (8) this statement is executed, should it?
  5.   end;
  6. end;
[/code]

Yes, the ( 8 ) line should be executed, since ASrcEncoding and ADestEncoding are pointing at different objects, so the comparison evaluates as false.

Both ASrcEncoding as well as ADestEncoding are of the IIdTextEncoding type, which I understand is an interface.

Yes.

Have no clues on how do interfaces compare

For purposes of the '=' and '<>' comparison operators, they are simply raw pointer comparisons.

However, for Indy 11, I'm expanding on IIdTextEncoding comparisons to take codepages and charsets into account, so even if two IIdTextEncoding variables point at different objects in memory, byte conversions can be skipped if both objects logically represent the same character encoding.

Would expect statement #8 to not be executed.

In the situation where you pass in a UTF-8 encoded AnsiString, and ask for it to be output as a UTF-8 byte array, then you would be correct ONLY WHEN the ASrcEncoding parameter of ToBytes() is set to IndyTextEncoding_UTF8, which it is not in this example.  Had it been, the comparison in CheckByteEncoding() would have evaluated as true instead of false, and the conversion skipped.

Cannot F7 statement #8 of course.

Which is odd, since those methods are also in IdGlobal.pas.  You should be able to put breakpoints in the implementations, though.
« Last Edit: July 21, 2018, 02:45:58 am by Remy Lebeau »
Remy Lebeau
Lebeau Software - Owner, Developer
Internet Direct (Indy) - Admin, Developer (Support forum)

scasparz

  • New member
  • *
  • Posts: 30
Re: Linux-Lazarus-Indy
« Reply #29 on: July 23, 2018, 08:17:32 pm »
Well, this has been a wealth of information, will need time to play with not to mention the need to do some homework on topics I lack experience. At the moment have just reinstalled Lazarus and Indy, apparently they work, still cannot F7 say EnsureEncoding(), and a few others, can overcome this with explicit breakpoints.
Quote
- set the IdGlobal.GIdDefaultTextEncoding variable to encUTF8 instead of its default value of encASCII.
Added this on app main, it did not work.
Quote
Again, did you recompile Indy with debug info enabled?
Do not have a clue here. GDB would let me F7 some parts but may keep a few others out of reach. Haven’t changed anything on Indy, would expect there must be some centralized management on that, probably at some include file or unit. Have taken a look at Lazpack.lpk, have found nothing to be there. Have noticed the existence of a number of DEBUG_STRING_MISMATCH on IdGlobal.pas, but would expect them to be unrelated. Perhaps then you could enlighten me on where to look for.

Anyway will come up as soon as and if I have something useful to say.

 

Recent

Get Lazarus at SourceForge.net. Fast, secure and Free Open Source software downloads Open Hub project report for Lazarus