Recent

Author Topic: [Solved] About TIdIRC and Unicode  (Read 1998 times)

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1186
  • Professional amateur ;-P
[Solved] About TIdIRC and Unicode
« on: December 18, 2024, 07:34:28 pm »
Hey Remy,

I've been playing around with TIdIRC for a while in order to do some IRC bots.
The first one I did was a channel log bot with replay and search. But I'm now bumping my head on the wall because of Unicode support.
I've been made aware of this because one of the users of the #pascal channel has one of their computers setup with single quotes as a Unicode character.
I've also tested and any emoji will not come through.

Now, I understand removing the offending Unicode characters in order to be safe in ansi mode, what I don't understand is why it will delete the first encountered Unicode char and all the following ones, Unicode or not.
Modern IRC is able to support Unicode. But TIdIRC isn't.

I've tried my best to look at the source and see where I could maybe suggest a PR, but I wasn't able to suss it out, not even with RAW.
I'm really sorry for that!!  :-[

Can you please help me, or suggest some kind of tweak I can perform in order to get Unicode working?

Many thanks in advance!!

Cheers,
Gus
« Last Edit: March 20, 2025, 11:25:06 pm by Gustavo 'Gus' Carreno »
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

Remy Lebeau

  • Hero Member
  • *****
  • Posts: 1501
    • Lebeau Software
Re: About TIdIRC and Unicode
« Reply #1 on: December 19, 2024, 07:26:08 am »
The IRC protocol predates Unicode, and has never been updated to dictate a specific charset, or an official command added to specify a charset, like many other Internet protocols have. As such, TIdIRC does not natively support Unicode.

Many IRC systems have developed their own ways to handle Unicode over the years. Most charset handling is typically done using client/server configurations.

If you have a specific charset you want to use, you can set the TIdIRC.IOHandler.DefStringEncoding (over the wire) and DefAnsiEncoding (AnsiString in memory) properties accordingly, such as to IndyTextEncoding_UTF8 or IndyTextEncoding(charset).

Using this unofficial IRC extension, if the TIdIRC.OnISupport event reports the 'UTF8ONLY' token, you can set the DefStringEncoding to IndyTextEncoding_UTF8.

But, setting the DefStringEncoding will apply the charset to entire commands/replies as a whole, not selectively to specific portions of them, so I don't know if this will have any unexpected side-effects at the IRC protocol level.
« Last Edit: December 19, 2024, 07:28:10 am by Remy Lebeau »
Remy Lebeau
Lebeau Software - Owner, Developer
Internet Direct (Indy) - Admin, Developer (Support forum)

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1186
  • Professional amateur ;-P
Re: About TIdIRC and Unicode
« Reply #2 on: December 19, 2024, 11:03:27 am »
Hey Remy,

Thank you so very much for this awesome explanation!!

I'll give it a go and report back here if it works and it does not create unintended side effects.

Cheers,
Gus
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1186
  • Professional amateur ;-P
Re: About TIdIRC and Unicode
« Reply #3 on: March 20, 2025, 11:24:42 pm »
Hey Remy,

Ok, I finally realised that I needed to actually change 2 properties not only one !!!

I'm now using a config setting to turn on and off support for Unicode because the Libera servers do not announce support for any Unicode on the I Support command, but the also do not filter it out.
This means no more cut-off messages due to some clients using some alternative of single quote/apostrophe instead of the ANSI one. This was driving me crazy!!! Batshit insane !!

Again, soooo many thanks for your support, you always deliver !!!

Cheers,
Gus
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

 

TinyPortal © 2005-2018