Recent

Author Topic: Is it supposed to be possible to build FPC in "Unicode" mode, or not?  (Read 8892 times)

Michl

  • Full Member
  • ***
  • Posts: 194
Re: Is it supposed to be possible to build FPC in "Unicode" mode, or not?
« Reply #15 on: April 18, 2016, 08:07:58 am »
...Their excuse - most spoken languages appear in the BMP range...
The language became younger and is changed. In written textes I can see more and more symbols like 👍 (thumbs up sign (U+1F44D)), defined at Unicode 6.0.0 (October 2010). So the above sentence became outdated.
Code: [Select]
type
  TLiveSelection = (lsMoney, lsChilds, lsTime);
  TLive = Array[0..1] of TLiveSelection;

mse

  • Sr. Member
  • ****
  • Posts: 286
Re: Is it supposed to be possible to build FPC in "Unicode" mode, or not?
« Reply #16 on: April 18, 2016, 08:14:42 am »
Absolute rubbish!
It seems you don't really read my post (not the first time IIRC). Please read it again.

Michl

  • Full Member
  • ***
  • Posts: 194
Re: Is it supposed to be possible to build FPC in "Unicode" mode, or not?
« Reply #17 on: April 18, 2016, 09:00:02 am »
@mse:
If I'm a newbie at developing, uses UTF-16, and want to check if a character is in a string:

Quote
const
  MyString = 'I like a Ä and a 🌵 in it.';
var
  i: Integer;
  c: Char;
begin
  c := 'a';  // work all the time
  c := 'Ä';  // work with ACP, work with UTF-16, failed with UTF-8
  c := '🌵';  // failed all the time
Code: Pascal  [Select]
  1.   for i := 1 to Length(MyString) do
  2.     if MyString[i] = c then
  3.       WriteLn(c, ' found at position ', i);
How would you explain me, why I can't use a '🌵' as char with used UTF-16?

The answer would be the same, as you give for the 'Ä' with used UTF-8.


(I'm not sure, that each browser shows the cactus (U+1F335) right?! For me it works: Windows7, Firefox)
« Last Edit: April 18, 2016, 09:21:48 am by Michl »
Code: [Select]
type
  TLiveSelection = (lsMoney, lsChilds, lsTime);
  TLive = Array[0..1] of TLiveSelection;

mse

  • Sr. Member
  • ****
  • Posts: 286
Re: Is it supposed to be possible to build FPC in "Unicode" mode, or not?
« Reply #18 on: April 18, 2016, 09:27:10 am »
@mse:
If I'm a newbie at developing, uses UTF-16, and want to check if a character is in a string:

Quote
const
  MyString = 'I like a Ä and a  in it.';
var
  i: Integer;
  c: Char;
begin
  c := 'a';  // work all the time
  c := 'Ä';  // work with ACP, work with UTF-16, failed with UTF-8
  c := '';  // failed all the time
  for i := 1 to Length(MyString) do
    if MyString = c then
      WriteLn(c, ' found at position ', i);
end.
If you want hire me, I am no newbie and I know that '' is not in BMP when I type it. ;-)
It even shows in my browsers, see attachment. I assume tk also is no newbie and example shows that it is really safe because it will not compile.
Code: Text  [Select]
  1. Free Pascal Compiler version 3.0.1 [2015/12/22] for i386
  2. Copyright (c) 1993-2015 by Florian Klaempfl and others
  3. Target OS: Linux for i386
  4. Compiling test.pas
  5. Compiling main.pas
  6. main.pas(25,6) Error: UTF-8 code greater than 65535 found
  7.  
« Last Edit: April 18, 2016, 10:37:45 am by mse »

Michl

  • Full Member
  • ***
  • Posts: 194
Re: Is it supposed to be possible to build FPC in "Unicode" mode, or not?
« Reply #19 on: April 18, 2016, 10:07:14 am »
It even shows in my browsers, see attachment.
Better do it with Lazarus ;), see attachment.
Code: [Select]
type
  TLiveSelection = (lsMoney, lsChilds, lsTime);
  TLive = Array[0..1] of TLiveSelection;

Graeme

  • Hero Member
  • *****
  • Posts: 1430
    • Graeme on the web
Re: Is it supposed to be possible to build FPC in "Unicode" mode, or not?
« Reply #20 on: April 18, 2016, 12:01:39 pm »
It seems you don't really read my post (not the first time IIRC). Please read it again.
Yes I did read your post, and I read it again to double check. I understand it as you meaning that if you type text into a program that you as the programmer wrote, you know what limitations you can apply, because you know your needs. What I'm saying is, if you develop a Text Editor (or any application that has text input) as a product you want to sell, you as a developer cannot make the assumption that the end-user will never use  code points above the BMP. Yet that is what you are doing and suggesting here on the forum. That is what I consider wrong and sloppy.

On the other hand, if you use UTF-8 correctly [1], you automatically support the whole Unicode range.

[1] When I say “correctly”, I mean don't do rubbish like the following. Everybody should know by now that you can't do that for any Unicode encoding.
Code: Pascal  [Select]
  1. var
  2.   i: integer;
  3.   c: char;
  4. begin
  5.   for i := 1 to Length(s)
  6.     c := s[i];
  7. end;
  8.  
--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

Graeme

  • Hero Member
  • *****
  • Posts: 1430
    • Graeme on the web
Re: Is it supposed to be possible to build FPC in "Unicode" mode, or not?
« Reply #21 on: April 18, 2016, 12:13:39 pm »
How would you explain me, why I can't use a…
I believe we agree on the issues, but for clarity left me answer that.

  • The char type is not large enough to hold all possible Unicode encoded “character” (what you see on screen). It is better to simply forget about the Char type.
  • Everybody should know by now what Length() returns. Length is used on Sets and Arrays too. So think of a text strings as an array. Length() doesn't return number of bytes or characters - it returns the number of elements in the array. The size and typo of that element can vary.
  • You simply cannot use indexed access (as FPC currently supports) on string types and expect it to return a “character”. That is NOT what you are getting back. You are getting back the element of the array (remember point 2 above).

Quote
(I'm not sure, that each browser shows the cactus (U+1F335) right?! For me it works: Windows7, Firefox)
In my Firefox 44.0.2 under FreeBSD it shows something, but that something doesn't look like a cactus. Yet on my iPad (first edition), using the Safari browser, it does display a nice green cactus. :)
--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

mse

  • Sr. Member
  • ****
  • Posts: 286
Re: Is it supposed to be possible to build FPC in "Unicode" mode, or not?
« Reply #22 on: April 18, 2016, 12:16:43 pm »
What I'm saying is, if you develop a Text Editor (or any application that has text input) as a product you want to sell, you as a developer cannot make the assumption that the end-user will never use  code points above the BMP. Yet that is what you are doing and suggesting here on the forum.
Now that is "rubbish". ;-)
I never wrote that.
Quote
That is what I consider wrong and sloppy.
Agreed.
Quote
On the other hand, if you use UTF-8 correctly [1], you automatically support the whole Unicode range.
And you loose performance if you handle a codepoint everywhere as a string instead a codeunit even if it is guaranteed that it fits in a codeunit.
Admitted, with utf-8 even Russian and German pupils need to do so. ;-)
« Last Edit: April 18, 2016, 12:29:06 pm by mse »