Recent

Author Topic: Stop wasting time on FPC 2.6 and Laz 1.4!  (Read 49881 times)

Graeme

  • Hero Member
  • *****
  • Posts: 1428
    • Graeme on the web
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #30 on: November 02, 2015, 11:29:39 am »
Note, I wrote "codeunit" which is a 16 bit word in UTF-16.
I think you are referring to the other decomposed representation with 2 codepoints.
Ah, I misread the "codeunit" part. So the answer to your question is:  Yes and No.  ;-)
--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4554
  • I like bugs.
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #31 on: November 02, 2015, 11:34:39 am »
Why we are talking about some minor things like "umlauts" while personally I need to work with cyrillic symbols? :) It is really strange to me this thread because I have no any troubles with multi-encoding text sources. May be I missed something. At the Linux UTF8 is natural and for the Windows we have nice LConvEncoding unit.

This is a good point. Many applications don't need to care about Unicode details at all. The details are encapsulated in library functions and GUI controls. For example iterating over Unicode characters is not needed very often really.
I also think the problems are exaggerated! Lots of energy wasted for the endless debates.

There are projects that must deal with Unicode details, for example Unicode aware editors like SynEdit, but they are a minority among all projects.
« Last Edit: November 02, 2015, 12:05:19 pm by JuhaManninen »
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

Graeme

  • Hero Member
  • *****
  • Posts: 1428
    • Graeme on the web
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #32 on: November 02, 2015, 11:37:04 am »
I just wanted to point out the fact that experience shows that utf-16 for GUI is much more convenient. ;-)
And I would like to point out that if they use code like you showed, then they only support the BMP of Unicode - thus not the Unicode standard entirely. So then they should rather specify that they only support UCS2, and not Unicode (just like MSEgui does). Use the Unicode standard correctly, or don't use it at all. When you use UTF-8, you automatically support the full unicode code-point range, without any extra effort (bar you stop using String Index hacks). That's much more convenience for me, because I actually use codepoints above BMP in some of my projects. ;-)
--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

mse

  • Sr. Member
  • ****
  • Posts: 286
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #33 on: November 02, 2015, 11:39:50 am »
Such examples are "artificial" indeed. Plus that example code is broken even for UTF-16 (code-points above BMP). If you say you are supporting Unicode, the support it ALL, even code-points above BMP.
You are wrong, 'Ä' is in BMP so the code will work with any text in "thestring" including surrogate pairs.
I recently implemented surrogate pair handling in MSEgui and utf-8<>utf-16 conversion and can judge how more complex the handling of the 1..4 bytes of utf-8 is than the utf-16 surrogate pairs. BTW in more than 10 years that MSEgui used UCS2 there was not a single bugreport about that and MSEide+MSEgui has Chinese users...

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #34 on: November 02, 2015, 12:05:29 pm »
Why we are talking about some minor things like "umlauts" while personally I need to work with cyrillic symbols? :) It is really strange to me this thread because I have no any troubles with multi-encoding text sources. May be I missed something. At the Linux UTF8 is natural and for the Windows we have nice LConvEncoding unit.

This is a good point. Many applications don't need to care about Unicode details at all. The details are encapsulated in library functions and GUI controls. For example iterating over Unicode characters is not needed very often really.
I also think the problems are exaggerated! Lots of energy wasted for the endless debates.

There are projects that must deal with Unicode details, for example Unicode aware editors like SynEdit, but they are a minority among all applications.
yeah true no one uses csv parsers, xml, json, xmpp, pascal, javascript, C or other language parsers, html enginees, rtf enginees, text drawing etc, only a small minority of projects do. Yeah right how about spam filters, search engines & sql filters?

Don't get me wrong I really don't care if the lcl/lazarus/fpc use only utf8, stay away from the basic types (ee unicodestring, ansistring, shortstring) that are supposed to have specific encoding/sizes (ee I couldn't care less what you do with the string type) so I don't have to rewrite, retest and reaprove my libraries and you can use what ever you want. Is it going to affect my future with lazarus? of course it is, speed, easy of use debugging capabilities, typing, everything is evaluated in a daily bases if you fall under a threshold I'm going to drop the tool.

Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12050
  • FPC developer.
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #35 on: November 02, 2015, 01:03:39 pm »
Such examples are "artificial" indeed. Plus that example code is broken even for UTF-16 (code-points above BMP). If you say you are supporting Unicode, the support it ALL, even code-points above BMP.
You are wrong, 'Ä' is in BMP so the code will work with any text in "thestring" including surrogate pairs.

As said, it is not on Mac, where it is denormalized.

mse

  • Sr. Member
  • ****
  • Posts: 286
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #36 on: November 02, 2015, 01:30:23 pm »
As said, it is not on Mac, where it is denormalized.

If MSEgui would be ported to OSX I probably would normalise filenames and GUI text to precomposed form as I transparently encode invalid utf-8 sequences in Linux filenames in special utf-16 values.

Michl

  • Full Member
  • ***
  • Posts: 226
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #37 on: November 02, 2015, 01:55:38 pm »
Don't get me wrong I really don't care if the lcl/lazarus/fpc use only utf8, stay away from the basic types (ee unicodestring, ansistring, shortstring)...
It is not right, you can use all string types further and it is not so hard to design a real ACP project (a UTF16-project I don't know). You can use the the switch -dDisableUTF8RTL and save your files in the ACP encoding. Thats it. See also http://wiki.freepascal.org/Lazarus_with_FPC3.0_without_UTF-8_mode.

Lazarus 1.5 (with UTF8) compiled with FPC3.0.0RC2 has imho lesser bugs than with FPC2.6.4 (I make some tests and I count not so much problems with ACP-Projects as I thought, so it should be solvable).

@mse: How can you create a clean UTF-16 project with FPC? If I save a project file to UCS-2LE or UCS-2BE, I'm not able to compile it. It doesn't work, cause FPC doesn't know the codepage. There is no option like -Fccp1200 or -FcUTF16 etc. or I'm wrong?
Code: [Select]
type
  TLiveSelection = (lsMoney, lsChilds, lsTime);
  TLive = Array[0..1] of TLiveSelection;

mse

  • Sr. Member
  • ****
  • Posts: 286
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #38 on: November 02, 2015, 02:25:25 pm »
In MSEgui utf-16 is used for internal GUI tasks only. Files and inter-process data are in utf-8 or locale encoding depending on parameters or properties.
MSEgui has a complete set of utf-16 utiliy units, one does not need text encoding dependent Free Pascal RTL and FCL functions and classes.
I don't think it is desirable to use utf-16 for files.

Graeme

  • Hero Member
  • *****
  • Posts: 1428
    • Graeme on the web
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #39 on: November 02, 2015, 04:49:17 pm »
You are wrong, 'Ä' is in BMP so the code will work with any text in "thestring" including surrogate pairs.
Sorry, maybe I was unclear. I meant using similar code to your example, but with a comparison character (code-point) above BMP will not work, because above BMP uses surrogate pairs in UTF-16, and Object Pascal's string indexes can't handle that. That's what I meant by that code example being flawed or "artificial".
--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

mse

  • Sr. Member
  • ****
  • Posts: 286
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #40 on: November 02, 2015, 05:29:37 pm »
Then the compiler will throw an error. But that will never happen for German or Russian pupils, even Klingon is in BMP AFAIK. ;-)

BeniBela

  • Hero Member
  • *****
  • Posts: 921
    • homepage
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #41 on: November 02, 2015, 06:33:26 pm »


Such [broken] examples stem from that fact that Delphi supported AnsiString for so long, and developers got use to the "hack" of referencing characters as string indexes. With Unicode you simply can't do that any more. You need to use functions that correctly find the character you want by using byte offset or code-point lookup in a string. Taking Unicode normalization into account is also crucial for text comparison.

Perhaps the best thing would be to remove string indexes all-together

Most people are not using them correctly anyways.

for-loops can go, too

Replace it all with proper map and filter functionality

Pupils often use code like
Code: Pascal  [Select][+][-]
  1.  if thestring[n] = 'Ä' then begin
  2.   ...
  3.  end;
  4.  

Even UTF-16 might be denormalized

I think there should be an initiative to build a set of easy to use dedicated dataedit and datadisplay widgets with a unified interface as MSEgui provides. In MSEgui above code looks like
Code: Pascal  [Select][+][-]
  1.  thedisplay.value:= theedit1.value + theedit2.value;
  2.  
All data editwidgets have "onsetvalue" with a dedicated "value" parameter and "ondataentered" events:
Code: Pascal  [Select][+][-]
  1.  setintegereventty = procedure(const sender: tobject; var avalue: integer;
  2.                           var accept: boolean) of object;
  3.  setbooleaneventty = procedure(const sender: tobject; var avalue: boolean;
  4.                           var accept: boolean) of object;
  5. ...
  6.  notifyeventty = procedure (const sender: tobject) of object; //for ondataentered
  7.  
  8.  


Seems they need to use a spin edit


Please read the German Lazarus forum, there are many questions about Lazarus and umlauts.

Germany is just full of idiots. Should have nuked them instead Japan




skalogryz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2770
    • havefunsoft.com
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #42 on: November 02, 2015, 07:06:55 pm »
Germany is just full of idiots. Should have nuked them instead Japan
Please add a smiley to the end of the statement. One might not have enough sense of humor.

Lot worse, if others will start to use the same kind of statements in their as their technical arguments. Those who might not realize that you're actually German.
« Last Edit: November 02, 2015, 07:15:09 pm by skalogryz »

Thaddy

  • Hero Member
  • *****
  • Posts: 16580
  • Kallstadt seems a good place to evict Trump to.
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #43 on: November 02, 2015, 07:32:33 pm »
@Juha:
Your write up about the algorithm not correct since a masked  lookup table in  a four byte format is O(log N).. I wonder what you are referring to? implementation details in FPC or the specification? It is not truly 100% but you would write the algorithm up as a case of log N, not N. Just apply the four byte mask. This is because the computational efficiency of applying the mask over the range is negligible compared to the look up afterwards. It has linear  properties, but not in effect. (The specs are) It is not memory efficient. That is true. But computationally in can be solved in O(log N) because the necessary masks are known and can't be repeated over the whole set of all UTF encodings. I mean a sort is sort of  there if you look at it the proper way. Sort it out ;)

In other words the computational effort for UNICODE (any) is indeed more than standard ascii, but the computational effort should be not over-estimated. It is when properly implemented an effort of  at most O(log N) above the ANSI handling.
« Last Edit: November 02, 2015, 08:00:03 pm by Thaddy »
But I am sure they don't want the Trumps back...

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12050
  • FPC developer.
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #44 on: November 02, 2015, 10:05:23 pm »
As said, it is not on Mac, where it is denormalized.

If MSEgui would be ported to OSX I probably would normalise filenames and GUI text to precomposed form as I transparently encode invalid utf-8 sequences in Linux filenames in special utf-16 values.

That assumes normalization-denormalization is cycle stable (as in s=denorm(norm(s)). I don't know if it is, since such conversions are usually only done just before rendering a string on screen, not during processing

 

TinyPortal © 2005-2018