Recent

Author Topic: Stop wasting time on FPC 2.6 and Laz 1.4!  (Read 50331 times)

mischi

  • Full Member
  • ***
  • Posts: 178
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #45 on: November 02, 2015, 10:23:03 pm »
That assumes normalization-denormalization is cycle stable (as in s=denorm(norm(s)). I don't know if it is, since such conversions are usually only done just before rendering a string on screen, not during processing
It is not cycle stable. I forgot the details of the example, except that one part was Å.

BeniBela

  • Hero Member
  • *****
  • Posts: 922
    • homepage
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #46 on: November 02, 2015, 11:59:56 pm »

@Juha:
Your write up about the algorithm not correct since a masked  lookup table in  a four byte format is O(log N).. I wonder what you are referring to?

What are you talking about?

I do not think you and Juha are talking about the same thing.

Juha means the time to find the n-th character of the string

Do you mean the time to decode an utf-8 character to a unicode codepoint?

Please add a smiley to the end of the statement. One might not have enough sense of humor.

Lot worse, if others will start to use the same kind of statements in their as their technical arguments. Those who might not realize that you're actually German.

But I was serious

As German, I am always serious

Graeme

  • Hero Member
  • *****
  • Posts: 1428
    • Graeme on the web
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #47 on: November 03, 2015, 12:22:36 am »
Then the compiler will throw an error.
Nope, FPC 2.6.4 did not throw an error here when compiling a simple test program with sample code as you showed, and a comparison text character in the Unicode SMP range. It also returned the wrong result at runtime - as expected.

I used one of the Unicode Domino Tile code-points.... u+1F030  as my test character.
 http://www.alanwood.net/unicode/domino-tiles.html
--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

malcome

  • Jr. Member
  • **
  • Posts: 81
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #48 on: November 03, 2015, 01:39:46 am »
« Last Edit: November 03, 2015, 01:43:29 am by malcome »

mse

  • Sr. Member
  • ****
  • Posts: 286
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #49 on: November 03, 2015, 07:37:27 am »
Nope, FPC 2.6.4 did not throw an error here when compiling a simple test program with sample code as you showed, and a comparison text character in the Unicode SMP range. It also returned the wrong result at runtime - as expected.
Hmm, it probably changes "character = character" to "string" = string" then (not tested).
That proably can be avoided by
Code: Pascal  [Select][+][-]
  1.  if thestring[n] = UnicodeChar('<theUnicodeCharConstant>') then begin
  2.   ...
  3.  end;
  4.  
(not tested).
Anyway, German and Russian pupils (and some other people too) will be happier with utf-16 than utf-8 thats all I wanted to point out. ;-)
« Last Edit: November 03, 2015, 07:51:56 am by mse »

mse

  • Sr. Member
  • ****
  • Posts: 286
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #50 on: November 03, 2015, 07:47:54 am »
That assumes normalization-denormalization is cycle stable (as in s=denorm(norm(s)). I don't know if it is, since such conversions are usually only done just before rendering a string on screen, not during processing
AFAIK there are two canonical forms defined, I don't know the details.
Another question in this regard:
You write that umlauts on Mac are decomposed. How does Free Pascal handle string constants on Mac? Does it decompose them in order to make the sources portable?

Michl

  • Full Member
  • ***
  • Posts: 226
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #51 on: November 03, 2015, 07:52:40 am »
Code: Pascal  [Select][+][-]
  1.  if thestring[n] = UnicodeChar('Ä') then begin
  2.   ...
  3.  end;
  4.  
But you only find the BMPs not the Chars with Combining Diacritical Marks. With UTF8 you have to understand it very early, with UTF16 it works in 99% and than :o

How many "ö" are in that strings? With no normalisation I find only 1, with UTF8 and and With UTF16 (I've tested):
Code: Pascal  [Select][+][-]
  1. const
  2.   s1 = 'Flöhe';
  3.   s2 = 'Flöhe';
Code: [Select]
type
  TLiveSelection = (lsMoney, lsChilds, lsTime);
  TLive = Array[0..1] of TLiveSelection;

mse

  • Sr. Member
  • ****
  • Posts: 286
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #52 on: November 03, 2015, 08:05:30 am »
But you only find the BMPs not the Chars with Combining Diacritical Marks. With UTF8 you have to understand it very early, with UTF16 it works in 99% and than :o
The decomposed / precomposed problem is independent from encoding, most of the Combining Diacritical Marks are in BMP too.
Anyway I just thought that Lazarus people were interested on the experience of MSEide+MSEgui which used 16 bit characters from start.
I do not want to persuade anybody. :-)

Michl

  • Full Member
  • ***
  • Posts: 226
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #53 on: November 03, 2015, 08:19:44 am »
The decomposed / precomposed problem is independent from encoding, most of the Combining Diacritical Marks are in BMP too.
And I think, if you have understand, how to check this. There is no big difference to use UTF8 or UTF16. Just my two cents.
Code: [Select]
type
  TLiveSelection = (lsMoney, lsChilds, lsTime);
  TLive = Array[0..1] of TLiveSelection;

Graeme

  • Hero Member
  • *****
  • Posts: 1428
    • Graeme on the web
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #54 on: November 03, 2015, 10:43:27 am »
Hmm, it probably changes "character = character" to "string" = string" then (not tested).
No idea, but that it is a serious silent error which the developer would not know about, and would be extremely hard to debug weeks after it was written. It is these pitfalls which I don't like about UTF-16 (in terms of Object Pascal). Many seem to treat UTF-16 text as if it is UCS2, and completely forgets (or ignores) the Unicode SMP range. That is not "convenience", but a huge programmer error.

Quote
That probably can be avoided by...
And there you loose the "convenience" of treating UTF-16 like AnsiString or UCS2. You had to know about the issue (FPC not raising an error), which apparently nobody knows about, and then modify the code to suite (untested solution).

Instead, the following UTF-8 code (as would be written in fpGUI) works 100% with the full Unicode range... BMP and SMP. No hidden issues. The code, when read by a human, is obvious in functionality, and can easily be added in an iteration loop.
Code: Pascal  [Select][+][-]
  1. procedure MainProc;
  2. var
  3.   s: TfpgString;
  4.   c: TfpgChar;
  5. begin
  6.   s := 'Hello World Ä&#127024; o&#128523;eo';
  7.   c := fpgCharAt(s, 13);
  8.   if c = 'Ä' then
  9.      writeln('True')
  10.   else
  11.      writeln('False');
  12. end;
  13.  
(tested and produces the correct results even if I use the Domino Tile or Emoji characters).

And if I had to write code that iterates over characters in a UTF-8 encoded string, I would use a dedicated UTF-8 String Iterator instead. This will be more optimised than the above code (keeping track of byte offsets etc), and will correctly handle the full Unicode range without issues. An example of such Iterator usage is:

Code: Pascal  [Select][+][-]
  1. var
  2.   itr: ICharIterator; // interface reference
  3.   s := TfpgString;
  4.   c := TfpgChar;
  5. begin
  6.   s := 'Hello World Ä&#127024; o&#128523;eo';
  7.   itr := gIteratorFactory.CharIterator(s);
  8.   while itr.HasNext do
  9.   begin
  10.     c := itr.Next;
  11.     // do something with c
  12.   end;
  13.  
The Iterator interface supports HasNext, Next, HasPrevious, Previous etc functions.


NOTE:  This Forum software changed my string constants in the code examples to escaped sequences. They meant to display normal text strings containing Unicode characters.
« Last Edit: November 03, 2015, 10:45:43 am by Graeme »
--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4565
  • I like bugs.
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #55 on: November 03, 2015, 01:27:00 pm »
The iterator is nice. It would help with any encoding.

Now I would like to remind that LCL will have a Delphi compatible UTF-16 support when the time is right. It may even be easy to implement once other libs are ready!
LCL itself does not analyze text beyond ASCII area. It only calls functions provided by RTL and LazUtils package. All Unicode related stuff is pretty well encapsulated already.

Things look good :

- The new UTF-8 system works amazingly well. For many applications it is source compatible with Delphi already. It is much less of a hack than the currently used UTF8...() functions are.

- Strings with the default system codepage on Windows with FPC 3.0 can be used in a clean way after all, thanks to Michl's work. Then interaction with LCL must go through the dedicated UTF8...() functions, but the code can use all features from FPC 3.0 which is a big benefit.

- Delphi compatible UTF-16 support will come later.
The discussion here is not about "should LCL implement UTF-16?", it can be seen as an academic debate instead.

Discussion often turns into problems even if they are small.
The biggest problem with Unicode still is that one glyph can be presented in 2 alternative ways, using 1 or 2 codepoints. No encoding or programming library can solve it because it is a built-in oddity of Unicode.
Why they made it? Typical committee decision...

I agree with the subject "Stop wasting time on FPC 2.6 and Laz 1.4!".
I look forward to the next releases.
« Last Edit: November 03, 2015, 03:12:01 pm by JuhaManninen »
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

Graeme

  • Hero Member
  • *****
  • Posts: 1428
    • Graeme on the web
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #56 on: November 03, 2015, 01:58:25 pm »
The iterator is nice. It would help with any encoding.
Indeed. The most important part for me is that it also hides implementation detail of what you are iterating. So the usage stays the same, even if you iterate characters in a string or objects in a TObjectList. I have written many such iterators and register them with the iterator factory class for easy use. I use them more and more in my own projects, instead of a FOR loop for example.

For a full discussion of the Iterator design pattern, its benefits, and implementation example (even combining regex to filter results), see an article I wrote a couple years back, 2008-11: Iterator Pattern. PDF and source code are available for download.

  http://geldenhuys.co.uk/articles/

--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

tr_escape

  • Sr. Member
  • ****
  • Posts: 434
  • sector name toys | respect to spectre
    • Github:
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #57 on: November 03, 2015, 02:01:48 pm »
Hello,

I upgraded Delphi 6 to Delphi XE3 about 3 years ago and all my old projects have to change some ansistring operations.
Because normal string varaible and some old components not compitable like as Com Port components.

Actually I just wondering ofcourse one day all we are left the fpc 2.6 to other fpc but what will happen on our old codes?

Thanks a lot.


minesadorada

  • Sr. Member
  • ****
  • Posts: 452
  • Retired
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #59 on: November 03, 2015, 07:13:14 pm »
This is all a bit scary for folk that simply want to code apps that are multinational.

Ideally all string manipulation code is compatible with English, Arabic, Chinese, Hebrew etc character sets - yet reading these threads seems to suggest that multinational applications will have lots of localisation problems unless special code and compiler options are used.

Is this true with the latest versions of FPC and the Lazarus LCL?
GPL Apps: Health MonitorRetro Ski Run
OnlinePackageManager Components: LazAutoUpdate, LongTimer, PoweredBy, ScrollText, PlaySound, CryptINI

 

TinyPortal © 2005-2018