Recent

Author Topic: Stop wasting time on FPC 2.6 and Laz 1.4!  (Read 49882 times)

stocki

  • Full Member
  • ***
  • Posts: 144
Stop wasting time on FPC 2.6 and Laz 1.4!
« on: October 31, 2015, 02:32:25 pm »
Stop wasting time on FPC 2.6 and Laz 1.4! These versions are full of bugs. Go to 3.1 and 1.5 finally. FPC&Laz dev is so slowly. Since years we have this buggy 2.6 version. 3.1 has more than 1000 fixes. Why loosing time with 2.6?

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4554
  • I like bugs.
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #1 on: October 31, 2015, 03:13:10 pm »
Stop wasting time on FPC 2.6 and Laz 1.4! These versions are full of bugs. Go to 3.1 and 1.5 finally. FPC&Laz dev is so slowly. Since years we have this buggy 2.6 version. 3.1 has more than 1000 fixes. Why loosing time with 2.6?

I would now recommend FPC 3.0RC2 + Lazarus trunk. The next Lazarus release will also use FPC 3.0.
Lazarus got some new cool features and its configuration files' format changed. There was a period of instability but now things look good. Trunk is again good for any experienced user, although not recommended for production work.
And, did I mention that trunk has some new cool features? :)
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

stocki

  • Full Member
  • ***
  • Posts: 144
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #2 on: October 31, 2015, 03:54:03 pm »
FPCLaz still suffers on the string nightmare. We should go to unicode strings (utf16) finally. utf8 is for special cases only.

mischi

  • Full Member
  • ***
  • Posts: 178
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #3 on: October 31, 2015, 04:07:12 pm »
Here comes the fish:    ><}}}°>

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4554
  • I like bugs.
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #4 on: October 31, 2015, 05:10:40 pm »
FPCLaz still suffers on the string nightmare. We should go to unicode strings (utf16) finally. utf8 is for special cases only.

Would you please stop that nonsense. Both UTF-16 and UTF-8 are Unicode. If you don't believe me, you can search Internet. It has plenty of information.
If somebody truly wants to speed up the Delphi compatible UTF-16 solution, he should help to implement it.
Learn what is missing from FPC and Lazarus, then provide patches. Until you do that, you can use UTF-8 for your "special cases".
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

rtusrghsdfhsfdhsdfhsfdhs

  • Full Member
  • ***
  • Posts: 162
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #5 on: October 31, 2015, 11:27:55 pm »
FPCLaz still suffers on the string nightmare. We should go to unicode strings (utf16) finally. utf8 is for special cases only.

Would you please stop that nonsense. Both UTF-16 and UTF-8 are Unicode. If you don't believe me, you can search Internet. It has plenty of information.
If somebody truly wants to speed up the Delphi compatible UTF-16 solution, he should help to implement it.
Learn what is missing from FPC and Lazarus, then provide patches. Until you do that, you can use UTF-8 for your "special cases".

UTF8 is only for the web/IO. It is slow to parse but we all know that right? :)
Let me remind you that character indexing of UTF-8 strings is O(n). Longer the string slower to parse.

By the way does FPC support UTF16 string yet?
« Last Edit: October 31, 2015, 11:31:06 pm by Fiji »

BeniBela

  • Hero Member
  • *****
  • Posts: 921
    • homepage
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #6 on: October 31, 2015, 11:45:52 pm »
Let me remind you that character indexing of UTF-8 strings is O(n). Longer the string slower to parse.


Indexing in UTF-16 strings is also O(n)
 

rtusrghsdfhsfdhsdfhsfdhs

  • Full Member
  • ***
  • Posts: 162
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #7 on: November 01, 2015, 12:15:25 am »
Indexing in UTF16 is constant time... why does QT use UTF16 again?  8-)

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4554
  • I like bugs.
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #8 on: November 01, 2015, 01:07:44 am »
UTF8 is only for the web/IO.

Wrong. UTF-8 is used for example by LCL and by Android.

Quote
It is slow to parse but we all know that right? :)
Let me remind you that character indexing of UTF-8 strings is O(n). Longer the string slower to parse.

Wrong. Often in real code you can use byte index instead of codepoint index. The special properties of UTF-8 make it possible.
We have collected even some examples in wiki about it.

Quote
Indexing in UTF16 is constant time...

Are you saying UTF-16 is a fixed-width encoding? No it is not. You should get your facts right.

Quote
why does QT use UTF16 again?

Ask them. Maybe they followed Windows which was the dominant system then.

Quote
By the way does FPC support UTF16 string yet?

Yes, the misleading name "UnicodeString" means UTF-16 String. I think you knew it.
Are you angry for FPC and Lazarus developers for not giving a fully Delphi compatible UTF-16 system yet?
As I wrote the process can be sped up by helping to implement it.
« Last Edit: November 01, 2015, 01:10:53 am by JuhaManninen »
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

rtusrghsdfhsfdhsdfhsfdhs

  • Full Member
  • ***
  • Posts: 162
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #9 on: November 01, 2015, 03:49:58 am »
Look its slower provide a benchmark that shows its faster. Until then my facts stand.

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #10 on: November 01, 2015, 06:07:58 am »
Look its slower provide a benchmark that shows its faster. Until then my facts stand.
what facts? Any benchmark we can test?
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

Graeme

  • Hero Member
  • *****
  • Posts: 1428
    • Graeme on the web
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #11 on: November 01, 2015, 12:51:42 pm »
Quote
Indexing in UTF16 is constant time...

Are you saying UTF-16 is a fixed-width encoding? No it is not. You should get your facts right.

Yup, Fiji seems to like spreading FUD.

I've seen no definitive benchmarks that prove UTF-8 being slower to parse than UTF-16. And when I refer to UTF-16, make sure it supports both UTF-16LE and UTF-16BE and multi-codepoint (above BMP) - that is the only way UTF-16 is then equal to UTF-8 for a true comparison.
--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4554
  • I like bugs.
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #12 on: November 01, 2015, 12:58:39 pm »
Look its slower provide a benchmark that shows its faster. Until then my facts stand.

Benchmarks would be interesting but they should present valid bug-free code, not the sloppy and buggy kind that you are proposing.

Let's repeat the facts. Unicode currently defines over 110,000 codepoints.
According to a BabelStone page
  http://babelstone.blogspot.fi/2005/11/how-many-unicode-characters-are-there.html
there are 120737 characters but I guess it includes multi-codepoint accented characters. Number of codepoints is little lower. (Where can I find the exact number of codepoints?)
The graphs show (up to 2014) how the amount keeps growing.

One 16-bit word in UTF-16 can directly hold 2^16 = 65536 unique numbers. It means some ~50000 codepoints don't fit in a 16-bit word and must be encoded using surrogate pairs (32 bits).
Code that ignores those codepoints is broken. Sure, those are rare codepoints and the bugs will pop up only sometimes. It is still not acceptable. If you have a mathematical algorithm that gives wrong results "sometimes", it is considered broken and must be fixed. Why should code for character encodings be different?

Fiji, you keep repeating this false information, basically claiming that UTF-16 is fixed width.
  http://forum.lazarus.freepascal.org/index.php/topic,28660.msg179684.html#msg179684
Unfortunately you are not the only one doing so. Let's see what has caused this misconception ...

Delphi switched to UTF-16 strings at 2009. It was a big change. Obviously customers asked troublesome questions like "How compatible is it?" and "How much conversion work we must do?"
The marketing team, being creative, decided to say "Yeah, yeah, it is compatible. No worries!" instead of explaining technical details about surrogate pairs or multi-codepoint characters.
They were mostly worried about their sales.

Part of the same marketing tactics was to name the new string type as UnicodeString. For the sake of symmetry it should be UTF16String because there is also UTF8String. It is apparently so confusing that "stocki" still at late 2015 believes that Unicode = UTF-16, despite all the information available in the net.

UTF-16 surrogate pairs don't have the same inherent properties as UTF-8 multi-byte codepoints have. If you use the fast Pos() etc. functions, it can go wrong sometimes. With UTF-8 it goes always right. Thus UTF-8 is faster in real-world applications when used cleverly.

UTF-16 was invented a long time ago. Then there were less than 65536 codepoints and it had a fixed width. This one big benefit was lost when the amount of codepoint definitions grew. UTF-16 has also other issues, like the CPU endiannes dependency.
Yet, I am not against it. A Delphi compatible Unicode system must be implemented. Backwards campatibility must always deal with technical decisions made in the past. No problem.

Unicode is complex. We just have to accept it. Now the discussion is only about codepoints, but even if you can find codepoints 100% accurately, you still don't know if it's part of a multi-codepoint accented character.
No encoding can solve that, not even UTF-32, because they are defined in Unicode character level instead of encoding level.
« Last Edit: November 01, 2015, 01:33:01 pm by JuhaManninen »
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

Thaddy

  • Hero Member
  • *****
  • Posts: 16580
  • Kallstadt seems a good place to evict Trump to.
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #13 on: November 01, 2015, 03:12:26 pm »
Let me remind you that character indexing of UTF-8 strings is O(n). Longer the string slower to parse.

By the way does FPC support UTF16 string yet?

@Fiji:
Finish your studies first.

I assume you have the look-up algoritm in mind? (Anything else is bull)
1. It's not O(N) it is O(Log N) If you are writing compilers or any code  with O(N) UTF8 support, plz let me notify you, you will get fired. And above all it is in the specification. (But you never read documentation, let alone specifications, I know)
2. FPC supports ALL unicode modes (the compiler that is, not the libraries) . Lazarus doesn't.
3. Please, please refrain from writing anything on any of the forums pretending to know when almost everybody that means something has to put you wrong time and time over again. I am fed up with this.

And to stay on the subject:
I more or less agree with Stocki, but hardly use Lazarus.

[edit]
One of the solutions to achieve O(log N) is to apply a 4 byte mask. But frankly, I am wasting my time.
« Last Edit: November 01, 2015, 03:49:33 pm by Thaddy »
But I am sure they don't want the Trumps back...

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4554
  • I like bugs.
Re: Stop wasting time on FPC 2.6 and Laz 1.4!
« Reply #14 on: November 01, 2015, 04:47:02 pm »
Let me remind you that character indexing of UTF-8 strings is O(n). Longer the string slower to parse.

I assume you have the look-up algoritm in mind? (Anything else is bull)
1. It's not O(N) it is O(Log N) If you are writing compilers or any code  with O(N) UTF8 support, plz let me notify you, you will get fired. And above all it is in the specification. (But you never read documentation, let alone specifications, I know)

No, it is O(n) if you must use UTF-8 codepoint positions instead of byte positions, at least when using functions provided by LazUtils.
For example accessing the Nth UTF-8 character you call:
Code: Pascal  [Select][+][-]
  1. NthChar := UTF8Copy(MyString, N, 1);
which calls internally UTF8CharStart which loops over the string but has no nested loops, thus linear O(n).
Code: Pascal  [Select][+][-]
  1. function UTF8CharStart(UTF8Str: PChar; Len, CharIndex: PtrInt): PChar;
  2. var
  3.   CharLen: LongInt;
  4. begin
  5.   Result:=UTF8Str;
  6.   if Result<>nil then begin
  7.     while (CharIndex>0) and (Len>0) do begin
  8.       CharLen:=UTF8CharacterLength(Result);
  9.       dec(Len,CharLen);
  10.       dec(CharIndex);
  11.       inc(Result,CharLen);
  12.     end;
  13.     if (CharIndex<>0) or (Len<0) then
  14.       Result:=nil;
  15.   end;
  16. end;
  17.  
Indexing for good old AnsiStrings is constant O(1).
Code: Pascal  [Select][+][-]
  1. ch := MyString[N];
The misconception was that indexing for UTF-16 string is also constant O(1).
It must be corrected because it appears to be so common.

Typically O(Log N) can be reached only with sorted data, in a way or another, like binary tree or sorted list.
You mentioned "writing compilers". Typical compilers or parsers don't need to care about Unicode because they work in ASCII area. So, magically they continue to work when switched to Unicode encoding. This applies to both UTF-16 and UTF-8.

Quote
2. FPC supports ALL unicode modes (the compiler that is, not the libraries) . Lazarus doesn't.

Yes but the Delphi compatible UTF-16 solution will come. This discussion about encodings does not affect it.
Also, I don't see any reason to oppose our UTF-8 solution because it does not take anything away from anybody, yet it solves many problems.
« Last Edit: November 01, 2015, 05:04:21 pm by JuhaManninen »
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

 

TinyPortal © 2005-2018