Lazarus and FPC 3.0

JuhaManninen

Global Moderator
Hero Member
Posts: 4474
I like bugs.

Lazarus and FPC 3.0

« on: January 11, 2015, 03:06:47 pm »

I move the discussion from "Lazarus 1.2.6 Release" to this new thread.
FYI: The next release Lazarus 1.4 will use FPC 2.6.4. I understand many people have questions about when the new FPC 3.x will be used.
The following major release Lazarus 2.0 will surely use FPC 3.x but also Lazarus 1.4.x minor release could in theory switch to it, although I don't see much use for it. Unicode related changes happen in Lazarus trunk and will not be backported, and Unicode is the biggest thing in FPC 3.x after all.
BTW, I have no idea when Lazarus 2.0 happens. Someday in future ...

Quote from: taazz on January 11, 2015, 11:32:15 am

2 questions of the top of my head
1) codetyphoon seems to be able to use the 3.1.1 along with lcl or so it seems from a quick glance aren't they willing to share they test cases and results with you or something along those lines to shorten the fpc 3.0 integration effort.

Usually CodeTyphon is not willing to share anything. On the other hand, they have nothing to share in this case because Lazarus already compiles with FPC 3.0 or 3.1.1 and works pretty well. CodeTyphon has not done anything to improve the Unicode support which will be our next challenge.

Quote

2) Are we going to see major changes in the lcl for the new unicode support? EE will the lcl support ucs16 or what ever is the default encoding for windows and utf8 for linux to minimize the translations between lcl and system encoding to minimum (I would go for 0 translation) or it will be kept as UTF8 and autotranslate to what ever the unicode windows api expects it to be?

Zero translation in encoding would be very difficult to implement.
If nothing unexpected comes up, Lazarus will continue to use UTF8 but using new cool features of FPC. See :
http://lists.lazarus.freepascal.org/pipermail/lazarus/2014-November/089394.html
If you want to know more, please ask questions in Lazarus mailing list because Mattias follows it and he know the topic best.

« Last Edit: January 11, 2015, 04:56:28 pm by JuhaManninen »

Logged

Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

JuhaManninen

Global Moderator
Hero Member
Posts: 4474
I like bugs.

Re: Lazarus and FPC 3.0

« Reply #1 on: January 11, 2015, 03:12:54 pm »

Quote from: JZS on January 11, 2015, 02:48:36 pm

What is the plan for supporting Right-To-Left languages?

There is no plan because none of the core developers know the issue well enough.
We would need a person who understands both Right-To-Left issues and LCL well enough. In practice it must be a person who natively uses a Right-To-Left language.
If such a person (maybe you) steps forward, other developers will be happy to share their knowledge about LCL.

That is how open source development works.

Logged

Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

JZS

Full Member
Posts: 194

Re: Lazarus and FPC 3.0

« Reply #2 on: January 11, 2015, 03:41:15 pm »

Quote from: JuhaManninen on January 11, 2015, 03:12:54 pm

There is no plan because none of the core developers know the issue well enough.
We would need a person who understands both Right-To-Left issues and LCL well enough. In practice it must be a person who natively uses a Right-To-Left language.
If such a person (maybe you) steps forward, other developers will be happy to share their knowledge about LCL.

That is how open source development works.

I am willing to assist. I am more than willing actually and really very happy to contribute to this community. But not sure if my knowledge of LCL is enough to start.

Logged

I use recent stable release

JuhaManninen

Global Moderator
Hero Member
Posts: 4474
I like bugs.

Re: Lazarus and FPC 3.0

« Reply #3 on: January 11, 2015, 04:24:21 pm »

Quote from: JZS on January 11, 2015, 03:41:15 pm

I am willing to assist. I am more than willing actually and really very happy to contribute to this community. But not sure if my knowledge of LCL is enough to start.

If you are serious about it, you must study the LCL architechture and code. Wiki may have some info but diving into the code will be needed very soon.
When you don't understand something although you tried, please ask questions in Lazarus mailing list. Developers follow it more than this forum. I am not an expert with LCL, better ask details from other people.
One challenge with LCL is the division between LCL code itself and the widgetset binding code. I believe that much of Right-To-Left code must go to bindings because all major widgetsets support it.

Logged

Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

serbod

Full Member
Posts: 142

Re: Lazarus and FPC 3.0

« Reply #4 on: January 11, 2015, 07:17:28 pm »

Is now any string-based RTL functions (such as StrReplace(), Format(), Trim(), UpperCase(), etc..) will correctly handle characters with variable bytes length?

And some functions, like Pos(), Length(), Copy() will be bytes-oriented, or character-oriented? Is lazutf8 unit not needed anymore?

Logged

JuhaManninen

Global Moderator
Hero Member
Posts: 4474
I like bugs.

Re: Lazarus and FPC 3.0

« Reply #5 on: January 11, 2015, 08:31:13 pm »

Quote from: serbod on January 11, 2015, 07:17:28 pm

Is now any string-based RTL functions (such as StrReplace(), Format(), Trim(), UpperCase(), etc..) will correctly handle characters with variable bytes length?
And some functions, like Pos(), Length(), Copy() will be bytes-oriented, or character-oriented? Is lazutf8 unit not needed anymore?

I don't have answers for your questions, maybe nobody has yet. In new FPC the codepage can be set for UTF8 by :
SetMultiByteConversionCodePage(CP_UTF8);
SetMultiByteRTLFileSystemCodePage(CP_UTF8);
added to the initialization section of an early unit. It means the characted encoding in FPC is not fixed to UTF16 after all.
Mattias has experimented with it, see the mailing list thread mentioned earlier.

Logged

Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

valdir.marcos

Hero Member
Posts: 1106

Re: Lazarus and FPC 3.0

« Reply #6 on: January 11, 2015, 10:36:33 pm »

Quote from: JuhaManninen on January 11, 2015, 08:31:13 pm

I don't have answers for your questions, maybe nobody has yet.
...

After reading the whole thread:
http://lists.lazarus.freepascal.org/pipermail/lazarus/2014-November/089394.html

Using Lazarus 1.3 and FPC 3.0.1, both development versions, should I start making tests on Unicode, UTF16 and UTF8?
Should I report problems here or on the Bug Tracker (Mantis)?
Is Mattias' wiki page about FPC Unicode ready? If so, where?

http://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows
http://superuser.com/questions/221593/windows-7-utf-8-and-unicode
http://msdn.microsoft.com/en-us/library/windows/desktop/dd319072(v=vs.85).aspx
http://msdn.microsoft.com/pt-br/library/windows/desktop/dd317756(v=vs.85).aspx

Logged

serbod

Full Member
Posts: 142

Re: Lazarus and FPC 3.0

« Reply #7 on: January 11, 2015, 11:32:06 pm »

Quote

see the mailing list thread mentioned earlier.

Yes, I read it. This one is same, as I want to say:

http://lists.lazarus.freepascal.org/pipermail/lazarus/2014-November/089555.html

And, some my suggestions:

Char is character, and String is array of characters. Long ago, characters was a bytes, so strings was a array of bytes.

But now, Char is not byte. It short array of bytes (ShortString, maybe?). String is array of bytes, like before, but must treated as array of characters, when used as characters container.

Code: [Select]

var
  SomeByte: Byte;
  SomeChar: Char;
  SomeString: String;
begin
  SomeString:='абракадабра'; // Russian 2-byte letters in utf8
  SomeChar:=SomeString[2]; // 'б', second character
  SomeByte:=SomeString[2]; // $E1, second byte
end;

SomeChar must be 'б', second letter, not second byte. And so for every function, that works with characters. That is what I mean as Unicode support.

For single-byte characters there is AnsiChar and AnsiString, same as in Delphi.

« Last Edit: January 12, 2015, 12:07:22 am by serbod »

Logged

JuhaManninen

Global Moderator
Hero Member
Posts: 4474
I like bugs.

Re: Lazarus and FPC 3.0

« Reply #8 on: January 12, 2015, 12:17:11 am »

Quote from: valdir.marcos on January 11, 2015, 10:36:33 pm

Using Lazarus 1.3 and FPC 3.0.1, both development versions,

Lazarus trunk is now 1.5 and FPC trunk is 3.1.1. FPC 3.0.1 is the release branch.

Quote

should I start making tests on Unicode, UTF16 and UTF8?

Yes, if you want.

Quote

Should I report problems here or on the Bug Tracker (Mantis)?

Not in bug tracker for sure. Only some experiments have been done, everybody knows it does not work yet.
If you make serious testing which can lead to code contribution, then Lazarus mailing list would be the best place as the right people follow it.

Quote

Is Mattias' wiki page about FPC Unicode ready? If so, where?

I don't think so. He has not worked on this issue either recently.
Now that 1.4 has been forked, trunk is open for experimental changes.

Logged

Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

JuhaManninen

Global Moderator
Hero Member
Posts: 4474
I like bugs.

Re: Lazarus and FPC 3.0

« Reply #9 on: January 12, 2015, 12:50:07 am »

Quote from: serbod on January 11, 2015, 11:32:06 pm

Yes, I read it. This one is same, as I want to say:
http://lists.lazarus.freepascal.org/pipermail/lazarus/2014-November/089555.html

That is DoDi partly hijacking yet another thread where the magic word "Unicode" was mentioned.
Remember, the thread was originally about Mattias asking help for testing the UTF8 RTL on Windows.

Quote

And, some my suggestions:
Char is character, and String is array of characters. Long ago, characters was a bytes, so strings was a array of bytes.
... and so on ...

Please let's not start discussion about differences of UTF8 and UTF16 here. Also let's not discuss here about how Embarcadero should have implemented Unicode or about if FPC should be 100% compatible with Delphi.
Those issues have been discussed again and again in FPC mailing lists during past 5 years. If a mail mentioned the word "Unicode", certain people joined and started repeating the same arguments again and again and again. A single mail thread typically lasted many months and had hundreds of mails.
Also the FPC devels wasted lots of energy explaining the same things repeatedly in those mail threads!

If Unicode in Lazarus must be discussed, let's keep it as concrete as possible. The requirements for participants should be:
1. Know how UTF8 and UTF16 work.
2. Know how Unicode and string types are implemented in new FPC.
3. Know how UTF8 support is implemented in current Lazarus.
4. Have read enough of FPC mail Unicode thread history to understand how bad it was.

The information is out there for those who have gaps in their knowledge.
Still, for serious discussion based on test results I recommend Lazarus mailing list, although the same rules should apply also there.

« Last Edit: January 12, 2015, 12:57:11 am by JuhaManninen »

Logged

Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

serbod

Full Member
Posts: 142

Re: Lazarus and FPC 3.0

« Reply #10 on: January 12, 2015, 01:58:38 am »

Quote

Please let's not start discussion about differences of UTF8 and UTF16 here.

No, I don't.

All, that I want from subject - that standard functions works as it described in help and manuals. For example:

Copy() - Copy returns a string which is a copy if the Count characters in S, starting at position Index.

Code: [Select]

s:=Copy('южный', 2, 2);
it obviously, that it take two characters, starting from second character. And if it return '躰', then that considered as a bug.

Logged

JZS

Full Member
Posts: 194

Re: Lazarus and FPC 3.0

« Reply #11 on: January 12, 2015, 11:44:45 am »

Quote from: JuhaManninen on January 11, 2015, 04:24:21 pm

If you are serious about it, you must study the LCL architechture and code. Wiki may have some info but diving into the code will be needed very soon.
When you don't understand something although you tried, please ask questions in Lazarus mailing list. Developers follow it more than this forum. I am not an expert with LCL, better ask details from other people.
One challenge with LCL is the division between LCL code itself and the widgetset binding code. I believe that much of Right-To-Left code must go to bindings because all major widgetsets support it.

I will do my best. Let's see how it goes. Thank you Juha.

Logged

I use recent stable release

Ocye

Hero Member
Posts: 518

Re: Lazarus and FPC 3.0

« Reply #12 on: January 12, 2015, 01:20:04 pm »

Quote from: JZS on January 11, 2015, 03:41:15 pm

Quote from: JuhaManninen on January 11, 2015, 03:12:54 pm
Right-To-Left language...
If such a person (maybe you) steps forward, other developers will be happy to share their knowledge about LCL.
I am willing to assist.

You may look as well for posts from Avishai or just contact him. He might be still very interested in the RTL topic.

Logged

Lazarus 1.7 (SVN) FPC 3.0.0

JuhaManninen

Global Moderator
Hero Member
Posts: 4474
I like bugs.

Re: Lazarus and FPC 3.0

« Reply #13 on: January 12, 2015, 05:33:32 pm »

Quote from: Ocye on January 12, 2015, 01:20:04 pm

You may look as well for posts from Avishai or just contact him. He might be still very interested in the RTL topic.

He is interested but his knowledge is not enough to implement it. He may become even more frustrated if you ask him questions about the implementation.
He has not written anything for a while. Earlier he mentioned about some health problems. I hope he is still OK.

Logged

Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

JuhaManninen

Global Moderator
Hero Member
Posts: 4474
I like bugs.

Re: Lazarus and FPC 3.0

« Reply #14 on: January 12, 2015, 08:08:37 pm »

First, I admit it was stupid to put up rules for not discussing Unicode. I am apparently allergic for the topic after reading FPC lists for 5 years.
Yes, I still recommend for anybody to study the topic because it is unbelievably complex. However talking about Lazarus and the new FPC brings some obvious new questions. Unicode is the main feature of FPC 3.0 after all.

Quote from: serbod on January 12, 2015, 01:58:38 am

All, that I want from subject - that standard functions works as it described in help and manuals. For example:
Copy() - Copy returns a string which is a copy if the Count characters in S, starting at position Index.
Code: [Select]
s:=Copy('южный', 2, 2);it obviously, that it take two characters, starting from second character. And if it return '躰', then that considered as a bug.

There will be such function for sure. Let's see if the name is Copy() also for UTF8.
However working with variable width Unicode strings is always more complex than fixed width AnsiStrings.
When you have to iterate characters, you must use a string to keep a single character.

UnicodeString in Delphi and FPC has the same problem. UTF16 is NOT fixed width although this impression is easy to get when looking at code samples. The old code from AnsiString Delphis will work most of the time but not always. There are almost 100k Unicode characters but 16 bits can address only 64k of them. It means ~30000 characters require 2 words (of type UnicodeChar). Those are maybe rare characters but will cause a bug eventually in code that does not take them into account.

Which encoding should Lazarus use? There seemed to be 2 alternatives, the Delphi compatible UnicodeString and AnsiString + the UTF8 specific functions in LCL.
Then it turned out that FPC + its libs can use UTF8 by simply setting some variables. FPC is well designed, 5 years of arguing in mailing list was not wasted after all.
Lazarus + LCL is already designed for UTF8 and this allows a conversion with least changes. (This assuming nothing unexpected comes up in tests).

It will still be possible to create a version of Lazarus + LCL with UTF16 UnicodeString if somebody wants to implement it.

UTF8 is a very clever encoding. It is backwards compatible with ascii, it produces compact data for western languages (ok, I don't know what Chinese people think of it), and its integrity can be analyzed from the data itself.
The benefit of UTF16 originally was its fixed width characters, but it is not true any more. So its main benefit went away.

Positive news is that user code only seldom needs to iterate single variable width Unicode characters because such things are encapsulated in libraries, and because often the characters of interest are in ascii area. For example many current parsers work with UTF8 data well because all <tag> chars are ascii. Data between tags can be UTF8 but typically the parser just copies it without analysis.

Now I discussed about differences of UTF8 and UTF16, here we go...
We can write about congrete Lazarus implementation details in mailing list when somebody has done tests.

« Last Edit: January 12, 2015, 08:23:08 pm by JuhaManninen »

Logged

Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

Lazarus

Bookstore

Search

Recent

Author Topic: Lazarus and FPC 3.0 (Read 73614 times)

JuhaManninen

Lazarus and FPC 3.0

JuhaManninen

Re: Lazarus and FPC 3.0

JZS

Re: Lazarus and FPC 3.0

JuhaManninen

Re: Lazarus and FPC 3.0

serbod

Re: Lazarus and FPC 3.0

JuhaManninen

Re: Lazarus and FPC 3.0

valdir.marcos

Re: Lazarus and FPC 3.0

serbod

Re: Lazarus and FPC 3.0

JuhaManninen

Re: Lazarus and FPC 3.0

JuhaManninen

Re: Lazarus and FPC 3.0

serbod

Re: Lazarus and FPC 3.0

JZS

Re: Lazarus and FPC 3.0

Ocye

Re: Lazarus and FPC 3.0

JuhaManninen

Re: Lazarus and FPC 3.0

JuhaManninen

Re: Lazarus and FPC 3.0

	Computer Math and Games in Pascal (preview)
	Lazarus Handbook