Recent

Author Topic: What is UTF-8 Application  (Read 25523 times)

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12599
  • FPC developer.
Re: What is UTF-8 Application
« Reply #15 on: February 03, 2015, 03:34:39 pm »
I lean toward utf16 on *nix too, but are less pronounced there

What should that mean?

Precisely what I mean. I think Delphi compatible on *nix should be possible, but doesn't need to be the first choice.

For Windows I think it should be the first choice. There is no reason to put a third alternative (besides ansi and (UCS2/UTF16) unicode) there, except for emulating *nix environments.

Quote
Just write it: You won't support UTF-8, everything should become UTF-16 for Delphi compatiblity.

I think support utf-8 on systems where it is the system encoding isn't that bad, since it doesn't require additional code. It is the emulation of that on Windows that is my main gripe

Still I would like a Delphi compatible solution on *nix too, I never have hidden that.
But I don't care if I have to recompile FPC+Lazarus for it and that it is not the primary choice.
« Last Edit: February 03, 2015, 03:49:59 pm by marcov »

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12599
  • FPC developer.
Re: What is UTF-8 Application
« Reply #16 on: February 06, 2015, 11:35:10 pm »
Yes. But that is all temporary till classes is unicodestring. And we won't adapt FPC code to that scheme.

Ok, this plan may be the real reason to oppose UTF-8 so much.

It always was the default plan, simply because Delphi does. (and I never saw a formal statement giving up Delphi compatibility, people always took potshots at that (half mode Objfpc is that kind of language hobbying), but throwing it formally out of the window is something else)

There was long talk about doing both encodings, but I simply gave up because people talked about upscaling the (fairly simple) overloading solution for the RTL to bigger codebases, and things like virtual methods using string parameters can't be solved that way (and many other things).

There wasn't any support for solutions that try to compile one codebase for two different default types (IOw a delphi compatible FPC and a separate D2009 one), and if I have to choose one, I'll pick the modern Delphi one.

E.g. my unfinished patches for TProcess are all unicodestring based for that reason.

I don't mind Lazarus doing the dirty UTF8 trick too if they need so (though I hope that is temporary till there is a real solution) But I draw the line at adapting non-LCL code for this hack. The whole idea of the FPC 3.0 rtl is to reduce the number of lazarus hacks, not increase it.

Quote
I did not realize there is so strong confrontation between the "camps", especially as FPC provided the functions helping with default encoding.

There are no camps. But there is no consensus either.

I'm voicing my opinion, combined with stating the fact that the Lazarus UTF8 solution is a lazarus only choice. (and IMHO a by default one because it is a small step from the old solution, and anything new is not ready, not a carefully considered one)

Anyway, assuming that FPC shares that vision is wrong. At least in my case.

Quote
I have a personal interest with SW that will need UTF-8 all over for various reasons. I have no interest to oppose other solutions.

As said above, multiple solutions will be hard and at the very least require different releases.  I'm getting the feeling that talking about multiple solutions is more about not making a choice (till the status quo is inescapable)

Anyway requiring FPC codec hanges that are specific for Lazarus UTF8 choices  is equal opposing other solutions.

Quote
First I thought UTF-8 must be done using the old way with AnsiString + UTF...() functions. Then I learned RTL could be mapped to UTF-8 and it worked better I had hoped. As I wrote earlier here "It is almost too good to be true!" and yes it was too good to be true...

I already explained this in Croatia that this hack was limited. The details were fleshed out one month before at the FPC summit in Bingen when Jonas more or less explained what he had cooked up.

Quote
I promise to work towards the UTF-16 solution later, but first I need the UTF-8. In the worst case it is the old AnsiString + UTF...() functions but then I am a little disappointed. We were so close to get this working:
  http://wiki.freepascal.org/Better_LCL_Unicode_Support

IMHO we are not any closer. It is just that Lazarus doesn't need to implement their hacks for procedural interfaces, but as said already that solution doesn't scale. FPC3 only moves the highly platform dependent code in utf8 aliases for system and sysutils back to FPC for any encoding  (not just utf8).

The choice to make utf8 the default stringtype is (and always has been) Lazarus, and Lazarus alone.

Quote
I also try to keep this as pragmatic as possible, there have been enough "camp" fight during past 5 years.
So, I am here to find a working UTF-8 solution, not to fight against other solutions.

I think there is only one encoding agnostic solution, and that is changing the base string type with an ifdef and having multiple RTLs (depending on base string type).

I've nothing against utf8, but it is IMHO an irrelevant choice for Windows.

Quote
Still, the functions for changing encoding should be removed if their usage is forbidden.

No they are not forbidden. But they are to get you out of a compatibility issue, not to base a whole framework like Lazarus/LCL on. Since however you take, you change the default state of FPC to a way it was not intended (because if it was intended, it already would be default)

Quote
1. The solution we made is amazingly Delphi compatible. String Ansi... functions work and the ASCII functions. Even Pos() and Copy() are compatible in most cases. In Delphi they are used because people treat UTF-16 as a fixed width encoding, with UTF-8 they work most often because of the special properties of this encoding.
When looking at some real Delphi code, there are very few things to change.

(That goes for any encoding if you only pump strings in that same encoding round).

And all packages need Lazarus specific adaptation. (old delphi, new delphi OR Lazarus. Lazarus is NOT one of the two choices)

Quote
2. 100% Delphi compatibility is not always a blessing, it can be a curse. Typical Delphi code still assumes a character is fixed width 16 bits. Tutorials and examples feed that same wrong idea. For example an article from Nick Hodges :

Tutorials are meant to be simple. ANd the one char is 16-bit assumption holds a whole lot longer than the one char is 8-bit assumption. I don't see anything wrong there. First order things work.

Quote
I know codepoints with 2 UnicodeChar are rare in west, but maybe the application is marketed to China some day and then the code breaks. Copy() will get half a codepoint.

It depends. IIRc most of Chinese is in the BMP, I was told the only somewhat characters outside it were the ones used in news headlines. (titles more or less)

Quote
UTF-8 code must be done right always when dealing with individual codepoints.

So must utf16.

Quote
3. I have done cross-platform code that reads an XML file, parses it and does something with the data. This all using UTF-8 encoding of LCL.

But utf16 in the FPC fcl-xml case. We know Lazarus likes to do utf8, but that is because most major committers are non-Windows centric.

Moreover both XML and wireprotocols can (often) have different encodings. You need a dynamical encoding solution anyway.

Quote
This code is not specific to Unix or any other operating system. So, I honestly don't understand your sentense:
  "no utf8 usage in code on Windows except for ported Unix software".

Because this is about data, and not about code. APIs are not UTF-8. Existing Delphi codebases don't use Utf8.

Quote
Anyway, I will take what is given from FPC team. I understand there are camps inside the team which complicates the issue.

Basically most of the FPC team seems to think the issues will go away if they wait out. Which IMHO only means we will be stuck with whatever is the way of the least resistance short term, not necessarily the best for the long term. It seems people are even willing to sacrifice compatibility to maintain the status quo.

But Delphi compatibility is very important for me. (and that means code moving back and forth, not the "convert once" principle but dual maintenance with minimal ifdefs, and preferably non at all). Most my code is in mode fpc or delphi, I never use objfpc.

Without it, a lot of my motivation for FPC work goes away a lot.

« Last Edit: February 08, 2015, 01:52:28 pm by marcov »

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4660
  • I like bugs.
Re: What is UTF-8 Application
« Reply #17 on: February 08, 2015, 10:29:02 am »
I added a "Future" section to the wiki page:
  http://wiki.freepascal.org/Better_LCL_Unicode_Support#Future
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

felipemdc

  • Administrator
  • Hero Member
  • *
  • Posts: 3538
Re: What is UTF-8 Application
« Reply #18 on: February 08, 2015, 08:49:58 pm »
I think not. First I think utf8 on Windows is not sane. When Mattias first came with that idea, I agreed, but only as transitional functionality till LCL can truely dual compile between unicodestring and utf8string/string.

That's ridiculous. There is no problem in having utf8 on Windows, its natural that if you want to write cross-platform code you will need to use 1 encoding everywhere and naturally it will fit the system encoding in some systems and will not fit in others. Your argument is absurd. There is no favorite considering only the encodings and the set of existing Operating Systems. Both solutions are equivalent. Any choice is arbitrary considering only the encoding and the operation systems (and the fact that we want cross-platform).

Now you, personally wants UTF-16 I'm not exactly what for. To compile the VCL with Free Pascal?

We, Lazarus users want UTF-8, because we already use it since ages, our codebases are already prepared to it and changing brings zero advantages, only bugs (as in every large rewrite brings bugs) and headaches.

My feeling is that given a Lazarus UTF-8 and a Lazarus UTF-16, the Lazarus UTF-16 version will see very low usage. Let's see who could use it:

1> If you already use Lazarus, its worse for you, you need to convert your project with no gain at all
2> If you are new to Lazarus and Pascal and starts creating a new project, it is indifferent, both solutions are equivalent, only less tested if most people will stick to UTF-8, less libraries available
3> If you want to convert a Delphi UTF-16 project, now there is an advantage. But let's be honest. Delphi is each time in worse shape. It's not like there are thousands of new projects being started in Delphi UTF-16. Do we actually know anyone dying to convert his project to Lazarus, but only if we supported UTF-16 he would do it? I don't know anyone, and don't see how this scenario would make sense at all.

Now Marco doesn't want 2 Lazarus versions: UTF-8 and UTF-16. He proposes only 1 Lazarus, only UTF-16. And everyone is forced to migrate. We piss off every existing user with the migration.

So basically UTF-16 means: We piss off our existing users, and we gain near-zero new users. That's a text book error that companies do just before dying, so I don't see why we should follow this known wrong path.

Free Pascal could support both UTF-8 and UTF-16. It was already proven that it can work like that, and we give a present to lots of users in the fact that they won't need to adapt their code. So why not? Why are you so determined to make us rewrite our codebase?

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12599
  • FPC developer.
Re: What is UTF-8 Application
« Reply #19 on: February 08, 2015, 09:59:11 pm »
I think not. First I think utf8 on Windows is not sane. When Mattias first came with that idea, I agreed, but only as transitional functionality till LCL can truely dual compile between unicodestring and utf8string/string.

That's ridiculous.

Not it is not.

Quote
There is no problem in having utf8 on Windows,

As clearly shown that you must do a conversion to call nearly /any/ external system. (except some mingw ported crapola)

And there is of course the sourcecode incompatibility with the only close language.

Quote
its natural that if you want to write cross-platform code you will need to use 1 encoding everywhere

That is not necessarily natural. A more highlevel system would abstract the native encoding. Pascal suffers from a legacy of character access there. But to a degree that can't be helped without damaging its strengths.

Quote
Your argument is absurd. There is no favorite considering only the encodings and the set of existing Operating Systems. Both solutions are equivalent. Any choice is arbitrary considering only the encoding and the operation systems (and the fact that we want cross-platform).

So then why the deep need to deviate from Delphi compatibility ?

Quote
Now you, personally wants UTF-16 I'm not exactly what for. To compile the VCL with Free Pascal?

No. To improve large scale code sharing from and back between Lazarus and Delphi and to lessen the dual maintenance penance on coders.

Quote
We, Lazarus users want UTF-8, because we already use it since ages, our codebases are already prepared to it and changing brings zero advantages, only bugs (as in every large rewrite brings bugs) and headaches.

Most with Windows code will have to redone anyway to move from a manual to an automated system.   On *nix nothing changes since unicodestring<> ansistring was 1:1 back and forth anyway, since utf8 is the default encoding. So only a few places where pointers are passed to the system have to be patched. (and since most of those are procedural, a utf8 or utf16 application codebase will look nearly the same, if not 100% the same)

I think it is more the fact that many of the proponents are *nix centrics and want to force their issue no matter what. Windows doesn't matter, Delphi doesn't matter, because we don't use it. Bad attitude.

Quote
My feeling is that given a Lazarus UTF-8 and a Lazarus UTF-16, the Lazarus UTF-16 version will see very low usage. Let's see who could use it:

I think the Delphi compat will be a main feat, and except a few *nix refuseniks most Windows users will use it. Just check the major forum index and count the number of threads that are about porting Delphi code.

Quote
1> If you already use Lazarus, its worse for you, you need to convert your project with no gain at all

If you haven't invested majorly in specific unicode compatibility on Windows it is mostly painless, and it will actually make lowlevel Windows work easier.  If you have invested heavily in the current utf8 situation you will have to fix manual hacks anyway, since currently most lazarus Windows programs use both native and utf8 in the same string type.

Moreover, with the Lazarus interpretation of the FPC 3.0 model, there isn't even a stringtype representing the windows codepage since that is eaten by the hack.

Quote
2> If you are new to Lazarus and Pascal and starts creating a new project, it is indifferent, both solutions are equivalent, only less tested if most people will stick to UTF-8, less libraries available

Yes. Having to patch up every piece of code on the net to Lazarus conventions is equivalent. Don't make me laugh :-)

Quote
3> If you want to convert a Delphi UTF-16 project, now there is an advantage. But let's be honest. Delphi is each time in worse shape. It's not like there are thousands of new projects being started in Delphi UTF-16. Do we actually know anyone dying to convert his project to Lazarus, but only if we supported UTF-16 he would do it? I don't know anyone, and don't see how this scenario would make sense at all.

it also goes for old Delphi projects with any form of encoding awareness done, since Lazarus is also old Delphi incompatible. But most importantly, it will prevent component builders (both free (think e.g. Zeos, indy) and non free) to have to support three differing encoding systems:

1. old delphi
2. new delphi
3. lazarus

FOREVER

(and then I'm already kind and forgetting incarnations of Lazarus in pre and post FPC 3 flavor)

Quote
Now Marco doesn't want 2 Lazarus versions: UTF-8 and UTF-16. He proposes only 1 Lazarus, only UTF-16.

You are deliberately presenting the current hack and slash lazarus where any realworld windows supporting codebase is stuffed top till bottom with conversion routines as a system that needs no chances. That is totally false.

Quote
And everyone is forced to migrate. We piss off every existing user with the migration.

Totally nonsense, both the current utf8 codebases suck and the migration is not really that big (and actually advantageous on the long run for Windows users).


malcome

  • Jr. Member
  • **
  • Posts: 81
Re: What is UTF-8 Application
« Reply #20 on: February 09, 2015, 02:09:45 am »
The utf8 vs utf16 battle is meaningless.
This issue is only decided by due process.
IMHO perfect new Delphi compatibility is not smart.
Because Embarcader is not be silent forever.
And it's also our loss if Delphi dies again.
IMHO the Lazarus(and FPC) must take the original way(=UTF8).
Coexistence is more necessary than the compatibility.

felipemdc

  • Administrator
  • Hero Member
  • *
  • Posts: 3538
Re: What is UTF-8 Application
« Reply #21 on: February 09, 2015, 10:27:09 am »
As clearly shown that you must do a conversion to call nearly /any/ external system. (except some mingw ported crapola)

No problem for me, we already do it like that since *ages*, and it works great. How many real world projects exist where the UTF-8 <-> UTF-16 in WinAPI calls was a big performance problem? Zero. nip. Nada. It's unheard off. No one had this problem, it's completely imaginary.

Also: You do know that Win32 is dying anyway, right? We don't even have compiler support for WinRT yet, and if we ever support it, it would likely be with CustomDrawn, so not that many API calls needed, since AFAIK they don't provide a GUI framework in WinRT which is suitable for Lazarus. This makes the conversion speed problem even more imaginary.

Quote
That is not necessarily natural. A more highlevel system would abstract the native encoding. Pascal suffers from a legacy of character access there. But to a degree that can't be helped without damaging its strengths.

IMHO that solution sucks. Its even worse than migrating to UTF-16. I strongly prefer 1 encoding everywhere. I know frameworks that work like you propose and they suck. For every string you need to first convert to a known encoding before making any operation ... then back again to the unknown system encoding .... arrrgggg. So much code has to be written in the user code, while it could be in the framework if the framework presented 1 single encoding always.

A framework that forces you to write more code is a bad framework.

Also, people that want to use the system encoding should use direct OS API calls instead of the Pascal RTL.

Quote
So then why the deep need to deviate from Delphi compatibility ?

I'm not deviating. They deviated. I just don't want to be forced to rewrite my code just because they don't care about compatibility with us.

Quote
No. To improve large scale code sharing from and back between Lazarus and Delphi and to lessen the dual maintenance penance on coders.

This argument is the only one that I can accept, but still, it's no barrier to having 2 solutions. Why not support both UTF-8 and UTF-16 in the RTL?

Then people can slowly think over the years if they want to migrate. If they don't want, they are not forced to. And everyone is happy. Why not?

Quote
I think it is more the fact that many of the proponents are *nix centrics and want to force their issue no matter what. Windows doesn't matter, Delphi doesn't matter, because we don't use it. Bad attitude.

False that we are unix centric. I use equally Windows, Linux and Mac OS X.

Yes, I don't care that much about Delphi post-change because I don't know anyone that uses the new Delphi, and job offers for it are very small, so it's not that I don't care for no good reason, I am not presented real world evidence that supporting it is really the big advantage that you claim it to be.

Quote
If you haven't invested majorly in specific unicode compatibility on Windows

Most Lazarus users *have* invested heavily into our UTF-8 codebases.

Quote
If you have invested heavily in the current utf8 situation you will have to fix manual hacks anyway, since currently most lazarus Windows programs use both native and utf8 in the same string type.

I see no usage for a string which contains an obsolete codepage. A APIs are dead. In UNIX everyone uses UTF-8. There is no system where this matters.

If I do native API calls I won't use the managed type anyway, but array of WideChar, array of Char, PChar, PWideChar.

Quote
Moreover, with the Lazarus interpretation of the FPC 3.0 model, there isn't even a stringtype representing the windows codepage since that is eaten by the hack.

Windows Codepage for A calls? Those calls that are obsolete since decades? I don't see the need for a managed string that for that.

People should use only W calls.

Quote
Yes. Having to patch up every piece of code on the net to Lazarus conventions is equivalent. Don't make me laugh :-)

Every code in the net??? I don't remember having to patch every code in the net to use Lazarus as it is.

People using Lazarus use the libraries in lazarus/components and in the Lazarus CCR. All of them are already UTF-8 ready since ages.

Which other random code around the web you will find?

Stackoverflow answers are usually in C++/Java so it needs conversion regardless.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12599
  • FPC developer.
Re: What is UTF-8 Application
« Reply #22 on: February 09, 2015, 12:01:11 pm »
As clearly shown that you must do a conversion to call nearly /any/ external system. (except some mingw ported crapola)

No problem for me, we already do it like that since *ages*, and it works great. How many real world projects exist where the UTF-8 <-> UTF-16 in WinAPI calls was a big performance problem? Zero. nip.

That wasn't what I meant. Inserting manual conversion code is what I meant.

Quote
Also: You do know that Win32 is dying anyway, right?

No, I didn't get the memo. Win32 has been declared almost as many times as the next year was going to be the year of "Linux on the Desktop", and the year of massive ipv6 adaptation. (read every year since 1998 or so). The first still hasn't happened, and the second is only rolling out now, 15 years later, and very slowly.

And regardless on your opinion on that there are two crucial issues with such "is dying" thinking:

- It turns out platforms that have received significant investments die out (much and) much slower than most people think.  And win32 is *the* platform with the biggest investments, and special because of its extreme long term stability.

- More importantly: such platforms usually outlive the alternatives. Many alternatives are proposed, few make it and have staying power. And then I mean implementations, not names or ideas.

Switching early is nearly sure to pick a loser. (I started with .NET in the .NET1 -> .NET2 transition period, luckily without much .NET1 legacy myself, but those that had were not happy. Even more so for classic ASP and ASP.NET v1 investors)

Don't switch before the writing is really on the wall (so when the general programming populace is moving, not just because of press releases and luminaries "visions". (that are usually funded by firms that hope for the extra sales)

Just like the 2003-2007 disaster stories that Microsoft would kill of native because of .NET, that even the kernel would become .NET etc etc. I'll believe it when I see it.

Quote
We don't even have compiler support for WinRT yet,

That's because the platform on the desktop is not really viable at the moment. I don't kn

So far, I haven't seen a single programmer that I know that changed to winrt-on-the-desktop. People make apps in it for tablet and phone use like they do for android, the bare minimum to get the job done, assuming most investments are void in a few years anyway.

Quote
and if we ever support it, it would likely be with CustomDrawn, so not that many API calls needed,

We agree on something at least! I see customdrawn as the widgetset of choice for non viable platforms :-)

Quote
Quote
That is not necessarily natural. A more highlevel system would abstract the native encoding. Pascal suffers from a legacy of character access there. But to a degree that can't be helped without damaging its strengths.

IMHO that solution sucks.

Well, I do think character level access will diminish anyway. I don't think you can get ALL code encoding or string type agnostic, but I think you can get very far.  It might not be that bad actually.

As said my main worry is Delphi compatibility, and secondary that I don't like the idea of non native own solutions on Windows.

Quote
Its even worse than migrating to UTF-16. I strongly prefer 1 encoding everywhere. I know frameworks that work like you propose and they suck. For every string you need to first convert to a known encoding before making any operation ... then back again to the unknown system encoding .... arrrgggg.

Well, uh, no, since the idea was to pick the system encoding. DUH! You need conversions if you pick one encoding everywhere.

Quote
So much code has to be written in the user code, while it could be in the framework if the framework presented 1 single encoding always.

I don't see it that way. Most interfaces to the outside world have dynamic encodings (runtime) in principle, like web, xml (declare encoding, so conforming frameworks must have a dynamic string type) and most databases, so a fixed encoding is not that much help as many people

Quote
A framework that forces you to write more code is a bad framework.

Well, I don't follow your argumentation. I think it will be less.

Quote
Also, people that want to use the system encoding should use direct OS API calls instead of the Pascal RTL.

I've no idea what you mean here.

Quote
Quote
So then why the deep need to deviate from Delphi compatibility ?

I'm not deviating. They deviated. I just don't want to be forced to rewrite my code just because they don't care about compatibility with us.

As said you will have to fix it anyway. As said I'm not really convinced of the worth of existing codebases (with many encoding hacks) anyway.

Quote
Quote
No. To improve large scale code sharing from and back between Lazarus and Delphi and to lessen the dual maintenance penance on coders.

This argument is the only one that I can accept, but still, it's no barrier to having 2 solutions. Why not support both UTF-8 and UTF-16 in the RTL?

You can't support both in the RTL at the same time. It would become different releases, and few want that. And it would require some effort to write encoding agnostic code, and many people react like you.

Quote
Then people can slowly think over the years if they want to migrate. If they don't want, they are not forced to. And everyone is happy. Why not?

IMHO the best solution but a lot of additional work. That's why it was nearly immediately vetoed, and then I chose the solution I needed most, which is delphi compat.

Quote
Yes, I don't care that much about Delphi post-change because I don't know anyone that uses the new Delphi, and job offers for it are very small, so it's not that I don't care for no good reason, I am not presented real world evidence that supporting it is really the big advantage that you claim it to be.

The component builder angle alone is indisputable. And I do care about Delphi, and more importantly, I don't buy the reuse of existing Lazarus code is so easy.

I'm really wondering why I have to argue that a solution that is not used in the Pascal world by anybody else is stupid.

Quote
Quote
If you haven't invested majorly in specific unicode compatibility on Windows

Most Lazarus users *have* invested heavily into our UTF-8 codebases.

As said, I don't buy that. Mostly initial code full of hacks.

Quote
Quote
If you have invested heavily in the current utf8 situation you will have to fix manual hacks anyway, since currently most lazarus Windows programs use both native and utf8 in the same string type.

I see no usage for a string which contains an obsolete codepage.

Then start working on utf16 console support ;_)

They have been dead since 2003 with windows XP. Still the amount of new code out there is huge, and both FPC/Lazarus and Delphi (-codebases) are still in transition.

Partially because they first try out hacks (like utf8) before conforming.

Quote
If I do native API calls I won't use the managed type anyway, but array of WideChar, array of Char, PChar, PWideChar.

You must be one of the few legacy free people. Congratulations !

Quote
Quote
Yes. Having to patch up every piece of code on the net to Lazarus conventions is equivalent. Don't make me laugh :-)

Every code in the net??? I don't remember having to patch every code in the net to use Lazarus as it is.

As already said, Lazarus uses a convention no Delphi uses. So per definition you have to fix encoding issues.

Quote
People using Lazarus use the libraries in lazarus/components and in the Lazarus CCR. All of them are already UTF-8 ready since ages.

If you are going to be ridiculous, I see no further benefit in continuing this discussion.  Half of them haven't synced with a recent version in ages (like VST still being v4.x, gettext is mid 2000).

For ported code, it is where code goes to rust and die. The only viable components are the ones that unique to Lazarus.

Quote
Which other random code around the web you will find?

Just simple delphi code. Like e.g. detecting available com ports. Using a recent dxgettext, a recent tcomport, a recent vst.


felipemdc

  • Administrator
  • Hero Member
  • *
  • Posts: 3538
Re: What is UTF-8 Application
« Reply #23 on: February 09, 2015, 01:54:48 pm »
That wasn't what I meant. Inserting manual conversion code is what I meant.

The manual conversion is all trapped inside the LCL, so users don't need to do it themselves. We offer a uniform UTF-8 API. The user in its own code needs no conversions at all.

Quote
Don't switch before the writing is really on the wall (so when the general programming populace is moving, not just because of press releases and luminaries "visions". (that are usually funded by firms that hope for the extra sales)

Just like the 2003-2007 disaster stories that Microsoft would kill of native because of .NET, that even the kernel would become .NET etc etc. I'll believe it when I see it.

I was never convinced of .NET, but there are real world examples where WinRT is a must have: I would like to put my software (True Democracy) in the Windows Store. But I can't. Because it doesn't accept Win32 apps, only WinRT. Microsoft is clearly serious in win32 needing to die to do this.

Of course it will take ages for it to die, but still we need to prepare for the future in advance or be obsolete. If you can't make money hosting apps in Windows Store with FPC, that's a big disadvantage. Stores are the future, and the Android store has been great for me. I'd like to explore other stores. I have now access to Android, iPhone and Mac OS X. Why not Windows store? Windows has the largest user base, it's the best store to be. And being early is a big advantage.

I think you shouldn't cling too hard to old stuff. People that clinged to Carbon were kicked hard by Apple and are forced to adapt. MS will do the same.

Quote
Well, uh, no, since the idea was to pick the system encoding. DUH! You need conversions if you pick one encoding everywhere.

You don't seam to understand me, what I mean is that, using a API that offers UTF-8 everywhere you can:

line 1> Get string from framework (for example from TMemo)
line 2> Do string operation in the string (anything, Pos, iterate through chars, search for substring, lowercase, whatever! And not only ready made operations, but also whatever you imagine! including yes, char by char access)
line 3> Put string back into the framework (for example into TMemo)

With opaque type:

line 1> Get string from framework
line 2> Get UTF-8 string from opaque string
line 3> Do my operations
line 4> Convert again from UTF-8 to opaque string
line 5> Put back to framework

A lot worse IMHO

Quote
As said, I don't buy that. Mostly initial code full of hacks.

You are kidding right? There are many companies deploying Lazarus-based software in production and with significant sales revenue.

Quote
If you are going to be ridiculous, I see no further benefit in continuing this discussion.  Half of them haven't synced with a recent version in ages (like VST still being v4.x, gettext is mid 2000).

For ported code, it is where code goes to rust and die. The only viable components are the ones that unique to Lazarus.

I don't know what VST is. I don't use gettext either.

I agree that components are an issue, but I disagree that it is worth pissing off our existing user base. I think of them first. Delphi users are a very distant concern.

Quote
Just simple delphi code. Like e.g. detecting available com ports. Using a recent dxgettext, a recent tcomport, a recent vst.

When I search stackoverflow, the instances when a Delphi answer appears are ... well, that's so rare I can't remember the last time.

Quote
IMHO the best solution but a lot of additional work. That's why it was nearly immediately vetoed, and then I chose the solution I needed most, which is delphi compat.

I'm amazed that you say that you agree this would be the best solution.

So why not try it? If you don't try how can you be so sure it will be so much work?

And it would stop the endless discussions.

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: What is UTF-8 Application
« Reply #24 on: February 09, 2015, 02:14:49 pm »
That wasn't what I meant. Inserting manual conversion code is what I meant.

The manual conversion is all trapped inside the LCL, so users don't need to do it themselves. We offer a uniform UTF-8 API. The user in its own code needs no conversions at all.

Quote
Don't switch before the writing is really on the wall (so when the general programming populace is moving, not just because of press releases and luminaries "visions". (that are usually funded by firms that hope for the extra sales)

Just like the 2003-2007 disaster stories that Microsoft would kill of native because of .NET, that even the kernel would become .NET etc etc. I'll believe it when I see it.

I was never convinced of .NET, but there are real world examples where WinRT is a must have: I would like to put my software (True Democracy) in the Windows Store. But I can't. Because it doesn't accept Win32 apps, only WinRT. Microsoft is clearly serious in win32 needing to die to do this.

Of course it will take ages for it to die, but still we need to prepare for the future in advance or be obsolete. If you can't make money hosting apps in Windows Store with FPC, that's a big disadvantage. Stores are the future, and the Android store has been great for me. I'd like to explore other stores. I have now access to Android, iPhone and Mac OS X. Why not Windows store? Windows has the largest user base, it's the best store to be. And being early is a big advantage.

I think you shouldn't cling too hard to old stuff. People that clinged to Carbon were kicked hard by Apple and are forced to adapt. MS will do the same.

Quote
Well, uh, no, since the idea was to pick the system encoding. DUH! You need conversions if you pick one encoding everywhere.

You don't seam to understand me, what I mean is that, using a API that offers UTF-8 everywhere you can:

line 1> Get string from framework (for example from TMemo)
line 2> Do string operation in the string (anything, Pos, iterate through chars, search for substring, lowercase, whatever! And not only ready made operations, but also whatever you imagine! including yes, char by char access)
line 3> Put string back into the framework (for example into TMemo)

With opaque type:

line 1> Get string from framework
line 2> Get UTF-8 string from opaque string
line 3> Do my operations
line 4> Convert again from UTF-8 to opaque string
line 5> Put back to framework

A lot worse IMHO

Why, why would I keep the utf8 operations and not convert them also to the string the framework requires?

Quote
As said, I don't buy that. Mostly initial code full of hacks.

You are kidding right? There are many companies deploying Lazarus-based software in production and with significant sales revenue.

True, but lazarus and lcl is by nature slower executing than delphi code and one of the reasons is the way lcl is designed, adding one more layer of conversion will slow things even more and this is only one of the problem the utf8 solution has.

Quote
If you are going to be ridiculous, I see no further benefit in continuing this discussion.  Half of them haven't synced with a recent version in ages (like VST still being v4.x, gettext is mid 2000).

For ported code, it is where code goes to rust and die. The only viable components are the ones that unique to Lazarus.

I don't know what VST is. I don't use gettext either.

I agree that components are an issue, but I disagree that it is worth pissing off our existing user base. I think of them first. Delphi users are a very distant concern.

Quote
Just simple delphi code. Like e.g. detecting available com ports. Using a recent dxgettext, a recent tcomport, a recent vst.

When I search stackoverflow, the instances when a Delphi answer appears are ... well, that's so rare I can't remember the last time.

Quote
IMHO the best solution but a lot of additional work. That's why it was nearly immediately vetoed, and then I chose the solution I needed most, which is delphi compat.

I'm amazed that you say that you agree this would be the best solution.

So why not try it? If you don't try how can you be so sure it will be so much work?

And it would stop the endless discussions.

No utf8 its a good solution when the underline system supports natively if it does not its nothing more than hacks to support something that shouldn't exist in the first place. Overall in this matter I'm with Marcov, the string type uses by default the underline systems default encoding (ansi, UTF8 etc) and if lcl wants to keep on using utf8 in all systems regardless then use the UTF8String type to make it clear what is going on.
« Last Edit: February 09, 2015, 02:28:34 pm by taazz »
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

felipemdc

  • Administrator
  • Hero Member
  • *
  • Posts: 3538
Re: What is UTF-8 Application
« Reply #25 on: February 09, 2015, 02:31:07 pm »
Why, why would I keep the utf8 operations and not convert them also to the string the framework requires?

If the string is neither UTF-8 nor UTF-16 but instead it is some opaque system-dependent type, then I cannot possibly do operations into something that changes depending in the operating system.

I know a few frameworks that use such opaque string types and always it is a huge pain.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12599
  • FPC developer.
Re: What is UTF-8 Application
« Reply #26 on: February 09, 2015, 02:48:32 pm »
That wasn't what I meant. Inserting manual conversion code is what I meant.

The manual conversion is all trapped inside the LCL, so users don't need to do it themselves.

The LCL is not the solution to everything. It is a visual library only, with some utf8 utility routines. It is only a very small piece of the pie.

Quote
I was never convinced of .NET, but there are real world examples where WinRT is a must have: I would like to put my software (True Democracy) in the Windows Store.

I was a whole lot more convinced about certain aspects of .NET than WinRT. WinRT is like Javascript, it only sells because it is bundled.

Yes, the store is WinRT only, but only a small part of the windows software market goes via the store. Last year windows mobile versions (lumia) actually decreased in marketshare year-on-year.

Microsoft has been trying to create new application forms since Windows Vista sidebar, with very limited success.  IOW, times have changed. What Microsoft wants is not necessarily what Microsoft gets.

I'm fully migrated to windows 8, and what apps did I install via market store? The old games (mahjong,solitaire for my Mom) and the Windows 8.1 update because I was forced to.

Quote
But I can't. Because it doesn't accept Win32 apps, only WinRT. Microsoft is clearly serious in win32 needing to die to do this.

Nonsense. The marketplace doesn't hurt existing customers immediately, except the few that are in need of competing with some established names that have apps in the marketplace.

Windows 10 contains win32, is targeted towards busineses as Windows 7 follow up and will have extended support till halfway the next decade.

Quote
Of course it will take ages for it to die, but still we need to prepare for the future in advance or be obsolete.

I already said that declaring something is dead is something totally different from the successor being (already) alive.

.NET winforms, .NET WPF, Silverlight and to a lesser degree the Vista sidebar apps have all been hailed as the successor, because MS is trying to downplay win32 since 2003, with limited success only.

The true successor might not be WinRT, but be the "new" thing of Windows 13.

Quote
you can't make money hosting apps in Windows Store with FPC, that's a big disadvantage.

True. So, what is holding you back? Personally the Store is totally uninteresting for me, I don't distribute via stores.

Quote
Stores are the future, and the Android store has been great for me.

Stores are the successor to what was shareware in the nineties. But shareware never dominated, and neither will stores in this form (where users have a choice).

It might start to differ if manages to conquer low end laptops with RT only solutions, but that was actually much less likely then 3-44 years ago. Surface RT was a miserable failure.

Quote
I'd like to explore other stores. I have now access to Android, iPhone and Mac OS X. Why not Windows store?

Go ahead. But that some have a businessmodel that needs stores, doesn't mean that we all do.

Quote
Windows has the largest user base, it's the best store to be. And being early is a big advantage.

Potentially, depending on the adaptation rate, which is crazy low, but till now it has been frankly, underwhelming.

The Apple store is not great because the number of apple users, but because of their usage percentage.  Income = userbase * store adaptation rate

Quote
I think you shouldn't cling too hard to old stuff. People that clinged to Carbon were kicked hard by Apple and are forced to adapt.

Yes, I dropped Apple, and I'm glad I did, since Lazarus/Cocoa is still in its infancy.  Though admitted, if it had been a majority platform for me, I would probably have gone the objective pascal way.

But unfortunately that was also rather later, so I guess if I had really important commercial Apple business, I'd be doing Objective C nowadays.

I guess it will be the same for WinRT. If store is that important for you, easy acceptance and quick time to market after changes/new releases is too important to let language hold you back.

Quote
Quote
Well, uh, no, since the idea was to pick the system encoding. DUH! You need conversions if you pick one encoding everywhere.

You don't seam to understand me, what I mean is that, using a API that offers UTF-8 everywhere you can:

line 1> Get string from framework (for example from TMemo)
line 2> Do string operation in the string (anything, Pos, iterate through chars, search for substring, lowercase, whatever! And not only ready made operations, but also whatever you imagine! including yes, char by char access)
line 3> Put string back into the framework (for example into TMemo)

With opaque type:

line 1> Get string from framework
line 2> Get UTF-8 string from opaque string
line 3> Do my operations
line 4> Convert again from UTF-8 to opaque string
line 5> Put back to framework

A lot worse IMHO

No, since pos() etc will accept the native encoding too. So your example is convoluted because you introduce a bias for utf8, and then conclude the utf8 way is easier.

Quote
Quote
As said, I don't buy that. Mostly initial code full of hacks.

You are kidding right? There are many companies deploying Lazarus-based software in production and with significant sales revenue.

... and they have conversion calls and own maintained copies of Delphi components everywhere.

The fact that it ships doesn't mean it is the situation they wanna be.

Quote
I agree that components are an issue, but I disagree that it is worth pissing off our existing user base.

Conversion pain is once. Dual (and triple if you support old Delphi) maintenance hurts forever.

So pissing off is relative.

Quote
I think of them first. Delphi users are a very distant concern.

Most new influx is Delphi, many codebases of components are shared. We are currently lucky because ZEOS maintainer is sympathetic to Lazarus, but the next one might not like the strain and stop support.

Quote
When I search stackoverflow, the instances when a Delphi answer appears are ... well, that's so rare I can't remember the last time.

Depends. I'm now doing opengl stuff, and then it is much less. But for topic like com ports and windows centric stuff, I find quite a lot. More often than not.

Quote
Quote
IMHO the best solution but a lot of additional work. That's why it was nearly immediately vetoed, and then I chose the solution I needed most, which is delphi compat.

I'm amazed that you say that you agree this would be the best solution.

Why? I proposed it in 2010? (originally there also was a single byte native encoding version for Windows, perfect D7 compat and FPC with current versions)

It was considered too much work, and a final decision was postponed. The fix in FPC 3 is actually nice, but works for procedural interfaces only. ANd it took 4-5 YEARS.

Quote
So why not try it? If you don't try how can you be so sure it will be so much work?

Without support from the others it is not doable. So I focus on the important part which is the unicodestring introduction.

Quote
And it would stop the endless discussions.

Yup. That was exactly why I proposed it, because all people could work on their solution and would only be asked to minimize unnecessary hard coded encoding usage.

One of the reasons why delphi compatibility worked so well is because something is either compatible or not, and that minimizes discussion, and is also mild wrt backwards compat with the old code (big changes only with major versions, minor details also in between).

If you don't, then suddenly everybody has an own opinion for the new system and wants to design their own personal "FPC" course, and backwards compat issues pop up (because the "old" FPC choice still has its proponents too etc).

There is a reason why compatibility open source projects are relatively more successful. It cuts the crap.

Take a long hard look on e.g. objfpc mode. I standby most of the changes if I had to start from zero, but is it really that different to warrant double doing everything? It doesn't really make new things possible, it is mostly notational with fringe improvements (like case of string).

If we would have started changing in D2009, we would be through most of the pain now.
« Last Edit: February 09, 2015, 02:56:44 pm by marcov »

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: What is UTF-8 Application
« Reply #27 on: February 09, 2015, 02:56:01 pm »
Why, why would I keep the utf8 operations and not convert them also to the string the framework requires?

If the string is neither UTF-8 nor UTF-16 but instead it is some opaque system-dependent type, then I cannot possibly do operations into something that changes depending in the operating system.

why? What kind of operations you can't do?

I know a few frameworks that use such opaque string types and always it is a huge pain.

Yeah I don't know any so I can't agree with you, at least not without concrete examples and links to educate my self.

Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

felipemdc

  • Administrator
  • Hero Member
  • *
  • Posts: 3538
Re: What is UTF-8 Application
« Reply #28 on: February 09, 2015, 03:39:46 pm »
Cocoa uses the stupid opaque type:

http://svn.freepascal.org/cgi-bin/viewvc.cgi/trunk/lcl/interfaces/cocoa/cocoawscommon.pas?view=markup&root=lazarus

function TLCLCommonCallback.KeyEvent(Event: NSEvent): Boolean;

    UTF8Character := NSStringToString(Event.characters);
382   
383       if Length(UTF8Character) > 0 then
384       begin
385         SendChar := True;
386   
387         if Utf8Character[1] <= #127 then
388           KeyChar := Utf8Character[1];
389   
390         // the VKKeyCode is independent of the modifier
391         // => use the VKKeyChar instead of the KeyChar
392         case VKKeyChar of
393           'a'..'z': VKKeyCode:=VK_A+ord(VKKeyChar)-ord('a');
394           'A'..'Z': VKKeyCode:=ord(VKKeyChar);
395           #27     : VKKeyCode:=VK_ESCAPE;
396           #8      : VKKeyCode:=VK_BACK;
397           ' '     : VKKeyCode:=VK_SPACE;
398           #13     : VKKeyCode:=VK_RETURN;
399           '0'..'9':
400             case KeyCode of
401               MK_NUMPAD0: VKKeyCode:=VK_NUMPAD0;
402               MK_NUMPAD1: VKKeyCode:=VK_NUMPAD1;
403               MK_NUMPAD2: VKKeyCode:=VK_NUMPAD2;
404               MK_NUMPAD3: VKKeyCode:=VK_NUMPAD3;
405               MK_NUMPAD4: VKKeyCode:=VK_NUMPAD4;
406               MK_NUMPAD5: VKKeyCode:=VK_NUMPAD5;
407               MK_NUMPAD6: VKKeyCode:=VK_NUMPAD6;
408               MK_NUMPAD7: VKKeyCode:=VK_NUMPAD7;
409               MK_NUMPAD8: VKKeyCode:=VK_NUMPAD8;
410               MK_NUMPAD9: VKKeyCode:=VK_NUMPAD9
411               else VKKeyCode:=ord(VKKeyChar);
412             end;
413           else

If I don't know the encoding how can I do case VKKeyChar of #27:  ?

If you don't know the encoding, you lose all of the control over string operations, you are basically lost. All my software do string operations, and I *need* to know the encoding to them perfectly.

In the particular case above we need NSStringToString everywhere, to convert from Opaque to UTF-8.

In the example above its OK, because its in the LCL, so users are spared from the evil world of unknown encoding and they receive a pure UTF-8 interface. All imperfections are handled by the LCL. Just like any well designed framework would do. So the user code can be smaller, the LCL handles the system differences for the user.

If the LCL didn't handle it, the user would need to convert strings in his own code.
« Last Edit: February 09, 2015, 03:41:57 pm by felipemdc »

skalogryz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2770
    • havefunsoft.com
Re: What is UTF-8 Application
« Reply #29 on: February 09, 2015, 03:57:45 pm »
Cocoa uses the stupid opaque type:
...
Well, the example with VKKeyCode doesn't correspond to the example given above. Because you'd put the stingray back to the framework.
Quote from: felipemdc
With opaque type:

line 1> Get string from framework
line 2> Get UTF-8 string from opaque string
line 3> Do my operations
line 4> Convert again from UTF-8 to opaque string
line 5> Put back to framework
A framework typically provides necessary functions for line #3, so by a framework design you don't have to convert framework string to utf-8 set of characters.
Grab the framework-stingray -> do operations via framework API -> put it back.

Btw, in this particular case (of getting VK), it might be far more efficient to use charAtIndex rather than allocating a new pascal string. .. I can try to patch the code this week, if you want me to ;)
« Last Edit: February 09, 2015, 04:02:25 pm by skalogryz »

 

TinyPortal © 2005-2018