UTF16

rtusrghsdfhsfdhsdfhsfdhs

Full Member
Posts: 162

UTF16

« on: September 04, 2015, 04:53:19 pm »

Is LCL UTF16 ready yet? Conversions destroy performance.

« Last Edit: September 04, 2015, 04:57:12 pm by Fiji »

Logged

taazz

Hero Member
Posts: 5368

Re: UTF16

« Reply #1 on: September 04, 2015, 05:06:37 pm »

No. It requires the new unicode support in fpc 3.0 and it currently only supports utf8. Somewhere I read that people are working on the utf16 support but it will not be ready in the near future, for sure not when the fpc 3.0 is going to be released.

Logged

Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

otoien

Jr. Member
Posts: 89

Re: UTF16

« Reply #2 on: September 05, 2015, 05:08:48 am »

Well, please do not take the possibility to have the UTF-8 as default string encoding everywhere away from us Windows users. From a user perspective, having UTF-8 everywhere is so nice compared to the previous "nightmare" of needing to keep track of what is RTL and LCL with respect to string types and conversions. And I would definitely not want to write large text based mostly numerical data files in UTF16; UTF-8 is perfect for that. Besides cross-platform compatibility of the data files and internal code also needs to be maintained. I am currently using the fixes branch that come with FPC 3.01 and I have UTF-8 turned on as default string type everywhere; it works very well.

« Last Edit: September 05, 2015, 05:37:28 am by otoien »

Logged

Unless otherwise noted I always use the latest stable version of Lasarus/FPC x86_64-win64-win32/win64

taazz

Hero Member
Posts: 5368

Re: UTF16

« Reply #3 on: September 05, 2015, 07:03:21 am »

I'm glad you are comfortable with the utf8 I on the other hand I'm not and since the next version will require too many changes in my libraries I'm going to find an other way to move forward. If I'm to rewrite my libraries to make them compatible with the new version then I'm better of rewriting them in a more mainstream language and leave pascal behind altogether.

Logged

Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

mse

Sr. Member
Posts: 286

Re: UTF16

« Reply #4 on: September 05, 2015, 09:42:53 am »

Quote from: taazz on September 05, 2015, 07:03:21 am

I'm glad you are comfortable with the utf8 I on the other hand I'm not and since the next version will require too many changes in my libraries I'm going to find an other way to move forward. If I'm to rewrite my libraries to make them compatible with the new version then I'm better of rewriting them in a more mainstream language and leave pascal behind altogether.

There is an alternative to "leave pascal behind altogether", use MSEide+MSEgui. It uses 16 bit Unicode strings for GUI since more than 10 years. For on disk storage and communication it uses utf-8 or the current 8-bit system encoding.

Logged

JuhaManninen

Global Moderator
Hero Member
Posts: 4467
I like bugs.

Re: UTF16

« Reply #5 on: September 05, 2015, 01:15:51 pm »

Quote from: otoien on September 05, 2015, 05:08:48 am

... And I would definitely not want to write large text based mostly numerical data files in UTF16; UTF-8 is perfect for that.

Program's internal encoding and file's encoding can be different. For example Delphi itself stores most of its files in UTF-8.
You make it sound like there is a big difference and everything changes with the internal string encoding. No, it is not so dramatic really.

Quote

... I am currently using the fixes branch that come with FPC 3.01 and I have UTF-8 turned on as default string type everywhere; it works very well.

Yes, and it is more Delphi compatible than you would expect.
At least my experience is that most encoding specific code is already encapsulated in functions. The few places in user code that are encoding specific, can also be encapsulated.
Reading / writing files or streams which are not UTF-8 need changes, but that can be encapsulated, too.
I believe there is code which is more difficult to convert but my test cases proved to be easy.

Logged

Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

otoien

Jr. Member
Posts: 89

Re: UTF16

« Reply #6 on: September 05, 2015, 02:54:55 pm »

Quote from: JuhaManninen on September 05, 2015, 01:15:51 pm

Program's internal encoding and file's encoding can be different. For example Delphi itself stores most of its files in UTF-8.
You make it sound like there is a big difference and everything changes with the internal string encoding. No, it is not so dramatic really.

Ah, I was I was talking about data files I write myself, not program code. These are files that in the future should be accessible by different users on different systems and they need to be standardized, not system specific, as data sharing is getting very common in the scientific world. It would be an extra hassle if one day one suddenly have to rewrite all these routines to translate from UTF-16 to UTF-8. It would be difficult if streaming or inifiles routines suddenly one day switched to only accepting UTF16 for windows users. I hope programmers who choose to should be able to keep one consistent encoding regardless of platform and be able to keep UTF-8 in the default string type.

It would look natural that the current philosophy was followed so that the directives parallel to
{$DEFINE EnableUTF8RTL}
{$DEFINE FcUTF8}
would be used to define UTF-16 encoding for delphi compatibility when this has been implemented if the user choose to, but that the user who want to have UTF-8 encoding still has the option to do so on all platforms.

« Last Edit: September 05, 2015, 02:58:44 pm by otoien »

Logged

Unless otherwise noted I always use the latest stable version of Lasarus/FPC x86_64-win64-win32/win64

taazz

Hero Member
Posts: 5368

Re: UTF16

« Reply #7 on: September 05, 2015, 03:46:48 pm »

Quote from: otoien on September 05, 2015, 02:54:55 pm

Quote from: JuhaManninen on September 05, 2015, 01:15:51 pm
Program's internal encoding and file's encoding can be different. For example Delphi itself stores most of its files in UTF-8.
You make it sound like there is a big difference and everything changes with the internal string encoding. No, it is not so dramatic really.
Ah, I was I was talking about data files I write myself, not program code. These are files that in the future should be accessible by different users on different systems and they need to be standardized, not system specific, as data sharing is getting very common in the scientific world. It would be an extra hassle if one day one suddenly have to rewrite all these routines to translate from UTF-16 to UTF-8. It would be difficult if streaming or inifiles routines suddenly one day switched to only accepting UTF16 for windows users. I hope programmers who choose to should be able to keep one consistent encoding regardless of platform and be able to keep UTF-8 in the default string type.

It would look natural that the current philosophy was followed so that the directives parallel to
{$DEFINE EnableUTF8RTL}
{$DEFINE FcUTF8}
would be used to define UTF-16 encoding for delphi compatibility when this has been implemented if the user choose to, but that the user who want to have UTF-8 encoding still has the option to do so on all platforms.

There is a string type specifically designed for utf8 named utf8string, if you want your libraries to be utf8 do the same thing every one else does and use the specific types. String should be widget set specific. When I write programs for windows I expect that the string variable is utf16 and when I write programs for linux using GTK I expect it to be utf8.
On top of that lcl should never touch any specific datatype changing its encoding, for example, when I define something as ansistring I expect it to hold system specific ansi characters not utf8. When I define something as unicodestring I expect it to have utf16 characters from prior art. I would be more than huppy to use lcl if the string type and ONLY the string type is specified as utf8 for all targets regardless of the speed hit anything less is unacceptable for me.

Logged

Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

taazz

Hero Member
Posts: 5368

Re: UTF16

« Reply #8 on: September 05, 2015, 04:40:18 pm »

Quote from: mse on September 05, 2015, 09:42:53 am

Quote from: taazz on September 05, 2015, 07:03:21 am
I'm glad you are comfortable with the utf8 I on the other hand I'm not and since the next version will require too many changes in my libraries I'm going to find an other way to move forward. If I'm to rewrite my libraries to make them compatible with the new version then I'm better of rewriting them in a more mainstream language and leave pascal behind altogether.
There is an alternative to "leave pascal behind altogether", use MSEide+MSEgui. It uses 16 bit Unicode strings for GUI since more than 10 years. For on disk storage and communication it uses utf-8 or the current 8-bit system encoding.

It is in my sort list of things to check as long as it can satisfy the most basic needs that is.

Logged

Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

JuhaManninen

Global Moderator
Hero Member
Posts: 4467
I like bugs.

Re: UTF16

« Reply #9 on: September 05, 2015, 05:25:17 pm »

Quote from: taazz on September 05, 2015, 03:46:48 pm

... String should be widget set specific. When I write programs for windows I expect that the string variable is utf16 and when I write programs for linux using GTK I expect it to be utf8.

That is a very bad idea!

Quote

On top of that lcl should never touch any specific datatype changing its encoding, for example, when I define something as ansistring I expect it to hold system specific ansi characters not utf8.

I personally am happy that we get rid of the horrible ansi codepages. They have created enough trouble already.
Ansi codepages were an ugly hack to support non-English languages quickly. It was backwards compatible with ASCII but otherwise a complete mess.
Finally came Unicode and cleared the mess. One of its encodings, UTF-8, is even compatible with ASCII which is clever IMO, but it is not relevant now.

A question for taazz: Knowing all the facts, why do you want to still use ansi codepages? You should dump them and use Unicode instead. The encoding is not even very relevant based on my experience.

Quote

When I define something as unicodestring I expect it to have utf16 characters from prior art. I would be more than huppy to use lcl if the string type and ONLY the string type is specified as utf8 for all targets regardless of the speed hit anything less is unacceptable for me.

FPC supports changing the default encoding of both String and AnsiString now, both for the same bargin.
But again, why would you need ansi codepages? Dump them!

Logged

Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

marcov

Administrator
Hero Member
Posts: 11446
FPC developer.

Re: UTF16

« Reply #10 on: September 05, 2015, 05:38:01 pm »

Quote from: JuhaManninen on September 05, 2015, 05:25:17 pm

A question for taazz: Knowing all the facts, why do you want to still use ansi codepages?

Not all parts might be under your control, so often it is not even your choice to make. That's why Lazarus doing it deliberately Windows incompatible is such a pain. There is not even an ansistring type with the default encoding, throwing you back to pchar level.

Logged

JuhaManninen

Global Moderator
Hero Member
Posts: 4467
I like bugs.

Re: UTF16

« Reply #11 on: September 05, 2015, 06:08:13 pm »

Quote from: marcov on September 05, 2015, 05:38:01 pm

Not all parts might be under your control, so often it is not even your choice to make.

Ok, it may cause pain in some cases.
I am already waiting for the time when ansi codepages are a distant history and nobody uses them any more. Technically it is not a good system at all.

Logged

Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

skalogryz

Global Moderator
Hero Member
Posts: 2770

Re: UTF16

« Reply #12 on: September 05, 2015, 06:13:01 pm »

Quote from: JuhaManninen on September 05, 2015, 06:08:13 pm

Ok, it may cause pain in some cases.
I am already waiting for the time when ansi codepages are a distant history and nobody uses them any more. Technically it is not a good system at all.

I don't think that's possible any time soon, due to a great number of Ansi-encoded data still being used.

Even more, a lot of software that's using ansi-encoded, doesn't even need to be translated and/or support any sort of unicode.

« Last Edit: September 05, 2015, 06:14:53 pm by skalogryz »

Logged

taazz

Hero Member
Posts: 5368

Re: UTF16

« Reply #13 on: September 05, 2015, 06:27:32 pm »

Quote from: JuhaManninen on September 05, 2015, 05:25:17 pm

Quote from: taazz on September 05, 2015, 03:46:48 pm
... String should be widget set specific. When I write programs for windows I expect that the string variable is utf16 and when I write programs for linux using GTK I expect it to be utf8.

That is a very bad idea!

That is what people keep saying and I do not want to start challenging that assumption (mainly from lack of time) but lcl is a thin layer on top of existing widget sets how but an idea can be to use the underline widget set's default encoding.

Quote from: JuhaManninen on September 05, 2015, 05:25:17 pm

Quote
On top of that lcl should never touch any specific datatype changing its encoding, for example, when I define something as ansistring I expect it to hold system specific ansi characters not utf8.

I personally am happy that we get rid of the horrible ansi codepages. They have created enough trouble already.
Ansi codepages were an ugly hack to support non-English languages quickly. It was backwards compatible with ASCII but otherwise a complete mess.
Finally came Unicode and cleared the mess. One of its encodings, UTF-8, is even compatible with ASCII which is clever IMO, but it is not relevant now.

Depends are you talking about the 7bit ascii or the 8bit ascii? On the 8bit ascii there are code pages for multiple languages as well which are not compatible with utf8. The 7bit ascii is only compatible with USA which makes it some what relevant but not much. (I'm prety sure you are talking about 7bit ascii though).

Quote from: JuhaManninen on September 05, 2015, 05:25:17 pm

A question for taazz: Knowing all the facts, why do you want to still use ansi codepages? You should dump them and use Unicode instead. The encoding is not even very relevant based on my experience.

1) I'm not the owner of the system with ansi code pages I only provide an esternal tool.
2) I should not have to explain to any one why I need ansi code pages I must be able to use them.
3) I should not have to jump through hoops to use encoding specific types, accommodating a gui toolkit is a good reason not to do it at all.
4) if I'm going to dump ansi code pages which is the next logical step but not up to me to decide, utf16 is a better fit for my needs.

I would have the same reaction if you changed the word data type to a 4 byte data type in 64bit targets and that is easier to fix.

Quote from: JuhaManninen on September 05, 2015, 05:25:17 pm

Quote
When I define something as unicodestring I expect it to have utf16 characters from prior art. I would be more than huppy to use lcl if the string type and ONLY the string type is specified as utf8 for all targets regardless of the speed hit anything less is unacceptable for me.

FPC supports changing the default encoding of both String and AnsiString now, both for the same bargin.
But again, why would you need ansi codepages? Dump them!

I didn't know that nor I have any idea what that means for me, is that something that will be part of the 3.0 release?

Quote from: JuhaManninen on September 05, 2015, 05:25:17 pm

But again, why would you need ansi codepages? Dump them!

What I do or not with my products is not up for discussion, neither is breaking a fully tested with years of fine tuning library just because utf8.

Logged

Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

rtusrghsdfhsfdhsdfhsfdhs

Full Member
Posts: 162

Re: UTF16

« Reply #14 on: September 05, 2015, 06:49:57 pm »

Only a few GUI frameworks still use UTF8. Now that C++ Builder has a CLANG 32 bit compiler things are slowly starting to change.

« Last Edit: September 05, 2015, 06:55:30 pm by Fiji »

Logged

Lazarus

Bookstore

Search

Recent

Author Topic: UTF16 (Read 10438 times)

rtusrghsdfhsfdhsdfhsfdhs

UTF16

taazz

Re: UTF16

otoien

Re: UTF16

taazz

Re: UTF16

mse

Re: UTF16

JuhaManninen

Re: UTF16

otoien

Re: UTF16

taazz

Re: UTF16

taazz

Re: UTF16

JuhaManninen

Re: UTF16

marcov

Re: UTF16

JuhaManninen

Re: UTF16

skalogryz

Re: UTF16

taazz

Re: UTF16

rtusrghsdfhsfdhsdfhsfdhs

Re: UTF16

	Computer Math and Games in Pascal (preview)
	Lazarus Handbook