Recent

Author Topic: Suggestion on the lines of AnsiString  (Read 2425 times)

Skvoznjak

  • Newbie
  • Posts: 6
Suggestion on the lines of AnsiString
« on: June 08, 2023, 04:50:27 pm »
It is necessary to add one dummy encoding that can be applied to the AnsiString. So that you can program like this:
Code: Pascal  [Select][+][-]
  1. type
  2. string8 = type AnsiString(fictitious_encoding);
  3.  

So that later it would be possible to store any information in lines string8 and during operations with strings it would not be recoded into other encodings. And it was possible to process any data in strings with the encoding fictitious_encoding without fear of their damage, аs it was in fpc 2.6.4 This feature is very lacking in fpc 3x And it is necessary that the encoding fictitious_encoding does not spontaneously disappear from the lines.
« Last Edit: June 08, 2023, 05:01:07 pm by Skvoznjak »

Thaddy

  • Hero Member
  • *****
  • Posts: 14209
  • Probably until I exterminate Putin.
Re: Suggestion on the lines of AnsiString
« Reply #1 on: June 08, 2023, 05:07:11 pm »
That is the same as RawByteString, which we already have....
Specialize a type, not a var.

PascalDragon

  • Hero Member
  • *****
  • Posts: 5446
  • Compiler Developer
Re: Suggestion on the lines of AnsiString
« Reply #2 on: June 08, 2023, 06:13:13 pm »
In addition to what Thaddy wrote: if you want to work with binary data you're supposed to use TBytes (which is a dynamic array of Byte) and not a String, as that is essentially an abuse of the type.

Skvoznjak

  • Newbie
  • Posts: 6
Re: Suggestion on the lines of AnsiString
« Reply #3 on: June 08, 2023, 06:36:57 pm »
Quote
That is the same as RawByteString, which we already have....
No, RawByteString doesn't work that way.
Code: Pascal  [Select][+][-]
  1. {$CODEPAGE CP866}
  2.  
  3. var
  4. S5: rawbytestring;
  5.  
  6. begin
  7. s5:=#111;
  8. writeln(stringcodepage(s5));
  9. end.
  10.  
Code: Pascal  [Select][+][-]
  1. $ ./1
  2. 866
  3.  
The simplest operation and RawByteString turned in to Ansistring.
« Last Edit: June 08, 2023, 06:44:44 pm by Skvoznjak »

Skvoznjak

  • Newbie
  • Posts: 6
Re: Suggestion on the lines of AnsiString
« Reply #4 on: June 08, 2023, 06:43:43 pm »
In addition to what Thaddy wrote: if you want to work with binary data you're supposed to use TBytes (which is a dynamic array of Byte) and not a String, as that is essentially an abuse of the type.
I want to work with data that does not deteriorate. Including text in various encodings downloaded from a file. In version fpc 2.6.4, this worked fine until an update came out in version fpc 3.x that messed up the code. What you are suggesting is causing a lot of mistakes and torment. In fact, the logic from delphi was copied to version fpc 3.x, but Pascal needs advanced data usage capabilities.

Remy Lebeau

  • Hero Member
  • *****
  • Posts: 1312
    • Lebeau Software
Re: Suggestion on the lines of AnsiString
« Reply #5 on: June 09, 2023, 09:19:08 pm »
What you are suggesting is causing a lot of mistakes and torment. In fact, the logic from delphi was copied to version fpc 3.x, but Pascal needs advanced data usage capabilities.

FreePascal's UnicodeString and RawByteString handling aren't as mature as Delphi's, but they do mirror Delphi's behavior.

What you have described is essentially what RawByteString already is. You can put whatever arbitrary data you want into a RawByteString and then use SetCodePage() to specify its encoding, so that the data will be handled correctly if you assign the RawByteString to other string types, thus invoking data conversions.  But if you simply pass a RawByeString around to other RawByteStrings, the data is reference-counted and passed around as-is.  Much like Delphi's AnsiString in its pre-Unicode days.
« Last Edit: June 09, 2023, 09:24:08 pm by Remy Lebeau »
Remy Lebeau
Lebeau Software - Owner, Developer
Internet Direct (Indy) - Admin, Developer (Support forum)

Skvoznjak

  • Newbie
  • Posts: 6
Re: Suggestion on the lines of AnsiString
« Reply #6 on: June 15, 2023, 04:14:09 pm »
That's not how I described it. Currently, any operation  RawByteString with  AnsiString requires dancing on the field with a rake and using pads. This is inconvenient and requires a lot of checks on the correctness of code execution.

« Last Edit: June 15, 2023, 05:34:25 pm by Skvoznjak »

Remy Lebeau

  • Hero Member
  • *****
  • Posts: 1312
    • Lebeau Software
Re: Suggestion on the lines of AnsiString
« Reply #7 on: June 15, 2023, 10:05:00 pm »
That's not how I described it. Currently, any operation  RawByteString with  AnsiString requires dancing on the field with a rake and using pads. This is inconvenient and requires a lot of checks on the correctness of code execution.

I literally have no idea what you just said.

Can you provide a specific example of code that is not working the way you want it to? Can you provide specific details about what exactly you think is wrong, or needs changing, in the compiler/RTL?

In your earlier example, you did not specify a codepage for the RawByteString to use.  RawByteString does not have a codepage affinity at compile-time, it inherits the codepage of whatever string is assigned to it at runtime.  But, since you are assigning a literal to it, it can't pick up the codepage that is compiler is currently using for the literal, so you have to specify the codepage explicitly at runtime, eg:

Code: Pascal  [Select][+][-]
  1. {$CODEPAGE CP866}
  2.  
  3. var
  4.   S5: rawbytestring;
  5.  
  6. begin
  7.   s5 := #111;
  8.   SetCodePage(s5, 866, False); // <-- ADD THIS!
  9.   ...
  10. end.
  11.  
« Last Edit: June 15, 2023, 10:12:04 pm by Remy Lebeau »
Remy Lebeau
Lebeau Software - Owner, Developer
Internet Direct (Indy) - Admin, Developer (Support forum)

Skvoznjak

  • Newbie
  • Posts: 6
Re: Suggestion on the lines of AnsiString
« Reply #8 on: June 17, 2023, 12:23:55 am »
Yes that's what I suggested:
Code: Pascal  [Select][+][-]
  1. {$CODEPAGE .........} //any encoding written here will not affect the type variables string8 and string9
  2. type
  3. string8 = type AnsiString(fictitious_encoding);
  4. string9 = type RawByteString(fictitious_encoding);
  5.  
And so that variables of these types do not require SetCodePage to work. Chose the type containing the encoding fictitious_encoding once and the bytes in these lines are not overwritten. And until these lines become SetCodePage, the encoding in them will not change in any way.

Remy Lebeau

  • Hero Member
  • *****
  • Posts: 1312
    • Lebeau Software
Re: Suggestion on the lines of AnsiString
« Reply #9 on: June 17, 2023, 12:38:31 am »
Yes that's what I suggested:
Code: Pascal  [Select][+][-]
  1. {$CODEPAGE .........} //any encoding written here will not affect the type variables string8 and string9
  2. type
  3. string8 = type AnsiString(fictitious_encoding);
  4. string9 = type RawByteString(fictitious_encoding);
  5.  

This is already possible today, using the first declaration (RawByteString is defined using type AnsiString).

Any string data that is assigned to an AnsiString(N) type gets converted to N at runtime, except when N=$FFFF in the case of RawByteString.

However, the PROBLEM is that if you try to assign a literal to such an AnsiString(N) type, the literal does not carry any encoding info needed to allow its data to be held/converted to encoding N of the target AnsiString(N) type.  So, you must use SetCodePage() instead to force the encoding at runtime.

And so that variables of these types do not require SetCodePage to work.

They already don't require it, when assigned other strings.  It is when assigning LITERALS that require it (or other data sources where encoding info is not known until runtime).

Chose the type containing the encoding fictitious_encoding once and the bytes in these lines are not overwritten. And until these lines become SetCodePage, the encoding in them will not change in any way.

I have no idea what you are describing.  You are going to have to provide an example.
Remy Lebeau
Lebeau Software - Owner, Developer
Internet Direct (Indy) - Admin, Developer (Support forum)

Skvoznjak

  • Newbie
  • Posts: 6
Re: Suggestion on the lines of AnsiString
« Reply #10 on: June 23, 2023, 09:01:50 pm »
Quote
They already don't require it, when assigned other strings.  It is when assigning LITERALS that require it (or other data sources where encoding info is not known until runtime).

And when string encodings are known, bad things happen too. I have already redesigned the program for windows to use the new compiler. And a lot of bad things happened, because of which we had to look for solutions. If an eight-bit system encoding is installed, then many procedures working with files and directories incorrectly process paths containing characters that are not in the eight-bit encoding. That is, automatically changing data in rows is not always good. And then, in order to access files and directories, it is necessary to change the system encoding many times during the operation of the program.

Quote
I have no idea what you are describing.  You are going to have to provide an example.

The idea is to make a universal eight-bit CP_FICTITIOUS encoding for strings ansistrings, for which many exceptions will be made in RTL, which will allow even binary data to be stored in strings with CP_FICTITIOUS encoding without unnecessary hassle.

The idea is roughly outlined in the code, but so far it is not enough to avoid the error "Error: unknown code page" during compilation in the test code.
Code: Pascal  [Select][+][-]
  1. {$CODEPAGE UTF8}
  2.  
  3. type
  4. string8 = type {RawByteString(1251);}RawByteString(cp_fictitious);
  5. string866 = type RawByteString(866);
  6.  
  7. var
  8. q5: ansistring;
  9. a5: string8;
  10. x5: string866;
  11. q2:longint;
  12.  
  13. begin
  14. setmultibyteconversioncodepage(cp_utf8);
  15. q5:='Й';
  16. writeln('stringcodepage(q5)=',stringcodepage(q5),' length(q5)=',length(q5));
  17. a5:=#0#150#190#200#250#255;
  18. writeln('stringcodepage(a5)=',stringcodepage(a5),' length(a5)=',length(a5));
  19. a5:=a5+q5;
  20. writeln('stringcodepage(a5)=',stringcodepage(a5),' length(a5)=',length(a5));
  21. for q2:=1 to length(a5) do write(ord(a5[q2]),' ');
  22. writeln;
  23. x5:=q5;
  24. writeln('stringcodepage(x5)=',stringcodepage(x5),' length(x5)=',length(x5));
  25. for q2:=1 to length(x5) do write(ord(x5[q2]),' ');
  26. writeln;
  27. a5:=a5+x5;
  28. writeln('stringcodepage(a5)=',stringcodepage(a5),' length(a5)=',length(a5));
  29. for q2:=1 to length(a5) do write(ord(a5[q2]),' ');
  30. writeln;
  31. end.

There are patches in the archive, with an approximate description of how it should work. (but so far there are not enough patches to work with) The order of applying patches:
systemh_cp_f.patch
astrings_cp_f_9.patch
astrings_cp_f_14.patch

A large patch is divided into two parts, because before that the "patch" utility worked with it incorrectly.

 

TinyPortal © 2005-2018