Recent

Author Topic: If you've ever wanted to be able to use multi-line strings in FPC...  (Read 29696 times)

Akira1364

  • Hero Member
  • *****
  • Posts: 561
Re: If you've ever wanted to be able to use multi-line strings in FPC...
« Reply #30 on: July 12, 2019, 04:56:47 am »
I admit to having lost track of how you've implemented the feature.  At the time I wrote that, I was thinking about the discussion in FPC-devel which used "(##" (if I'm not mistaken again.)

Ah, that was my very initial suggestion, yes. Backtick was decided as being better quite quickly though. And it is, in hindsight, by quite a lot, especially from an implementation standpoint.

I understand.  The one thing that concerns me, if I've understood the description correctly, is that with that construct, there is no way to have fixed-length multi-line strings.    I think the ability of having fixed-length multi-line strings should not be lost (it can be done using current syntax but, obviously more typing and more clutter but, it's there, it's available if you need it.)

Multi-line strings are just strings in every way. So the following is valid code with this feature, that works like you'd expect:

Code: Pascal  [Select][+][-]
  1. type StringX = String[7];
  2.  
  3. const S: StringX = `
  4.  2
  5.  34
  6. `;

I even made sure in my tests that things like multi-line deprecation messages:

https://github.com/Akira13641/freepascal/blob/83def7b8ab3aabbe748bd127cf92ede7b5f1447c/tests/test/tmultilinestring18.pp

and multi-line dispatch strings:

https://github.com/Akira13641/freepascal/blob/83def7b8ab3aabbe748bd127cf92ede7b5f1447c/tests/test/tmultilinestring17.pp

work properly.

That was quick, thank you!!

No problem.

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: If you've ever wanted to be able to use multi-line strings in FPC...
« Reply #31 on: July 12, 2019, 06:18:46 am »
I think I hit another bug. It doubles the line ending.

The following code:
Code: Pascal  [Select][+][-]
  1. program doubleLineEndingBug;
  2.  
  3. {$mode objfpc}
  4. {$modeswitch MultiLineStrings}
  5. {$MultiLineStringTrimLeft 15}
  6. {$MultiLineStringLineEnding Platform}
  7.  
  8. var
  9. {$MultiLineStringLineEnding CR}
  10.   a: array[0..3] of string = (
  11. ``
  12. ,
  13. `
  14. `
  15. ,
  16. `
  17.  
  18. `
  19. ,
  20. `
  21.  
  22.  
  23. `);
  24.  
  25.   {$MultiLineStringLineEnding CRLF}
  26. b: array[0..3] of string = (
  27. `1`
  28. ,
  29. `1
  30. 2`
  31. ,
  32. `1
  33. 2
  34. 3`
  35. ,
  36. `1
  37. 2
  38. 3
  39. 4`);
  40.  
  41. procedure Test(StrArray:array of string);
  42. var
  43.   s,sHex: string;
  44.   c: char;
  45. begin
  46.   for s in StrArray do
  47.   begin
  48.     WriteLn('Length: ',Length(s));
  49.     sHex := ``;
  50.     for c in s do
  51.       sHex := sHex+`$`+hexStr(ord(c),2)+` `;
  52.     WriteLn(sHex);
  53.   end;
  54.   WriteLn('---------------');
  55. end;
  56.  
  57. begin
  58.   Test(a);
  59.   Test(b);
  60. end.

Gives for array a:
Code: Text  [Select][+][-]
  1. Length: 0
  2.  
  3. Length: 2
  4. $0D$0D
  5. Length: 4
  6. $0D$0D$0D$0D
  7. Length: 6
  8. $0D$0D$0D$0D$0D$0D

I expected it to be:
Code: Text  [Select][+][-]
  1. Length: 0
  2.  
  3. Length: 1
  4. $0D
  5. Length: 2
  6. $0D$0D
  7. Length: 3
  8. $0D$0D$0D

and for the second array b it also doubles the line endings:
Code: Text  [Select][+][-]
  1. Length: 1
  2. $31
  3. Length: 6
  4. $31$0D$0A$0D$0A$32
  5. Length: 11
  6. $31$0D$0A$0D$0A$32$0D$0A$0D$0A$33
  7. Length: 16
  8. $31$0D$0A$0D$0A$32$0D$0A$0D$0A$33$0D$0A$0D$0A$34

See the second string in this array:
`1
2`

Turned to be:
$31$0D$0A$0D$0A$32

Assuming I have the trunk patched right, of course.
« Last Edit: July 12, 2019, 06:20:37 am by engkin »

Akira1364

  • Hero Member
  • *****
  • Posts: 561
Re: If you've ever wanted to be able to use multi-line strings in FPC...
« Reply #32 on: July 12, 2019, 06:59:46 am »
I think I hit another bug. It doubles the line ending.

I get your expected results with an LF encoded file, but not with a CRLF one. Let me take a closer look at what's going on there. Thanks for really digging in to the testing!

avra

  • Hero Member
  • *****
  • Posts: 2514
    • Additional info
Re: If you've ever wanted to be able to use multi-line strings in FPC...
« Reply #33 on: July 12, 2019, 08:23:40 am »
Can you see how much of a problem that can be? Do I have to start making notes what are line endings in my sources? Do I have to put {$MultiLineStringEnding PLATFORM} in all files just because anyone can accidentally change line endings in my sources and by doing that make output of my program completely different?
I think you're vastly overstating the amount of scenarios where this would be a problem, and also expecting the compiler to essentially babysit your revision control configuration. The compiler directives are directives. Use them as you wish, like you would any existing directive.
Are you saying that people using GIT or SVN is an overstated scenario? I do not expect compiler to babysit my strings, I expect it to be in line with current behavior. Current behavior is that if you have String1 in SomeStringList and add String2 to SomeStringList, then SomeStringList will have different line endings in Windows and Linux, and SomeStringList will be printable on both without any changes. I expect the same behavior form this new feature, and anything else is in collision with what are we used to. Otherwise we will have a code that might produce unusable strings on some platforms unless we tell it explicitly to the compiler that we don't want that. And it doesn't matter if we have to put a project wide switch or unit wide switch. Both are bad.
« Last Edit: July 12, 2019, 08:27:41 am by avra »
ct2laz - Conversion between Lazarus and CodeTyphon
bithelpers - Bit manipulation for standard types
pasettimino - Siemens S7 PLC lib

PascalDragon

  • Hero Member
  • *****
  • Posts: 5444
  • Compiler Developer
Re: If you've ever wanted to be able to use multi-line strings in FPC...
« Reply #34 on: July 12, 2019, 09:45:31 am »
Here is a thought, it's a small variation on the concept Akira has presented, using current Pascal syntax:
Code: Pascal  [Select][+][-]
  1.  const
  2.    s = 'hello ' +
  3.        'world'  ;    // would print "hello world" (just as it does now.)
  4.  
  5.    {$LINEBREAKS ON}
  6.    b = 'hello'  +
  7.        'world'  ;
  8.  
  9.    // would print (imagine "// " is the beginning of the line)
  10.    // hello
  11.    // world
  12.  

{$LINEBREAKS would be active ONLY for 1 string, the very next one, to prevent having unintended line breaks in other strings.  I think this is a feature that is used rarely enough that having to type {$LINEBREAKS ON} for each one of them is within reason.

Basically, teach a new "trick" to the "+" pony.
This would mean that the parser would need to handle new lines which it currently is totally ignorant of. The multiline string implemented  by Akira1364 is solely implemented inside the scanner which is already dealing with lines and such. Thus your idea is much more intrusive for the compiler and thus harder to maintain in the long run and thus much less likely to get accepted than Akira1364's idea.

No you didn't miss understand me, I screwed that one up, I guess the editor in  your case makes
no difference but control characters and UTF8 need to be tested.

 For me all this is going to do is remove the use of the '+ at the end of the line.

 that is all I'll ever use it for.
You are aware that the feature replaces more than just the +?
Code: Pascal  [Select][+][-]
  1. const
  2.   Str1 = 'Currently a String ' + sLineBreak +
  3.          'with a linebreak';
  4.  
  5.   Str2 = `A new multi
  6.           line string`;
(yes, I'm aware that there'd be spaces between the start and line if the trim option is not used)

The problem here is that allowing multi lines for strings building is great, most editors allow it and most
compilers allow it but I don't like the idea of it changing the content of the string I specify..

 Changing the content means removing left spaces, inserting CR etc... that isn't ideal..

Look at this code:
Code: Pascal  [Select][+][-]
  1. procedure Something;
  2. begin
  3.   if whatever then
  4.     if somethingelse then
  5.       somestring := `Some multiline
  6.                      string`;
  7. end;
Without the ability to trim spaces on the left I'd either have a bunch of spaces before string or I'd need to write it like this:
Code: Pascal  [Select][+][-]
  1. procedure Something;
  2. begin
  3.   if whatever then
  4.     if somethingelse then
  5.       somestring := `Some multiline
  6. string`;
  7. end;
Which is ugly as hell.

And the ability to control the new line characters is important as well cause in normal strings you can control it by inserting the corresponding characters manually (see the sLineBreak above)

Having PLATFORM as the default would be much less expected IMO, and likely frustrating for anyone who did not want the compiler to impose a certain line ending that may have been not what was actually written.
Considering that the idea is to replace StrConst = 'Foo' + sLineBreak + 'Bar' making PLATFORM the default might indeed be the most sensible approach.

Another thing regarding the left trimming: I think you wrote that the trim option will trim spaces of the first line, but in the example mentioned by avra there was this:
Code: Pascal  [Select][+][-]
  1. {$MultiLineStringTrimLeft 40}
  2.  
  3. const Y = `A
  4.            B
  5.            C
  6.            D`;
Do I understand correctly that this then wouldn't be trimmed at all, because the first line doesn't have any spaces?

BrunoK

  • Sr. Member
  • ****
  • Posts: 452
  • Retired programmer
Re: If you've ever wanted to be able to use multi-line strings in FPC...
« Reply #35 on: July 12, 2019, 10:19:17 am »
I would gladly use something like :
Code: Pascal  [Select][+][-]
  1.     const
  2.       Str2 = `SELECT o.*, C.Company
  3.               from Orders O
  4.               join Customer C
  5.                 on o.CustNo=C.ID
  6.               where
  7.                 O.saledate=DATE '2001.03.20'`;
  8. or
  9.  
  10.     const
  11.       Str2 =
  12.         `SELECT o.*, C.Company
  13.          from Orders O
  14.          join Customer C
  15.            on o.CustNo=C.ID
  16.          where
  17.            O.saledate=DATE '2001.03.20'`;
  18.  
would be interpreted by the scanner as  :
Code: Pascal  [Select][+][-]
  1.       Str2 = 'SELECT o.*, C.Company' + LineEnding +
  2.              'from Orders O' + LineEnding +
  3.              'join Customer C' + LineEnding +
  4.              '  on o.CustNo=C.ID' + LineEnding +
  5.              'where' + LineEnding +
  6.              '  O.saledate=DATE ''2001.03.20''';
  7.  

JernejL

  • Jr. Member
  • **
  • Posts: 92
Re: If you've ever wanted to be able to use multi-line strings in FPC...
« Reply #36 on: July 12, 2019, 10:24:29 am »
An unpopular suggestion: What about simply introducing an alternate syntax system for strings?
 
php has single quoted and double quoted strings:
https://www.php.net/manual/en/language.types.string.php
 
The difference is, that single quote strings are as-is (kinda like pascal strings) while double-quote strings support escape sequences and other magical things including multiline behavior.

 
If we could simply introduce secondary string syntax, it could conform and immitate this - already seen behavior, we could then also add things like escape sequences to whole thing - something present in practically every single other language out there.
 
This would also make double quoted strings that would be practically identical in how they'd behave in C-languages, making c code portability even easier.
 
And the best part is, this can simply not break any other thing in pascal syntax, double quotes are simply not used for anything currently.
 

Akira1364

  • Hero Member
  • *****
  • Posts: 561
Re: If you've ever wanted to be able to use multi-line strings in FPC...
« Reply #37 on: July 12, 2019, 02:50:48 pm »
Considering that the idea is to replace StrConst = 'Foo' + sLineBreak + 'Bar' making PLATFORM the default might indeed be the most sensible approach.

If that's what people really want, I'm not too picky either way.

Do I understand correctly that this then wouldn't be trimmed at all, because the first line doesn't have any spaces?

You do not I'm afraid. It is trimmed like you'd expect (so, in that case, completely trimmed such that there is no leading space left anywhere.)

I.E. it would look like this if displayed with WriteLn:

Code: Pascal  [Select][+][-]
  1. A
  2. B
  3. C
  4. D

Upon encountering characters #32, #9, or #11, my code in the scanner checks two boolean values to determine whether or not to trim, which it toggles between true and false where appropriate: "first_multiline" (meaning we've just seen an opening backtick) and also "had_newline" (meaning we've just seen a newline character and are at the very beginning of the next line.)

The main block of code relevant to that is simply this:

Code: Pascal  [Select][+][-]
  1. #32,#9,#11 :
  2.   if (had_newline or first_multiline) and (current_settings.whitespacetrimcount > 0) then
  3.     begin
  4.       trimcount:=current_settings.whitespacetrimcount;
  5.       while (c in [#32,#9,#11]) and (trimcount > 0) do
  6.         begin
  7.           readchar;
  8.           dec(trimcount);
  9.         end;
  10.       had_newline:=false;
  11.       first_multiline:=false;
  12.       goto quote_label;
  13.     end;

"quote_label" there being exactly the top of the case statement containing that branch (which must be re-entered after trimming in order to properly avoid EOF.)

I think I hit another bug. It doubles the line ending.

I get your expected results with an LF encoded file, but not with a CRLF one. Let me take a closer look at what's going on there. Thanks for really digging in to the testing!

Ok, I've identified the issue with your second bug-find here. The problem is the specific combination of "MultiLineStringLineEnding CRLF" with a source file that is itself already CRLF (or "MultiLineStringEnding PLATFORM" when "target_info.newline" is specifically #13#10, with a file that is already itself CRLF).

After reviewing it, I realized that my code does not currently quite properly account for a contiguous sequence of #13 and #10, one after another, in those cases, and so both the #13 and #10 are replaced with a full #13#10.

("MultiLineStringLineEnding RAW" is not affected by this regardless of the operating system or the nature of the file, however, because it does not ever intentionally write more than one character at a time.)

I'm fairly confident that I know what the best way to remedy this is, but I might not have a chance to do so until later today (am at work right now, haha.) As soon as I can though, I'll fix it, push the changes to github, and upload another new pair of patches.

Thanks again for really trying to find the edge cases! It's precisely the kind of thing I was hoping for when I asked for testers. Hopefully that's the last big-ish bug, but of course let me know if you find anything else.
« Last Edit: July 12, 2019, 04:04:33 pm by Akira1364 »

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: If you've ever wanted to be able to use multi-line strings in FPC...
« Reply #38 on: July 12, 2019, 03:14:00 pm »
I'm fairly confident that I know what the best way to remedy this is, but I might not have a chance to do so until later today (am at work right now, haha.) As soon as I can though, I'll fix it, push the changes to github, and upload another new pair of patches.
Take your time. I am sure you can find the best way to correct it. Thank you for putting all that effort into this nice feature.

440bx

  • Hero Member
  • *****
  • Posts: 3921
Re: If you've ever wanted to be able to use multi-line strings in FPC...
« Reply #39 on: July 12, 2019, 06:10:06 pm »
This would mean that the parser would need to handle new lines which it currently is totally ignorant of. The multiline string implemented  by Akira1364 is solely implemented inside the scanner which is already dealing with lines and such. Thus your idea is much more intrusive for the compiler and thus harder to maintain in the long run and thus much less likely to get accepted than Akira1364's idea.
First, most important, you may very well be right since I have not spent any real amount of time analyzing the ways the feature can be implemented.

That said, the idea behind {$LINEBREAKS ON} is to direct the scanner to a new routine that handles multiline strings.  What Akira triggers with a ` (backtick), I would trigger with a directive.   Just as simple, just a different path to getting to the same place.

At least at first blush, there doesn't seem to be anything preventing such an implementation and it would not require adding new elements (such as the backtick) to the language.

I threw that out there, not even as a suggestion but, as an idea in case keeping syntax just the way it currently is would be appealing to those who would really like to see the feature implemented.

(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

lainz

  • Hero Member
  • *****
  • Posts: 4449
    • https://lainz.github.io/
Re: If you've ever wanted to be able to use multi-line strings in FPC...
« Reply #40 on: July 12, 2019, 09:11:08 pm »
I like the idea, didn't read all the opinions btw.

Also I like the backtick usage, is like is done in JavaScript for example.

Keep going and get it bug free in time.  :)

in javascript
https://flaviocopes.com/how-to-create-multiline-string-javascript/

and template literals can be good to have too
https://flaviocopes.com/javascript-template-literals/
« Last Edit: July 12, 2019, 09:26:13 pm by Lainz »

Thaddy

  • Hero Member
  • *****
  • Posts: 14166
  • Probably until I exterminate Putin.
Re: If you've ever wanted to be able to use multi-line strings in FPC...
« Reply #41 on: July 12, 2019, 09:14:10 pm »
I am never going to use it because I really do not see the point. (Even after reading all discussions on the mailing list)
You can't solve bad programming with a compiler switch.
« Last Edit: July 12, 2019, 09:15:49 pm by Thaddy »
Specialize a type, not a var.

440bx

  • Hero Member
  • *****
  • Posts: 3921
Re: If you've ever wanted to be able to use multi-line strings in FPC...
« Reply #42 on: July 12, 2019, 09:26:47 pm »
You can't solve bad programming with a compiler switch.
That's true but, this feature isn't about solving a "bad programming" problem.

@BrunoK above gave a very nice example of how multiline strings make a SQL statement much cleaner and easier to read.

Depending on context, it could be a feature used very little or a whole lot (for instance when interacting with a SQL database.)
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

jamie

  • Hero Member
  • *****
  • Posts: 6077
Re: If you've ever wanted to be able to use multi-line strings in FPC...
« Reply #43 on: July 12, 2019, 10:44:59 pm »
I am beginning to think they want Fpc to be a database compiler.., this aint Cobol.


For once Thaddy actually said something that I can agree with.. I don't want it modifying my
strings..

 Multi lines is a great idea for the compiler to read but I want it to assemble exactly what it see's …

 If formatted text within the source code is a desired then maybe a utility tool to paste the results
in after the fact.

I was just looking at the prices for the latest Embarcadero Delphi products (professional). I guess I
could swing it but I find looking for forums like we have with Lazarus is not showing much fruit..

The only true wisdom is knowing you know nothing

440bx

  • Hero Member
  • *****
  • Posts: 3921
Re: If you've ever wanted to be able to use multi-line strings in FPC...
« Reply #44 on: July 12, 2019, 10:49:11 pm »
For once Thaddy actually said something that I can agree with..
When that happens, the probability of being wrong is infinitesimally close to being 1.
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

 

TinyPortal © 2005-2018