* * *

Author Topic: [Solved:] Bug in concat function for AnsiStrings?  (Read 806 times)

jwdietrich

  • Hero Member
  • *****
  • Posts: 971
    • formatio reticularis
[Solved:] Bug in concat function for AnsiStrings?
« on: April 24, 2018, 08:41:48 am »
FPCs behaviour with respect of concatenating long strings is somewhat unexpected and seems to depend on the implementation. I assume that this is a bug, but before I submit a bug report I would like to ask if there could be a form of intention behind this response, which is a bit unexpected for me.

If we define 20 string constants with a length of 20 characters each with

Code: Pascal  [Select]
  1. const
  2.   part1: string[20] = 'Part1_7890123456789_';
  3.   part2: string[20] = 'Part2_7890123456789_';
  4.   part3: string[20] = 'Part3_7890123456789_';
  5.   part4: string[20] = 'Part4_7890123456789_';
  6.   part5: string[20] = 'Part5_7890123456789_';
  7.   part6: string[20] = 'Part6_7890123456789_';
  8.   part7: string[20] = 'Part7_7890123456789_';
  9.   part8: string[20] = 'Part8_7890123456789_';
  10.   part9: string[20] = 'Part9_7890123456789_';
  11.   part10: string[20] = 'Part10_890123456789_';
  12.   part11: string[20] = 'Part11_890123456789_';
  13.   part12: string[20] = 'Part12_890123456789_';
  14.   part13: string[20] = 'Part13_890123456789_';
  15.   part14: string[20] = 'Part14_890123456789_';
  16.   part15: string[20] = 'Part15_890123456789_';
  17.   part16: string[20] = 'Part16_890123456789_';
  18.   part17: string[20] = 'Part17_890123456789_';
  19.   part18: string[20] = 'Part18_890123456789_';
  20.   part19: string[20] = 'Part19_890123456789_';
  21.   part20: string[20] = 'Part20_890123456789_';
  22.  

and the following three functions to combine the 20 strings to one

Code: Pascal  [Select]
  1. function CombinedByMethodA: AnsiString;
  2. var
  3.   tempString: AnsiString;
  4. begin
  5.   tempString := part1 + part2 + part3 + part4 + part5 + part6 + part7 + part8 +
  6.                 part9 + part10 + part11 + part12 + part13 + part14 + part15 +
  7.                 part16 + part17 + part18 + part19 + part20;
  8.   result := tempString;
  9. end;
  10.  
  11. function CombinedByMethodB: AnsiString;
  12. var
  13.   tempString1, tempString2, tempString3: AnsiString;
  14. begin
  15.   tempString1 := part1 + part2 + part3 + part4 + part5 + part6 + part7 + part8 +
  16.                 part9 + part10;
  17.   tempString2 := part11 + part12 + part13 + part14 + part15 +
  18.                   part16 + part17 + part18 + part19 + part20;
  19.   tempString3 := tempString1 + tempString2;
  20.   result := tempString3;
  21. end;
  22.  
  23. function CombinedByMethodC: AnsiString;
  24. var
  25.   tempString: AnsiString;
  26. begin
  27.   tempString := concat(part1, part2, part3, part4, part5, part6, part7, part8,
  28.                 part9, part10, part11, part12, part13, part14, part15, part16,
  29.                 part17, part18, part19, part20);
  30.   result := tempString;
  31. end;
  32.  

then only method B seems to deliver an expected result, i.e. a string of length 400 with the content
Quote
Part1_7890123456789_Part2_7890123456789_Part3_7890123456789_Part4_7890123456789_Part5_7890123456789_Part6_7890123456789_Part7_7890123456789_Part8_7890123456789_Part9_7890123456789_Part10_890123456789_Part11_890123456789_Part12_890123456789_Part13_890123456789_Part14_890123456789_Part15_890123456789_Part16_890123456789_Part17_890123456789_Part18_890123456789_Part19_890123456789_Part20_890123456789_
.

Methods A and B deliver shorter strings, which end in the middle of part 13:
Quote
Part1_7890123456789_Part2_7890123456789_Part3_7890123456789_Part4_7890123456789_Part5_7890123456789_Part6_7890123456789_Part7_7890123456789_Part8_7890123456789_Part9_7890123456789_Part10_890123456789_Part11_890123456789_Part12_890123456789_Part13_89012345

I assume that this is a bug, which occurs if a large numbers of substrings is combined. Or is there any reason that motivates this behaviour?

A very simple Lazarus program demonstrating this effect is attached.
« Last Edit: April 24, 2018, 09:29:04 am by jwdietrich »
function GetRandomNumber: integer; // xkcd.com
begin
  GetRandomNumber := 4; // chosen by fair dice roll. Guaranteed to be random.
end;

http://www.formatio-reticularis.de

Lazarus 1.8.4 | FPC 3.0.4 | PPC, Intel, ARM | macOS, Windows, Linux

Thaddy

  • Hero Member
  • *****
  • Posts: 6542
Re: Bug in concat function for AnsiStrings?
« Reply #1 on: April 24, 2018, 08:58:11 am »
string[20] is shortstring, not longstring. It will overflow on concats beyond a length of 255 of course.
If you declare the string constants simply as string or AnsiString, the code works as expected.
So I guess this is not a bug, but expected?
Meaning the string conversion to AnsiString is ultimately done at concat end and internally the parts are still shortstring because if the length specifier.
Also be  careful in Lazarus: string equals UTF8 string in Lazarys, not Ansi. Always better to specify AnsiString if you expect AnsiString.
« Last Edit: April 24, 2018, 09:04:23 am by Thaddy »
Ada's daddy wrote this:"Fools are my theme, let satire be my song."

jwdietrich

  • Hero Member
  • *****
  • Posts: 971
    • formatio reticularis
Re: Bug in concat function for AnsiStrings?
« Reply #2 on: April 24, 2018, 09:04:31 am »
string[20] is shortstring, not longstring. It will overflow on concats beyond a length of 255 of course.
If you declare the string constants simply as string or AnsiString, the code works as expected.
So I guess this is not a bug, but expected?

Does this mean that the short strings are first concatenated to a combined short string and the resulting short string is then, in a second step only, converted to an AnsiString?
function GetRandomNumber: integer; // xkcd.com
begin
  GetRandomNumber := 4; // chosen by fair dice roll. Guaranteed to be random.
end;

http://www.formatio-reticularis.de

Lazarus 1.8.4 | FPC 3.0.4 | PPC, Intel, ARM | macOS, Windows, Linux

Thaddy

  • Hero Member
  • *****
  • Posts: 6542
Re: Bug in concat function for AnsiStrings?
« Reply #3 on: April 24, 2018, 09:10:58 am »
string[20] is shortstring, not longstring. It will overflow on concats beyond a length of 255 of course.
If you declare the string constants simply as string or AnsiString, the code works as expected.
So I guess this is not a bug, but expected?

Does this mean that the short strings are first concatenated to a combined short string and the resulting short string is then, in a second step only, converted to an AnsiString?
Yes, because concatting shortstrings is also legal. You were assuming something that the compiler can't forsee and ran into the 255 char limitation for shortstrings..
There are no AnsiStrings declarable with fixed lengths. See:
Code: Pascal  [Select]
  1. program testAnsiStrings;
  2. {$ifdef fpc}{$mode delphi}{$H+}{$endif}
  3. const
  4.   part1: AnsiString = 'Part1_7890123456789_';
  5.   part2: AnsiString = 'Part2_7890123456789_';
  6.   part3: AnsiString = 'Part3_7890123456789_';
  7.   part4: AnsiString = 'Part4_7890123456789_';
  8.   part5: AnsiString = 'Part5_7890123456789_';
  9.   part6: AnsiString = 'Part6_7890123456789_';
  10.   part7: AnsiString = 'Part7_7890123456789_';
  11.   part8: AnsiString = 'Part8_7890123456789_';
  12.   part9: AnsiString = 'Part9_7890123456789_';
  13.   part10: AnsiString = 'Part10_890123456789_';
  14.   part11: AnsiString = 'Part11_890123456789_';
  15.   part12: AnsiString = 'Part12_890123456789_';
  16.   part13: AnsiString = 'Part13_890123456789_';
  17.   part14: AnsiString = 'Part14_890123456789_';
  18.   part15: AnsiString = 'Part15_890123456789_';
  19.   part16: AnsiString = 'Part16_890123456789_';
  20.   part17: AnsiString = 'Part17_890123456789_';
  21.   part18: AnsiString = 'Part18_890123456789_';
  22.   part19: AnsiString = 'Part19_890123456789_';
  23.   part20: AnsiString = 'Part20_890123456789_';
  24.  
  25. function CombinedByMethodA: AnsiString;
  26. var
  27.   tempString: AnsiString;
  28. begin
  29.   tempString := part1 + part2 + part3 + part4 + part5 + part6 + part7 + part8 +
  30.                 part9 + part10 + part11 + part12 + part13 + part14 + part15 +
  31.                 part16 + part17 + part18 + part19 + part20;
  32.   result := tempString;
  33. end;
  34.  
  35. function CombinedByMethodB: AnsiString;
  36. var
  37.   tempString1, tempString2, tempString3: AnsiString;
  38. begin
  39.   tempString1 := part1 + part2 + part3 + part4 + part5 + part6 + part7 + part8 +
  40.                 part9 + part10;
  41.   tempString2 := part11 + part12 + part13 + part14 + part15 +
  42.                   part16 + part17 + part18 + part19 + part20;
  43.   tempString3 := tempString1 + tempString2;
  44.   result := tempString3;
  45. end;
  46.  
  47. function CombinedByMethodC: AnsiString;
  48. var
  49.   tempString: AnsiString;
  50. begin
  51.   tempString := concat(part1, part2, part3, part4, part5, part6, part7, part8,
  52.                 part9, part10, part11, part12, part13, part14, part15, part16,
  53.                 part17, part18, part19, part20);
  54.   result := tempString;
  55. end;
  56.  
  57. begin
  58.   writeln(CombinedByMethodA);
  59.   writeln(CombinedByMethodB);
  60.   writeln(CombinedByMethodC);
  61. end.
Which gives the correct output for all three.
You need a type cast to AnsiString *before* you attempt the concat operation.
This is mostly if not all documented.
« Last Edit: April 24, 2018, 09:13:26 am by Thaddy »
Ada's daddy wrote this:"Fools are my theme, let satire be my song."

Thaddy

  • Hero Member
  • *****
  • Posts: 6542
Re: Bug in concat function for AnsiStrings?
« Reply #4 on: April 24, 2018, 09:21:18 am »
Btw, given String[20] you can also do:
Code: Pascal  [Select]
  1. function CombinedByMethodD: AnsiString;
  2. var
  3.   tempString: AnsiString;
  4. begin
  5.   tempString := (part1 + part2 + part3 + part4 + part5 + part6 + part7 + part8 +
  6.                 part9 + part10 + part11 + part12) + (part13 + part14 + part15 +
  7.                 part16 + part17 + part18 + part19 + part20);
  8.   result := tempString;
  9. end;
The brackets resolve a maxlength of 240 and then convert....two AnsiSTrings w/o typecasts
But that is implied logic and not documented directly.
Ada's daddy wrote this:"Fools are my theme, let satire be my song."

jwdietrich

  • Hero Member
  • *****
  • Posts: 971
    • formatio reticularis
[Solved] Bug in concat function for AnsiStrings?
« Reply #5 on: April 24, 2018, 09:28:37 am »
Btw, given String[20] you can also do:
Code: Pascal  [Select]
  1. function CombinedByMethodD: AnsiString;
  2. var
  3.   tempString: AnsiString;
  4. begin
  5.   tempString := (part1 + part2 + part3 + part4 + part5 + part6 + part7 + part8 +
  6.                 part9 + part10 + part11 + part12) + (part13 + part14 + part15 +
  7.                 part16 + part17 + part18 + part19 + part20);
  8.   result := tempString;
  9. end;
The brackets resolve a maxlength of 240 and then convert....two AnsiSTrings w/o typecasts
But that is implied logic and not documented directly.

Thanks, this is interesting.
function GetRandomNumber: integer; // xkcd.com
begin
  GetRandomNumber := 4; // chosen by fair dice roll. Guaranteed to be random.
end;

http://www.formatio-reticularis.de

Lazarus 1.8.4 | FPC 3.0.4 | PPC, Intel, ARM | macOS, Windows, Linux

 

Recent

Get Lazarus at SourceForge.net. Fast, secure and Free Open Source software downloads Open Hub project report for Lazarus