Recent

Author Topic: performance of concatenating 1 item to an array - repeatedly  (Read 4812 times)

JdeHaan

  • Full Member
  • ***
  • Posts: 171
performance of concatenating 1 item to an array - repeatedly
« on: April 25, 2020, 08:43:46 am »
Hi, I have the following code
Code: Pascal  [Select][+][-]
  1. {$mode delphi}{$H+}
  2.  
  3. type
  4.  
  5.   TToken = record
  6.     Typ: TTokenTyp;
  7.     ...
  8.   end;
  9.  
  10.   TTokens = array of TToken;
  11.  
  12. ...
  13.  
  14. // variable Tokens is initalized to nil.
  15.  
  16. procedure TLexer.ScanTokens;
  17. var
  18.   Token: TToken;
  19. begin
  20.   repeat
  21.     Token := ScanToken;
  22.     Tokens += [Token];
  23.   until Token.Typ = ttEOF;
  24. end;

Though I visually like this, I'm not sure if this is performance wise the best solution.
What happens behind the scenes of the array?


MacOs Catalina / FPC 3.3.1 / Laz 2.1.0

circular

  • Hero Member
  • *****
  • Posts: 4462
    • Personal webpage
Re: performance of concatenating 1 item to an array - repeatedly
« Reply #1 on: April 25, 2020, 09:18:31 am »
That will be slow because it reallocated each time you concatenate.

Instead, I suggest you use a global variable with the current count of tokens and increment it. And resize the array exponentially.

Code: Delphi  [Select][+][-]
  1.     {$mode delphi}{$H+}
  2.      
  3.     type
  4.      
  5.       TToken = record
  6.         Typ: TTokenTyp;
  7.         ...
  8.       end;
  9.      
  10.       TTokens = array of TToken;
  11.      
  12.     ...
  13.  
  14.     var
  15.       Tokens: TTokens = nil;
  16.       TokenCount: integer = 0;
  17.  
  18.     procedure AddToken(AToken: TToken);
  19.     begin
  20.       if TokenCount >= length(Tokens) then
  21.         SetLength(Tokens, TokenCount*2 + 4);
  22.       Tokens[TokenCount] := AToken;
  23.       Inc(TokenCount);
  24.     end;
  25.      
  26.     procedure TLexer.ScanTokens;
  27.     var
  28.       Token: TToken;
  29.     begin
  30.       repeat
  31.         Token := ScanToken;
  32.         AddToken(Token);
  33.       until Token.Typ = ttEOF;
  34.     end;
Conscience is the debugger of the mind

MarkMLl

  • Hero Member
  • *****
  • Posts: 8525
Re: performance of concatenating 1 item to an array - repeatedly
« Reply #2 on: April 25, 2020, 09:27:50 am »
That will be slow because it reallocated each time you concatenate.

Instead, I suggest you use a global variable with the current count of tokens and increment it. And resize the array exponentially.

Seconded, although "exponentially" might not be quite the best approach" :-)

Also freeing an element doesn't leave memory allocated: a delete-then-concatenate is still slow.

I sometimes overload + as a smart concatenation operator, but IMO FPC badly needs a separate concatenation operator for cases where e.g. numeric vectors can be added together.

MarkMLl


MarkMLl.
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

JdeHaan

  • Full Member
  • ***
  • Posts: 171
Re: performance of concatenating 1 item to an array - repeatedly
« Reply #3 on: April 25, 2020, 09:39:07 am »
Thanks
@circular, I will use your suggestion

Thaddy

  • Hero Member
  • *****
  • Posts: 18711
  • To Europe: simply sell USA bonds: dollar collapses
Re: performance of concatenating 1 item to an array - repeatedly
« Reply #4 on: April 25, 2020, 09:41:00 am »
I would use a list<> from generics.collections or a fpglist from fgl for example:
Code: Pascal  [Select][+][-]
  1. {$mode objfpc}
  2. uses
  3.   generics.collections;
  4. type
  5.   TToken = record
  6.   end;
  7.    
  8.   TTokenList = specialize TList<TToken>;  
  9. var
  10.   s:TTokenList;
  11. begin
  12.   s:= TTokenlist.Create;
  13.   try
  14.   // do some work
  15.   finally
  16.     s.Free;
  17.   end;
  18. end.
This does the caching for you and you can set the capacity to what you expect to need.
If Europe sells their USA bonds the USD will collapse. Europe can affort that given average state debts. The USA can't affort that. Just an advice...

munair

  • Hero Member
  • *****
  • Posts: 887
  • compiler developer @SharpBASIC
    • SharpBASIC
Re: performance of concatenating 1 item to an array - repeatedly
« Reply #5 on: April 25, 2020, 09:44:59 am »
I sometimes overload + as a smart concatenation operator, but IMO FPC badly needs a separate concatenation operator for cases where e.g. numeric vectors can be added together.

MarkMLl

Could you provide a simple example how to concatenate numeric vectors with a (proposed) operator?
It's only logical.

MarkMLl

  • Hero Member
  • *****
  • Posts: 8525
Re: performance of concatenating 1 item to an array - repeatedly
« Reply #6 on: April 25, 2020, 09:55:32 am »
Could you provide a simple example how to concatenate numeric vectors with a (proposed) operator?

What I've done in scripting stuff that I use is allow _ for concatenation leaving + for addition. In the case of something like FPC they'd both start off as array concatenation for the sake of backward compatibility, but then + could be overloaded as vector/array addition, hence something like

Code: [Select]
var
  vectorA, vectorB, vectorC, vectorD, vectorE: numericVector;

// Overloads of _ and + for type numericVector here.

begin
  vectorA := someInitialiser();
  vectorB := someInitialiser();
  vectorC := vectorA _ vectorB; // Concatenation
  vectorD := vectorB _ vectorA; // Concatenation
  vectorE := vectorC + vectorD // Addition
end;

MarkMLl

 
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

munair

  • Hero Member
  • *****
  • Posts: 887
  • compiler developer @SharpBASIC
    • SharpBASIC
Re: performance of concatenating 1 item to an array - repeatedly
« Reply #7 on: April 25, 2020, 10:32:38 am »
Could you provide a simple example how to concatenate numeric vectors with a (proposed) operator?

What I've done in scripting stuff that I use is allow _ for concatenation leaving + for addition. In the case of something like FPC they'd both start off as array concatenation for the sake of backward compatibility, but then + could be overloaded as vector/array addition, hence something like

Code: [Select]
var
  vectorA, vectorB, vectorC, vectorD, vectorE: numericVector;

// Overloads of _ and + for type numericVector here.

begin
  vectorA := someInitialiser();
  vectorB := someInitialiser();
  vectorC := vectorA _ vectorB; // Concatenation
  vectorD := vectorB _ vectorA; // Concatenation
  vectorE := vectorC + vectorD // Addition
end;

MarkMLl

 

Arguably, ampersand (&) might be more intuitive, but perhaps that would add complications/confusion in Pascal.
It's only logical.

julkas

  • Guest
Re: performance of concatenating 1 item to an array - repeatedly
« Reply #8 on: April 25, 2020, 10:36:08 am »
You can use gvector from fcl-stl also.

MarkMLl

  • Hero Member
  • *****
  • Posts: 8525
Re: performance of concatenating 1 item to an array - repeatedly
« Reply #9 on: April 25, 2020, 10:52:13 am »
Arguably, ampersand (&) might be more intuitive, but perhaps that would add complications/confusion in Pascal.

I think that & is a bit close to the "and" operation. Without whitespace or other separator _ is already commonly used to indicate separate parts of an identifier, so using it for concatenation continues the tradition.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

PascalDragon

  • Hero Member
  • *****
  • Posts: 6284
  • Compiler Developer
Re: performance of concatenating 1 item to an array - repeatedly
« Reply #10 on: April 25, 2020, 03:24:19 pm »
Arguably, ampersand (&) might be more intuitive, but perhaps that would add complications/confusion in Pascal.

The ampersand can not be used, because &Something is already valid code (and this would need to work, because e.g. +Something or -Something is valid code as well (note the missing space between operator and identifier)).

MarkMLl

  • Hero Member
  • *****
  • Posts: 8525
Re: performance of concatenating 1 item to an array - repeatedly
« Reply #11 on: April 25, 2020, 04:08:52 pm »
Arguably, ampersand (&) might be more intuitive, but perhaps that would add complications/confusion in Pascal.

The ampersand can not be used, because &Something is already valid code (and this would need to work, because e.g. +Something or -Something is valid code as well (note the missing space between operator and identifier)).

Thanks for the confirmation. I can't say I much like having made a suggestion that relies on whitespace etc. but it still seems like the best possibility.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

circular

  • Hero Member
  • *****
  • Posts: 4462
    • Personal webpage
Re: performance of concatenating 1 item to an array - repeatedly
« Reply #12 on: April 25, 2020, 04:36:35 pm »
Hmm what about using pipe "|" ?

Either as concatenation operator:
Code: Pascal  [Select][+][-]
  1. var
  2.   vectorA, vectorB, vectorC, vectorD, vectorE: numericVector;
  3.  
  4. // Overloads of _ and + for type numericVector here.
  5.  
  6. begin
  7.   vectorA := someInitialiser();
  8.   vectorB := someInitialiser();
  9.   vectorC := vectorA | vectorB; // Concatenation
  10.   vectorD := vectorB | vectorA; // Concatenation
  11.   vectorE := vectorC + vectorD // Addition
  12. end;

Or maybe to indicate that it is to be applied to each component of the array?

Code: Pascal  [Select][+][-]
  1. var
  2.   vectorA, vectorB, vectorC, vectorD, vectorE: numericVector;
  3.  
  4. // Overloads of _ and + for type numericVector here.
  5.  
  6. begin
  7.   vectorA := someInitialiser();
  8.   vectorB := someInitialiser();
  9.   vectorC := vectorA + vectorB; // Concatenation
  10.   vectorD := vectorB + vectorA; // Concatenation
  11.   vectorE := vectorC |+ vectorD // Addition
  12. end;
« Last Edit: April 25, 2020, 04:43:07 pm by circular »
Conscience is the debugger of the mind

MarkMLl

  • Hero Member
  • *****
  • Posts: 8525
Re: performance of concatenating 1 item to an array - repeatedly
« Reply #13 on: April 25, 2020, 05:27:52 pm »
Hmm what about using pipe "|" ?

There's enough people on both sides of the C vs Pascal divide that criticise the other's design choices. | (not to mention | ) has well-accepted semantics in many languages' expressions relating to "or" relationships, and I really think that we could do with avoiding that.

And please don't suggest borrowing Perl's  .  or APL's  ,  concatenation operator.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

PascalDragon

  • Hero Member
  • *****
  • Posts: 6284
  • Compiler Developer
Re: performance of concatenating 1 item to an array - repeatedly
« Reply #14 on: April 25, 2020, 05:37:31 pm »
In MacPas mode the pipe character is already used for short circuit or (and the ampersand is used for short circuit and; please note that this does not contradict my previous post as in MacPas mode the identifier escape using the ampersand is disabled).

 

TinyPortal © 2005-2018