Recent

Author Topic: performance of concatenating 1 item to an array - repeatedly  (Read 4888 times)

JdeHaan

  • Full Member
  • ***
  • Posts: 171
performance of concatenating 1 item to an array - repeatedly
« on: April 25, 2020, 08:43:46 am »
Hi, I have the following code
Code: Pascal  [Select][+][-]
  1. {$mode delphi}{$H+}
  2.  
  3. type
  4.  
  5.   TToken = record
  6.     Typ: TTokenTyp;
  7.     ...
  8.   end;
  9.  
  10.   TTokens = array of TToken;
  11.  
  12. ...
  13.  
  14. // variable Tokens is initalized to nil.
  15.  
  16. procedure TLexer.ScanTokens;
  17. var
  18.   Token: TToken;
  19. begin
  20.   repeat
  21.     Token := ScanToken;
  22.     Tokens += [Token];
  23.   until Token.Typ = ttEOF;
  24. end;

Though I visually like this, I'm not sure if this is performance wise the best solution.
What happens behind the scenes of the array?


MacOs Catalina / FPC 3.3.1 / Laz 2.1.0

circular

  • Hero Member
  • *****
  • Posts: 4469
    • Personal webpage
Re: performance of concatenating 1 item to an array - repeatedly
« Reply #1 on: April 25, 2020, 09:18:31 am »
That will be slow because it reallocated each time you concatenate.

Instead, I suggest you use a global variable with the current count of tokens and increment it. And resize the array exponentially.

Code: Delphi  [Select][+][-]
  1.     {$mode delphi}{$H+}
  2.      
  3.     type
  4.      
  5.       TToken = record
  6.         Typ: TTokenTyp;
  7.         ...
  8.       end;
  9.      
  10.       TTokens = array of TToken;
  11.      
  12.     ...
  13.  
  14.     var
  15.       Tokens: TTokens = nil;
  16.       TokenCount: integer = 0;
  17.  
  18.     procedure AddToken(AToken: TToken);
  19.     begin
  20.       if TokenCount >= length(Tokens) then
  21.         SetLength(Tokens, TokenCount*2 + 4);
  22.       Tokens[TokenCount] := AToken;
  23.       Inc(TokenCount);
  24.     end;
  25.      
  26.     procedure TLexer.ScanTokens;
  27.     var
  28.       Token: TToken;
  29.     begin
  30.       repeat
  31.         Token := ScanToken;
  32.         AddToken(Token);
  33.       until Token.Typ = ttEOF;
  34.     end;
Conscience is the debugger of the mind

MarkMLl

  • Hero Member
  • *****
  • Posts: 8551
Re: performance of concatenating 1 item to an array - repeatedly
« Reply #2 on: April 25, 2020, 09:27:50 am »
That will be slow because it reallocated each time you concatenate.

Instead, I suggest you use a global variable with the current count of tokens and increment it. And resize the array exponentially.

Seconded, although "exponentially" might not be quite the best approach" :-)

Also freeing an element doesn't leave memory allocated: a delete-then-concatenate is still slow.

I sometimes overload + as a smart concatenation operator, but IMO FPC badly needs a separate concatenation operator for cases where e.g. numeric vectors can be added together.

MarkMLl


MarkMLl.
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

JdeHaan

  • Full Member
  • ***
  • Posts: 171
Re: performance of concatenating 1 item to an array - repeatedly
« Reply #3 on: April 25, 2020, 09:39:07 am »
Thanks
@circular, I will use your suggestion

Thaddy

  • Hero Member
  • *****
  • Posts: 18792
  • Glad to be alive.
Re: performance of concatenating 1 item to an array - repeatedly
« Reply #4 on: April 25, 2020, 09:41:00 am »
I would use a list<> from generics.collections or a fpglist from fgl for example:
Code: Pascal  [Select][+][-]
  1. {$mode objfpc}
  2. uses
  3.   generics.collections;
  4. type
  5.   TToken = record
  6.   end;
  7.    
  8.   TTokenList = specialize TList<TToken>;  
  9. var
  10.   s:TTokenList;
  11. begin
  12.   s:= TTokenlist.Create;
  13.   try
  14.   // do some work
  15.   finally
  16.     s.Free;
  17.   end;
  18. end.
This does the caching for you and you can set the capacity to what you expect to need.
Recovered from removal of tumor in tongue following tongue reconstruction with a part from my leg.

munair

  • Hero Member
  • *****
  • Posts: 887
  • compiler developer @SharpBASIC
    • SharpBASIC
Re: performance of concatenating 1 item to an array - repeatedly
« Reply #5 on: April 25, 2020, 09:44:59 am »
I sometimes overload + as a smart concatenation operator, but IMO FPC badly needs a separate concatenation operator for cases where e.g. numeric vectors can be added together.

MarkMLl

Could you provide a simple example how to concatenate numeric vectors with a (proposed) operator?
It's only logical.

MarkMLl

  • Hero Member
  • *****
  • Posts: 8551
Re: performance of concatenating 1 item to an array - repeatedly
« Reply #6 on: April 25, 2020, 09:55:32 am »
Could you provide a simple example how to concatenate numeric vectors with a (proposed) operator?

What I've done in scripting stuff that I use is allow _ for concatenation leaving + for addition. In the case of something like FPC they'd both start off as array concatenation for the sake of backward compatibility, but then + could be overloaded as vector/array addition, hence something like

Code: [Select]
var
  vectorA, vectorB, vectorC, vectorD, vectorE: numericVector;

// Overloads of _ and + for type numericVector here.

begin
  vectorA := someInitialiser();
  vectorB := someInitialiser();
  vectorC := vectorA _ vectorB; // Concatenation
  vectorD := vectorB _ vectorA; // Concatenation
  vectorE := vectorC + vectorD // Addition
end;

MarkMLl

 
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

munair

  • Hero Member
  • *****
  • Posts: 887
  • compiler developer @SharpBASIC
    • SharpBASIC
Re: performance of concatenating 1 item to an array - repeatedly
« Reply #7 on: April 25, 2020, 10:32:38 am »
Could you provide a simple example how to concatenate numeric vectors with a (proposed) operator?

What I've done in scripting stuff that I use is allow _ for concatenation leaving + for addition. In the case of something like FPC they'd both start off as array concatenation for the sake of backward compatibility, but then + could be overloaded as vector/array addition, hence something like

Code: [Select]
var
  vectorA, vectorB, vectorC, vectorD, vectorE: numericVector;

// Overloads of _ and + for type numericVector here.

begin
  vectorA := someInitialiser();
  vectorB := someInitialiser();
  vectorC := vectorA _ vectorB; // Concatenation
  vectorD := vectorB _ vectorA; // Concatenation
  vectorE := vectorC + vectorD // Addition
end;

MarkMLl

 

Arguably, ampersand (&) might be more intuitive, but perhaps that would add complications/confusion in Pascal.
It's only logical.

julkas

  • Guest
Re: performance of concatenating 1 item to an array - repeatedly
« Reply #8 on: April 25, 2020, 10:36:08 am »
You can use gvector from fcl-stl also.

MarkMLl

  • Hero Member
  • *****
  • Posts: 8551
Re: performance of concatenating 1 item to an array - repeatedly
« Reply #9 on: April 25, 2020, 10:52:13 am »
Arguably, ampersand (&) might be more intuitive, but perhaps that would add complications/confusion in Pascal.

I think that & is a bit close to the "and" operation. Without whitespace or other separator _ is already commonly used to indicate separate parts of an identifier, so using it for concatenation continues the tradition.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

PascalDragon

  • Hero Member
  • *****
  • Posts: 6356
  • Compiler Developer
Re: performance of concatenating 1 item to an array - repeatedly
« Reply #10 on: April 25, 2020, 03:24:19 pm »
Arguably, ampersand (&) might be more intuitive, but perhaps that would add complications/confusion in Pascal.

The ampersand can not be used, because &Something is already valid code (and this would need to work, because e.g. +Something or -Something is valid code as well (note the missing space between operator and identifier)).

MarkMLl

  • Hero Member
  • *****
  • Posts: 8551
Re: performance of concatenating 1 item to an array - repeatedly
« Reply #11 on: April 25, 2020, 04:08:52 pm »
Arguably, ampersand (&) might be more intuitive, but perhaps that would add complications/confusion in Pascal.

The ampersand can not be used, because &Something is already valid code (and this would need to work, because e.g. +Something or -Something is valid code as well (note the missing space between operator and identifier)).

Thanks for the confirmation. I can't say I much like having made a suggestion that relies on whitespace etc. but it still seems like the best possibility.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

circular

  • Hero Member
  • *****
  • Posts: 4469
    • Personal webpage
Re: performance of concatenating 1 item to an array - repeatedly
« Reply #12 on: April 25, 2020, 04:36:35 pm »
Hmm what about using pipe "|" ?

Either as concatenation operator:
Code: Pascal  [Select][+][-]
  1. var
  2.   vectorA, vectorB, vectorC, vectorD, vectorE: numericVector;
  3.  
  4. // Overloads of _ and + for type numericVector here.
  5.  
  6. begin
  7.   vectorA := someInitialiser();
  8.   vectorB := someInitialiser();
  9.   vectorC := vectorA | vectorB; // Concatenation
  10.   vectorD := vectorB | vectorA; // Concatenation
  11.   vectorE := vectorC + vectorD // Addition
  12. end;

Or maybe to indicate that it is to be applied to each component of the array?

Code: Pascal  [Select][+][-]
  1. var
  2.   vectorA, vectorB, vectorC, vectorD, vectorE: numericVector;
  3.  
  4. // Overloads of _ and + for type numericVector here.
  5.  
  6. begin
  7.   vectorA := someInitialiser();
  8.   vectorB := someInitialiser();
  9.   vectorC := vectorA + vectorB; // Concatenation
  10.   vectorD := vectorB + vectorA; // Concatenation
  11.   vectorE := vectorC |+ vectorD // Addition
  12. end;
« Last Edit: April 25, 2020, 04:43:07 pm by circular »
Conscience is the debugger of the mind

MarkMLl

  • Hero Member
  • *****
  • Posts: 8551
Re: performance of concatenating 1 item to an array - repeatedly
« Reply #13 on: April 25, 2020, 05:27:52 pm »
Hmm what about using pipe "|" ?

There's enough people on both sides of the C vs Pascal divide that criticise the other's design choices. | (not to mention | ) has well-accepted semantics in many languages' expressions relating to "or" relationships, and I really think that we could do with avoiding that.

And please don't suggest borrowing Perl's  .  or APL's  ,  concatenation operator.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

PascalDragon

  • Hero Member
  • *****
  • Posts: 6356
  • Compiler Developer
Re: performance of concatenating 1 item to an array - repeatedly
« Reply #14 on: April 25, 2020, 05:37:31 pm »
In MacPas mode the pipe character is already used for short circuit or (and the ampersand is used for short circuit and; please note that this does not contradict my previous post as in MacPas mode the identifier escape using the ampersand is disabled).

 

TinyPortal © 2005-2018