Recent

Author Topic: Eliminating empty strings in TStringList when reading from text file.  (Read 21301 times)

Thaddy

  • Hero Member
  • *****
  • Posts: 16580
  • Kallstadt seems a good place to evict Trump to.
Re: Eliminating empty strings in TStringList when reading from text file.
« Reply #15 on: March 12, 2020, 11:52:10 am »
Consider this:
Code: Pascal  [Select][+][-]
  1. {$mode objfpc}{$H+}
  2. uses classes, sysutils;
  3. var
  4.   L:Tstringlist;
  5.   A:array of string;
  6.   S:string;
  7. begin
  8.   L:=TStringlist.create;
  9.   try
  10.     // some data
  11.     L.Add('');L.Add('dummy');L.Add('something');L.Add('');L.Add('help');
  12.     // here's a solution taking just three lines.
  13.     A:=L.Text.split(LineEnding, TStringSplitOptions.ExcludeEmpty);  
  14.     L.Clear;
  15.     L.AddStrings(A);
  16.     // that's all, now check if it works:
  17.     for S in L do writeln(s);
  18.   finally
  19.     L.free;
  20.   end;
  21. end.
Not actually on load.. but easy.
« Last Edit: March 12, 2020, 12:24:35 pm by Thaddy »
But I am sure they don't want the Trumps back...

Bart

  • Hero Member
  • *****
  • Posts: 5516
    • Bart en Mariska's Webstek
Re: Eliminating empty strings in TStringList when reading from text file.
« Reply #16 on: March 12, 2020, 12:13:37 pm »
Less lines of code is now the target?
Then I suggest to write a proc that does that, have it in a unit and next time you need it, it's a one-liner.

Bart

Thaddy

  • Hero Member
  • *****
  • Posts: 16580
  • Kallstadt seems a good place to evict Trump to.
Re: Eliminating empty strings in TStringList when reading from text file.
« Reply #17 on: March 12, 2020, 12:25:15 pm »
True. merely using some newer features.
But I am sure they don't want the Trumps back...

MaxCuriosus

  • Full Member
  • ***
  • Posts: 136
Re: Eliminating empty strings in TStringList when reading from text file.
« Reply #18 on: March 12, 2020, 06:19:54 pm »
Winni(1),

I am not a magician, that would be way above my pay grade!

I am a dreamer, as in dreaming of a method in addition to TStringList.Sort
and TStringList.Find. Of course those use loops but they are implicit not explicit, and therefore optimized for their purpose.

MaxCuriosus

  • Full Member
  • ***
  • Posts: 136
Re: Eliminating empty strings in TStringList when reading from text file.
« Reply #19 on: March 12, 2020, 06:23:17 pm »
eljo(1),
a) strictly speaking, among all the suggestions, yours is the only one that doesn't use an explicit loop. It does require however the knowledge of the LineEnding type. Is there a function that can get it directly from the text file?

b) By buffering do you mean at the compiler implementation level for the purpose of improving the performance? If so, why would it disrupt the logical flow of the loop?

MaxCuriosus

  • Full Member
  • ***
  • Posts: 136
Re: Eliminating empty strings in TStringList when reading from text file.
« Reply #20 on: March 12, 2020, 06:26:48 pm »
Now about performance.

My initial question was only about simplifying the life of a "lazy" Pascal programmer. With large text files of course perfomance is important. So here are my comments and questions.


Bart(2),
Winni(2),
the type of loop doesn't matter much if the code executed in one iteration of the loop takes a lot more time than the evaluation of the count.


jamie(2),
a) using an auxiliary StringList is easy but if the text file is large, so will the additional temporary memory.

b) Two loop pointers to memory characters(byte) doesn't seem to me to be easier or faster than handling two string indexes to strings in the list.
It also requires the knowledge of the "LineEnding" type. The nice thing with
"LoadFromFile" is that you don't have to bother.


jamie(3),
With "memo" an auxiliary StringList is also used, thus additional temporary memory.
And how can I learn about the behavior of the memory manager, is there a documention somewhere? And more importantly, is it possible to control it?


Winni(3),
your code uses an external loop, but is it faster than eljos's?


PascalDragon(1),
I concur.


MoCityMM(1),
by empty line I meant zero length string.
The Trim() function is not quite what I was looking for but interesting nonetheless.
 
 
Thaddy(1),
yes, easy but it also uses additional temporary memory. And also requires the knowledge of the LineEnding.


Bart(2),
Thaddy(2),
Please don't get me wrong. I'm not seeking to minimize my lines of code. I only want to emphasize the use of existing tools. I'm certain that our oustanding developers have thought thoroughly to optimize the libraries. For example the StringList.Sort method has been designed with optimization in mind. It doesn't matter if I need to write one or three lines of code to use it. The important is not to reinvent the wheel.

eljo

  • Sr. Member
  • ****
  • Posts: 468
Re: Eliminating empty strings in TStringList when reading from text file.
« Reply #21 on: March 12, 2020, 06:50:31 pm »
eljo(1),
a) strictly speaking, among all the suggestions, yours is the only one that doesn't use an explicit loop. It does require however the knowledge of the LineEnding type. Is there a function that can get it directly from the text file?
Not to my knowledge no. it is easy to write if you allow for some error margin. usually goes like this
load a big chuck of the file in a buffer. walk down the buffer byte by byte if you find a 10 check the previous byte is a 13 if it is then the line ending is a #13#10 otherwise its #10, if the next line ending in the file is the same then assume you are correct and return. 
But that does not take in to account every possibility eg if the file is an ascii or utf8 you'll probably guess correctly if it utf16 you might need to take in to account the endianes as well etc.
Keep in mind that there are only two line endings only
1) #10
2)#13#10
there is no other options.

b) By buffering do you mean at the compiler implementation level for the purpose of improving the performance? If so, why would it disrupt the logical flow of the loop?
buffering is the wrong term although it might have the same effect. Bart put it properly, single evaluation at the start of the loop is what I really mend.
As to why it will disrupt the flow of the loop is obvious that it does not in both case, but as you delete lines from the list the actual count changes its getting smaller and as you progress up the ladder at some point you will end up trying to access indices that are no longer valid. By going down to 0 instead you avoid the problem altogether.
« Last Edit: March 12, 2020, 06:52:21 pm by eljo »

wp

  • Hero Member
  • *****
  • Posts: 12622
Re: Eliminating empty strings in TStringList when reading from text file.
« Reply #22 on: March 12, 2020, 07:09:44 pm »
you can use the stringreplace function to replace a pair of lineend chars with a single lineend character eg
Code: Pascal  [Select][+][-]
  1.   sl.Text := StringReplace(sl.Text,LineEnding+LineEnding,LineEnding,[rfReplaceAll]);
  2.  
This does not work when there is an odd number of adjacent LineEndings in the file (or: an even number of Add('') calls):
Code: Pascal  [Select][+][-]
  1. program Project1;
  2.  
  3. uses
  4.   Classes, SysUtils;
  5.  
  6. var
  7.   SL: TStrings;
  8.   i: Integer;
  9. begin
  10.   SL := TStringList.Create;
  11.   try
  12.     SL.Add('1');
  13.     SL.Add('2');
  14.     SL.Add('');
  15.     SL.Add('3');
  16.     SL.Add('');
  17.     SL.Add('');
  18.     SL.Add('4');
  19.  
  20.     SL.Text := StringReplace(SL.Text, LineEnding + LineEnding, LineEnding, [rfReplaceAll]);
  21.  
  22.     for i:=0 to SL.Count-1 do
  23.       WriteLn(SL[i]);
  24.  
  25.     WriteLn('Done.');
  26.     ReadLn;
  27.   finally
  28.     SL.Free;
  29.   end;
  30.  
  31. end.

lucamar

  • Hero Member
  • *****
  • Posts: 4219
Re: Eliminating empty strings in TStringList when reading from text file.
« Reply #23 on: March 12, 2020, 07:10:32 pm »
Keep in mind that there are only two line endings only
1) #10
2)#13#10
there is no other options.

Not realy true, is it? There is at least one more: a single #13, as used in Classic MacOS. And more, if one has to include unicode files; Unicode does have special "Line End" code-points distinct from "ASCII Carriage Return" (#13) and "ASCII Line Feed" (AKA "New Line", #10).

Note also that you might find some files using #10#13 instead of #13#10 (they are basically equivalent), so to be (mildly) thorough one should search at least for (in this order):

1) #13#10;
2) #10#13;
3) #10;
4) #13.

and, for unicode files, the unicode "new line" code-points.
« Last Edit: March 12, 2020, 07:12:04 pm by lucamar »
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus/FPC 2.0.8/3.0.4 & 2.0.12/3.2.0 - 32/64 bits on:
(K|L|X)Ubuntu 12..18, Windows XP, 7, 10 and various DOSes.

eljo

  • Sr. Member
  • ****
  • Posts: 468
Re: Eliminating empty strings in TStringList when reading from text file.
« Reply #24 on: March 12, 2020, 07:23:46 pm »
Keep in mind that there are only two line endings only
1) #10
2)#13#10
there is no other options.

Not realy true, is it? There is at least one more: a single #13, as used in Classic MacOS. And more, if one has to include unicode files; Unicode does have special "Line End" code-points distinct from "ASCII Carriage Return" (#13) and "ASCII Line Feed" (AKA "New Line", #10).

Note also that you might find some files using #10#13 instead of #13#10 (they are basically equivalent), so to be (mildly) thorough one should search at least for (in this order):

1) #13#10;
2) #10#13;
3) #10;
4) #13.

and, for unicode files, the unicode "new line" code-points.
I haven't seen them in any of my files. Not even the classic mac one and I've handled data from a variety of sources too ee text conversion between pc and mac, importing csv from mainframes, banking systems, cobol exports and other last century sources. Sorry but the possibility might be there but it would be a waste of resources to code for something that will probably never occur in my life time.

avk

  • Hero Member
  • *****
  • Posts: 771
Re: Eliminating empty strings in TStringList when reading from text file.
« Reply #25 on: March 12, 2020, 07:31:22 pm »
I have one more solution, but it requires a third-party library.
Code: Pascal  [Select][+][-]
  1. procedure RemoveEmptyLines(aList: TStringList);
  2.   function NonEmpty(constref s : string): Boolean; begin Result := s <> '' end;
  3.   function Append(constref a, r: string): string; begin Result := r + a + LineEnding end;
  4. begin
  5.   aList.Text := aList.GetEnumerable.Select(@NonEmpty).Fold(@Append, '');
  6. end;
  7.  
As for performance - need to check.

winni

  • Hero Member
  • *****
  • Posts: 3197
Re: Eliminating empty strings in TStringList when reading from text file.
« Reply #26 on: March 12, 2020, 08:08:02 pm »
Hi!

To speed up my solution from #9 I use posEx from StrUtils.


Code: Pascal  [Select][+][-]
  1. uses ......., StrUtils;
  2.  
  3.     procedure DeleteEmptyLines(var sl : TStringList);
  4.     var s : String;
  5.          p : integer=1;
  6.      
  7.     begin
  8.     s := sl.text;
  9.     repeat
  10.     p := posEX (LineEnding+LineEnding,s,p);
  11.     if p > 0 then delete (s,p,length(lineEnding));
  12.     until p= 0;
  13.     sl.text := s;
  14.     end;
  15.  
  16.  

Now only the part of the string starting from the last deletion is scanned.

Winni

jamie

  • Hero Member
  • *****
  • Posts: 6802
Re: Eliminating empty strings in TStringList when reading from text file.
« Reply #27 on: March 12, 2020, 09:40:59 pm »
Hi!

To speed up my solution from #9 I use posEx from StrUtils.


Code: Pascal  [Select][+][-]
  1. uses ......., StrUtils;
  2.  
  3.     procedure DeleteEmptyLines(var sl : TStringList);
  4.     var s : String;
  5.          p : integer=1;
  6.      
  7.     begin
  8.     s := sl.text;
  9.     repeat
  10.     p := posEX (LineEnding+LineEnding,s,p);
  11.     if p > 0 then delete (s,p,length(lineEnding));
  12.     until p= 0;
  13.     sl.text := s;
  14.     end;
  15.  
  16.  

Now only the part of the string starting from the last deletion is scanned.

Winni
I must be losing it or old age is catching up with me, the request was to remove empty lines, all I see there is it removing all Line endings with no regard for valid line info..

 Can you please explain to me how that removes empty lines in the middle of the file and not touch valid lines where they still need separators ?
The only true wisdom is knowing you know nothing

winni

  • Hero Member
  • *****
  • Posts: 3197
Re: Eliminating empty strings in TStringList when reading from text file.
« Reply #28 on: March 12, 2020, 09:54:40 pm »
Hi!

Jamie - did not get you! ???

A line ends with a LineEnding
Two LineEndings with nothing inbetween means that there is an empty line.
Since CP/M days for me. But must be older.

If I delete ONE LineEnding if there are two, then I delete an empty line.

Or have I missunderstood your question?

Winni

lucamar

  • Hero Member
  • *****
  • Posts: 4219
Re: Eliminating empty strings in TStringList when reading from text file.
« Reply #29 on: March 12, 2020, 10:36:49 pm »
I haven't seen them in any of my files. Not even the classic mac one and I've handled data from a variety of sources too ee text conversion between pc and mac, importing csv from mainframes, banking systems, cobol exports and other last century sources. Sorry but the possibility might be there but it would be a waste of resources to code for something that will probably never occur in my life time.

You're, of course, free to do whatever you want and ignore whatever you want. :)

All I'm saying is that things are not (or might not be) as easy as "just look for #10 or #13#10". And yes, I've seen (and treated) several thousands of text files from quite diverse procedences and some (a lot, in fact) of them did indeed have those "rare" (and even stranger) line ends and I have had to deal with converting them.
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus/FPC 2.0.8/3.0.4 & 2.0.12/3.2.0 - 32/64 bits on:
(K|L|X)Ubuntu 12..18, Windows XP, 7, 10 and various DOSes.

 

TinyPortal © 2005-2018