Bookstore

Recent

Author Topic: [Solved] How to save and load large text files?  (Read 21276 times)

Ramijami

  • Jr. Member
  • **
  • Posts: 87
[Solved] How to save and load large text files?
« on: March 22, 2012, 09:59:09 am »
Hi everybody,

I have a programme that generates about 4.3 million lines that contain 9 numbers per line (e.g. 1,2,3,4,5,6,7,8,9) that I want to save. When I try saving it with the code below I get an out of memory error:

Code: [Select]
procedure TForm1.Button8Click(Sender: TObject);
begin
  if savedialog.execute then
LinesBox.Lines.SaveToFile(SaveDialog.Filename);
end;

What do I need to add so that it saves the result to a normal text file. I am able to cut and paste it into notepad++ and save it that way, and I end up with a file that is 139MB, so it is a pretty large file.

The second problem I have is when I try to load this large file into a programme I get a Run error (203), and a search on the forum indicates that this is a heap overflow error, so obviously has something to do with memory allocation and the size of the file I am trying to load. Any suggestions on how to resolve these problems would be appreciated. I work under windows on a 2.66 Ghz Core 2 duo pc with 2Gig Ram.

Thanks
« Last Edit: April 10, 2012, 10:01:51 pm by Ramijami »

ludob

  • Hero Member
  • *****
  • Posts: 1173
Re: How to save and load large text files?
« Reply #1 on: March 22, 2012, 10:19:14 am »
TStrings.SaveToStream makes several copies of the data before writing it to file. I suggest you use the 'old fashioned' writeln to store the individual lines. Something along the lines of:

Code: [Select]
procedure TForm1.Button8Click(Sender: TObject);
var f:text;
 i:integer;
begin
  if savedialog.execute then
    begin
    assignfile(f,SaveDialog.Filename);
    rewrite(f);
    for i:=0 to LinesBox.Lines.Count-1 do
      writeln(f,LinesBox.Lines[i]);
    closefile(f);
    end;
end;

Similar for loading from file.

eny

  • Hero Member
  • *****
  • Posts: 1589
Re: How to save and load large text files?
« Reply #2 on: March 22, 2012, 06:30:53 pm »
Any suggestions on how to resolve these problems would be appreciated. I work under windows on a 2.66 Ghz Core 2 duo pc with 2Gig Ram.
Your program must have one huge memory footprint.
When I try to do the same (PC almost the same specs), I have no problem creating and saving a stringlist with 9 million rows that creates a 300MB file.
Although the extra memory consumption is like 5 times higher during the save.
And during the load its more than 10 times higher.
All posts based on: Win10 (Win64); Lazarus 1.8.0 'stable' (#56594 win64) unless specified otherwise...

User137

  • Hero Member
  • *****
  • Posts: 1791
    • Nxpascal home
Re: How to save and load large text files?
« Reply #3 on: March 22, 2012, 06:57:03 pm »
Is there a reason you need to handle numbers as strings? I would imagine a binary file with assignfile() or TFileStream would be handy for saving and loading massive chunks of data.

Remember that if you need to process the data afterwards with StrToInt(), it takes tons of additional processing power.

Ramijami

  • Jr. Member
  • **
  • Posts: 87
Re: How to save and load large text files?
« Reply #4 on: March 23, 2012, 08:46:43 am »
Thanks for the quick responses.....it is always gartifying when experienced programmers take the time to help with what must be mundane/ routine for them  :)

Thanks ludob for the suggestion. When I load the file I load it into an array (Ehistory) so that I can make comparisons (such as counting how many odd numbers vs even numbers etc) the code I am currently using is:

Code: [Select]
procedure TForm1.Button5Click(Sender: TObject);

var
  MyStrings: TStringList;
  i, j, k, x, y, z: Integer;
  MaxNumbersInSet: Integer;
//Ehistory: array of array[0..ECONST_LINE_ELEMENT_COUNT-1] of Byte;
  MyOpenDialog: TOpenDialog;

begin
MaxNumbersInSet:=StrToInt(NumbersPerSet.Text);

  MyStrings := TStringList.Create;
  MyOpenDialog := TOpenDialog.Create(nil);
  try
    // Read the text
    if not MyOpenDialog.Execute then Exit;
    MyStrings.LoadFromFile(MyOpenDialog.FileName);

    // Now parse the text
    MyStrings.Delimiter := ',';
    MyStrings.DelimitedText := MyStrings.Text;

    // Now copy it to the array
    SetLength(Ehistory, Mystrings.count, MaxNumbersInSet);
    EhistoryCount:=0;
    for i :=0 to MyStrings.Count-1 do
    begin
      j := i div MaxNumbersInSet;
      k := i mod MaxNumbersInSet;
      Ehistory[j][k] := StrToInt(MyStrings.Strings[i]);
      EhistoryCount:=EhistoryCount+1;
    end;

    EhistoryCounted:=(EhistoryCount div MaxNumbersInSet);

    LinesBox.lines.add(' ');
    LinesBox.lines.add(IntToStr(EhistoryCounted));
    LinesBox.lines.add('SETS Loaded');

  finally
    MyStrings.Free;
    MyOpenDialog.Free;
    //SetLength(history, 0);
  end;
end;

This is sufficient for my programme as I usually only load files with lines containing 6 numbers and about 1000 lines, but thought I'd see what would happen if I extended it to 9 numbers and a lot more lines.

eny, by "memory footprint" do you mean using large amounts of memory? I am still in the early stages of learning lazarus so have no clue as to how to programme to maximise memory usage. Any suggestions/links to articles would be appreciated.

User137, I'm a relative newby so don't no many different options or different ways of doing the same thing. I go on what the more experienced programmers suggest/recommend, so any new tips and ideas I can add to my toolbox is always welcome.

eny

  • Hero Member
  • *****
  • Posts: 1589
Re: How to save and load large text files?
« Reply #5 on: March 23, 2012, 11:19:02 pm »
eny, by "memory footprint" do you mean using large amounts of memory? I am still in the early stages of learning lazarus so have no clue as to how to programme to maximise memory usage.
You are doing a great job in maximizing the memory usage.
I assume you mean minimizing   :)
Since we have comparable hardware and I can create files twice your size you must do some more memory eating in your program.

The Stringlist in itself doesn't work that bad, except that apparently there is a problem with storing and especially loading huge amounts of  data.
All posts based on: Win10 (Win64); Lazarus 1.8.0 'stable' (#56594 win64) unless specified otherwise...

Ramijami

  • Jr. Member
  • **
  • Posts: 87
Re: How to save and load large text files?
« Reply #6 on: March 25, 2012, 09:38:25 pm »
Yes eny, meant to refer to optimising the memory.........will have to do a search and look at things that generally hog memory and see if I have any of those :)

Ramijami

  • Jr. Member
  • **
  • Posts: 87
Re: How to save and load large text files?
« Reply #7 on: March 28, 2012, 01:13:38 pm »
Decided to follow ludob's suggestion about loading the "old fashioned" way and found this code in my toolbox which I adapted to test a 10 row file:

Code: [Select]
procedure TForm1.Button1Click(Sender: TObject);
  var
 TS: TStringList;
 myFile: TextFile;
 test: string;
 RecordsinFile, i, j: integer;
 history: array [0..9, 0..8] of Integer;

begin
if opendialog1.execute then
 begin
  AssignFile(myFile , OpenDialog1.filename);  // Try to open .txt file
   Reset(myFile);  //Reopen file for reading
   RecordsinFile:=0;
   While not EOF (myFile) do
   TS:= nil;
   begin
    Readln(myFile, test);
   TS.CommaText := test;
    RecordsinFile:=RecordsinFile+1; //count records in file
   end;
    CloseFile(myFile);
 end;
//Load Array
Reset(myFile);
While not EOF (myFile) do
 begin
  for i:=0 to RecordsinFile do
   for j:=0 to 8 do
    begin
    Readln(history[i,j]);
    end;
CloseFile(myFile);
 end;
//TS.free;
end;   

There are some errors, but when I try to debug and put a watch, I get a debugger error saying the "debugger has entered the error state" and will close, so I can't fix the problem. When I don't put the watch the programme just freezes.

Also, are there any tutorials dealing with the TFilestream as eny and user137 suggested I use? A search of the forums have only shown how to load a file using TFilestream, but I haven't found anything that shows how to fill an array using a "," as delimiter so have been getting SIGSEGV errors trying that process?

Thanks

User137

  • Hero Member
  • *****
  • Posts: 1791
    • Nxpascal home
Re: How to save and load large text files?
« Reply #8 on: March 28, 2012, 01:23:58 pm »
When you use readln() it will jump to next line. Did you say there are many numbers per line to read? So you an use read() for first numbers and readln() for the last one, or it might just work if you use only read().

Some random use of TFilestream, to read every character in to char array individually:
Code: [Select]
var FS: TFileStream; c: array of char;
    i: integer;
begin
  FS:=TFileStream.Create('mydata.dat', fmOpenRead);
  setlength(c, FS.Size);
  for i:=0 to FS.Size-1 do
    FS.ReadBuffer(c[i], 1);
  FS.Free;
  // Now array c is filled with data... Analyzing it as string may be a bit complicated.
end;
« Last Edit: March 28, 2012, 01:30:02 pm by User137 »

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: How to save and load large text files?
« Reply #9 on: March 28, 2012, 01:35:01 pm »
In addition to what user137 said, the wiki also has articles on streams, even FileStream, if I remember correctly.
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

Peter_Vadasz

  • New Member
  • *
  • Posts: 35
Re: How to save and load large text files?
« Reply #10 on: March 28, 2012, 09:34:52 pm »
Why don't you use binary file?
For example:
Code: [Select]
program txthand;
{$mode objfpc}{$H+}
uses SysUtils, Classes, DateUtils;

const
  MaxNumbersInSet=9;

type
  tMySet=packed array [0..MaxNumbersInSet] of integer;
  tSS=(Start,Stop);

procedure timer(ss:tSS; t:TDateTime);
const
   StartTime:TDateTime=0;
var
  dt: integer;
begin
  case ss of
    start: begin
             StartTime:=t;
             writeln('Timer started...');
           end;
    stop: begin
            dt:=MillisecondsBetween(t,StartTime);
            writeln('Timer stopped ',dt/1000:7:4,' s');
          end;
  end;               
end;
 
procedure makedatafile();
var
  i,j : longint;
  TmpData: tMySet;
  fs: TFileStream;
begin
  randomize();
  timer(Start,time);
  fs:=TFileStream.Create('datas.bin',fmOpenWrite);
  for i:=1 to 90000000 do
  begin
    for j:=0 to MaxNumbersInSet do
    begin
      TmpData[j]:=random(maxint);
    end;
    fs.write(TmpData,sizeof(tMySet)); 
  end;
  writeln('File is ready');   
  writeln('The size of the genereated file is: ',((fs.Size/1024)/1024)/1024:5:3,' GiB');
  fs.Free;
  timer(Stop,time);
end;

procedure readdatafile();
var
  fs: TFileStream;
  history: tMySet;
  i,j,x: longint;
begin
  i:=0;
  j:=0;
  x:=0;
  fs:=TfileStream.Create('datas.bin',fmOpenRead);
  i:=fs.size div sizeof(tmyset);
  writeln('There are ',i,' sets in the file. Press enter to write data to screen...');
  readln();
  fs.Position:=0;
  for x:=0 to i do
  begin
    fs.ReadBuffer(history[0],sizeof(tmyset));
    for j:=0 to MaxNumbersInSet do
      if j<>MaxNumbersInSet then write(history[j],',')
      else writeln(history[j]);
  end;
  fs.free; 
end;

begin
  writeln('Generating datafile...');
  makedatafile();
  writeln('Open file to read...');
  readdatafile();
end.
OS: Ubuntu 12.04.2 32 bit
Lazarus: 1.0.8
FPC: 2.6.2

Ramijami

  • Jr. Member
  • **
  • Posts: 87
Re: How to save and load large text files?
« Reply #11 on: March 29, 2012, 12:14:59 pm »
Tx User137, I am using a file containing 9 numbers per line to test.

BigChimp thanks for the referal to the wiki, I did a search on the wiki for TfileStream and found something on file handling which makes reference to TfileStream, http://wiki.lazarus.freepascal.org/File_Handling_In_Pascal, but the other results didn't seem applicable.

Thanks for the reference code Peter_Vadasz. I'm trying to open a large text file that I have. Should I redo it and save it as a binary file to use it? I use history[i,j] to make comparisons to individual elements in the array (for example to check if history[1,1] is odd or even, and in your readdatafile() procedure it appears as if you are loading a single dimension array history[j]. Is this the way to load binary files and how do I then make comparisons to the individual elements that are seperated by the comma in the file?

User137

  • Hero Member
  • *****
  • Posts: 1791
    • Nxpascal home
Re: How to save and load large text files?
« Reply #12 on: March 29, 2012, 03:02:08 pm »
If you can just like that remake a big file on a whim, makes me curious what it's used for? Is there other options for the problem as a whole? Remember that there is never only 1 solution to problem, but a hundred.

Peter_Vadasz

  • New Member
  • *
  • Posts: 35
Re: How to save and load large text files?
« Reply #13 on: March 29, 2012, 03:43:42 pm »
Thanks for the reference code Peter_Vadasz. I'm trying to open a large text file that I have. Should I redo it and save it as a binary file to use it? I use history[i,j] to make comparisons to individual elements in the array (for example to check if history[1,1] is odd or even, and in your readdatafile() procedure it appears as if you are loading a single dimension array history[j]. Is this the way to load binary files and how do I then make comparisons to the individual elements that are seperated by the comma in the file?
If you never want to read your data to an editor (for example notepad on windows) I think you don't need a large text file, just a simple binary file. It contains your arrays. I was modify the example to show you how to make comparison to individual elements in the array.
Code: [Select]
program txthand;
{$mode objfpc}{$H+}
uses SysUtils, Classes, DateUtils;

const
  MaxNumbersInSet=9;

type
  tMySet=packed array [0..8] of integer;
  tSS=(Start,Stop);

procedure timer(ss:tSS; t:TDateTime);
const
   StartTime:TDateTime=0;
var
  dt: integer;
begin
  case ss of
    start: begin
             StartTime:=t;
             writeln('Timer started...');
           end;
    stop: begin
            dt:=MillisecondsBetween(t,StartTime);
            writeln('Timer stopped ',dt/1000:7:4,' s');
          end;
  end;
end;

procedure makedatafile();
{
  This procedure make a big file. We are generating 9000000 "set" wich contains 9 integer number.
}
var
  i,j : longint;
  TmpData: tMySet;
  fs: TFileStream;
begin
  randomize();
  timer(Start,time);
  fs:=TFileStream.Create('datas.bin',fmOpenWrite);
  fs.Position:=0;
  fs.Size:=0;
  for i:=1 to 90000000 do
  begin
    for j:=0 to MaxNumbersInSet-1 do
    begin
      TmpData[j]:=random(maxint);
    end;
    fs.write(TmpData,sizeof(tMySet));
  end;
  writeln('File is ready');
  writeln('The size of the generated file is: ',((fs.Size/1024)/1024)/1024:7:3,' GiB');
  fs.Free;
  timer(Stop,time);
end;

procedure readdatafile();
var
  fs: TFileStream;
  history: tMySet;
  i,j,x,y: longint;
begin
  i:=0;
  j:=0;
  x:=0;
  y:=0;
  randomize();
  fs:=TfileStream.Create('datas.bin',fmOpenRead);
  x:=fs.size div sizeof(tmyset); // get the count of sets
  writeln('There are ',x,' sets in the file. Press enter to write 20 random sets to the screen...');
  readln();
  fs.Position:=0;
  fs.ReadBuffer(history[0],sizeof(tmyset));
  for i:=1 to 20 do
  {
    We can use the file like a big array wich contains our sets (arrays).
    We are reading 20 random sets from the file. Each sets contains 9 numbers.
  }
  begin
    y:=random(x);
    fs.Position:=y*sizeof(TMySet);
    fs.ReadBuffer(history[0],sizeof(tmyset)); //read the selected set from a file
    write(i, ', : ');
    for j:=0 to MaxNumbersInSet-1 do
    begin
      if j=0 then write(y,'. set in the file: ');
      if odd(history[j]) then write('O|') //check the number in the set is odd or even and write O| or E| before the number
      else write('E|');
      if j<>MaxNumbersInSet then write(history[j],',')
      else writeln(history[j]);
    end;
    writeln();
    writeln();
  end;
  fs.free;
end;

begin
  writeln('Generating datafile...');
  makedatafile();
  writeln('Open file to read...');
  readdatafile();
end.
OS: Ubuntu 12.04.2 32 bit
Lazarus: 1.0.8
FPC: 2.6.2

Ramijami

  • Jr. Member
  • **
  • Posts: 87
Re: How to save and load large text files?
« Reply #14 on: March 29, 2012, 07:22:49 pm »
user137 I have a file that took me three days to generate (making various combinations and comparisons) and it's in preperation for writing a lottery type programme, so can't regenerate on a whim, but it is a static file so I only need to generate it once and then load it as needed. Like you idea about looking for multiple possible solutions  :D

Thanks Peter_Vadasz for showing the example for a multi-dimension array. As it is a static file I am hoping to use, I need to be able to save the file once and then load it as needed, but I don't need to make changes to the file so not necessary to open it in an editor.

Also, in my search on google I found this code for loading a two dimension array usisng filestream:
Code: [Select]

type
  TRowData<T> = array of T;

 procedure ReadMatrix<T>;
 var
   Matrix: array of TRowData<T>;
   NumberOfRows: Cardinal;
   NumberOfCols: Cardinal;
   CurRow: Integer;
 begin
   NumberOfRows := 20; // not known until run time
   NumberOfCols := 100; // not known until run time
   SetLength(Matrix, NumberOfRows, NumberOfCols);
   for CurRow := 0 to NumberOfRows do
     FileStream.ReadBuffer(Matrix[CurRow][0], NumberOfCols * SizeOf(T)) );
 end;

It was on a Delphi site, so will be doing a search on the Lazarus forums and some experimienting over the next few days to see if I can use it with Lazrus  :)