Recent

Author Topic: memory consumption  (Read 7875 times)

Hansvb

  • Hero Member
  • *****
  • Posts: 715
memory consumption
« on: July 08, 2016, 10:11:25 pm »
With FPspread sheet i write datasets to different worksheets. I notice that when there a lot of rows and columns to export, the memory consumption is verry high.

I can not use
Code: Pascal  [Select][+][-]
  1. workbook.OnWriteCellData := @WriteCellDataHandler;
because i write to different worksheets.

Is there a way to lower the memory consumption? Or am i doing something wrong?
When i use Ole Excel automation, there is almost no memory use at all. But then the export is verry verry slow.

Code: Pascal  [Select][+][-]
  1. procedure TExport_To_File.SaveAs_FPExcelFile_Complete_lvwoz_scan(
  2.   Adbgrid: TDBGrid; ASheetName: widestring);
  3. var
  4.   sheet: TsWorksheet;
  5.   szLeesuit : String;
  6.   f, R: Integer;
  7. begin
  8.   screen.Cursor:=crHourglass;
  9.  
  10.   if (Active_DBGrid.DataSource = nil) or
  11.      (Active_DBGrid.DataSource.DataSet.IsEmpty = True) then
  12.     begin
  13.       Frm_main.Logging.WriteToLog('INFORMATIE | De dataset is leeg, er is geen export naar Excel aangemaakt.');
  14.       //hier eventueel een leeg werkblad laten aanmaken.
  15.     end
  16.   else
  17.     begin //De export naar Excel uitvoeren
  18.       Frm_Main.Logging.WriteToLog('INFORMATIE | Start met het opslaan van ' + ASheetName);
  19.  
  20.       //Werkblad aanmaken
  21.       sheet := MyWorkbook.AddWorksheet(Asheetname);
  22.       szLeesuit :='';
  23.          
  24.           //kopteksten
  25.       for f := 0 to  Adbgrid.DataSource.DataSet.FieldCount -1 do
  26.         begin
  27.           sheet.WriteText(0, f, widestring(Adbgrid.DataSource.DataSet.Fields[f].FieldName));
  28.         end;
  29.  
  30.       //de data naar Excel schrijven
  31.       with Adbgrid.DataSource.DataSet do
  32.         begin
  33.           Adbgrid.DataSource.DataSet.DisableControls;
  34.           First;
  35.           for R := 0 to Adbgrid.DataSource.DataSet.RecordCount-1 do
  36.             begin
  37.               for F := 0 to Adbgrid.DataSource.DataSet.FieldCount-1 do
  38.                 begin
  39.                   szLeesuit :='';
  40.                   if Adbgrid.DataSource.DataSet.Fields[F].AsString <> '' then
  41.                     szLeesuit :=  Adbgrid.DataSource.DataSet.Fields[F].AsString;
  42.                   sheet.WriteText(R+1, F+0, widestring(szLeesuit) );
  43.                 end;
  44.               next;
  45.             end;
  46.           First;   //weer terug naar het eerste record.
  47.           Adbgrid.DataSource.DataSet.EnableControls;
  48.  
  49.           //tussentijds opslaan (dus na elke query)
  50.           try
  51.             if SaveDialog_FileName <> '' then
  52.               MyWorkbook.WriteToFile(widestring(SaveDialog_FileName));
  53.           except
  54.             on E: Exception do
  55.               begin
  56.                 Frm_main.Logging.WriteToLog('FOUT      | Het tussentijds opslaan van het Fpspreadsheet Workbook is mislukt.');
  57.                 Frm_main.Logging.WriteToLog('FOUT      | Een onverwachte uitzondering heeft plaats gevonden:');
  58.                 Frm_main.Logging.WriteToLog('FOUT      | ' + E.Message);
  59.                 Frm_main.Logging.WriteToLogAndFlush('FOUT      | ');
  60.               end;
  61.           end;
  62.         end;
  63.     end;
  64.     screen.Cursor:=crdefault;
  65. end;


wp

  • Hero Member
  • *****
  • Posts: 12458
Re: memory consumption
« Reply #1 on: July 08, 2016, 11:21:43 pm »
Memory consumption of fpspreadsheet is high because the entire workbook is held in memory. A lot of optimization has been applied in the past to reduce the memory load per cell, I still see a few places where a few bytes can gained, but in total this will be a marginal effect. Excel, on the other hand, applies some tricky disk swapping mechanism which is not available here. Of course somebody can write a patch for it...

To overcome the memory limitation virtual mode was introduced. This allows to write cells "on the fly" without keeping them in memory, and this is what you mean with the OnWriteCellData event. Unfortunately I made a design error and assigned the virtual mode to the workbook, not to the worksheet. Therefore, you cannot handle worksheets differently in virtual mode. At the moment I am busy with updating the Turbo Power Visual PlanIt components. When I am done the virtual mode design flaw will be fixed. Most probably, however, it will break existing code.

I am not the expert with OLE automation. When I applied it in some old Delphi project I soon got disappointed by this low-performance programming model and the interference with already running Excel instances. I did notice that writing an Excel file cell by cell (as you do) is extremely slow, but I found somewhere a note that speed increases considerably if cell ranges are used and the cells are addressed as elements in such arrays - sorry, I don't know any more where I had picked this up.

Since you deal with databases (I guess), did you consider using tools provided by your database engine? I did a lot of work with Delphi and Microsoft's ADO these days; there was an extension of the SELECT query which could be used to write to Excel (and other formats). I am posting here the code that I had used. Note that this is Delphi code, it will not work with Lazarus, but maybe you can get similar things with your database engine, or with ODBC, or with ZEOS, I don't know:

Code: Pascal  [Select][+][-]
  1. type
  2.   TExportFormat = (
  3.     efMDB,
  4.     efExcel3, efExcel4, efExcel5, efExcel8,
  5.     efLotusWK1, efLotusWK3, efLotusWK4,  // Lotus WK4 is not working
  6.     efDBase3, efDBase4, efDBase5,
  7.     efParadox3, efParadox4, efParadox5,
  8.     efText,
  9.     efHTML,
  10.     efXML
  11.   );
  12.  
  13. const
  14.   SQL_Mask = 'SELECT %s INTO [%s] IN "%s" %s FROM %s';
  15.   //                 |        |       |   |        |
  16.   //               fields     |       |   |        |
  17.   //                     dest table   |   |        |
  18.   //                        dest database |        |
  19.   //                              database format  |
  20.   //                                         source dataset
  21.   DBCodes : array[TExportFormat] of string = (
  22.     '',
  23.     '"Excel 3.0;"', '"Excel 4.0;"', '"Excel 5.0;"', '"Excel 8.0;"',
  24.     '"Lotus WK1;"', '"Lotus WK3;"', '"Lotus WK4;"',
  25.     '"dBASE III;"', '"dBASE IV;"', '"dBASE 5.0;"',
  26.     '"Paradox 3.x;"', '"Paradox 4.x;"', '"Paradox 5.x;"',
  27.     '"Text;"',
  28.     '"HTML Export;"',
  29.     ''
  30.   );
  31.  
  32. function ADOExportQuery(AConnection:TADOConnection;
  33.   ASourceSQL,ADestFile:string;
  34.   AFormat:TExportFormat; ADestTableName:string) : integer;
  35. const
  36.   WhiteSpace = [' ', #13, #10, #9];
  37.   Ls = 6;  // = Length('SELECT');
  38.   Lf = 4;  // = Length('FROM');
  39. var
  40.   SQL : string;
  41.   p, p1, p2, p3 : integer;
  42.   done : boolean;
  43.   sql1, sql2, _sql : string;
  44. begin
  45.   result := -1;
  46.   if (ASourceSQL='') or (ADestFile='') or (AConnection=nil) then
  47.     raise Exception.Create('Incomplete parameters.');
  48.  
  49.   if AFormat=efXML then begin
  50.     result := ADOExportToXML(AConnection, ASourceSQL, ADestFile);
  51.   end else begin
  52.     while (ASourceSQL<>'') and (ASourceSQL[1] in WhiteSpace) do
  53.        System.Delete(ASourceSQL, 1, 1);
  54.     _sql := Uppercase(ASourceSQL) + ' ';
  55.     p1 := Ls+1;
  56.     while (_sql<>'') and (_sql[p1] in WhiteSpace) do inc(p1);
  57.     p2 := 0;
  58.     done := false;
  59.     while not done do begin
  60.       p := pos('FROM', _sql);
  61.       if p<>0 then begin
  62.         done := (_sql[p-1] in WhiteSpace) and (_sql[p+Lf+1] in WhiteSpace);
  63.         if not done then begin
  64.           p2 := p2 + p;
  65.           _sql := copy(_sql, p+Lf, Length(_sql));
  66.         end;
  67.       end else
  68.         done := true;
  69.     end;
  70.     p3 := p2 + Lf;
  71.     while (p2>0) and (ASourceSQL[p2] in WhiteSpace)
  72.       do dec(p2);
  73.     while (p3<Length(ASourceSQL)) and (ASourceSQL[p3] in WhiteSpace)
  74.       do inc(p3);
  75.     sql1 := copy(ASourceSQL, p1, p2-p1-1);  // between "SELECT" and "FROM"
  76.     sql2 := copy(ASourceSQL, p3, Length(ASourceSQL)); // part after "FROM"
  77.     result := ADOExportTable(AConnection, sql2, sql1, ADestFile, AFormat,
  78.       ADestTableName);
  79.   end;
  80. end;
  81.  
  82. function ADOExportTable(AConnection:TADOConnection;
  83.   ASourceTable,AFieldList,ADestFile:string;
  84.   AFormat:TExportFormat; ADestTableName:string) : integer;
  85. var
  86.   SQL : string;
  87.   wasConn : boolean;
  88.   fdir : string;
  89.   fnam : string;
  90.   L : TStringList;
  91. begin
  92.   result := -1;
  93.  
  94.   if (ASourceTable='') or (ADestFile='') or (AConnection=nil) then
  95.     raise Exception.Create('Incomplete parameters.');
  96.  
  97.   if AFieldList='' then AFieldList := '*';
  98.   if AFormat=efXML then begin
  99.     SQL := Format('SELECT %s FROM %s', [AFieldList, ASourceTable]);
  100.     result := ADOExportToXML(AConnection, SQL, ADestFile);
  101.   end else begin
  102.     fdir := ExtractFileDir(ADestFile);
  103.     fnam := ExtractfileName(ADestFile);
  104.     case AFormat of
  105.       efDBase3,
  106.       efDBase4,
  107.       efDBase5,
  108.       efParadox3,
  109.       efParadox4,
  110.       efParadox5,
  111.       efLotusWK1,
  112.       efText,
  113.       efHTML :
  114.         begin
  115.           if FileExists(ADestFile) then DeleteFile(ADestFile);
  116.           SQL := Format(SQL_Mask,
  117.             [AFieldList, fnam, fdir, DBCodes[AFormat], ASourceTable]);
  118.         end;
  119.       efLotusWK3,
  120.       //efLotusWK4,  --> not working !
  121.       efExcel3,
  122.       efExcel4 :
  123.          begin
  124.            if FileExists(ADestFile) then DeleteFile(ADestFile);
  125.            SQL := Format(SQL_Mask,
  126.              [AFieldList, fnam, ADestFile, DBCodes[AFormat], ASourceTable]);
  127.          end;
  128.       efExcel5,
  129.       efExcel8 :
  130.         begin
  131.          if FileExists(ADestFile) then DeleteFile(ADestFile);
  132.          SQL := Format(SQL_Mask,
  133.            [AFieldList, ADestTableName, ADestFile, DBCodes[AFormat],
  134.             ASourceTable]);
  135.         end;
  136.       efMDB :
  137.         SQL := Format(SQL_Mask,
  138.           [AFieldList, ADestTableName, ADestFile, DBCodes[AFormat],
  139.            ASourceTable]);
  140.       else
  141.         raise Exception.Create('ExportTable: Fileformat not supported.');
  142.     end;
  143.     with AConnection do begin
  144.       wasConn := Connected;
  145.       Connected := true;
  146.       Execute(SQL, result);
  147.       if not wasConn then Connected := false;
  148.     end;
  149.   end;
  150. end;

Thaddy

  • Hero Member
  • *****
  • Posts: 16152
  • Censorship about opinions does not belong here.
Re: memory consumption
« Reply #2 on: July 09, 2016, 11:06:03 am »
. Excel, on the other hand, applies some tricky disk swapping mechanism which is not available here.
Well, actually the Excel behavior  is available by default when you use the standard windows functions for memory allocation / de-allocation in your memory manager.
That would be VirtualAlloc and the likes with proper flags to allow swapping to disk instead of forcing into actual memory hardware.
(I told many times the FPC memory manager sucks, here's another example)
In Delphi the memory manager on top of the OS code allows this and that should work.

Holding just the sheet in memory, though, would be an improvement either way. I concur.

But there are no 'tricks'  involved other than OS memory management behavior.

There are alternative memory managers available to solve the issue, including one I wrote myself for fun. Even very simple ones that just use the heap calls under windows.

The offset is that they - mostly - consume more physical memory than FPC's default memory manager. But they will swap automatically.

Note this write up is highly Windows specific! Warranted by the appearance of Excel in this story ;)
« Last Edit: July 09, 2016, 11:20:14 am by Thaddy »
If I smell bad code it usually is bad code and that includes my own code.

wp

  • Hero Member
  • *****
  • Posts: 12458
Re: memory consumption
« Reply #3 on: July 14, 2016, 09:49:13 pm »
@Hansvb:
The new fpspreadsheet version on trunk has a modified virtual mode. The properties VirtualColCount and VirtualRowCount, as well as the event OnWriteCellData, have been moved from the workbook to the worksheet. Therefore, is is possible now to treat worksheets individually when they are written in virtual mode.

As already noted above, this modification breaks existing virtual mode code. But it's easy to fix, the compiler will tell you what's wrong:

old code                   ---> new code
--------------------------------------------------------------
MyWorkbook.VirtualRowCount ---> MyWorksheet.VirtualRowCount
MyWorkbook.VirtualColCount ---> MyWorksheet.VirtualColCount
MyWorkbook.OnWriteCellData ---> MyWorksheet.OnWriteCellData


And, the first parameter of the OnWriteCellData event handler is a TsWorksheet now, no longer a TObject.

[EDIT]
When you apply this to your project be sure to set workbook option boBufStream, otherwise all cells will be written to a temporary memory stream which will overflow memory in the same way as non-virtual mode, just a bit delayed.
« Last Edit: July 15, 2016, 12:24:02 pm by wp »

Hansvb

  • Hero Member
  • *****
  • Posts: 715
Re: memory consumption
« Reply #4 on: July 31, 2016, 09:25:05 am »
Thanks, i will try this

Abrax

  • New member
  • *
  • Posts: 7
Re: memory consumption
« Reply #5 on: December 12, 2016, 02:04:52 pm »
Dear Experts,

I'm working with a really huge .xlsx table (93.880 rows x 37 columns). The System is win10 64bit, 8GB RAM. During the loading of the table my program consumes 5-6GB memory, and the loading takes 20sec. After it is loaded, the memory consumption falls to 700MB. Running the .exe on 4GB RAM (= user PC) the loading of the .xlsx takes 15 minutes, because of the swapping.

I tried every option (boBufStream, boFileStream) and also tried the demo application (db_export_import) with a 100.000 x 37 table with boVirtualMode but the effect was the same: high peek memory consuption and a falling back.

I used the debugger to find the cause, and I found the procedure call (parser.Parse(src, ADoc);) in the fpsxmlecommon.pas, which eats up the memory, but I got stuck here, because the step-by-step debugger did not jump in to the laz2_XMLRead.

Any idea to decrease the hunger of the parser?
« Last Edit: December 12, 2016, 02:06:52 pm by Abrax »

wp

  • Hero Member
  • *****
  • Posts: 12458
Re: memory consumption
« Reply #6 on: December 12, 2016, 07:25:00 pm »
I can confirm this issue, but I don't have a solution, yet.

To debug into laz2_XMLRead you can set the -option "-gw2" or "-gw" in "Additions and overrides" of the project options (see screenshot). This recompiles your project and all required packages with Dwarf(2) debug info.

Abrax

  • New member
  • *
  • Posts: 7
Re: memory consumption
« Reply #7 on: December 13, 2016, 01:44:19 pm »
Thanks to the advice. The guilty is the procedure TXMLReader.ParseContent; (laz2_XMLRead) , it eats the memory. The stack is OK, the heap should be checked, but the Heaptrc freeze the IDE and give no result. I suppose the procedure allocate new piece of memory again and again at every call and the System has not enough time (or thread) to the garbage collection.


Unfortunately this examination is far beyond the scope of the project. The basic concept is wrong to work with such big spreadsheet. This is a fact. The MS Excel is dying in an other way when I try to manipulate this .xlsx.

As last solution I try to quarry the data from the .xlsx by hand and write to a temporary .csv like file or TSringList at runtime. Actually I only have to find and read records from the spreadsheet, and TStringList quite fast.

KennethH

  • New member
  • *
  • Posts: 8
Re: memory consumption
« Reply #8 on: July 30, 2024, 04:42:09 pm »
Old topic I know, but I just went down this rabbit hole, so maybe I can help shed light on it through my pain.

Yes, TXMLReader.ParseContent; (laz2_XMLRead) is the culprit. But I decided to dig deeper and understand why. It turns out that the nodes of the XML are class-based.

A high pain-point is: TXMLReader.ParseAttribute  and CreateElementBuf() (laz2_XMLRead). This will turn every node in the XML into a class. This class has FList for its children and various other properties. For example, TDOMNode_WithChildren.InstanceSize is 80 (FPC 64-bit Windows). This adds up quickly.

My sample Excel file rendered into an XML with 9 million nodes.  Each of these caused the creation of a DomElement and a DomAttribute, so ~160 bytes per node. For a file of about for a file of about 20MB, we see ~1.5GB memory used  just on these two classes alone. This didn't even count the InstanceSize of their FLists, and whatever other helper objects they make.

imo this cannot be solved with the native Lazarus XML-reading procedures.

I did comment out units laz2_xmlread, laz2_DOM and the resulting list of errors looks relatively contained. Perhaps substituting a "dumb" XML reader is a go-forward option.

PascalDragon

  • Hero Member
  • *****
  • Posts: 5752
  • Compiler Developer
Re: memory consumption
« Reply #9 on: July 31, 2024, 10:39:41 pm »
. Excel, on the other hand, applies some tricky disk swapping mechanism which is not available here.
Well, actually the Excel behavior  is available by default when you use the standard windows functions for memory allocation / de-allocation in your memory manager.
That would be VirtualAlloc and the likes with proper flags to allow swapping to disk instead of forcing into actual memory hardware.

There are no flags for VirtualAlloc that can force a page to stay in the physical memory. If one wants memory to stay in physical memory then one has to use VirtualLock on that memory. FPC's default memory manager doesn't use that by itself, but only HeapAlloc, thus any page can be paged.

(I told many times the FPC memory manager sucks, here's another example)

The memory manager works good enough for the majority of the applications out there. Other with specific needs can replace the memory manager if desired.

In Delphi the memory manager on top of the OS code allows this and that should work.

Code, please, or it didn't happen.

wp

  • Hero Member
  • *****
  • Posts: 12458
Re: memory consumption
« Reply #10 on: August 01, 2024, 12:01:23 pm »
Perhaps substituting a "dumb" XML reader is a go-forward option.
Then go ahead and rewrite the xlsx reader in your way (I won't do it, at least not in the near future) - fpspreadsheet is open to allow this (https://wiki.lazarus.freepascal.org/FPSpreadsheet#Adding_new_file_formats, you can register your xlsx reader as user format).

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11935
  • FPC developer.
Re: memory consumption
« Reply #11 on: August 01, 2024, 01:30:35 pm »
I did comment out units laz2_xmlread, laz2_DOM and the resulting list of errors looks relatively contained. Perhaps substituting a "dumb" XML reader is a go-forward option.

(There is sax_xml but it is quite skeletal. That is an fcl-xml unit that didn't get the laz_/laz2_ treatment )

BeniBela

  • Hero Member
  • *****
  • Posts: 920
    • homepage
Re: memory consumption
« Reply #12 on: August 01, 2024, 11:50:18 pm »

A high pain-point is: TXMLReader.ParseAttribute  and CreateElementBuf() (laz2_XMLRead). This will turn every node in the XML into a class. This class has FList for its children and various other properties. For example, TDOMNode_WithChildren.InstanceSize is 80 (FPC 64-bit Windows). This adds up quickly.

My sample Excel file rendered into an XML with 9 million nodes.  Each of these caused the creation of a DomElement and a DomAttribute, so ~160 bytes per node. For a file of about for a file of about 20MB, we see ~1.5GB memory used  just on these two classes alone. This didn't even count the InstanceSize of their FLists, and whatever other helper objects they make.

Classes were a bad idea

Someone made an XML library for Delphi which just uses 8 bytes per node: https://blog.grijjy.com/2020/10/07/an-xml-dom-with-just-8-bytes-per-node/


You would think it is obviously impossible to store seven 8-byte pointers in just 8-bytes, but with bit tricks, he managed to do it
« Last Edit: August 02, 2024, 03:12:05 pm by BeniBela »

 

TinyPortal © 2005-2018