Recent

Author Topic: XLSX file read  (Read 2911 times)

xinyiman

  • Hero Member
  • *****
  • Posts: 2256
    • Lazarus and Free Pascal italian community
XLSX file read
« on: January 19, 2020, 08:00:10 pm »
Hi guys, I don't understand how to read the contents of an XLSX file with lazarus and fpspreedsheet. Is there a small example somewhere?
Win10, Ubuntu and Mac
Lazarus: 2.1.0
FPC: 3.3.1

Handoko

  • Hero Member
  • *****
  • Posts: 5122
  • My goal: build my own game engine using Lazarus
Re: XLSX file read
« Reply #1 on: January 19, 2020, 08:05:34 pm »

xinyiman

  • Hero Member
  • *****
  • Posts: 2256
    • Lazarus and Free Pascal italian community
Re: XLSX file read
« Reply #2 on: January 19, 2020, 08:31:10 pm »
I have created a small example. That doesn't work. Can anyone tell me why?

Handoko I had already read that page.
Win10, Ubuntu and Mac
Lazarus: 2.1.0
FPC: 3.3.1

wp

  • Hero Member
  • *****
  • Posts: 11830
Re: XLSX file read
« Reply #3 on: January 19, 2020, 10:58:36 pm »
There are two errors in your code:
  • The error why you are asking: FPSpreadsheet supports several file formats which not necessarily are needed in an application. Therefore it follows the strategy that the programmer must specify the unit with the reader/writer code in the uses clause. For xlsx, the unit is "xlsxooxml". So, when you add this to your uses clause the reader will be found and the file can be read. Alternatively you can also add "fpsaddformats" which links all file formats supported.
  • After doing this I noticed that the program raises an exception at the end. This is because you do not correctly work with the visual controls. WorkbookSource is the workhorse which calls the functions of the non-visual fpspreadsheet library to get access to the files - you call WorkbookSource.LoadFromSpreadsheetFile to read the file. The WorksheetGrid must be connected to the workbooksource via the equally named property to get the file contents being displayed in the grid. Things may get a bit more complicated because sometimes people do not need the entire WorkbookSource machinery; therefore, the grid has its own file loading methods. One of them, the LoadFromWorkbook is thought to display a workbook already loaded. The problem here is that the internal WorkbookSource of the grid will destroy the workbook, but your code loaded the workbook by an external WorkbookSource which will destroy the workbook, too.
Here is one possibility to fix the issue -- it requires that the TsWorkbookSource that you had put onto the form must be connected to the WorkbookSource property of the grid:
Code: Pascal  [Select][+][-]
  1. uses
  2.   Classes, SysUtils, Forms, Controls, Graphics, Dialogs, StdCtrls,
  3.   fpspreadsheetgrid, fpspreadsheetctrls, fpstypes,
  4.   xlsxooxml;   // THIS MUST BE ADDED
  5.  
  6. procedure TForm1.Button1Click(Sender: TObject);
  7. var
  8.    filename : string;
  9. begin
  10.      filename := Application.Location + 'file.xlsx';
  11.      if FileExists(filename) then
  12.      begin
  13.        sWorkbookSource1.LoadFromSpreadsheetFile(filename, sfOOXML);
  14.      end else begin
  15.        ShowMessage('File not exists: ' + filename);
  16.      end;
  17. end;
And this is another solution, here you only need the WorksheetGrid, no WorkbookSource:
Code: Pascal  [Select][+][-]
  1. uses
  2.   Classes, SysUtils, Forms, Controls, Graphics, Dialogs, StdCtrls,
  3.   fpspreadsheetgrid, fpspreadsheetctrls, fpstypes,
  4.   xlsxooxml;   // THIS MUST BE ADDED
  5.  
  6. procedure TForm1.Button1Click(Sender: TObject);
  7. var
  8.    filename : string;
  9. begin
  10.      filename := Application.Location + 'file.xlsx';
  11.      if FileExists(filename) then
  12.      begin
  13.        sWorksheetGrid1.LoadFromSpreadsheetFile(filename, sfOOXML);
  14.      end else begin
  15.        ShowMessage('File not exists: ' + filename);
  16.      end;
  17. end;

xinyiman

  • Hero Member
  • *****
  • Posts: 2256
    • Lazarus and Free Pascal italian community
Re: XLSX file read
« Reply #4 on: January 20, 2020, 11:54:23 am »
Hi wp, thanks for the info. They work properly on small files. But having to process a file with 300000 lines, the memory (8 GB of ram) and the program crashes.

Is there a way to read the file line by line without first having imported the entire block of lines from the file?
Win10, Ubuntu and Mac
Lazarus: 2.1.0
FPC: 3.3.1

wp

  • Hero Member
  • *****
  • Posts: 11830
Re: XLSX file read
« Reply #5 on: January 20, 2020, 12:36:03 pm »
having to process a file with 300000 lines
What is a "line" in this context? Rows of the spreadsheet? How many columns, i.e. how many cells?

Assume there are 1 million cells and that each cell occupies 100 bytes on the average you would need 100 MB to store the worksheet. During reading, the worksheet xml is loaded into memory as well. But i doubt that memory usage exceeds several hundreds of MB -- far away from your 8GB. How much of your memory is occupied? Or did you compile a 32-bit program where you can run out of memory relatively easily? Compile a 64-bit program so that you can access all the free space of the 8GB.

It also depends on what you want to do with the data? Display in a grid? Then there's no other option (BTW - who should look at 1 million cells?). On the other hand, if you want to do an analysis on the fly without storing all cells in a worksheet you can go to "virtual mode" which disposes each cell after reading, of course after giving you the opportunity to analyze the cell. Virtual mode is explained in the wiki article, and there are also some demos among the examples provided.
« Last Edit: January 20, 2020, 05:17:05 pm by wp »

xinyiman

  • Hero Member
  • *****
  • Posts: 2256
    • Lazarus and Free Pascal italian community
Re: XLSX file read
« Reply #6 on: January 20, 2020, 01:51:56 pm »
Line is row of the spreedsheet. 120 columns * 300000 rows.
Win10, Ubuntu and Mac
Lazarus: 2.1.0
FPC: 3.3.1

wp

  • Hero Member
  • *****
  • Posts: 11830
Re: XLSX file read
« Reply #7 on: January 20, 2020, 05:21:58 pm »
This comes up to 36 million cells. With maybe 100 bytes per cell, we have something like 3.6 GB. And since the uncompressed input xml file is in memory, too, (maybe during parsing even twice) I'd guess: no chance...

 

TinyPortal © 2005-2018