Recent

Author Topic: Cannot read excel 2 file  (Read 3004 times)

bpranoto

  • Full Member
  • ***
  • Posts: 183
Cannot read excel 2 file
« on: March 21, 2023, 04:49:34 pm »
I need to read excel files created by attendance machine.

When I open the file in MsOffice, the file's property says it is an Excel2 format.

When I try to open the file with autodetection
Code: Pascal  [Select][+][-]
  1. wb.ReadFromFile(Self.OpenDialog1.FileName{,sfExcel2});

The autodetection mechanism shows a XMLReadError which  is eaten than, and then it fails to read the file.

If I open the file with specific version of Excel2
Code: Pascal  [Select][+][-]
  1. wb.ReadFromFile(Self.OpenDialog1.FileName,sfExcel2);
it throws a EFpSpreadsheetReader exception with message BOF record not found.

Is fpSpreadsheet not able to read old excel 2 files?

Here is the full code
Code: Pascal  [Select][+][-]
  1. procedure TForm1.Button1Click(Sender: TObject);
  2. var
  3.   wb:TsWorkbook;
  4.   ws:TsWorksheet;
  5.   ptrCell:PCell;
  6.   str:String;
  7. begin
  8.   wb:=TsWorkbook.Create;
  9.   try
  10.     Self.OpenDialog1.Filter:='Excel file|*.xls;*.xlsx';
  11.     if Self.OpenDialog1.Execute then begin
  12.       wb.ReadFromFile(Self.OpenDialog1.FileName,sfExcel2);
  13.       ws:=wb.GetFirstWorksheet;
  14.       ptrCell:=ws.GetCell(0,0);
  15.       str:=ptrCell^.UTF8StringValue;
  16.       ShowMessage( 'Cell A1 contains "' + str + '"');
  17.     end;
  18.   finally
  19.     wb.Free;
  20.   end;
  21.  
  22. end;        
  23.  

Environment:
O/S: Linux 64bit
FPC:3.2.2
Lazarus:2.2.5
fpSpreadsheet:1.14.0.0 (installed with OPM)

Thank you for the help.

Best regards,
Bambang

Attached is the simple demo project.

The link to download the excel file from my google drive is in the unit1.pas

wp

  • Hero Member
  • *****
  • Posts: 12461
Re: Cannot read excel 2 file
« Reply #1 on: March 21, 2023, 06:40:34 pm »
How was this file created? I mean: By which application?

The file has a BIFF8 header, but a typical BIFF2 record layout. I am not sure whether reading of such a malformed file should be supported. Because what are headers good for then, if the reading application still has to guess whether maybe a wrong header had been written?

But anyway, it is easy to patch the file in memory. A BIFF2 file (Excel2.1) begins with the bytes $09 $00, but your file begins with $09 $08. So, my solution is to read the file into a memory stream, rewind the stream to the 2nd byte and replace the $08 by a $00. This cheats the fpspreadsheet format detection to detect the file as a valid BIFF2 file (well, not exactly, because the header is too long by 2 bytes, but the basic structure (record header - record size - record data) is intact), and fpspreadsheet can read the file correctly.

Code: Pascal  [Select][+][-]
  1. procedure TForm1.Button1Click(Sender: TObject);
  2. var
  3.   stream: TMemoryStream;
  4.   wb: TsWorkbook;
  5.   ws: TsWorksheet;
  6.   str: String;
  7.   cell: PCell;
  8. begin
  9.   wb := TsWorkbook.Create;
  10.   try
  11.     stream := TMemoryStream.Create;
  12.     try
  13.       // Read the file into a memory stream
  14.       stream.LoadFromFile('ABSEN WAJAH FEBRUARI 2023.xls');
  15.       // Patch the faulty byte at stream positon 1
  16.       stream.Position := 1;
  17.       stream.WriteByte(0);
  18.       // Rewind the stream to the beginning...
  19.       stream.Position := 0;
  20.       // ... and read the workbook.
  21.       wb.ReadFromStream(stream, sfExcel2);
  22.     finally
  23.       stream.Free;
  24.     end;
  25.     ws := wb.GetFirstWorksheet;
  26.     str := ws.ReadAsText(0, 0);
  27.     {
  28.     // This has the problem that when A1 is empty a cell is allocated for it.
  29.     // Just the query cells, it is better to use the FindCell method which
  30.     // returns nil when the cell does not exist. Or: use the Read* methods of
  31.     // the worksheet as done in the previouse line.
  32.     cell := ws.GetCell(0, 0);
  33.     str := cell^.UTF8StringValue;
  34.     }
  35.     ShowMessage('Cell A1 contains "' + str + '"');
  36.   finally
  37.     wb.Free;
  38.   end;
  39. end;  

Please read also the comment that I put into the code about your usage of the GetCell method.
« Last Edit: March 21, 2023, 11:54:24 pm by wp »

bpranoto

  • Full Member
  • ***
  • Posts: 183
Re: Cannot read excel 2 file
« Reply #2 on: March 22, 2023, 05:03:30 am »
Wp, Thank you very much for your help.

How was this file created? I mean: By which application?
This is attendance machine exported data in excel format.

Quote
The file has a BIFF8 header, but a typical BIFF2 record layout. I am not sure whether reading of such a malformed file should be supported. Because what are headers good for then, if the reading application still has to guess whether maybe a wrong header had been written?
Agree.. What made me puzzled is Microsoft excel and Libreoffice can read the file fine..

Quote
But anyway, it is easy to patch the file in memory. A BIFF2 file (Excel2.1) begins with the bytes $09 $00, but your file begins with $09 $08. So, my solution is to read the file into a memory stream, rewind the stream to the 2nd byte and replace the $08 by a $00. This cheats the fpspreadsheet format detection to detect the file as a valid BIFF2 file (well, not exactly, because the header is too long by 2 bytes, but the basic structure (record header - record size - record data) is intact), and fpspreadsheet can read the file correctly.
Thanks for the trick.

Quote
Code: Pascal  [Select][+][-]
  1. procedure TForm1.Button1Click(Sender: TObject);
  2. var
  3.   stream: TMemoryStream;
  4.   wb: TsWorkbook;
  5.   ws: TsWorksheet;
  6.   str: String;
  7.   cell: PCell;
  8. begin
  9.   wb := TsWorkbook.Create;
  10.   try
  11.     stream := TMemoryStream.Create;
  12.     try
  13.       // Read the file into a memory stream
  14.       stream.LoadFromFile('ABSEN WAJAH FEBRUARI 2023.xls');
  15.       // Patch the faulty byte at stream positon 1
  16.       stream.Position := 1;
  17.       stream.WriteByte(0);
  18.       // Rewind the stream to the beginning...
  19.       stream.Position := 0;
  20.       // ... and read the workbook.
  21.       wb.ReadFromStream(stream, sfExcel2);
  22.     finally
  23.       stream.Free;
  24.     end;
  25.     ws := wb.GetFirstWorksheet;
  26.     str := ws.ReadAsText(0, 0);
  27.     {
  28.     // This has the problem that when A1 is empty a cell is allocated for it.
  29.     // Just the query cells, it is better to use the FindCell method which
  30.     // returns nil when the cell does not exist. Or: use the Read* methods of
  31.     // the worksheet as done in the previouse line.
  32.     cell := ws.GetCell(0, 0);
  33.     str := cell^.UTF8StringValue;
  34.     }
  35.     ShowMessage('Cell A1 contains "' + str + '"');
  36.   finally
  37.     wb.Free;
  38.   end;
  39. end;  

Please read also the comment that I put into the code about your usage of the GetCell method.

Thanks for the pointer of using GetCell method...

wp

  • Hero Member
  • *****
  • Posts: 12461
Re: Cannot read excel 2 file
« Reply #3 on: March 22, 2023, 09:34:10 am »
Because what are headers good for then, if the reading application still has to guess whether maybe a wrong header had been written?
Since Excel can read these files and since it is a relatively simple modification, I committed a version of the BIFF2 reader to svn repository which accepts also a BOF record of Excel 8. With the new version in CCR you now can read your files directly, the special patch is not required any more.

bpranoto

  • Full Member
  • ***
  • Posts: 183
Re: Cannot read excel 2 file
« Reply #4 on: March 22, 2023, 09:50:32 am »
Much appreciated wp.

BTW, will the new version automatically reflected in OPM?

wp

  • Hero Member
  • *****
  • Posts: 12461
Re: Cannot read excel 2 file
« Reply #5 on: March 22, 2023, 10:04:53 am »
No, I only prepare a new OPM release after signficant bug fixes or after some relevant amount of new features have accumulated. But if you don't use svn you can download the snapshot from the ccr site: https://sourceforge.net/p/lazarus-ccr/svn/HEAD/tree/components/fpspreadsheet/

bpranoto

  • Full Member
  • ***
  • Posts: 183
Re: Cannot read excel 2 file
« Reply #6 on: March 22, 2023, 10:23:52 am »
Thanks.

I have downloaded the xlsbiff2.pas.

Thank you very much again.

bpranoto

  • Full Member
  • ***
  • Posts: 183
Re: Cannot read excel 2 file
« Reply #7 on: March 22, 2023, 12:50:13 pm »
But anyway, it is easy to patch the file in memory. A BIFF2 file (Excel2.1) begins with the bytes $09 $00, but your file begins with $09 $08.

I have another .xls file created by CLIPWKS (a dos clipper library). fpSpreadsheet also can't read this file properly while openoffice and ms excel can open it well. It begins with the bytes $09 $04.

I rename the file to .doc extension in order to be able to attach to this post.

wp

  • Hero Member
  • *****
  • Posts: 12461
Re: Cannot read excel 2 file
« Reply #8 on: March 23, 2023, 12:30:00 am »
$09 $04 is the BOF record marker of the BIFF4 file format (Excel 4.0) which is not supported by fpspreadsheet (among the xls format it supports only those of Excel 2.1, 5.0 and 8.0)

But since the main building blocks are already there I combined them to a reader unit for this format and committed it to the svn repository. The unit is named similar to the others: xlsbiff4. I did a quick test and found no issues, except for the cell background which is not correctly rendered in the worksheetgrid at the moment.

Note, that xlsbiff4 is not used internally by the fpspreadsheet package, there is no predefined format identifier for it (i.e. "sfExcel4"). But by adding the unit to the uses clause the reader is registered as a user-defined format and gets a id "sfidExcel4" which can be used in the ReadFromFile command:

Code: Pascal  [Select][+][-]
  1. uses
  2.   Classes, SysUtils, Forms, Controls, Graphics, Dialogs, StdCtrls,
  3.   fpSpreadsheet, fpsTypes, xlsBIFF4;
  4.  
  5. procedure TForm1.Button1Click(Sender: TObject);
  6. const
  7.   fn = 'hapusaja.xls';
  8. var
  9.   wb: TsWorkbook;
  10.   ws: TsWorksheet;
  11.   str: String;
  12. begin
  13.   wb := TsWorkbook.Create;
  14.   try
  15.     wb.ReadFromFile(fn, sfidExcel4);
  16.     ws := wb.GetFirstWorksheet;
  17.     str := ws.ReadAsText(0, 0);
  18.     ShowMessage('Cell A1 contains "' + str + '"');
  19.   finally
  20.     wb.Free;
  21.   end;
  22. end;

And just for clarification: This is only a reader unit - it is not possible to write a workbook in the BIFF4 format ATM (and I don't have plans to support this in the near future).
« Last Edit: March 23, 2023, 12:32:40 am by wp »

bpranoto

  • Full Member
  • ***
  • Posts: 183
Re: Cannot read excel 2 file
« Reply #9 on: March 23, 2023, 03:32:48 am »
Thank you very much WP.

wp

  • Hero Member
  • *****
  • Posts: 12461
Re: Cannot read excel 2 file
« Reply #10 on: March 24, 2023, 01:02:21 am »
Since BIFF4 is very similar to BIFF3, the format of Excel 3.0, I extended the unit xlsbiff4 to support both formats and renamed the unit to xlsbiff34. Like before, simply adding the unit to the uses clause activates the readers for both formats. The formats can be selected directly by the sfidExcel3 and sfidExcel4 format IDs, but format auto-detections works, too.

 

TinyPortal © 2005-2018