Forum > FPSpreadsheet
Cannot read excel 2 file
bpranoto:
I need to read excel files created by attendance machine.
When I open the file in MsOffice, the file's property says it is an Excel2 format.
When I try to open the file with autodetection
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---wb.ReadFromFile(Self.OpenDialog1.FileName{,sfExcel2});
The autodetection mechanism shows a XMLReadError which is eaten than, and then it fails to read the file.
If I open the file with specific version of Excel2
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---wb.ReadFromFile(Self.OpenDialog1.FileName,sfExcel2);it throws a EFpSpreadsheetReader exception with message BOF record not found.
Is fpSpreadsheet not able to read old excel 2 files?
Here is the full code
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---procedure TForm1.Button1Click(Sender: TObject);var wb:TsWorkbook; ws:TsWorksheet; ptrCell:PCell; str:String;begin wb:=TsWorkbook.Create; try Self.OpenDialog1.Filter:='Excel file|*.xls;*.xlsx'; if Self.OpenDialog1.Execute then begin wb.ReadFromFile(Self.OpenDialog1.FileName,sfExcel2); ws:=wb.GetFirstWorksheet; ptrCell:=ws.GetCell(0,0); str:=ptrCell^.UTF8StringValue; ShowMessage( 'Cell A1 contains "' + str + '"'); end; finally wb.Free; end; end;
Environment:
O/S: Linux 64bit
FPC:3.2.2
Lazarus:2.2.5
fpSpreadsheet:1.14.0.0 (installed with OPM)
Thank you for the help.
Best regards,
Bambang
Attached is the simple demo project.
The link to download the excel file from my google drive is in the unit1.pas
wp:
How was this file created? I mean: By which application?
The file has a BIFF8 header, but a typical BIFF2 record layout. I am not sure whether reading of such a malformed file should be supported. Because what are headers good for then, if the reading application still has to guess whether maybe a wrong header had been written?
But anyway, it is easy to patch the file in memory. A BIFF2 file (Excel2.1) begins with the bytes $09 $00, but your file begins with $09 $08. So, my solution is to read the file into a memory stream, rewind the stream to the 2nd byte and replace the $08 by a $00. This cheats the fpspreadsheet format detection to detect the file as a valid BIFF2 file (well, not exactly, because the header is too long by 2 bytes, but the basic structure (record header - record size - record data) is intact), and fpspreadsheet can read the file correctly.
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---procedure TForm1.Button1Click(Sender: TObject);var stream: TMemoryStream; wb: TsWorkbook; ws: TsWorksheet; str: String; cell: PCell;begin wb := TsWorkbook.Create; try stream := TMemoryStream.Create; try // Read the file into a memory stream stream.LoadFromFile('ABSEN WAJAH FEBRUARI 2023.xls'); // Patch the faulty byte at stream positon 1 stream.Position := 1; stream.WriteByte(0); // Rewind the stream to the beginning... stream.Position := 0; // ... and read the workbook. wb.ReadFromStream(stream, sfExcel2); finally stream.Free; end; ws := wb.GetFirstWorksheet; str := ws.ReadAsText(0, 0); { // This has the problem that when A1 is empty a cell is allocated for it. // Just the query cells, it is better to use the FindCell method which // returns nil when the cell does not exist. Or: use the Read* methods of // the worksheet as done in the previouse line. cell := ws.GetCell(0, 0); str := cell^.UTF8StringValue; } ShowMessage('Cell A1 contains "' + str + '"'); finally wb.Free; end;end;
Please read also the comment that I put into the code about your usage of the GetCell method.
bpranoto:
Wp, Thank you very much for your help.
--- Quote from: wp on March 21, 2023, 06:40:34 pm ---How was this file created? I mean: By which application?
--- End quote ---
This is attendance machine exported data in excel format.
--- Quote ---The file has a BIFF8 header, but a typical BIFF2 record layout. I am not sure whether reading of such a malformed file should be supported. Because what are headers good for then, if the reading application still has to guess whether maybe a wrong header had been written?
--- End quote ---
Agree.. What made me puzzled is Microsoft excel and Libreoffice can read the file fine..
--- Quote ---But anyway, it is easy to patch the file in memory. A BIFF2 file (Excel2.1) begins with the bytes $09 $00, but your file begins with $09 $08. So, my solution is to read the file into a memory stream, rewind the stream to the 2nd byte and replace the $08 by a $00. This cheats the fpspreadsheet format detection to detect the file as a valid BIFF2 file (well, not exactly, because the header is too long by 2 bytes, but the basic structure (record header - record size - record data) is intact), and fpspreadsheet can read the file correctly.
--- End quote ---
Thanks for the trick.
--- Quote ---
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---procedure TForm1.Button1Click(Sender: TObject);var stream: TMemoryStream; wb: TsWorkbook; ws: TsWorksheet; str: String; cell: PCell;begin wb := TsWorkbook.Create; try stream := TMemoryStream.Create; try // Read the file into a memory stream stream.LoadFromFile('ABSEN WAJAH FEBRUARI 2023.xls'); // Patch the faulty byte at stream positon 1 stream.Position := 1; stream.WriteByte(0); // Rewind the stream to the beginning... stream.Position := 0; // ... and read the workbook. wb.ReadFromStream(stream, sfExcel2); finally stream.Free; end; ws := wb.GetFirstWorksheet; str := ws.ReadAsText(0, 0); { // This has the problem that when A1 is empty a cell is allocated for it. // Just the query cells, it is better to use the FindCell method which // returns nil when the cell does not exist. Or: use the Read* methods of // the worksheet as done in the previouse line. cell := ws.GetCell(0, 0); str := cell^.UTF8StringValue; } ShowMessage('Cell A1 contains "' + str + '"'); finally wb.Free; end;end;
Please read also the comment that I put into the code about your usage of the GetCell method.
--- End quote ---
Thanks for the pointer of using GetCell method...
wp:
--- Quote from: wp on March 21, 2023, 06:40:34 pm ---Because what are headers good for then, if the reading application still has to guess whether maybe a wrong header had been written?
--- End quote ---
Since Excel can read these files and since it is a relatively simple modification, I committed a version of the BIFF2 reader to svn repository which accepts also a BOF record of Excel 8. With the new version in CCR you now can read your files directly, the special patch is not required any more.
bpranoto:
Much appreciated wp.
BTW, will the new version automatically reflected in OPM?
Navigation
[0] Message Index
[#] Next page