Recent

Author Topic: Defective xlsx file exception  (Read 2207 times)

VTwin

  • Hero Member
  • *****
  • Posts: 655
  • Former Turbo Pascal 3 user
Defective xlsx file exception
« on: December 15, 2018, 11:50:43 pm »
First, many thanks for FPSpreadsheet, I use it to open and save different spreadsheet formats, which my users greatly appreciate.

When trying to open a simple LibreOffice ods spreadsheet document, I have run into an exception:

"Project raised exception class 'EFpSpreadsheetReader' with message:
Defective internal structure of xlsx file."

which occurs at:

Code: Pascal  [Select]
  1. if not UnzipToStream(AStream, OOXML_PATH_XL_WORKBOOK, XMLStream) then
  2.   raise EFPSpreadsheetReader.CreateFmt(rsDefectiveInternalFileStructure, ['xlsx']);  
  3.  

If I ignore the exception the file opens and is fine. Why is it sending an exception for 'xlsx'? Is this a bug?

Cheers,
VTwin
« Last Edit: December 16, 2018, 12:46:07 am by VTwin »
“Talk is cheap. Show me the code.” -Linus Torvalds

macOS 10.11.6: Lazarus 2.1.0 svn 61174M (64 bit Cocoa trunk)
Ubuntu 18.04.2: Lazarus 2.0.0 (64 bit on VBox)
Windows 7 Pro SP1: Lazarus 2.0.0 (64 bit on VBox)

wp

  • Hero Member
  • *****
  • Posts: 5652
Re: Defective xlsx file exception
« Reply #1 on: December 16, 2018, 01:11:31 am »
You want to open an odt file in fpspreadsheet, but it picks the xlsx reader? Did you specify the correct file format?
Code: Pascal  [Select]
  1. var
  2.   book: TsWorkbook;
  3. begin
  4.   book := TsWorkbook.Create;
  5.   book.ReadFromFile('test.ods', sfOpenDocument);
  6.   // or, for automatic format detection:
  7.   // book.ReadFromFile('test.ods');
  8.   ...
If you did this correctly, the next question is: Are you sure that the file really is ods? How was it written? ods and xlsx files internally are zip files; you can unzip your file after renaming the extension to zip. An unzipped ods file has a file "contents.xml" in the top level of the file hierarchy. An unzipped xlsx file has folders "_rels" and "xl" in the top level of the file hierarchy.
Lazarus trunk / fpc 3.0.4 / all 32-bit on Win-10

VTwin

  • Hero Member
  • *****
  • Posts: 655
  • Former Turbo Pascal 3 user
Re: Defective xlsx file exception
« Reply #2 on: December 16, 2018, 01:44:36 am »
Thanks wp. Yes, I have this:

Code: Pascal  [Select]
  1. function TVList.OpenFPSFile(const apath: string): boolean;
  2. var
  3.   wb: TsWorkbook;
  4. begin
  5.   result := false;
  6.   try
  7.     wb := TsWorkbook.Create;
  8.     wb.ReadFromFile(apath);
  9.     result := OpenFPSWorksheet(wb, 0);
  10.   finally
  11.     wb.Free;
  12.   end;
  13. end;
  14.  

"apath" has '.ods' extension, and is a spreadsheet file saved from LibreOffice. So I don't see an error that I am making. If I press "Continue" to ignore the exception I can use wb to access the file.
“Talk is cheap. Show me the code.” -Linus Torvalds

macOS 10.11.6: Lazarus 2.1.0 svn 61174M (64 bit Cocoa trunk)
Ubuntu 18.04.2: Lazarus 2.0.0 (64 bit on VBox)
Windows 7 Pro SP1: Lazarus 2.0.0 (64 bit on VBox)

VTwin

  • Hero Member
  • *****
  • Posts: 655
  • Former Turbo Pascal 3 user
Re: Defective xlsx file exception
« Reply #3 on: December 16, 2018, 01:47:45 am »
I confirmed that I can unzip the file, and that it has "content.xml" at the top.
“Talk is cheap. Show me the code.” -Linus Torvalds

macOS 10.11.6: Lazarus 2.1.0 svn 61174M (64 bit Cocoa trunk)
Ubuntu 18.04.2: Lazarus 2.0.0 (64 bit on VBox)
Windows 7 Pro SP1: Lazarus 2.0.0 (64 bit on VBox)

wp

  • Hero Member
  • *****
  • Posts: 5652
Re: Defective xlsx file exception
« Reply #4 on: December 16, 2018, 10:47:19 am »
The exception happens during the format detection because you do not specify a file format for the reader. fpspreadsheet looks at the header of the file and finds the signature of a zip file; therefore, it tries xlsx and ods as possible candidates, beginning with xlsx. If an exception occurs it continues with ods.

The exception is caught and ignored, but the IDE while the program runs in the debugger is interrupted and displays an exception window - this will not happen when the program runs outside the IDE or without the debugger. You can avoid the exception popping up in the IDE by adding "EFpSpreadsheetReader" to the ignored-exception-list in "Tools" > "Options" > "Debugger" > "Language exceptions".

Or, if you know for sure that the file to be opened is ods, you should add the format specifier sfOpenDocument to the wb.ReadFromFile call.

Maybe the automatic file detection part should be rewritten to consider the extension first.
Lazarus trunk / fpc 3.0.4 / all 32-bit on Win-10

VTwin

  • Hero Member
  • *****
  • Posts: 655
  • Former Turbo Pascal 3 user
Re: Defective xlsx file exception
« Reply #5 on: December 16, 2018, 02:52:13 pm »
Thanks wp, I appreciate your quick and clear reply. I follow the logic, and the work around.

It seems to me that for automatic detection the file extension 'ods' should be considered in addition to the fact that it is a 'zip' file, since it fails otherwise.

I will either ignore the exception or code it to use the format specifier sfOpenDocument for 'ods' files, probably the latter.

Again, I appreciate the continued work on FPSpreadsheet it is impressive, and I have not even delved into the visual components yet.
“Talk is cheap. Show me the code.” -Linus Torvalds

macOS 10.11.6: Lazarus 2.1.0 svn 61174M (64 bit Cocoa trunk)
Ubuntu 18.04.2: Lazarus 2.0.0 (64 bit on VBox)
Windows 7 Pro SP1: Lazarus 2.0.0 (64 bit on VBox)

SeregaKR

  • New member
  • *
  • Posts: 35
Re: Defective xlsx file exception
« Reply #6 on: December 17, 2018, 06:18:24 am »
Extension sometimes maybe wrong. I had to deal with the situation when our partners have sent us an xlsx file with ods extension. They manually renamed file! So we have to consider this. I thought I was going mad trying to find out what was wrong.

wp

  • Hero Member
  • *****
  • Posts: 5652
Re: Defective xlsx file exception
« Reply #7 on: December 17, 2018, 09:22:02 am »
In fact, this was the reason why the initial evaluation of the extension was replaced by the current one. What I was thinking of was to check the extension first, try to load the file and - if it fails - repeat with the current evaluation.
Lazarus trunk / fpc 3.0.4 / all 32-bit on Win-10

wp

  • Hero Member
  • *****
  • Posts: 5652
Re: Defective xlsx file exception
« Reply #8 on: December 18, 2018, 01:11:33 am »
The new revision checks the extension first, and only when the file cannot be loaded successfully this way, it proceeds with the current way of checking the file header. This way the exception for regular ods files is avoided because xlsx had always been tested first (xlsx and ods cannot be distinguished in the header test because both have a zip signature).
Lazarus trunk / fpc 3.0.4 / all 32-bit on Win-10

VTwin

  • Hero Member
  • *****
  • Posts: 655
  • Former Turbo Pascal 3 user
Re: Defective xlsx file exception
« Reply #9 on: December 18, 2018, 03:00:29 am »
Fair enough, I appreciate the difficulties in foreseeing what files might be tried to open, including modified extensions. It sounds like this has been addressed in revision. Very cool, thanks! 
“Talk is cheap. Show me the code.” -Linus Torvalds

macOS 10.11.6: Lazarus 2.1.0 svn 61174M (64 bit Cocoa trunk)
Ubuntu 18.04.2: Lazarus 2.0.0 (64 bit on VBox)
Windows 7 Pro SP1: Lazarus 2.0.0 (64 bit on VBox)

VTwin

  • Hero Member
  • *****
  • Posts: 655
  • Former Turbo Pascal 3 user
Re: Defective xlsx file exception
« Reply #10 on: December 18, 2018, 03:11:26 am »
This is not particularly relevant, but some years ago I spent some time figuring out magic numbers for bitmap images:

Code: Pascal  [Select]
  1.  { Magic numbers }
  2.   mnBmp  : array[0..1] of byte = ($42, $4D); // 'BM'
  3.   mnGif  : array[0..3] of byte = ($47, $49, $46, $38); // GIF8'
  4.   mnJpeg : array[0..2] of byte = ($FF, $D8, $FF);
  5.   mnPcx  : array[0..3] of byte = ($0A, $05, $01, $08);
  6.   mnPng  : array[0..7] of byte = ($89, $50, $4E, $47, $0D, $0A, $1A, $0A);
  7.                                   // . 'PNG' CR LF SUB LF
  8.   mnPnm  : array[0..0] of byte = ($50); // 'P'
  9.   mnPsd  : array[0..11] of byte = ($38, $42, $50, $53, $00, $01, $00, $00, $00,
  10.                                    $00, $00, $00); // '8BPS' 01 0 0 0 0 0 0
  11.   mnTiff1 : array[0..3] of byte = ($49, $49, $2A, $00); // Intel, II 42
  12.   mnTiff2 : array[0..3] of byte = ($4D, $4D, $00, $2A); // Motorola, MM 42

I forget why I needed it, but it was interesting that the files had a signature that could be checked whatever the extension. I suppose some Lazarus code uses these for raster files.
“Talk is cheap. Show me the code.” -Linus Torvalds

macOS 10.11.6: Lazarus 2.1.0 svn 61174M (64 bit Cocoa trunk)
Ubuntu 18.04.2: Lazarus 2.0.0 (64 bit on VBox)
Windows 7 Pro SP1: Lazarus 2.0.0 (64 bit on VBox)

wp

  • Hero Member
  • *****
  • Posts: 5652
Re: Defective xlsx file exception
« Reply #11 on: December 18, 2018, 10:48:25 am »
I suppose some Lazarus code uses these for raster files.
Yes. Every bitmap reader has a virtual function InternalCheck which reads the first few bytes of a file and checks against the signature expected for the file type supported.
Lazarus trunk / fpc 3.0.4 / all 32-bit on Win-10