Forum > FPSpreadsheet

Autodetect file format

(1/3) > >>

SeregaKR:
Hello.
I would like to know how autodetect file format works in fpspreadsheet. For now it seems it just analyse file extension like xls - Excel 8, xlsx - Office XML and so on. But I have a problem when our partners send us xls files,which are actually xlsx files with manually changed extension. The same problem is with Excel 5 files when fpspreadsheet see them as normal Excel 8 files.
Is there a way to detect the format regardless of extension?

Thaddy:
Well xlsx is a text format afaik, and xls is a binary format afaik.
So maybe this helps:

--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---function IsBinaryFile(const Filename:AnsiString):Boolean;var  f:File of byte; b:byte = 0;begin  Result := false;{$push}{$I-}  Assign(f,Filename);  reset(f);  if (IOResult = 0) and (Filename <>'') then    while not eof(f) do    begin      Read(f,b);      // test for non-printable characters      if ((b > 0) and (b < 8)) or ((b > 13) and (b < 26)) then exit(true);    end;  Close(f);{$pop}end;
See http://www.differencebetween.net/technology/difference-between-xls-and-xlsx/.
It seems to work pretty good. OTOH it is slow for real text files and fast for real BIFF files (xls) , so maybe just test the header for xml presence.
And xmlsx can be zipped, so also test for zip info.

I suspect there is something much simpler than the above!

wp:

--- Quote from: SeregaKR on July 10, 2018, 06:21:58 am ---Hello.
I would like to know how autodetect file format works in fpspreadsheet. For now it seems it just analyse file extension like xls - Excel 8, xlsx - Office XML and so on. But I have a problem when our partners send us xls files,which are actually xlsx files with manually changed extension. The same problem is with Excel 5 files when fpspreadsheet see them as normal Excel 8 files.
Is there a way to detect the format regardless of extension?

--- End quote ---
First of all: Tell your customers never to do this. Even Excel cannot read an xls file renamed to xlsx! (LibreOffice Calc on the other hand can...)

Yes, fpspreadsheet relies on the extension to distinguish between the main file formats. Only in case of xls (which may contain either BIFF2, BIFF5 or BIFF8 binary formats) it reads the header and makes a decision based on characteristic differences.  But since it is easy to check the signature of zipped files ('PK' in the first two bytes) I also added xlsx and ods to this test now. However, to distinguish between these two fpspreadsheet must try to read the files and catch the exception of the reader (while your program is running in the debugger, the IDE will generate an exception which will not occur when the program is run outside the IDE. But you can avoid this by adding the exception type EFPSpreadsheetReader to the "Language Exceptions" under "Tools" > "Options" > "Debugger").

So, update your fpspreadsheet to the current svn version and fpspreadsheet will accept xls files "disguised" as ".xlsx" or ".ods", or ".abc" or whatever.

RayoGlauco:
If you only need to distinguish XLS or XLSX formats, you can read the first bytes of the file:

D0 CF ... = XLS
50 4B ... = XLSX

There is info about filetype signatures here: https://en.wikipedia.org/wiki/List_of_file_signatures

edit: if you know your file can only be XLS or XLSX, you can read the first byte: if it is D0 then you have an XLS file, else you have a XLSX file. If the file can be another type, you need more tests, see the wikipedia link.

wp:
Not necessarily:

* FpSpreadsheet supports also the old Excel2 format which has the extension .xls, too, but begins with $09, $00, $04, $00.
* And the first bytes 'PK' (your $50 $4B) are common to all zipped files. Since both xlsx and ods are zipped xml files your xlsx file might be an ods as well.

Navigation

[0] Message Index

[#] Next page

Go to full version