Recent

Author Topic: XLSX document created by FPSpreadsheet not readable by LibreOffice  (Read 49601 times)

rvk

  • Hero Member
  • *****
  • Posts: 6953
When i create a .xlsx document with FPSpreadsheet i can open it in Excel but not with LibreOffice (latest version).

I can extract all files (from the .xlsx) en re-zip with 7-zip to a .xlsx file. After that i can read it with LibreOffice.
(so the internal files are ok)

Normal .ods created with FPSpreadsheet can also be read by LibreOffice.
(so i figured LibreOffice should be able to read the zipped .xlsx too)

Why would LibreOffice not be able to read a .xlsx created with the same zip-functions as an .ods (which it can read)?
(I get the standard Text Import in LibreOffice when trying to read a .xlsx)

(I have tried both the release version and the latest svn version)
« Last Edit: July 07, 2014, 07:31:34 pm by rvk »

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #1 on: July 07, 2014, 08:18:34 pm »
Please always post your Lazarus and FPC version. Those matter. (i.e. different zip routines are used depending on FPC versions).
In this case, please also specify LibreOffice version...

Strange; obviously LibreOffice expects something that fpspreadsheet does not provide at least with xlsx. It would also be helpful to upload an example program and even nicer if you could upload the faulty xlsx and the corrected xlsx...
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

rvk

  • Hero Member
  • *****
  • Posts: 6953
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #2 on: July 07, 2014, 08:47:23 pm »
This is with
Windows 7 Professional
Lazarus 1.2.4 r45510 FPC 2.6.4 i386-win32-win32/win64
   with fpspreadsheet revision 3294
LibreOffice Version: 4.2.5.2

Code to reproduce test.xlsx and test.ods in C:\TEMP (xlsx and ods files are attached)
(Clean project with one button)
Code: [Select]
uses
  fpspreadsheet;

procedure TForm1.Button1Click(Sender: TObject);
var
  MyWorkbook: TsWorkbook;
  MyWorksheet: TsWorksheet;
  i, j: integer;
begin
  MyWorkbook := TsWorkbook.Create;
  MyWorksheet := MyWorkbook.AddWorksheet('My Worksheet');
  for i := 0 to 40 do
    for j := 0 to 20 do
      MyWorksheet.WriteUTF8Text(i, j, 'aj' + IntToStr(i) + ' ' + IntToStr(j));
  MyWorkbook.WriteToFile('c:\temp\test' + STR_OOXML_EXCEL_EXTENSION, sfOOXML);
  MyWorkbook.WriteToFile('c:\temp\test' + STR_OPENDOCUMENT_CALC_EXTENSION, sfOpenDocument);
  MyWorkbook.Free;
end;

When i try to open test.xlsx in LibreOffice i get the Text Import dialog.
When i take test.xlsx and extract it to a directory i can rezip it with 7-zip.
(Resulting in attached test_rezipped.xlsx)

The test_rezipped.xlsx is readable by LibreOffice without problems.

So the structure of the files (in the xlsx) seem to be correct but there is something wrong with the way it is zipped. (and that is strange because as far as i can tell the .ods is zipped in the same way and that one is readable by LibreOffice.)

« Last Edit: July 07, 2014, 08:53:30 pm by rvk »

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #3 on: July 08, 2014, 12:30:34 am »
Thanks.
I see the same problem with Laz trunk r45807, FPC trunk x86, Win, fpspreadsheet r3296 with the .xlsx file in LibreOffice 4.2.5.2... which should be using the regular FPC zipper code instead of the fpspreadsheet-supplied fpszipper

Will investigate further...
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

rvk

  • Hero Member
  • *****
  • Posts: 6953
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #4 on: July 08, 2014, 11:07:51 am »
Wauw... I found the "bug"  :)  (my third bugfix-assist for fpspreadsheet in a week)

I tried to re-zip the files with this code to see if the order of the files within the zip would make a difference. But i accidentally used forward slashes instead of backslashes (as is normal in Windows).

AND IT WORKED !! (I could read the file in LibreOffice)

Code: [Select]
procedure TForm1.Button3Click(Sender: TObject);
var
  Zip: TZipper;
  Ts: TStringList;
begin
  SetCurrentDir('c:\temp\test\');
  Ts := TStringList.Create;
  Ts.Add('[Content_Types].xml');
  Ts.Add('_rels/.rels');
  Ts.Add('xl/_rels/workbook.xml.rels');  // <--- Oops... / = lucky typo ;)
  Ts.Add('xl/workbook.xml');
  Ts.Add('xl/styles.xml');
  Ts.Add('xl/sharedStrings.xml');
  Ts.Add('xl/worksheets/sheet1.xml');
  Zip := TZipper.Create;
  Zip.ZipFiles('c:\temp\test2.xlsx', Ts);
  Zip.Free;
  Ts.Free;
end;

Attached is the test.xlsx created by fpspreadsheet and a newly zipped test2.xlsx (zipped by zipper.pp).
If i scroll down in the files with a hexeditor and compare them you clearly see the only difference is the / and \.

The only file with slashes (and subdir) in an ODS-file is OPENDOC_PATH_METAINF_MANIFEST and that one is also defined with a forward slash (/) regardless of OS. The OOXML_PATH_xxxx's in xlsxooxml.pas are defined with the PathDelim (from sysutilh.inc) which is the system-delimiter. (Which is the backslash on Windows)

So LibreOffice has a problem with files zipped with backslahes (internally within the zip).
7-zip also zips files with forward slashes (that's why rezipping with 7-zip worked).

Is it a standard that all zip-files should have forward slashes as delimiter????
If so, then the Path variables (OOXML_PATH_xxxx) in xlsxooxml.pas are wrong.


Edit: Yeah... The OOXML_PATH_xxxx's should be adjusted with /.
(Or even better: the zipper-unit should replace the \ with /)

.ZIP File Format Specification

Quote
4.4.17 file name: (Variable)
4.4.17.1 The name of the file, with optional relative path.
    The path stored MUST not contain a drive or device letter, or a leading slash.  All slashes MUST be forward slashes '/' as opposed to backwards slashes '\' for compatibility with Amiga and UNIX file systems etc.
(source: http://www.pkware.com/documents/casestudies/APPNOTE.TXT)
« Last Edit: July 08, 2014, 11:21:27 am by rvk »

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #5 on: July 08, 2014, 11:25:49 am »
Interesting - thanks! I was figuring out the file structure and why there are 11 file entries in the 7zip versions but only 7 in the original one (probably directories). No need to go further, looks like ;)

I'll have a detailed look at your post/links later.

However, it's weird that Excel does open this file - perhaps they have a... how to put it... more Windows-centric idea of what path delimiters should look like ;)
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

rvk

  • Hero Member
  • *****
  • Posts: 6953
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #6 on: July 08, 2014, 12:28:29 pm »
It's probably that strictly speaking a backslash in a zip is forbidden but because some program's do use them, that Excel (and 7-zip as well as the standard zip of Windows) allow them to be read. (It would be easy to implement reading \ and interpreting them as /).

LibreOffice apparently is somewhat stricter in the reading of zipped .xlsx and does not allow backslashes in internal filenames. (Excel also creates them with the forward slash as it should be done)

(We could of course try to change LibreOffice to be somewhat easier on backslashes in zips but it would be easier to stay true to the standard and use forward slashes in xlsxooxml.pas or zipper.pp.)

I changed this in xlsxooxml.pas and it works. I can read the file in LibreOffice.
Code: [Select]
  PathDelim = '/'; // <-- overruling of the one used in sysutilh.inc
  { OOXML Directory structure constants }
  OOXML_PATH_TYPES     = '[Content_Types].xml';
  OOXML_PATH_RELS      = '_rels' + PathDelim;
  ...
(Of course instead of using the PathDelim in the constants we could just use / hardcoded same as it is done in fpsopendocument.pas but that's up to the experts :))

Maybe it would be better to change this at zipper.pp level (so it doesn't happen with other uses of this unit) but i'm not well enough known with the source to make that determination.
« Last Edit: July 08, 2014, 12:33:05 pm by rvk »

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12720
  • FPC developer.
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #7 on: July 08, 2014, 12:33:04 pm »
Probably zipper should have a force forward slashes option.


wp

  • Hero Member
  • *****
  • Posts: 13430
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #8 on: July 08, 2014, 12:41:14 pm »
Since currently there is only a writer for xlsx we definitely should hard-code the forward slash in order to get a file according to specification. BTW if even Excel writes forward slashes are there any programs at all which write a backslash?

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12720
  • FPC developer.
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #9 on: July 08, 2014, 12:57:24 pm »
BTW if even Excel writes forward slashes are there any programs at all which write a backslash?

Seems old winzips did.   http://www.info-zip.org/FAQ.html#backslashes

Still it would be wise to have a passthrough (write as user specificies), in case sb has to live with an app that really wants it.  But that can be disabled by default.

wp

  • Hero Member
  • *****
  • Posts: 13430
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #10 on: July 08, 2014, 09:06:40 pm »
Fixed in rev 3299 (replaced all "PathDelim" by "/").

@rvk: Hoping you find more of these. So far, I did not yet go into details of the xlsx files, but this will happen soon.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12720
  • FPC developer.
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #11 on: July 08, 2014, 09:20:54 pm »
Well,....   On a Dutch forum we were discussing escaping of HTML entities.  I think fpspreadsheet should do that for XLSX ?

wp

  • Hero Member
  • *****
  • Posts: 13430
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #12 on: July 08, 2014, 09:30:50 pm »
Which forum posting is that? Google will certainly translate it for me.

Maybe a stupid question: which kind of HTML entities do you mean?

rvk

  • Hero Member
  • *****
  • Posts: 6953
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #13 on: July 08, 2014, 09:42:42 pm »
That would have been my next topic  :)
(The dutch forum-topic is here)

When writing a &, < or > in a cell the .xlsx version doesn't work.
.ods does translate it correctly to &amp;

Example:
Code: [Select]
procedure TForm1.Button3Click(Sender: TObject);
var
  MyWorkbook: TsWorkbook;
  MyWorksheet: TsWorksheet;
  S: String;
begin
  MyWorkbook := TsWorkbook.Create;
  MyWorksheet := MyWorkbook.AddWorksheet('My Worksheet');

  S := 'Me & you < and >';
  MyWorksheet.WriteUTF8Text(0, 0, S);

  S := 'België';
  MyWorksheet.WriteUTF8Text(0, 1, S);

  MyWorkbook.WriteToFile('c:\temp\test' + STR_EXCEL_EXTENSION, sfExcel8, true);
  MyWorkbook.WriteToFile('c:\temp\test' + STR_OOXML_EXCEL_EXTENSION, sfOOXML);
  MyWorkbook.WriteToFile('c:\temp\test' + STR_OPENDOCUMENT_CALC_EXTENSION, sfOpenDocument);
  MyWorkbook.Free;
  Close;
end;

The "Me & you < and >" is correctly transferred to an .ods but i get problems with a .xlsx.
(LibreOffice doesn't show the .xlsx file at all and at this moment i don't have Excel handy)

I could translate the &, < and > myself with something like this:
Code: [Select]
S := StringReplace(s, '&', '&amp;', [rfReplaceAll]);
Then the .ods stays ok and the .xlsx is ok too but the .xls has &amp; as text.
« Last Edit: July 08, 2014, 10:06:10 pm by rvk »

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12720
  • FPC developer.
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #14 on: July 08, 2014, 09:44:30 pm »
Maybe a stupid question: which kind of HTML entities do you mean?

Characters like  & and > in cell texts

Forum: http://www.nldelphi.com/showthread.php?40666-fpspreadsheet-met-75000-regels/page2

which quotes

The only illegal characters are &, < and > (as well as " or ' in attributes).
They're escaped using XML entities, in this case you want &amp; for &.

from

http://stackoverflow.com/questions/730133/invalid-characters-in-xml

 

TinyPortal © 2005-2018