Recent

Author Topic: XLSX document created by FPSpreadsheet not readable by LibreOffice  (Read 49600 times)

wp

  • Hero Member
  • *****
  • Posts: 13430
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #30 on: July 09, 2014, 01:00:37 pm »
GetCellCountInRow = 0 seems to check if there is any cell in a given row. Therefore I replaced this by "GetFirstCellOfRow" which runs along the row and stops seeking when the first cell is found - this should be faster, but have the same effect. In addition, this method called "GetLastColIndex" which again iterates through all cells. Therefore I am caching its result in FLastColIndex which is updated whenever a new cell is added or when GetLastColIndex is called with the parameter AForceCalculation = true.

I did some tests, but did not see that increase which you mention.

Could you try it yourself? I am attaching the modified fpspreadsheet.pas and fpsopendocument.pas.

wp

  • Hero Member
  • *****
  • Posts: 13430
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #31 on: July 09, 2014, 01:54:56 pm »
I should publish sources only after full testing... Fixed bugs causing incorrect value of cached last column/row index. The xlsxooxml test runs a bit faster afterwards, but maybe that's just wishful thinking. At least, the unit test does not show any errors now.

rvk

  • Hero Member
  • *****
  • Posts: 6953
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #32 on: July 09, 2014, 02:23:24 pm »
I did some tests, but did not see that increase which you mention.
...
Could you try it yourself? I am attaching the modified fpspreadsheet.pas and fpsopendocument.pas.

With the original revision 3301 i only did these two changes i mentioned:
Code: [Select]
2785. if false and (ASheet.GetCellCountInRow(r) = 0) then begin
2839. // Result := Result + FCellContent;

I get the following results (all columns are x100):
5.000 took 0 seconds (well rounded to 0)
10.000 took 1 seconds
20.000 took 2 seconds

With your changes i get (so both Result := Result + and GetFirstCellOfRow stay intact):
5.000 took 10 seconds
10.000 took 40 seconds
20.000 took 160 seconds

So i stripped out the line 2841 again (the Result := Result + ).
Then i got:
5.000 took 3 seconds
10.000 took 14 seconds
20.000 took 59 seconds

So again i stripped out the GetFirstCellOfRow in line 2785:
Code: [Select]
if false and (ASheet.GetFirstCellOfRow(r) = nil) then begin5.000 took 0 seconds
10.000 took 1 seconds
20.000 took 2 seconds

You don't have these kind of delays at all??? (with sample code below)
I understand (if you don't have these kind of delays) debugging them is an almost impossible task.

If this is something specific to my computers you wouldn't even know where to begin.
I'll try to figure out where in those function the trouble is for me but i'm guessing somewhere in the TAVLTreeNode.


Sample code:
Clean form whith one button and one label:
Code: [Select]
uses
  fpspreadsheet, LclIntf;

procedure TForm1.Button1Click(Sender: TObject);
var
  MyWorkbook: TsWorkbook;
  MyWorksheet: TsWorksheet;
  i, j: integer;
  Tm: DWORD;
begin
  MyWorkbook := TsWorkbook.Create;
  MyWorksheet := MyWorkbook.AddWorksheet('My Worksheet');
  for i := 0 to 20000 do
  begin
    Label1.Caption := IntToStr(i);
    Application.ProcessMessages;
    for j := 0 to 100 do
    begin
      MyWorksheet.WriteUTF8Text(i, j, 'aj' + IntToStr(i) + ' ' + IntToStr(j));
    end;
  end;
  ShowMessage('ready #1');
  Tm := GetTickCount;
  MyWorkbook.WriteToFile('c:\temp\test' + STR_OPENDOCUMENT_CALC_EXTENSION, sfOpenDocument);
  //MyWorkbook.WriteToFile('c:\temp\test' + STR_OOXML_EXCEL_EXTENSION, sfOOXML);
  //MyWorkbook.WriteToFile('c:\temp\test' + STR_EXCEL_EXTENSION, sfExcel8, true);
  ShowMessage('ready #2 ' + inttostr((GetTickCount - Tm) div 1000));
  MyWorkbook.Free;
  Close;
end;
« Last Edit: July 09, 2014, 02:36:28 pm by rvk »

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #33 on: July 09, 2014, 02:30:36 pm »
BTW if even Excel writes forward slashes are there any programs at all which write a backslash?

Seems old winzips did.   http://www.info-zip.org/FAQ.html#backslashes
Looking at http://web.archive.org, that page started to get that backslash info between beginning of 2004, and beginning of 2005. Even then, those winzips were already "old".

Still it would be wise to have a passthrough (write as user specificies), in case sb has to live with an app that really wants it.  But that can be disabled by default.
1. The fpc zip code only allowed \ when running on Windows, not on Unix, so it wasn't supported cross-platform anyway
2. The standard requires use of / and the info-zip page does not mention that those winzips could not read the standard / but absolutely required the \.
3. With current trunk, if an application adds > <large amount of files> (sorry, can't remember count), currently zip64 format is automatically used. Those old utilities will not support that and error out anyway.

Adding an option that allows people to deviate from the standard only for very specific cases seems a bit too conservative (and prone to error due to people tweaking that option without clue).

Edit: updated typos. Thanks rvk
I'd suggest still accepting \ on Windows for reading (as it is currently done) but forcing / on writing on Windows as well, then document the more strict adherence on FPC User changes trunk.

Currently working on a patch including test cases that does this...
Of course, if you do want to add some override to it... well, you're one of the guys who have commit rights there, so I'm certainly open to suggestions ;)
« Last Edit: July 09, 2014, 02:57:51 pm by BigChimp »
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #34 on: July 09, 2014, 02:33:05 pm »
If this is something specific to my computers you wouldn't even know where to begin.
I'll try to figure out where in those function the trouble is for me but i'm guessing somewhere in the TAVLTreeNode.
You're doing such a good job but perhaps you weren't aware of this
http://wiki.lazarus.freepascal.org/FPProfiler
http://wiki.lazarus.freepascal.org/Profiling
in case you didn't, just FYI...

Groeten,
BigChimp
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

rvk

  • Hero Member
  • *****
  • Posts: 6953
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #35 on: July 09, 2014, 02:49:42 pm »
I'd suggest still accepting / on Windows for reading (as it is currently done) but forcing \ on writing on Windows as well, then document the more strict adherence on FPC User changes trunk.
I take it you mean the other way around? Accepting \ (backslash) for reading and forcing / (forward slash) for writing on Windows because the zip-standard nowadays is / for zips (for all platforms).
(otherwise we would still get in trouble with LibreOffice which only accepts / (forward slash) on all platforms)

in case you didn't, just FYI...
I wasn't aware of profiling options in Lazarus (I'm mainly a "Delphi"-guy  %)) but i'll definitely take a look.

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #36 on: July 09, 2014, 02:54:11 pm »
(otherwise we would still get in trouble with LibreOffice which only accepts / (forward slash) on all platforms)

That is something that needs to be reported on LibreOffice and no forcing linux path delimeters on windows users is not acceptable just because of libreoffice.
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #37 on: July 09, 2014, 02:58:23 pm »
I'd suggest still accepting / on Windows for reading (as it is currently done) but forcing \ on writing on Windows as well, then document the more strict adherence on FPC User changes trunk.
I take it you mean the other way around? Accepting \ (backslash) for reading and forcing / (forward slash) for writing on Windows because the zip-standard nowadays is / for zips (for all platforms).
(otherwise we would still get in trouble with LibreOffice which only accepts / (forward slash) on all platforms)
Yes, thanks, sorry for the typo, updated the post.
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #38 on: July 09, 2014, 03:00:01 pm »
That is something that needs to be reported on LibreOffice and no forcing linux path delimeters on windows users is not acceptable just because of libreoffice.
I think you're missing the fact that we're talking about how paths are stored inside ZIP files (xlsx and ods are zip files containing defined data files). These require / as path delimiter; see rvk's post linking to the standard (appnote.txt).

fpspreadsheet has already been changed to use / instead of \ so LibreOffice correctly opens fpspreadsheet-generated xlsx files - see wp's post on that in this thread.

Edit:
However, if you want to report the bug that LibreOffice won't open an invalid xlsx file (because it has \ instead of /) while Excel will open it (see first posts in this thread)... well, please feel free.
« Last Edit: July 09, 2014, 03:02:40 pm by BigChimp »
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

rvk

  • Hero Member
  • *****
  • Posts: 6953
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #39 on: July 09, 2014, 03:00:46 pm »
That is something that needs to be reported on LibreOffice and no forcing linux path delimeters on windows users is not acceptable just because of libreoffice.
The standard of path-delimiters within zip-files is / nowadays regardless of OS.
So forcing / in zipper.pp would not be "forcing linux path delimiters on Windows users" but it would be enforcing the standard of zip-format for Windows users in zips created by zipper.
« Last Edit: July 09, 2014, 03:02:46 pm by rvk »

wp

  • Hero Member
  • *****
  • Posts: 13430
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #40 on: July 09, 2014, 04:50:32 pm »
Coming back to the speed measurement:

rvk, I am using your test program now. I'm at a slow computer at the moment, therefore I do only 5000 rows. Here are my results:

* with "GetFirstCellOfRow" (line 2785), with "Result := ..." (line 2841) --> 25 sec
* "GetFirstCellOfRow" deactivated by "and" with "false", with "Result := ..." --> 25 sec
* with "GetFirstCellOfRow", "Result := .." commented out --> 1 sec
* "GetFirstCellOfRow" deactivated, "Rest := ..." commented out --> 1 sec

In an additional test, I just exit the WriteCellCallback immediately after entering it, such that the many string concatenations mentioned above are not executed --> 0sec

So: it is clear to me: the reason for the slow writing performance is in the many string operations of extremely long strings.

Give me some time until I have a preliminary version of xlsxooxml. I guess we will be very surprised.

rvk

  • Hero Member
  • *****
  • Posts: 6953
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #41 on: July 09, 2014, 05:30:33 pm »
So: it is clear to me: the reason for the slow writing performance is in the many string operations of extremely long strings.
Ok... so you're not seeing the same delays as me. No problems.
(If they continue after the changes i'll investigate further. or before if i get some time)

Give me some time until I have a preliminary version of xlsxooxml. I guess we will be very surprised.
Can't wait... but take your time... good programming needs to be done with care :D

Then the only problem left (my problem) would be the memory hungry TsWorksheet.
I'm wondering if parts of something like that couldn't be swapped to disc if not used.
(I imagine Excel and LibreOffice would do the same)
I'll try to gather some info on that possibility (and work on that idea some more).

If the DOM version is more optimal with memory you could even completely replace the whole TCell-structure with nodes. In that case you would read the properties of a cell from the cell-node. Default values would not take any space. Reading files into it and changing data and writing back would result in exactly the same file (with changes) but unsupported features would stay intact. Don't worry.... I'm daydreaming a little bit  :) That would mean a massive rewrite of the core TsWorksheet and have problems with cross-version (ods/xls/xlsx, etc).

But while daydreaming that... The TCell structure could be transformed to a sort of TList of properties. When you don't need a certain property it won't take any space because it's not defined yet in that cell. If i don't use Border, BorderStyles, BackgroundColor etc it wouldn't take space in TCell. It would mean a TCell could shrink from a minimum of 112 bytes to just a few bytes (if only a few properties are used). (Just spitballing  ;D)

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12720
  • FPC developer.
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #42 on: July 09, 2014, 11:12:17 pm »
You don't have these kind of delays at all??? (with sample code below)

Note that the delays are depending on the chance that the memorymanager has to actually reallocate the block if the current runs out of space.

This means that switching memorymanager can have a big effect. (sharemem, cmem or default of FPC 2.6 or 2.7)

 

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12720
  • FPC developer.
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #43 on: July 09, 2014, 11:21:00 pm »
1. The fpc zip code only allowed \ when running on Windows, not on Unix, so it wasn't supported cross-platform anyway

True, but if the user delivers backspaces on *nix he does it with a purpose. So following that logic you wouldn't even have to force it.

Quote
2. The standard requires use of / and the info-zip page does not mention that those winzips could not read the standard / but absolutely required the \.

Flogging a standard at a binary a previous employee of your company left you without source is kinda useless. I

Quote
3. With current trunk, if an application adds > <large amount of files> (sorry, can't remember count), currently zip64 format is automatically used. Those old utilities will not support that and error out anyway.

Good point. So that needs a switch to disable too.

Quote
Adding an option that allows people to deviate from the standard only for very specific cases seems a bit too conservative (and prone to error due to people tweaking that option without clue).

It's a matter of chance. I don't see it so far fetched. Moreover those sleeping booleans are not that bad. Inited to false by default, two extra bytes for the class, a handfull of places to test them, one or two instructions each.

Quote
Currently working on a patch including test cases that does this...
Of course, if you do want to add some override to it... well, you're one of the guys who have commit rights there, so I'm certainly open to suggestions ;)

IIRC I also committed zip64? :-)

wp

  • Hero Member
  • *****
  • Posts: 13430
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #44 on: July 09, 2014, 11:39:04 pm »
Totally confused...

In xlsxooxml.pas I replaced the cell writing string operations by DOM nodes. But it's getting worse! In rvk's speed test the old code takes about 15 sec for saving 10000 rows, the task manager shows a little increase of memory usage. The new code which I am adding in the attachment, however, needs about 35 sec with a clear increase of memory usage.



 

TinyPortal © 2005-2018