Recent

Author Topic: XLSX document created by FPSpreadsheet not readable by LibreOffice  (Read 49212 times)

wp

  • Hero Member
  • *****
  • Posts: 13398
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #60 on: July 10, 2014, 03:18:25 pm »
Quote
i would be more comfortable if someone took a critical look at my changes before implementing them
No problem, I'll have a look in the evening when I have more time and then upload it to svn.

I think I'll remove the LineEndings completely. Having it as an empty string in one unit is too confusing.

Quote
I'm not sure we can just take Length(S) to write to the buffer
I can fully understand your confusion. Strings have become the most complicated part of the once so simple language Pascal...

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12696
  • FPC developer.
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #61 on: July 10, 2014, 03:35:19 pm »
Quote
I'm not sure we can just take Length(S) to write to the buffer
I can fully understand your confusion. Strings have become the most complicated part of the once so simple language Pascal...

But also its strength. Anyway if you are not confused yet, have a look at http://www.stack.nl/~marcov/delphistringtypes.txt

rvk

  • Hero Member
  • *****
  • Posts: 6948
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #62 on: July 10, 2014, 04:05:24 pm »
No problem, I'll have a look in the evening when I have more time and then upload it to svn.
I used rev.3304 for this.
Attached my last one with TStream in the header instead of TMemoryStream (in preparation of an TFileStream or another TStream variant)
Also in the zip an extra .diff and a screenshot of the differences between my version and the rev.3304 one.
(.diff made with ExamDiff. I'm not sure yet how to make a "proper" patch-file with svn ;D).


To be done (by you at some point):
  • removing all LineEnding because they are not necessary in XML
  • removing all extra trailing and leading spaces for the same reason and to get clean small xml's
  • some rudimentary testing
  • implementing something similar for fpsopendocument.pas

At a later stage we could look at the possibility to use TFileStream but we need to find one with some better buffering mechanism because the standard one is writing everything to file piece by piece and slow.

edit: btw. All the other strings (like FContentTypes, FRelsRels, FWorkbookRelsString, FWorkbookString and FStylesString) could of course also be converted to TStream. In the header these TStreams are already defined (like FSContentTypes... etc... while they are only used in WriteToStream). So why not put them to use at the beginning so there is no need for the strings anymore? They don't have much impact speed-wise but it would be more consistent in the code.
« Last Edit: July 10, 2014, 05:25:57 pm by rvk »

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12696
  • FPC developer.
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #63 on: July 10, 2014, 05:42:15 pm »
fyi: Attached is the unit I use at work to get Lazarus code running under Delphi. (XE/XE3)

rvk

  • Hero Member
  • *****
  • Posts: 6948
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #64 on: July 10, 2014, 05:53:10 pm »
edit: btw. All the other strings (like FContentTypes, FRelsRels, FWorkbookRelsString, FWorkbookString and FStylesString) could of course also be converted to TStream. In the header these TStreams are already defined (like FSContentTypes... etc... while they are only used in WriteToStream). So why not put them to use at the beginning so there is no need for the strings anymore? They don't have much impact speed-wise but it would be more consistent in the code.
I just changed all FContentType etc. strings to TMemoryStream's. (attached)

Like i expected it doesn't change anything time-wise but maybe you'll find the code more consistent.
(also used a AppendToStream like discussed but it's still in the xlsxooxml.pas so may need to be moved to the string-unit)

I'll leave the rest up to you... like cleaning up the LineEnding's.

I'm still no expert in Lazarus/FPC (i'm more used to Delphi and its IDE) but i'm getting the hang of it  :D)


Edit: Haha... didn't know FPC also had class helpers.
I changed the AppendToStream to TStream.AppendString(S) like marcov suggested.
In xlsxooxml_withstreamclasshelper.zip
« Last Edit: July 10, 2014, 06:04:29 pm by rvk »

wp

  • Hero Member
  • *****
  • Posts: 13398
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #65 on: July 10, 2014, 06:08:43 pm »
I did not see all these last postings while working with the ooxml, sorry, I just uploaded my version based on rvk's a few postings above and with the following modifications:
  • Use AppendToStream to compose the xml strings in a stream
  • Maybe a little unusual, but I indented the strings to be written in the second line of the AppendToStream in order see their hierachy level.
  • Provide a variable for the stream class. This prepares for later switching to a file stream or for going to Marco's idea with the blocks. All (well, most) streams are created and destroyed in a single method.
  • Line endings removed
  • Leading spaces removed.
  • Files can be opened by Excel2007 and LibreOffice (don't have OpenOffice here)
  • Speed seems to be at the same level as mentioned by rvk (although I cannot reproduce his numbersexactly due to hardware differences, of course).

rvk

  • Hero Member
  • *****
  • Posts: 6948
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #66 on: July 10, 2014, 06:31:46 pm »
It works (rev.3305)
Speed is good now.

10.000x100 cells 5 seconds
20.000x100 cells 10 seconds
30.000x100 cells 15 seconds
40.000x100 cells 20 seconds
So as far as i can see about 5 seconds per 10.000x100 cells.

LibreOffice is a bit slow opening large files (from .xlsx) but it always has been.
Even Excel has some slowdown opening/writing a 40.000x100 cells spreadsheet.

For now i think we can be happy with the current speed-boost  :) :)

(I wonder if you can get the same speed-boost with sfOpenDocument  %))

wp

  • Hero Member
  • *****
  • Posts: 13398
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #67 on: July 10, 2014, 08:36:51 pm »
@Marco, as I said above I did not read many of todays postings carefully enough. Two questions:
  • Do you know an example of a memory stream which works with blocks?
  • What is the advantage of using a class helper except for the elegance of extending an existing class? Is there a speed or memory advantage? The currently used, inline procedure "AppendToStream" is called millions of times, so I don't want to sacrifice anything for elegance.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12696
  • FPC developer.
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #68 on: July 10, 2014, 09:01:08 pm »
wp: don't expect performance improvements. In theory when inlined there shouldn't be a difference at all I guess.

rvk

  • Hero Member
  • *****
  • Posts: 6948
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #69 on: July 10, 2014, 10:04:15 pm »
  • Do you know an example of a memory stream which works with blocks?
I'm still looking for a TMemoryStream which would swap its data to disc if for example it reaches 80% of the available memory (or something).

I did find a TFileStream example where there is a buffer before writing chunks to disc but didn't try to implement it yet.
(TaaWriteBufferFilter but that source is a bit broken and i couldn't find an official source. Maybe here)
I have yet to look at tJCLBufferedStream.

One downside is that you would need to determine in advance if you would like to use memory or disc. I would much rather have a sort of automatic mechanism of backup to disc only when needed in a TMemoryStream descendant which could be disabled/enabled at a given threshold. (It would depend on the performance of a replacement TFileStream i need to find. If that one is almost as fast as TMemoryStream there would be no need to do this)

Also this article also has some nice information about TMemoryStream and its possible OutOfMemory errors. I' not sure if the inner workings are the same for FPC but it's an interesting read.

(I wonder if Windows shouldn't be able to swap some of the TMemoryStream to disc when needed)

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12696
  • FPC developer.
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #70 on: July 11, 2014, 09:14:28 am »
In 64-bit you could consider memory mapping. If you can set its affinity to keep it in memory but make it the first to be swapped, that would be a good fit for our problem (if memory enough, it stays in mem). In 32-bit too, but since memory mapping takes address space (even if not mapped by memory)

OTOH how that practically works out (and then specially the multi platform angle) is hard to say, you risk the system paging everything in just to unmap etc.

Basically that last .NET url is the same as my block proposal, only they use 4kb blocksize, and I took 1MB as initial blocksize.  (probably best is to use n*4k on 32-bit and n*8k on 64-bit with n adjustable). I'll see if I can whip up something in the weekend. But that is still all in memory.
« Last Edit: July 11, 2014, 09:35:29 am by marcov »

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #71 on: July 11, 2014, 01:50:58 pm »
1. The fpc zip code only allowed \ when running on Windows, not on Unix, so it wasn't supported cross-platform anyway

True, but if the user delivers backspaces on *nix he does it with a purpose. So following that logic you wouldn't even have to force it.
Sorry, I don't understand what you're saying here. Do you mean backslashes? Both the current code and my patch leave backslashes alone on unix.
Asking windows users to use / to specify paths for files to be zipped is very unexpected and I think most people would see it as a bug, if that was your suggestion. Converting \ to / seems quite sensible.

Quote
2. The standard requires use of / and the info-zip page does not mention that those winzips could not read the standard / but absolutely required the \.

Flogging a standard at a binary a previous employee of your company left you without source is kinda useless.
Building in elaborate backward compatibility mechanisms to match non-standard behaviour that has not even been seen in the wild is not good either. As said, those binaries should not even be able to accept / and there's no evidence of any of that.

IMO, adding an option that allows people to deviate from the standard only for very specific cases is a bit too conservative (and prone to error due to people tweaking that option without clue).
If backward compability is needed, people can take a private copy of the zipper code; see suggested remarks for user changes trunk page in bug report.

Quote
3. With current trunk, if an application adds > <large amount of files> (sorry, can't remember count), currently zip64 format is automatically used. Those old utilities will not support that and error out anyway.

Good point. So that needs a switch to disable too.
No, because falling back to the previous behaviour of silently corrupting the offset pointers in the zip file is not an improvement ;) IIRC, I indicated that when submitting the original patch.

Quote
Currently working on a patch including test cases that does this...
Of course, if you do want to add some override to it... well, you're one of the guys who have commit rights there, so I'm certainly open to suggestions ;)

IIRC I also committed zip64? :-)
Yes, that's why I'm optimistic ;)
Have a look at
http://bugs.freepascal.org/view.php?id=26468
please.

Thanks.
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #72 on: July 11, 2014, 02:53:14 pm »
Have a look at
http://bugs.freepascal.org/view.php?id=26468
please.
Michael Van Canneyt just committed it; thanks!
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

wp

  • Hero Member
  • *****
  • Posts: 13398
Re: XLSX document created by FPSpreadsheet not readable by LibreOffice
« Reply #73 on: July 12, 2014, 11:55:36 pm »
Direct writing of xml strings to a memory stream has now been implemented also for the ODS writer (rev 3311). Again extreme speed improvement: 20.000 x 100 cells write in 7 seconds now, compared to 120 with the old code.

 

TinyPortal © 2005-2018