Recent

Author Topic: FPSpreadsheet SVN: reading files .ods  (Read 5772 times)

bonmario

  • Sr. Member
  • ****
  • Posts: 346
FPSpreadsheet SVN: reading files .ods
« on: September 02, 2015, 08:48:51 am »
Hi,
doing debugging of a program, I noticed something strange in the management of files .ods.
For each file .ods, these three files are handled: styles.xml, content.xml, settings.xml.
In each case, these operations are performed:
- The file XML is unzipped and saved to disk
- The file XML is read and saved to a stream
- The file XML is deleted

This management can create a problem if, for example, 2 programs that use fpspreadsheet accessing at the same time to 2 ods files present in the same directory.
Also, if a program were to analyze several files .ods, this administration would do several disk accesses, slowing the program.

I think it can be arranged by unzip the 3 files in a stream, without saving them to disk, as is explained here:
http://wiki.lazarus.freepascal.org/paszlib#Unzip_file_to_a_stream


Thanks in advance, Mario

wp

  • Hero Member
  • *****
  • Posts: 11916
Re: FPSpreadsheet SVN: reading files .ods
« Reply #1 on: September 02, 2015, 11:49:59 am »
I think it can be arranged by unzip the 3 files in a stream, without saving them to disk, as is explained here:
http://wiki.lazarus.freepascal.org/paszlib#Unzip_file_to_a_stream
Putting them into memory streams would mean duplication of memory usage of the program while reading. There are guys here in this forum who want to read really HUGE spreadsheet files; for them, this strategy is not a good idea.

But what you request is already contained in fpspreadsheet: Instead of using "ReadFromFile" you should call "ReadFromStream" which unzips directly to memory streams:

Code: [Select]
stream := TFileStream.Create(ASpreadsheetFileName, fmOpenReadWrite+fmShareDenyNone);
try
  myworkbook.ReadFromStream(stream, sfOpenDocument);
finally
  stream.Free;
end;

In order to avoid the ambiguity issue that you mention I am using a "GetUniqueTempDir" function now (--> r4310) which creates a uniquely named subfolder of the system's temp dir. Unzipped files are stored in this folder. Now it is possible to run multiple instances in parallel because every instance unzips to its own directory.

bonmario

  • Sr. Member
  • ****
  • Posts: 346
Re: FPSpreadsheet SVN: reading files .ods
« Reply #2 on: September 02, 2015, 02:03:34 pm »
In order to avoid the ambiguity issue that you mention I am using a "GetUniqueTempDir" function now (--> r4310) which creates a uniquely named subfolder of the system's temp dir. Unzipped files are stored in this folder. Now it is possible to run multiple instances in parallel because every instance unzips to its own directory.

Ok, tkanks, Mario

felipemdc

  • Administrator
  • Hero Member
  • *
  • Posts: 3538
Re: FPSpreadsheet SVN: reading files .ods
« Reply #3 on: September 07, 2015, 05:44:15 pm »
In order to avoid the ambiguity issue that you mention I am using a "GetUniqueTempDir" function now (--> r4310) which creates a uniquely named subfolder of the system's temp dir. Unzipped files are stored in this folder. Now it is possible to run multiple instances in parallel because every instance unzips to its own directory.

temporary files? Why is this needed? Are you absolutely sure this is necessary? Because I think FPC can handle zip totally in-memory.

wp

  • Hero Member
  • *****
  • Posts: 11916
Re: FPSpreadsheet SVN: reading files .ods
« Reply #4 on: September 07, 2015, 06:00:22 pm »
temporary files? Why is this needed? Are you absolutely sure this is necessary? Because I think FPC can handle zip totally in-memory.
Read my response above: "Putting them into memory streams would mean duplication of memory usage of the file while reading. There are guys here in this forum who want to read really HUGE spreadsheet files; for them, this strategy is not a good idea."

felipemdc

  • Administrator
  • Hero Member
  • *
  • Posts: 3538
Re: FPSpreadsheet SVN: reading files .ods
« Reply #5 on: September 08, 2015, 01:31:50 pm »
Read my response above: "Putting them into memory streams would mean duplication of memory usage of the file while reading. There are guys here in this forum who want to read really HUGE spreadsheet files; for them, this strategy is not a good idea."

Maybe in this case a switch could be provided to control the behavior. Creating files might be undesirable in many cases. In Android phones for example its an additional permission required, and space might be scarce.

wp

  • Hero Member
  • *****
  • Posts: 11916
Re: FPSpreadsheet SVN: reading files .ods
« Reply #6 on: September 08, 2015, 02:13:35 pm »
Files are created in the system's temp folder by calling osutils.GetTempDir. Is this an issue as well?

There is some kind of switch already, a bit hidden, though: If the file is read by means of "Workbook.ReadFromStream" via a TFileStream then all the unzipping is done to memory streams instead of temp files. I have to admit, though, that this is not very straightforward because everybody would call the direct "Workbook.ReadFromFile". I'll think about it...

felipemdc

  • Administrator
  • Hero Member
  • *
  • Posts: 3538
Re: FPSpreadsheet SVN: reading files .ods
« Reply #7 on: September 08, 2015, 04:50:29 pm »
I think it is a potential problem in some situations, and it should be easy to fix. Hidden solutions are often problematic as they might stop working in the future as its not clear that it should behave like that. It should be easy to make this configurable:

Either add a default valued parameter to ReadFromFile like AAllowCreatingTempFiles: Boolean = True
or
Add a new function ReadFromFile_NoTempFiles

wp

  • Hero Member
  • *****
  • Posts: 11916
Re: FPSpreadsheet SVN: reading files .ods
« Reply #8 on: September 11, 2015, 07:27:06 pm »
Ok - I switched readers/writers to use memory streams by default now, no more temporary files. If they run out of memory then the new workbook option "boFileStream" can be set which instructs the readers/writers to access data files directly by means of file streams and to create temporary files if required. This is an extension of the other workbook option "boBufStream" which uses a "buffered stream" - this is essentially a memory stream, but it swaps content to disk if the buffer becomes too small.

 

TinyPortal © 2005-2018