Recent

Author Topic: Using fpspreadsheet to write huge xlsx files  (Read 39963 times)

vfclists

  • Hero Member
  • *****
  • Posts: 1165
    • HowTos Considered Harmful?
Re: Using fpspreadsheet to write huge xlsx files
« Reply #15 on: July 12, 2014, 12:31:52 am »

Does FPSpreadsheet have its own built-in expression evaluator, or does it just enter the values and  formulae for the calculations to be done by Excel, LibreOffice etc?

I asked a related question http://forum.lazarus.freepascal.org/index.php/topic,25154.msg152320 and wonder if it has an expression evaluator that can be used independently.

Lazarus 3.0/FPC 3.2.2

wp

  • Hero Member
  • *****
  • Posts: 13433
Re: Using fpspreadsheet to write huge xlsx files
« Reply #16 on: July 12, 2014, 12:56:18 am »
No, there is no expression evaluator in fpspreadsheet. Not sure yet if I'll dare to write one...

What do you want it for? Write your own spreadsheets with formulas? Then you could compose a formula as an RPN formula - don't worry, it is easier as it sounds... The basics are explained in the wiki (http://wiki.lazarus.freepascal.org/FPSpreadsheet#RPN_Formulas). This is working for writing and reading, the rpn formulas can even be calculated. Advantage: available only for the old xls files.

I answered your other posting with a link to a tutorial for using "fpexprpars". This can be used independently and is very flexible. There is also TFPMathExpressionBridge which provides a wrapper around "fpexprpars". Or you could use "symbolic" which comes also along with fpc. Just look for this keyword here in the forum, here have been some related discussions.

vfclists

  • Hero Member
  • *****
  • Posts: 1165
    • HowTos Considered Harmful?
Re: Using fpspreadsheet to write huge xlsx files
« Reply #17 on: July 12, 2014, 10:12:56 am »
I was hoping for something well used by Lazarus developers.

I have decided on ArtFormula for the time being which the author announced here http://forum.lazarus.freepascal.org/index.php?topic=18003.0
Lazarus 3.0/FPC 3.2.2

wp

  • Hero Member
  • *****
  • Posts: 13433
Re: Using fpspreadsheet to write huge xlsx files
« Reply #18 on: July 14, 2014, 05:19:20 pm »
After giving up the idea with the style index formatting of virtual cells becomes very easy: Just format some cells in the worksheet as a template as needed in virtual mode. In the example provided with fpspreadsheet ("other/test_virtualmode"), for example, all cells in the first row have a bold type face and a gray background; this is achieved by

Code: [Select]
var
  templateCell: PCell;
 ...
  worksheet.WriteFontStyle(0, 0, [fssBold]);
  worksheet.WriteBackgroundColor(0, 0, scGray);
  templateCell := worksheet.FindCell(0, 0);  // you can also call GetCell here

The event OnNeedData now has an additional parameter "AStyleCell" in which you can specify the format of the currently saved cell:

Code: [Select]
procedure TSomething.NeedCellData(Sender: TObject; ARow,ACol: Cardinal;
  var AValue: variant; var AStyleCell: PCell);
begin
  if ARow = 0 then begin
    AStyleCell := templateCell;
  ....

It is important that the style cell belongs to the workbook being saved because the writer collects all styles contained in the workbook before writing. So, please don't create style cells on the fly within the NeedCellData event handler!

The price to be paid for this approach is a small loss of speed because the style cell has to be searched within the list of all formatting styles in order to get the style index. But if you don't have too many of the style cells I'd guess that the difference won't be noticed at all.

rvk

  • Hero Member
  • *****
  • Posts: 6953
Re: Using fpspreadsheet to write huge xlsx files
« Reply #19 on: July 14, 2014, 05:52:03 pm »
What if i wanted the last row (summary or something) in a different color (say light-yellow) which is not used in the above cells?
Can i use FindCell for rows that aren't yet filled in? (because i need to do this before the NeedCellData)

And what if i want a light-red background for all negative numbers?
I don't know yet what cell-coordinate these are so i can't define a template-cell beforehand.
(And i can hardly fill in the first row with these templates)

Isn't there a possibility to define a "virtualcell" (not used in the sheet or in the cells above) with formatting which you can use during filling the cells?

Mike.Cornflake

  • Hero Member
  • *****
  • Posts: 1269
Re: Using fpspreadsheet to write huge xlsx files
« Reply #20 on: July 14, 2014, 06:03:40 pm »
Would the following work?  Create a second worksheet.  Set up your template cells in that spreadsheet.  Then use NeedCellData to populate the first worksheet?
And maybe delete the second worksheet after all the NeedCellData stuff is finished?   Dunno, just an idea...
Lazarus Trunk/FPC latest fixes on Windows 11
  I'm getting old and stale.  Slowly getting used to git, I'll get there...

wp

  • Hero Member
  • *****
  • Posts: 13433
Re: Using fpspreadsheet to write huge xlsx files
« Reply #21 on: July 14, 2014, 06:11:28 pm »
To which cells the template cells are applied is determined in the OnNeedCellData handler. So, if you want the last row in yellow you say

Code: [Select]
var
  yellowCell: PCell;

  // you can write the yellow style anywhere in the worksheet, say in A2
  worksheet.WriteBackground(1, 0, scYellow);
  yellowCell := worksheet.FindCell(1, 0);
...

procedure TSomething.NeedCellDataHandler(Sender: TObject; ARow, ACol: Cardinal;
  var AValue: variant; AStyle: PCell);
begin
  if ARow = workbook.VirtualRowCount-1 then
    AStyle := yellowCell;
  ...

Quote
red background for all negative numbers?
You define a cell with red background and in OnNeedCellData you look at the number value retrieved from the database (as an example); if it is negative you use the cell with the red background as a style cell, otherwise you can use another one, or nil for default formatting.

Quote
possibility to define a "virtualcell" (not used in the sheet or in the cells above)
No. Because when the cell is written you have to specify the style index, this is the index of the style in the list of known formats. This list is built by the writer before beginning to write the cells and stores it to file. As we have discussed before it is not possible to modify this list later in case of BIFF format (except you pay me for excessive debugging hours...)

Quote
Set up your template cells in that spreadsheet.  Then use NeedCellData to populate the first worksheet?
I think that this should work because the writer collects the styles from all sheets of the workbook. Just like in xlsx file format: there is a styles.xml which is valid for all sheets.

rvk

  • Hero Member
  • *****
  • Posts: 6953
Re: Using fpspreadsheet to write huge xlsx files
« Reply #22 on: July 14, 2014, 06:56:34 pm »
Would the following work?  Create a second worksheet.  Set up your template cells in that spreadsheet.  Then use NeedCellData to populate the first worksheet?
And maybe delete the second worksheet after all the NeedCellData stuff is finished?   Dunno, just an idea...
Haha... I just figured it out...
It's a bit counter-intuitive but if you use WriteFontStyle for cell 0,0 etc.... you create the template cells.
After that you're using NeedCellData to actually populate these cell's and the data (and styles in it) will be deleted and replaced with the actually data and link to the style-index.
(It wasn't clear to me you used the sheet at first for creating the templates after which these will be destroyed and replaced.)

I'm not saying the building of the templates should be done during population (in NeedCellData) but i still think the building of the styles could be done virtually (without using the worksheet) but that would mean rewriting the gathering of the styles (not from the sheet but from memory) and i'm not sure if that would mess up the other formats.

But the way you did it now works good too. (It's just some getting used to using the first cells of the sheet when these will be thrown away)

There could even be a small object written which does all this work for you (without you needing to mess with the sheet yourself).
It could use the first row of the sheet and populating it with the styles you want.
It could use template-names for easy access.

Code: [Select]
MyTemplates := TTemplateCells.Create;
MyTemplates.WriteFontStyle('just_bold',  [fssBold]);
MyTemplates.WriteFontStyle('bold_and_gray',  [fssBold]);
MyTemplates.WriteBackground('bold_and_gray',  scGray);
MyTemplates.WriteFontStyle('negative',  [fssBold]);
MyTemplates.WriteBackground('negative',  scRed);
MyTemplates.PopulateTemplateCells; // <- this will populate an internal PCell structure
and in NeedCellData
Code: [Select]
...
AData := myValue
if ARow = 0 then AStyle := MyTemplates.GetTemplate('just_bold');
if myvalue < 0 then  AStyle := MyTemplates.GetTemplate('negative');
...
Internally in that object the WriteFontStyle would look up 'just_bold' in it's own index (TStringList) to see which row/column and set the style in the worksheet there.

Just a thought for someone (maybe me) to create later, because that would stand apart from fpspreadsheet itself.
(It would just be a small helper-object)
« Last Edit: July 14, 2014, 06:59:30 pm by rvk »

wp

  • Hero Member
  • *****
  • Posts: 13433
Re: Using fpspreadsheet to write huge xlsx files
« Reply #23 on: July 14, 2014, 08:52:26 pm »
Well - the template cells are not destroyed, they are regular cells of the worksheet. They just lend their style to the virtual cells written. Maybe it is clearer when I post some code. This is some simplified code of the WriteWorksheet method of TsSpreadOOXMLWriter:

Code: [Select]
procedure TsSpreadOOXMLWriter.WriteWorksheet(CurSheet: TsWorksheet);
var
  r, c: Cardinal;  // row and column index
  value: Variant;  // value to be used in the virtual cell
  styleCell: PCell;  // PHYSICAL cell to provide the format, it is part of the workbook, its position and value are ignored
  lCell: TCell;      // VIRTUAL cell being written - it does not belong to the workbook
begin
  WriteXMLHeaderForWorksheet;
 
  if (woVirtualMode in Workbook.WritingOptions) and Assigned(Workbook.OnNeedCellData)
  then begin
    for r := 0 to Workbook.VirtualRowCount-1 do begin
      for c := 0 to Workbook.VirtualColCount-1 do begin
        // Reset the virtual cell to default values
        FillChar(lCell, SizeOf(lCell), 0); 

        // Provide default values for return of event handler
        value := varNull;
        styleCell := nil;

        // call event handler; this overwrites the defaults of the previous step
        Workbook.OnNeedCellData(Workbook, r, c, value, styleCell);

        // if the event handler returned a style cell the style cell is copied over to the virtual cell
        if styleCell <> nil then
          lCell := styleCell^;

        // We set row and column index of the virtual cell such that it appears in the file where we want it to be.
        lCell.Row := r;
        lCell.Col := c;

        // Then we analyze the value provided by the event handler and, depending on its type,
        // we put the value into the correct content field of the virtual cell.
        if VarIsNull(value) then
          lCell.ContentType := cctEmpty
        else
        if VarIsNumeric(value) then begin
          lCell.ContentType := cctNumber;
          lCell.NumberValue := value;
        end else
          // etc
 
        // Now we have everything we need to write the virtual cell to the stream. Depending on the
        // ContentType of the cell WriteCallBack calls the writing code for Number, Text etc. cells.
        // Note that this does not create cells in the worksheet.
        WriteCellCallback(@lCell, FSSheets[FCurSheetNum]);
      end;
    end;
  end else
    // Write regular ("physical") cells
    ...

rvk

  • Hero Member
  • *****
  • Posts: 6953
Re: Using fpspreadsheet to write huge xlsx files
« Reply #24 on: July 14, 2014, 09:19:32 pm »
Well - the template cells are not destroyed, they are regular cells of the worksheet. They just lend their style to the virtual cells written.
Yeah... it was already clear to me you used the original structure of the worksheet (the same as you use without the woVirtualMode) for the styling cells. And the wording "destroyed" was wrong of me... (I already dove into the source) It's just... when using the woVirtualMode you shouldn't consider the (cells in the) worksheet as actually (cells in the) worksheet anymore but as a style container. So... that takes some getting used to  :D

wp

  • Hero Member
  • *****
  • Posts: 13433
Re: Using fpspreadsheet to write huge xlsx files
« Reply #25 on: July 14, 2014, 11:48:27 pm »
VirtualMode now has found its way into the fpspreadsheet wiki.

rvk

  • Hero Member
  • *****
  • Posts: 6953
Re: Using fpspreadsheet to write huge xlsx files
« Reply #26 on: July 15, 2014, 09:57:50 am »
VirtualMode now has found its way into the fpspreadsheet wiki.
I take it woVirtualMode hasn't found it's way in fpsopendocument.pas yet...
(But you've already been extremely busy the last few days ;))

I did notice a blowup in time using woSaveMemory with the sfExcel8 format.
I haven't gotten profiler to work yet so i don't know where the reason lies but will investigate.
Or it is because writing small-pieces every time is very slow.
In that case sfExcel8 doesn't lend itself so good for using woSaveMemory because without it, it is lightning fast.

(Maybe we still need a TFileStream with a buffering mechanisme)

Code: [Select]
Running: Building TsWorkbook and Writing 3 formats
  All strings with Windows 7 pro and SSD drive
----------                                       .xlsx   .ods   .xls
Rows x Cols WritingOptions                Build  Write  Write  Write
Rows x Cols WritingOptions                Build  Write  Write  Write
10.000x100  [                           ]   0,7    5,3    2,9    1,1
10.000x100  [woVirtualMode              ]   0,0    6,4    0,0    1,3
10.000x100  [               woSaveMemory]   0,7   12,3    8,1   30,1
10.000x100  [woVirtualMode, woSaveMemory]   0,0   13,7    0,0   31,0
---------
40.000x100  [                           ]   3,0   21,4   12,1    4,1
40.000x100  [woVirtualMode              ]   0,0   25,0    0,0    5,4
40.000x100  [               woSaveMemory]   3,0   48,7   30,6  125,5
40.000x100  [woVirtualMode, woSaveMemory]   0,0   53,4    0,0  128,3
----------
Ready

Keep up the great work.  O:-)


Edit: Yeah... in WriteLabel (sfExcel8) you do about 7 (+ whatever is done in WriteXFIndex) separate writes to the tfilestream.
When building an internal small buffer in that procedure and writing that to stream you could cut down the time from 30 seconds to 4 (for 10.000) and from 125 down to 20 (for 40.000 records). (and that's just in WriteLabel. I didn't check the other procedures.) Besides buffering in your procedure itself, a buffering tfilestream would solve all this too. I haven't found one yet, though.
« Last Edit: July 15, 2014, 10:58:42 am by rvk »

wp

  • Hero Member
  • *****
  • Posts: 13433
Re: Using fpspreadsheet to write huge xlsx files
« Reply #27 on: July 15, 2014, 04:07:35 pm »
Quote
I take it woVirtualMode hasn't found it's way in fpsopendocument.pas yet
Correct. Still scared of the empty cell handling of ods...

Quote
in WriteLabel (sfExcel8) you do about 7 separate writes to the tfilestream.
Exact. And all biff units are full of "Stream.WriteByte", "WriteWord" etc... It would help a lot if all biff records were written at once instead of writing their parts separately.

But I am not sure if this is really the main issue. Windows supports buffered writing, i.e. every call to "Stream.WriteByte" first goes into a buffer anyway without me doing anything.

My feeling is that due to the size limitations of biff (max 65536 rows x 256 columns) there is less urgent need for the woSaveMemory option here. Therefore I'll postpone this issue until I'm ready for a lot of typing again...

Quote
buffering tfilestream
Don't put too much effort into this. Unless included with fpc I am hesitant to add dependencies on other external libraries to the fpspreadsheet packages. It always a pain for the user being forced to install yet another package which may lead to instabilities and issues.

wp

  • Hero Member
  • *****
  • Posts: 13433
Re: Using fpspreadsheet to write huge xlsx files
« Reply #28 on: July 15, 2014, 05:03:20 pm »
Couldn't resist - and I've got to correct myself: there is a factor of 4 speed increase of the virtualmode demo for biff8 when complete records of number and label cells are written instead of single bytes and words... Is contained in rev 3321 (number cells for all biff, label cells for biff8 only).

rvk

  • Hero Member
  • *****
  • Posts: 6953
Re: Using fpspreadsheet to write huge xlsx files
« Reply #29 on: July 15, 2014, 05:04:27 pm »
But I am not sure if this is really the main issue. Windows supports buffered writing, i.e. every call to "Stream.WriteByte" first goes into a buffer anyway without me doing anything.
I don't think Windows (or Linux) is buffering these writes.
I'm not sure why not, but i did some testing and when using an internal buffer (if only in the WriteLabel-function) the time get reduced immensely.
There are a lot of discussions about it on the net (in combination with TFileStream).

I tried one myself, TaaWriteBufferFilter (which was hard to find), with a small test-loop (10.000 times WriteBuffer(V,1)) which reduced the writing of a file from 36,1 seconds to 0,1 second.

And if you disable the WriteBytes in xlsbuff8.pas in WriteLabel and leave the last WriteBuffer standing (just for testing), you'll see these are the reason for the slowdown.

Quote
My feeling is that due to the size limitations of biff (max 65536 rows x 256 columns) there is less urgent need for the woSaveMemory option here. Therefore I'll postpone this issue until I'm ready for a lot of typing again...
Completely understandable (and no problem)...

Quote
Don't put too much effort into this. Unless included with fpc I am hesitant to add dependencies on other external libraries to the fpspreadsheet packages. It always a pain for the user being forced to install yet another package which may lead to instabilities and issues.
The TFileStreamBuffered (or TaaWriteBufferFilter) could, of course, be a separate unit in fpsreadsheet itself (like fpszipper.pp. It would only be a small one of about 180 lines max. and fpszipper.pp is way bigger). No need for external dependencies. It could even be transferred to fpc at one point. I was surprised there was no standard buffering in TFileStream by default. (I'm still looking)

But you're correct... that's for later.
(I'll let you know if and when i have a working sample)

B.T.W. I noticed that fpsopendocument.pas implemented the writing itself directly in .WriteToFile but for xlsxooxml.pas it is in .WriteToStream which is called from .WriteToFile.
In fpsopendocument.pas the .WriteToStream is unsupported yet. I'm assuming this is still on your (probably very big) todo-list ;)

 

TinyPortal © 2005-2018