Recent

Author Topic: [SOLVED]Some additional functions for ranges - CurrentRegion, getDataArray, etc.  (Read 583 times)

JohnSUN

  • Newbie
  • Posts: 3
  • OpenSource as a way to save the world
Dear colleagues,
the project I'm working on is related to manipulating data ranges in a spreadsheet. Currently, there is no simple way to get the functions that are often used in Excel or in Calc - select the current range (expand the current selection to a rectangular area bounded by empty lines and columns, Ctrl+*), get data from the specified range as an array of array of variant (or VarArray) and return this data after some calculations to the range from the array (without formatting or formulas, only data) and some other amenities.
I tried to write the missing functionality on my own, but I ran into some incomprehensible problems. In the example I show two of them.
This code must implement the "expand selected range" action. The same code with the same data gives different results. It seems to me that the reason is that the Find() function in the ReadAsText() method doesn't always work correctly. (It is possible that the error is hidden in my code, but not in the method code)
I would be grateful for the code review (places requiring attention are marked with comments '!!!')
« Last Edit: June 05, 2019, 04:32:42 am by JohnSUN »
Copying is not theft! So, copy only high-quality things, don't take the garbage, don't multiply rubbish - and the world will be better

wp

  • Hero Member
  • *****
  • Posts: 6462
expand the current selection to a rectangular area bounded by empty lines and columns
What does the "empty" here mean? Is a cell without content but carrying some format (e.g. background color) considered to be "empty" in this sense? Or must the cell be non-existent so that Worksheet.FindCell would find the value nil?
Lazarus trunk / fpc 3.0.4 / all 32-bit on Win-10

JohnSUN

  • Newbie
  • Posts: 3
  • OpenSource as a way to save the world
For the current (far from the final version!), the criterion "does not contain any display text" was choosed. In other words, ReadAsText returns an empty string. For any reason. The absence of a cell, the result of the formula is an empty string, other...
Copying is not theft! So, copy only high-quality things, don't take the garbage, don't multiply rubbish - and the world will be better

wp

  • Hero Member
  • *****
  • Posts: 6462
OK, I am asking because otherwise the check for an empty cell should be reformulated. But that's not you problem, anyway.

Could you specify exactly under what conditions you see that FindCell is not working. Tell me which cell must be selected on which worksheet so that a wrong result is produced. Tell me the result and what you expected.
Lazarus trunk / fpc 3.0.4 / all 32-bit on Win-10

wp

  • Hero Member
  • *****
  • Posts: 6462
In TForm1.getCurrentRange, there is a bug due to exchanged parameters: Procedure ExtendSelection is defined with parameters Row, Col, but in the body of GetCurrentRange you call it with opposite order: Col,Row. After this modification, all ranges in the sheets "Connecting ranges" and "Non-adjacetn ranges" are found; the "Ripped ranges" do not work, yet - this probably is an issue in the algorithm which I did not investigate.

In order to display the selected range in the grid you must know that ranges are stored at two places: one is at the worksheet level which is very fundamental and affects when worksheets are stored to Excel/Calc. The other one is at the grid level - the worksheet grid inherits from the standard LCL TCustomGrid which has its own selection mechanism. Therefore, all you have to do is to select the range also in the grid:

Code: Pascal  [Select]
  1. procedure TForm1.SelectCurrentRangeExecute(Sender: TObject);
  2. var
  3.   ws: TsWorksheet;
  4.   allSelRange: TsCellRangeArray;
  5.   SeleRange: TsCellRange;
  6.   activeCellRow,activeCellCol: Cardinal;
  7. begin
  8.   ws := sWorkbookSource1.Workbook.ActiveWorksheet;
  9.   activeCellRow := ws.ActiveCellRow;
  10.   activeCellCol := ws.ActiveCellCol;
  11.   SeleRange := GetCurrentRange(ws, activeCellRow, activeCellCol);
  12.   SetLength(allSelRange, 1);
  13.   allSelRange[0] := SeleRange;
  14.   // Select range in worksheet
  15.   ws.SetSelection(allSelRange);
  16.   // Select range in grid
  17.   // sWorksheetGrid1.ClearSelections;  // if removed ranges can be extended by CTRL key method.
  18.   sWorksheetGrid1.Selection := Rect(
  19.     sWorksheetGrid1.GetGridCol(allSelRange[0].Col1),
  20.     sWorksheetGrid1.GetGridRow(allSelRange[0].Row1),
  21.     sWorksheetGrid1.GetGridCol(allSelrange[0].Col2),
  22.     sWorksheetGrid1.GetGridRow(allSelrange[0].Row2)
  23.   );
  24. end;
  25.  

Please note the vary popular error made that row and column indexes are exchanged in both worlds and are 1-based for the grid, but 0-based for the worksheet. Call the worksheetGrid methods GetGridRow/GetGridCol to convert from 0-based worksheet to 1-based grid indexes. (or GetSheetRow/GetSheetCol for the opposite direction).
« Last Edit: June 04, 2019, 07:52:31 pm by wp »
Lazarus trunk / fpc 3.0.4 / all 32-bit on Win-10

JohnSUN

  • Newbie
  • Posts: 3
  • OpenSource as a way to save the world
OMG! I suspected that it was my mistake! As always, the ridiculous typo slowed down the work for several hours. Thank you! Now, after changing the order of the parameters, the function works correctly on all sheets.
I also thank you for solving the second problem - it works. It is a pity that this construction is not singled out in a separate method - something like a RePaint. But as my previous President joked - "He can do everything, but he cannot do everything at once"

I know that the questions asked in the first message received their comprehensive answers. However, I will not close this discussion yet (Solved). I deliberately put into the title of the topic several titles of various methods that are widely used in spreadsheet processors. I would like to discuss their creation here, in this thread. Everything that will be accumulated as a result of the discussion may be transferred to the package code — it is easier to assemble code fragments from one topic than to collect from several. If I'm wrong - correct me, I'll start a new topic.
When processing tables, rectangular data ranges, such things are actively used (I quote the method names from StarBasic AOO/LiO) - queryEmptyCells, queryIntersection, getData (result as array of arrayof Double), getDataArray (...of Variant), copyRange, etc.
If we can manage to implement some of these features, using FPSpreadsheet will become even easier. The next method I want to think about is getDataArray. In Excel it replaces with
Code: Pascal  [Select]
  1. aData = aRange.Value
I want to know your opinion on the implementation of this method. Should I read the worksheet, I double loop row by row and cell by cell? Or does it make sense to think about cutting a tree of cells from the upper left corner and bypassing the resulting subtree with the restriction to the extreme right index (column) and lower? The first method (high level) relies on already existing methods of reading cells, but it will be slow because of the piece-by-piece reading of each cell. The second method (low level) will require intervention in the parent classes, which contradicts some of the principles, can lead to errors and confusion. But must be very fast. Your opinion?
Copying is not theft! So, copy only high-quality things, don't take the garbage, don't multiply rubbish - and the world will be better

wp

  • Hero Member
  • *****
  • Posts: 6462
I don't see much benefit in putting such methods into FPSpreadsheet - there are already too many methods. Being a historical activity which grew in a way its original inventor probably never thought of it contains many many parts which should be refactored, but this will certainly break compatibility. And rewriting as a completely new project, no thanks...

If you need these methods write them yourself. Yes, just iterate over the range of cells and extract the information requested into a 2D array: if you need "everything" use an "array of array of TCell" (TCell is just a record, not a class!); if you only need the values, use an "array of array of variant"; or if you need the cell texts use an "array of array of string" or a TStringList, etc. A 2D array is much easier to handle than a sub-tree.

[P.S.]
A note on TCell: Since it stores various cell properties only as indexes to lists contained in the worksheet or workbook these properties can only be extracted when the worksheet and/or workbook are available.
« Last Edit: June 06, 2019, 01:11:18 am by wp »
Lazarus trunk / fpc 3.0.4 / all 32-bit on Win-10