Recent

Author Topic: [SOLVED via svn!] MS Excel created xml file support  (Read 1546 times)

totya

  • Hero Member
  • *****
  • Posts: 577
[SOLVED via svn!] MS Excel created xml file support
« on: July 13, 2019, 12:36:14 pm »
Hi wp master! :)

One my app input files usually xml files. But these not normal xml files, because these files created (see header) with MS Office. If I try to open these files with lazarus, I got sigsev error lol. So I process them manually. In first time, this manual process is must, because often the xml structure is maybe damaged - its normal, but I restore them manually, after I cheking it. Now I got an idea, these files are office files, and your big components supported the office files. See the header:

Code: Pascal  [Select]
  1. <?xml version="1.0"?>
  2. <?mso-application progid="Excel.Sheet"?>
  3. <Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
  4.  xmlns:o="urn:schemas-microsoft-com:office:office"
  5.  xmlns:x="urn:schemas-microsoft-com:office:excel"
  6.  xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
  7.  xmlns:html="http://www.w3.org/TR/REC-html40">
  8.  <DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">

so, can you support this file type?

Now, fps can open this file partially, and use only one column. Spready don't see the xml files.

I created a  simple file, I attached it, created/saved via Excel 2003 SP3.

If you intrested in, I can send you the original file in private, because the structure is slightly complicated, for example it can define index under row structure (column):
Code: Pascal  [Select]
  1. <Cell ss:Index="2"><Data ss:Type="String">CAT_NAME</Data></Cell>

The original files contain much more style, but these styles doesn't matter for me... I need only, if I see (and I can process) datas similar to opened these file via excel.

Thank you! :)
« Last Edit: July 14, 2019, 12:40:44 am by totya »

wp

  • Hero Member
  • *****
  • Posts: 6229
Re: MS Excel created xml file support
« Reply #1 on: July 13, 2019, 02:36:16 pm »
The Excel xml format is a temporary format which Microsoft played with before they introduced the now popular xlsx format. I had tested the extendability of the fpspreadsheet formats with it. I had not thought that anybody is using this kind of xml format and thus wrote only the writer for it.

Having a plain xml file makes it easy to read the file by yourself - just iterate through the xml nodes and take what you need. The following code reads the cells with type "String" or "Number" and adds the content to a string grid. Of course, it could add the data to a TsWorksheet as well...

Code: Pascal  [Select]
  1. uses
  2.   laz_dom, laz_xmlread;
  3.  
  4. procedure TForm1.LoadExcelXML(AFileName: String);
  5. var
  6.   doc: TXMLDocument = nil;
  7.   sheet_node: TDOMNode;
  8.   table_node: TDOMNode;
  9.   row_node: TDOMNode;
  10.   cell_node: TDOMNode;
  11.   data_node: TDOMNode;
  12.   nodeName: String;
  13.   s: String;
  14.   r, c: Integer;
  15. begin
  16.   try
  17.     ReadXMLFile(doc, AFileName);
  18.     sheet_node := doc.DocumentElement.FindNode('Worksheet');
  19.     if sheet_node = nil then exit;
  20.     table_node := sheet_node.FindNode('Table');
  21.     s := GetAttrValue(table_node, 'ss:ExpandedColumnCount');
  22.     if s <> '' then StringGrid1.ColCount := StringGrid1.FixedCols + StrToInt(s);
  23.     s := GetAttrValue(table_node, 'ss:ExpandedRowCount');
  24.     if s <> '' then stringGrid1.RowCount := StringGrid1.FixedRows + StrToInt(s);
  25.  
  26.     r := StringGrid1.FixedRows - 1;
  27.     row_node := table_node.FirstChild;
  28.     while row_node <> nil do begin
  29.       nodeName := row_node.NodeName;
  30.       if nodeName = 'Row' then begin
  31.         inc(r);
  32.         c := StringGrid1.FixedCols;
  33.         cell_node := row_node.FirstChild;
  34.         while cell_node <> nil do begin
  35.           nodeName := cell_node.NodeName;
  36.           if nodeName = 'Cell' then begin
  37.             data_node := cell_node.FirstChild;
  38.             while data_node <> nil do begin
  39.               nodeName := data_node.NodeName;
  40.               if nodeName = 'Data' then begin
  41.                 s := GetAttrValue(data_node, 'ss:Type');
  42.                 if (s = 'String') or (s = 'Number') then
  43.                   StringGrid1.Cells[c, r] := GetNodeValue(data_node)
  44.                 else
  45.                   StringGrid1.Cells[c, r] := '';
  46.                 inc(c);
  47.               end;
  48.               data_node := data_node.NextSibling;
  49.             end;
  50.           end;
  51.           cell_node := cell_node.NextSibling;
  52.         end;
  53.       end;
  54.       row_node := row_Node.NextSibling;
  55.     end;
  56.   finally
  57.     doc.Free;
  58.   end;
  59. end;  
Lazarus trunk / fpc 3.0.4 / all 32-bit on Win-10

totya

  • Hero Member
  • *****
  • Posts: 577
Re: MS Excel created xml file support
« Reply #2 on: July 13, 2019, 03:12:36 pm »
...Having a plain xml file makes it easy to read the file by yourself - just iterate through the xml nodes and take what you need...

Hi wp master!

Thank you for the answer, and the example code! :)

First, in my recent (fixed branch) Lazarus version no laz_dom or laz_xmlread units, only  laz2_dom or laz2_xmlread (UTF8 version units).

Second, your code works nicely with the sample what I attached, but not as the real file, for example as I wrote in start post, this code doesn't interpret the start index value. But this is very good example for me, thank you! Otherwise I will send you a real (non public) example file in private, you can see if you want. Thanks again!

wp

  • Hero Member
  • *****
  • Posts: 6229
Re: MS Excel created xml file support
« Reply #3 on: July 13, 2019, 03:59:48 pm »
First, in my recent (fixed branch) Lazarus version no laz_dom or laz_xmlread units, only  laz2_dom or laz2_xmlread (UTF8 version units).
Oh sorry, you do need laz2_dom and laz2_xmlread, my typo...

complicated, for example it can define index under row structure (column):
Code: Pascal  [Select]
  1. <Cell ss:Index="2"><Data ss:Type="String">CAT_NAME</Data></Cell>
I used my Excel 2016 to store a dummy file in Excel2003 format, and it shows that "index=2" is the (1-based) column index; it seems to be used when the stored cell block is contains empty cells. Use my demo to read the index via GetAttrValue and then set the column index c accordingly.
Lazarus trunk / fpc 3.0.4 / all 32-bit on Win-10

totya

  • Hero Member
  • *****
  • Posts: 577
Re: MS Excel created xml file support
« Reply #4 on: July 13, 2019, 05:04:37 pm »
I used my Excel 2016 to store a dummy file in Excel2003 format, and it shows that "index=2" is the (1-based) column index; it seems to be used when the stored cell block is contains empty cells. Use my demo to read the index via GetAttrValue and then set the column index c accordingly.

Thanks for the tip, master :)

Code: Pascal  [Select]
  1. s := GetAttrValue(cell_node, 'ss:Index');
  2. if s<>'' then c := StrToInt(s);

It's better, but not perfect yet... I'm working on it. :)

totya

  • Hero Member
  • *****
  • Posts: 577
Re: [Answered] MS Excel created xml file support
« Reply #5 on: July 13, 2019, 05:50:55 pm »
Okay, I can't read this string:

Code: Pascal  [Select]
  1. <Cell ss:StyleID="s120"><ss:Data ss:Type="String"
  2.       xmlns="http://www.w3.org/TR/REC-html40"><B>TODO: </B><Font>Sample string...</Font></ss:Data></Cell>

After I modify this line:
Code: Pascal  [Select]
  1. if (nodeName = 'Data') or (nodeName = 'ss:Data') then

I got right column position, but the value is empty. Can I read this value somehow?

Edit.: I can't do anything an empty value, but if I got raw value (with newline sign and spaces), this is enough for me (I think it is less job for you, if you want to modify the code of GetAttrValue). I can see in type: ss:Data, so this is an special value, so I need  process this value manually. (If the data_node contain it...)
« Last Edit: July 13, 2019, 10:30:02 pm by totya »

wp

  • Hero Member
  • *****
  • Posts: 6229
Re: [Answered] MS Excel created xml file support
« Reply #6 on: July 13, 2019, 10:58:02 pm »
I added an elemental reader for Excel2003 xml files to fpspreadsheet. So far, it only reads cell values (similar to the example above), no formats, no formulas, none of the other fancy things.

EDIT:
ss:Data nodes are supported now, the html codes, however, are lost. Why didn't Microsoft use a CDATA[] section here?

EDIT2:
Now with cell formatting (r7031).
« Last Edit: July 14, 2019, 05:06:12 pm by wp »
Lazarus trunk / fpc 3.0.4 / all 32-bit on Win-10

totya

  • Hero Member
  • *****
  • Posts: 577
Re: [Answered] MS Excel created xml file support
« Reply #7 on: July 14, 2019, 12:40:13 am »
Hi wp master!

I dream?  :o No... LOL. In latest svn I see every cell, like as "Todo sample string". LOL.

Impossible... but true. I can read the office xml format via fps...  :o



I have a really small questions... can I determine a (any) cell type ss:Data or not, and can I read the original (untouched raw) xml line/Data value from a (any) cell anyhow? For example I need for the untouched cell data for example "&#45;" and not the "-". But this version is good if available.  ;)
« Last Edit: July 14, 2019, 01:28:30 am by totya »

wp

  • Hero Member
  • *****
  • Posts: 6229
Re: [SOLVED via svn!] MS Excel created xml file support
« Reply #8 on: July 14, 2019, 11:27:24 am »
All the text extraction is done by the xml parser in laz2_xmlread. It probably requires some effort to modifiy this.
Lazarus trunk / fpc 3.0.4 / all 32-bit on Win-10

totya

  • Hero Member
  • *****
  • Posts: 577
Re: [SOLVED via svn!] MS Excel created xml file support
« Reply #9 on: July 14, 2019, 11:35:50 am »
All the text extraction is done by the xml parser in laz2_xmlread. It probably requires some effort to modifiy this.

Thank you for your answer, and your excellent job! In this time I use my manual "parser" too, because I must need the raw data.

totya

  • Hero Member
  • *****
  • Posts: 577
Re: [SOLVED via svn!] MS Excel created xml file support
« Reply #10 on: July 14, 2019, 04:15:36 pm »
Hi wp master!

Your modified office-xml-capable component is very useful for me, many needed code much sorter/easier than before. And thank you for the error handle, I got exception for this:
Quote
&quot
because ";" missing :) First of all I need to  check and repair all errors in the xml files.

totya

  • Hero Member
  • *****
  • Posts: 577
Re: [SOLVED via svn!] MS Excel created xml file support
« Reply #11 on: July 14, 2019, 06:28:21 pm »
Hi master,

I got "NoValidNumberFormatString" when I run this file:
\fpspreadsheet_svn\examples\read_write\excelxmldemo\excelxmlwrite.lpr

fps svn is 7031-7033.

I have fixes 3.2 fpc and fixes 2.0 Lazarus.
« Last Edit: July 14, 2019, 08:50:24 pm by totya »

wp

  • Hero Member
  • *****
  • Posts: 6229
Re: [SOLVED via svn!] MS Excel created xml file support
« Reply #12 on: July 14, 2019, 10:42:19 pm »
I cannot reproduce this error. There was an error in writing "rich-text" formatted cells (fixed in r7034). And the RC notation of cell references in formulas is still buggy - requires more investigation...
Lazarus trunk / fpc 3.0.4 / all 32-bit on Win-10

totya

  • Hero Member
  • *****
  • Posts: 577
Re: [SOLVED via svn!] MS Excel created xml file support
« Reply #13 on: July 14, 2019, 10:52:59 pm »
I cannot reproduce this error. There was an error in writing "rich-text" formatted cells (fixed in r7034)...

r7034 raised exception too, see picture.

totya

  • Hero Member
  • *****
  • Posts: 577
Re: [SOLVED via svn!] MS Excel created xml file support
« Reply #14 on: July 14, 2019, 11:10:34 pm »
I'm sorry, but I have a new question, but I think the answer is "no". :)

If I open an office xml file with your great component, and after save it (without any changes), the result xml file is different from the original (like as MS Excel). My queston is, is possible anyhow, the saved file let be same as the original file?

For example, if I changed only one cell, then the best for me, if only one line changed in the xml... and only the data value, between > <

But I think It's not "normal" usage.  Sorry for this question.