Print Page - [SOLVED via svn!] MS Excel created xml file support

Programming => Packages and Libraries => FPSpreadsheet => Topic started by: totya on July 13, 2019, 12:36:14 pm

Title: [SOLVED via svn!] MS Excel created xml file support
Post by: totya on July 13, 2019, 12:36:14 pm

Hi wp master! :)

One my app input files usually xml files. But these not normal xml files, because these files created (see header) with MS Office. If I try to open these files with lazarus (https://wiki.freepascal.org/XML_Tutorial), I got sigsev error lol. So I process them manually. In first time, this manual process is must, because often the xml structure is maybe damaged - its normal, but I restore them manually, after I cheking it. Now I got an idea, these files are office files, and your big components supported the office files. See the header:

Code: Pascal [Select][+]

<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:o="urn:schemas-microsoft-com:office:office"
 xmlns:x="urn:schemas-microsoft-com:office:excel"
 xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:html="http://www.w3.org/TR/REC-html40">
 <DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">

so, can you support this file type?

Now, fps can open this file partially, and use only one column. Spready don't see the xml files.

I created a simple file, I attached it, created/saved via Excel 2003 SP3.

If you intrested in, I can send you the original file in private, because the structure is slightly complicated, for example it can define index under row structure (column):

Code: Pascal [Select][+]

<Cell ss:Index="2"><Data ss:Type="String">CAT_NAME</Data></Cell>

The original files contain much more style, but these styles doesn't matter for me... I need only, if I see (and I can process) datas similar to opened these file via excel.

Thank you! :)

Title: Re: MS Excel created xml file support
Post by: wp on July 13, 2019, 02:36:16 pm

The Excel xml format is a temporary format which Microsoft played with before they introduced the now popular xlsx format. I had tested the extendability of the fpspreadsheet formats with it. I had not thought that anybody is using this kind of xml format and thus wrote only the writer for it.

Having a plain xml file makes it easy to read the file by yourself - just iterate through the xml nodes and take what you need. The following code reads the cells with type "String" or "Number" and adds the content to a string grid. Of course, it could add the data to a TsWorksheet as well...

Code: Pascal [Select][+]

uses
  laz_dom, laz_xmlread;
 
procedure TForm1.LoadExcelXML(AFileName: String);
var
  doc: TXMLDocument = nil;
  sheet_node: TDOMNode;
  table_node: TDOMNode;
  row_node: TDOMNode;
  cell_node: TDOMNode;
  data_node: TDOMNode;
  nodeName: String;
  s: String;
  r, c: Integer;
begin
  try
    ReadXMLFile(doc, AFileName);
    sheet_node := doc.DocumentElement.FindNode('Worksheet');
    if sheet_node = nil then exit;
    table_node := sheet_node.FindNode('Table');
    s := GetAttrValue(table_node, 'ss:ExpandedColumnCount');
    if s <> '' then StringGrid1.ColCount := StringGrid1.FixedCols + StrToInt(s);
    s := GetAttrValue(table_node, 'ss:ExpandedRowCount');
    if s <> '' then stringGrid1.RowCount := StringGrid1.FixedRows + StrToInt(s);
 
    r := StringGrid1.FixedRows - 1;
    row_node := table_node.FirstChild;
    while row_node <> nil do begin
      nodeName := row_node.NodeName;
      if nodeName = 'Row' then begin
        inc(r);
        c := StringGrid1.FixedCols;
        cell_node := row_node.FirstChild;
        while cell_node <> nil do begin
          nodeName := cell_node.NodeName;
          if nodeName = 'Cell' then begin
            data_node := cell_node.FirstChild;
            while data_node <> nil do begin
              nodeName := data_node.NodeName;
              if nodeName = 'Data' then begin
                s := GetAttrValue(data_node, 'ss:Type');
                if (s = 'String') or (s = 'Number') then
                  StringGrid1.Cells[c, r] := GetNodeValue(data_node)
                else
                  StringGrid1.Cells[c, r] := '';
                inc(c);
              end;
              data_node := data_node.NextSibling;
            end;
          end;
          cell_node := cell_node.NextSibling;
        end;
      end;
      row_node := row_Node.NextSibling;
    end;
  finally
    doc.Free;
  end;
end;  

Title: Re: MS Excel created xml file support
Post by: totya on July 13, 2019, 03:12:36 pm

Quote from: wp on July 13, 2019, 02:36:16 pm

...Having a plain xml file makes it easy to read the file by yourself - just iterate through the xml nodes and take what you need...

Hi wp master!

Thank you for the answer, and the example code! :)

First, in my recent (fixed branch) Lazarus version no laz_dom or laz_xmlread units, only laz2_dom or laz2_xmlread (UTF8 version units).

Second, your code works nicely with the sample what I attached, but not as the real file, for example as I wrote in start post, this code doesn't interpret the start index value. But this is very good example for me, thank you! Otherwise I will send you a real (non public) example file in private, you can see if you want. Thanks again!

Title: Re: MS Excel created xml file support
Post by: wp on July 13, 2019, 03:59:48 pm

Quote from: totya on July 13, 2019, 03:12:36 pm

First, in my recent (fixed branch) Lazarus version no laz_dom or laz_xmlread units, only laz2_dom or laz2_xmlread (UTF8 version units).

Oh sorry, you do need laz2_dom and laz2_xmlread, my typo...

Quote from: totya on July 13, 2019, 12:36:14 pm

complicated, for example it can define index under row structure (column):
Code: Pascal [Select][+][-]
<Cell ss:Index="2"><Data ss:Type="String">CAT_NAME</Data></Cell>

I used my Excel 2016 to store a dummy file in Excel2003 format, and it shows that "index=2" is the (1-based) column index; it seems to be used when the stored cell block is contains empty cells. Use my demo to read the index via GetAttrValue and then set the column index c accordingly.

Title: Re: MS Excel created xml file support
Post by: totya on July 13, 2019, 05:04:37 pm

Quote from: wp on July 13, 2019, 03:59:48 pm

I used my Excel 2016 to store a dummy file in Excel2003 format, and it shows that "index=2" is the (1-based) column index; it seems to be used when the stored cell block is contains empty cells. Use my demo to read the index via GetAttrValue and then set the column index c accordingly.

Thanks for the tip, master :)

Code: Pascal [Select][+]

s := GetAttrValue(cell_node, 'ss:Index');
if s<>'' then c := StrToInt(s);

It's better, but not perfect yet... I'm working on it. :)

Title: Re: [Answered] MS Excel created xml file support
Post by: totya on July 13, 2019, 05:50:55 pm

Okay, I can't read this string:

Code: Pascal [Select][+]

<Cell ss:StyleID="s120"><ss:Data ss:Type="String"
      xmlns="http://www.w3.org/TR/REC-html40"><B>TODO: </B><Font>Sample string...</Font></ss:Data></Cell>

After I modify this line:

Code: Pascal [Select][+]

if (nodeName = 'Data') or (nodeName = 'ss:Data') then

I got right column position, but the value is empty. Can I read this value somehow?

Edit.: I can't do anything an empty value, but if I got raw value (with newline sign and spaces), this is enough for me (I think it is less job for you, if you want to modify the code of GetAttrValue). I can see in type: ss:Data, so this is an special value, so I need process this value manually. (If the data_node contain it...)

Title: Re: [Answered] MS Excel created xml file support
Post by: wp on July 13, 2019, 10:58:02 pm

I added an elemental reader for Excel2003 xml files to fpspreadsheet. So far, it only reads cell values (similar to the example above), no formats, no formulas, none of the other fancy things.

EDIT:
ss:Data nodes are supported now, the html codes, however, are lost. Why didn't Microsoft use a CDATA[] section here?

EDIT2:
Now with cell formatting (r7031).

Title: Re: [Answered] MS Excel created xml file support
Post by: totya on July 14, 2019, 12:40:13 am

Hi wp master!

I dream? :o No... LOL. In latest svn I see every cell, like as "Todo sample string". LOL.

Impossible... but true. I can read the office xml format via fps... :o

I have a really small questions... can I determine a (any) cell type ss:Data or not, and can I read the original (untouched raw) xml line/Data value from a (any) cell anyhow? For example I need for the untouched cell data for example "-" and not the "-". But this version is good if available. ;)

Title: Re: [SOLVED via svn!] MS Excel created xml file support
Post by: wp on July 14, 2019, 11:27:24 am

All the text extraction is done by the xml parser in laz2_xmlread. It probably requires some effort to modifiy this.

Title: Re: [SOLVED via svn!] MS Excel created xml file support
Post by: totya on July 14, 2019, 11:35:50 am

Quote from: wp on July 14, 2019, 11:27:24 am

All the text extraction is done by the xml parser in laz2_xmlread. It probably requires some effort to modifiy this.

Thank you for your answer, and your excellent job! In this time I use my manual "parser" too, because I must need the raw data.

Title: Re: [SOLVED via svn!] MS Excel created xml file support
Post by: totya on July 14, 2019, 04:15:36 pm

Hi wp master!

Your modified office-xml-capable component is very useful for me, many needed code much sorter/easier than before. And thank you for the error handle, I got exception for this:

Quote

&quot

because ";" missing :) First of all I need to check and repair all errors in the xml files.

Title: Re: [SOLVED via svn!] MS Excel created xml file support
Post by: totya on July 14, 2019, 06:28:21 pm

Hi master,

I got "NoValidNumberFormatString" when I run this file:
\fpspreadsheet_svn\examples\read_write\excelxmldemo\excelxmlwrite.lpr

fps svn is 7031-7033.

I have fixes 3.2 fpc and fixes 2.0 Lazarus.

Title: Re: [SOLVED via svn!] MS Excel created xml file support
Post by: wp on July 14, 2019, 10:42:19 pm

I cannot reproduce this error. There was an error in writing "rich-text" formatted cells (fixed in r7034). And the RC notation of cell references in formulas is still buggy - requires more investigation...

Title: Re: [SOLVED via svn!] MS Excel created xml file support
Post by: totya on July 14, 2019, 10:52:59 pm

Quote from: wp on July 14, 2019, 10:42:19 pm

I cannot reproduce this error. There was an error in writing "rich-text" formatted cells (fixed in r7034)...

r7034 raised exception too, see picture.

Title: Re: [SOLVED via svn!] MS Excel created xml file support
Post by: totya on July 14, 2019, 11:10:34 pm

I'm sorry, but I have a new question, but I think the answer is "no". :)

If I open an office xml file with your great component, and after save it (without any changes), the result xml file is different from the original (like as MS Excel). My queston is, is possible anyhow, the saved file let be same as the original file?

For example, if I changed only one cell, then the best for me, if only one line changed in the xml... and only the data value, between > <

But I think It's not "normal" usage. Sorry for this question.

Title: Re: [SOLVED via svn!] MS Excel created xml file support
Post by: wp on July 14, 2019, 11:17:31 pm

Quote from: totya on July 14, 2019, 10:52:59 pm

r7034 raised exception too, see picture.

Hmmm... You are using the excelxmlwrite demo which comes with this revision? Is your Lazarus 32 bit or 64 bit? You seem to be on Windows 7?

Please try again with r7036, it displays more information in the exception message.

Title: Re: [SOLVED via svn!] MS Excel created xml file support
Post by: totya on July 14, 2019, 11:19:39 pm

Windows 7 x64, Lazarus x86.

Title: Re: [SOLVED via svn!] MS Excel created xml file support
Post by: wp on July 14, 2019, 11:23:50 pm

Quote from: totya on July 14, 2019, 11:10:34 pm

I'm sorry, but I have a new question, but I think the answer is "no". :)

If I open an office xml file with your great component, and after save it (without any changes), the result xml file is different from the original (like as MS Excel). My queston is, is possible anyhow, the saved file let be same as the original file?

For example, if I changed only one cell, then the best for me, if only one line changed in the xml... and only the data value, between > <

But I think It's not "normal" usage. Sorry for this question.

No problem.

ATM, the Excel2003/XML reader/writers are not complete: There are still several nodes which the reader ignores. Therefore, the writer (which is more complete) cannot write them back. And even when I'll be finished it is still possible that the file will change after write-back because there a many details in these files which I don't understand. The same is true for the standard formats such as xlsx or ods.

Title: Re: [SOLVED via svn!] MS Excel created xml file support
Post by: wp on July 14, 2019, 11:27:28 pm

Quote from: totya on July 14, 2019, 11:19:39 pm

Windows 7 x64, Lazarus x86.

I see. What is "de./du."? Something like "AM/PM"? In this case, this format modifier is not supported ATM, and you should switch to a 24-hour format.

Title: Re: [SOLVED via svn!] MS Excel created xml file support
Post by: totya on July 14, 2019, 11:29:25 pm

Quote from: wp on July 14, 2019, 11:23:50 pm

And even when I'll be finished it is still possible that the file will change after write-back because there a many details in these files which I don't understand. The same is true for the standard formats such as xlsx or ods.

I tought it, but now I'm sure. Thanks for the answer! I continue to write my low level "parser", but your component is very good for example for a compare (position check)... etc...

Title: Re: [SOLVED via svn!] MS Excel created xml file support
Post by: totya on July 14, 2019, 11:30:53 pm

Quote from: wp on July 14, 2019, 11:27:28 pm

Quote from: totya on July 14, 2019, 11:19:39 pm
Windows 7 x64, Lazarus x86.
I see. What is "de./du."? Something like "AM/PM"? In this case, this format modifier is not supported ATM, and you should switch to a 24-hour format.

de = AM
du = PM

This is the default language setting by the OS, in my country (Hungary).

Title: Re: [SOLVED via svn!] MS Excel created xml file support
Post by: wp on July 14, 2019, 11:40:44 pm

What does this program write to the screen?

Code: Pascal [Select][+]

program Project1;
 
uses
  SysUtils;
 
begin
  WriteLn('ShortTimeFormat: ', DefaultFormatSettings.LongTimeFormat);
  WriteLn('LongTimeFormat:  ', DefaultFormatSettings.ShortTimeFormat);
  WriteLn('TimeAMString:    ', DefaultFormatSettings.TimeAMString);
  WriteLn('TimePMString:    ', DefaultFormatSettings.TimePMString);
  ReadLn;
end. 

Title: Re: [SOLVED via svn!] MS Excel created xml file support
Post by: totya on July 15, 2019, 12:24:12 am

ops... your sample is buggy... O:-) The corrected code:

Code: Pascal [Select][+]

begin
  WriteLn('LongTimeFormat: ', DefaultFormatSettings.LongTimeFormat);
  WriteLn('ShortTimeFormat: ', DefaultFormatSettings.ShortTimeFormat);
  WriteLn('TimeAMString:    ', DefaultFormatSettings.TimeAMString);
  WriteLn('TimePMString:    ', DefaultFormatSettings.TimePMString);
  ReadLn;
end.

The result:

Quote

LongTimeFormat: h:nn:ss
ShortTimeFormat: h:nn
TimeAMString: de.
TimePMString: du.

Title: Re: [SOLVED via svn!] MS Excel created xml file support
Post by: totya on July 16, 2019, 08:48:18 am

Quote from: wp on July 14, 2019, 11:27:28 pm

Quote from: totya on July 14, 2019, 11:19:39 pm
Windows 7 x64, Lazarus x86.
I see. What is "de./du."? Something like "AM/PM"? In this case, this format modifier is not supported ATM, and you should switch to a 24-hour format.

Hi master :)

I cant switch to the 24 hours format, because my country time default format the 24 hours format... BUT if I change manually "de" sign to AM, and "du' sign to PM, your excel xml demo executed without any error.

Title: Re: [SOLVED via svn!] MS Excel created xml file support
Post by: wp on July 17, 2019, 12:14:47 am

I had never seen this issue before your report because my DefaultFormatSettings contain empty strings for TimeAMString and TimePMString.

Please try r7043. Should be fixed now.

-----------------

The Excel2003/XML reader is now rather complete. What is missing is rich-text cell formatting. The reason is the same isue for which you posted a question somewhere else: Using laz2_dom and laz2_xmlread, how can I retrieve the undecoded xml child nodes of that strange <ss:Data> node?

Code: XML [Select][+]

 <Cell ss:StyleID="s112"><ss:Data ss:Type="String><B>TODO: </B><Font>some text</Font></ss:Data></Cell>

Title: Re: [SOLVED via svn!] MS Excel created xml file support
Post by: totya on July 17, 2019, 06:37:52 am

Hi master!

First time, your demo write executed without any error. See attached file.

Quote from: wp on July 17, 2019, 12:14:47 am

The reason is the same isue for which you posted a question somewhere else: Using laz2_dom and laz2_xmlread, how can I retrieve the undecoded xml child nodes of that strange <ss:Data> node

I created topic for it... but no answer (https://forum.lazarus.freepascal.org/index.php/topic,46100.0.html) O:-)

I process these files manaully as I said... with regex... the code under development... I hate regex, because all regex work differently (from same patternt), some buggy, some not, hopefully the trunk sorokin regexpr works with utt8 without any magic... (for ex. unicode mode).

Have a nice day!

Title: Re: [SOLVED via svn!] MS Excel created xml file support
Post by: totya on July 22, 2019, 08:03:09 pm

Hi wp master!

I see in svn watcher, your working hardly for this filetype support.

Today I about finished my own "parser", I can read and write the all cell data with raw value. Besides I can write back the data to the file, and these recreated files same as the original (hash checked!). This is mean, I can simulate the original office xml line-breaking system too. I test it with about 150 file, and all hash is equal. This is exactly, what I want (raw value, and reproduce the same xml structure).

Anyway, thank you wp master for your many help and support!

Title: Re: [SOLVED via svn!] MS Excel created xml file support
Post by: totya on August 13, 2019, 10:04:41 pm

Hi master!

Thanks again for your many job and advice!

Now I write to you, because when I read with my low-level app your test.xml file (created by excelxmlwrite with the latest fps svn), I got an error (from my app):

Quote

Invalid index value in Row node: 11 (before: 11)

This isn't a big error, but... I think my checking code is okay, and I don't want to modify it, because it's okay ;)

Your created xml code:

Quote

<Row ss:Index="11" ss:AutoFitHeight="1">
<Cell>
<Data ss:Type="String">Writing current date/time:</Data>
</Cell>
</Row>

The problem is, the ss:Index="11" is unnecessary, because this row index is 11 anyway. If I resave your xml file with the excel, the ss:Index="11" is certainly disappear.

Can you correct this small unnecessary row index "problem"?

Title: Re: [SOLVED via svn!] MS Excel created xml file support
Post by: wp on August 13, 2019, 11:24:54 pm

Quote from: totya on August 13, 2019, 10:04:41 pm

Quote
<Row ss:Index="11" ss:AutoFitHeight="1">
<Cell>
<Data ss:Type="String">Writing current date/time:</Data>
</Cell>
</Row>

The problem is, the ss:Index="11" is unnecessary, because this row index is 11 anyway. If I resave your xml file with the excel, the ss:Index="11" is certainly disappear.

Assuming that we are talking of the same file the 'ss:Index="11"' cannot be removed because there is an empty row before #11. Row indexes can only be skipped when they follow each other without gaps. Of course, I could add empty row nodes to keep the indexes in sequence, but Excel is not doing it, so why should fpspreadsheet do it?

When I re-save the file in Excel the ss:Index="11" stays in the file. So maybe your file is somehow different from mine? Can you zip the file and post it?

Title: Re: [SOLVED via svn!] MS Excel created xml file support
Post by: totya on August 14, 2019, 12:02:11 am

Quote from: wp on August 13, 2019, 11:24:54 pm

Assuming that we are talking of the same file the 'ss:Index="11"' cannot be removed because there is an empty row before #11.

This isn't an really empty row, because this line
<Row ss:AutoFitHeight="1"/>
be in the xml. Look, I read about 150 real excel xml file, and this checking passed with all files.

Really empty row example from the original excel:

Quote

<Table ss:ExpandedColumnCount="1" ss:ExpandedRowCount="5" x:FullColumns="1"
x:FullRows="1">
<Row>
<Cell><Data ss:Type="String">a</Data></Cell>
</Row>
<Row ss:Index="5">
<Cell><Data ss:Type="String">b</Data></Cell>
</Row>
</Table>

Do you see the differents?

Quote from: wp on August 13, 2019, 11:24:54 pm

When I re-save the file in Excel the ss:Index="11" stays in the file. So maybe your file is somehow different from mine? Can you zip the file and post it?

Okay. Resaved with Excel 2003.

Title: Re: [SOLVED via svn!] MS Excel created xml file support
Post by: wp on August 14, 2019, 01:04:30 am

You are right. In fact, my Excel (2016) did not write the preceeding row at all, therefore it required the index.

I fixed writing of the unnecessary index now.

The problem left is why Excel writes "AutoFitHeight=0", but fpspreadsheet writes "AutoFitHeight=1". And in fact the "very, very, very long line" is not wrapped when the file is opened by Excel, but it is when opened by fpspreadsheet.

Title: Re: [SOLVED via svn!] MS Excel created xml file support
Post by: totya on August 14, 2019, 10:53:09 pm

Thank you wp master, for the quick fix :)