Recent

Author Topic: Raw Data from TDOMNode  (Read 627 times)

totya

  • Hero Member
  • *****
  • Posts: 572
Raw Data from TDOMNode
« on: July 15, 2019, 09:18:19 pm »
Hi, I use laz2_dom and laz2_xmlread, so can I read the "raw" mean:untouched value of a node?

For example an xml value (element) this:
Quote
<B>TODO: </B>

When I read it (node.TextContent) I got
Quote
TODO


Thanks!
« Last Edit: July 18, 2019, 05:46:35 pm by totya »

wp

  • Hero Member
  • *****
  • Posts: 5990
Re: Raw Data from TDOMNode
« Reply #1 on: July 17, 2019, 11:28:45 am »
You must recursively iterate through the child nodes of <ss:Data> along with their attributes and reconstruct the original string.

The following code can be used in the demo of https://forum.lazarus.freepascal.org/index.php/topic,46069.msg327080.html#msg327080.
Code: Pascal  [Select]
  1. procedure RebuildChildNodes(ANode: TDOMNode; var AText: String);
  2. var
  3.   nodeName: String;
  4.   s: String;
  5.   i: Integer;
  6. begin
  7.   if ANode = nil then
  8.     exit;
  9.   while ANode <> nil do begin
  10.     nodeName := ANode.NodeName;
  11.     if nodeName = '#text' then
  12.       AText := AText + ANode.NodeValue
  13.     else begin
  14.       s := '';
  15.       for i := 0 to ANode.Attributes.Length-1 do
  16.         s := Format('%s %s="%s"', [s, ANode.Attributes.Item[i].NodeName, ANode.Attributes.Item[i].NodeValue]);
  17.       AText := Format('%s<%s%s>', [AText, nodeName, s]);
  18.       s := '';
  19.       RebuildChildNodes(ANode.FirstChild, s);
  20.       if s <> '' then
  21.         AText := Format('%s%s</%s>', [AText, s, nodeName]);
  22.     end;
  23.     ANode := ANode.NextSibling;
  24.   end;
  25. end;
  26.  
  27. [...]
  28.             while data_node <> nil do begin
  29.               nodeName := data_node.NodeName;
  30.               if nodeName = 'ss:Data' then begin
  31.                 s := '';
  32.                 RebuildChildNodes(data_node.FirstChild, s);
  33.                 StringGrid1.Cells[c, r] := s;
  34.                 inc(c);
  35.               end;
  36. [...]
Lazarus trunk / fpc 3.0.4 / all 32-bit on Win-10

totya

  • Hero Member
  • *****
  • Posts: 572
Re: Raw Data from TDOMNode
« Reply #2 on: July 17, 2019, 05:54:44 pm »
Hi master!

 :o

Seems to me it's working! Thank you!

But need my modified code too, because with the real files (I sent you one of them) column order is wrong, so need:

Code: Pascal  [Select]
  1.           if nodeName = 'Cell' then
  2.           begin
  3.  
  4.             // #1 Read index if available...
  5.             s := GetAttrValue(cell_node, 'ss:Index');
  6.             if s <> '' then
  7.               c := StrToInt(s);
  8.  
  9.             data_node := cell_node.FirstChild;
  10.  
  11.             // #2 if no child (without data), then increase index...
  12.             if data_node = nil then
  13.               Inc(c);
  14.  
  15.             while data_node <> nil do
  16.             begin
  17.               nodeName := data_node.NodeName;      
  18.  

But as I see it need for the excelxmlwrite created test.xml file too...
« Last Edit: July 17, 2019, 05:56:28 pm by totya »

wp

  • Hero Member
  • *****
  • Posts: 5990
Re: Raw Data from TDOMNode
« Reply #3 on: July 17, 2019, 06:23:55 pm »
Sorry I don't fully understand. You mean the code following the comment "// #2..."? But I thought that the xml format adds an "ss:Index" attribute to the "Cell" node when cells left to the current one are empty. Nevertheless, I think that your code is not harmful and fixes an issue when the writing software does not use the "ss:Index" attribute - I'll add it to the "official" reader.
Lazarus trunk / fpc 3.0.4 / all 32-bit on Win-10

totya

  • Hero Member
  • *****
  • Posts: 572
Re: Raw Data from TDOMNode
« Reply #4 on: July 17, 2019, 06:38:43 pm »
Sorry I don't fully understand. You mean the code following the comment "// #2..."? But I thought that the xml format adds an "ss:Index" attribute to the "Cell" node when cells left to the current one are empty. Nevertheless, I think that your code is not harmful and fixes an issue when the writing software does not use the "ss:Index" attribute - I'll add it to the "official" reader.

As I wrote, you should look a sample file what I sent to you.

#1 needed certainly.
#2 needed to, for empty cells(!), for example:

Code: Pascal  [Select]
  1. <Cell ss:StyleID="s55"/>
  2. <Cell ss:StyleID="s55"/>

But as I see "simple" sWorkbookSource/sWorksheetGrid handle this situation...

totya

  • Hero Member
  • *****
  • Posts: 572
Re: Raw Data from TDOMNode
« Reply #5 on: July 17, 2019, 06:53:52 pm »
Full example:

Code: Pascal  [Select]
  1. <Row ss:AutoFitHeight="0" ss:Height="200">
  2.     <Cell ss:Index="2" ss:StyleID="s55"><Data ss:Type="String">String0</Data></Cell>
  3.     <Cell ss:StyleID="s99"/>
  4.     <Cell ss:StyleID="s99"/>
  5.     <Cell ss:StyleID="s55"><Data ss:Type="String">String1</Data></Cell>
  6.     <Cell ss:StyleID="s55"><Data ss:Type="String">String2</Data></Cell>
  7. </Row>
  8.  

As you see the "empty" cells are important for the appropriate column position (index).

wp

  • Hero Member
  • *****
  • Posts: 5990
Re: Raw Data from TDOMNode
« Reply #6 on: July 17, 2019, 07:26:27 pm »
Yes, this works, but an application which writes such files is not very clever because it can blow up the size of the xml file enormously. This is the way how Excel write an xml file with two blank cells between two text cells. It always uses an "ss:Index" attribute to "jump" over a gap:

Code: XML  [Select]
  1.    <Row>
  2.     <Cell ss:Index="3"><Data ss:Type="String">String0</Data></Cell>
  3.     <Cell ss:Index="6"><Data ss:Type="String">String3</Data></Cell>
  4.    </Row>

I re-checked the fpspreadsheet Excel2003/XML reader/writer - they handle empty cells correctly (the writer is a bit clumsy because it always writes an "ss:Index" attribute)
Lazarus trunk / fpc 3.0.4 / all 32-bit on Win-10

totya

  • Hero Member
  • *****
  • Posts: 572
Re: Raw Data from TDOMNode
« Reply #7 on: July 17, 2019, 07:39:01 pm »
Yes, this works, but an application which writes such files is not very clever

I think clever. Because these cells are empties, but as you see, it define style for it, so if these cells got value later, style is preserved. This is the reason why works your fps component correctly these "empty" cells.

Example: I create red backround for an empty excel cell, value is nothing, but the colour is information, isn't?

So these cells are not empties really, because they contains the style.

I suspect, you thought is:

Code: Pascal  [Select]
  1.   <Row>
  2.     <Cell ss:StyleID="s55"><Data ss:Type="String">String1</Data></Cell>
  3.     <Cell></Cell>
  4.     <Cell></Cell>
  5.     <Cell ss:StyleID="s55"><Data ss:Type="String">String1</Data></Cell>
  6.    </Row>
  7.  


... but hopefully I don't see similar to this. :)

totya

  • Hero Member
  • *****
  • Posts: 572
Re: [SOLVED by wp master] Raw Data from TDOMNode
« Reply #8 on: July 17, 2019, 11:40:58 pm »
Hi master! :)

As I see, simple <Data> </Data>needs rebuild too, if contains normal xml sign, for example:

Quote
&#10;

... but no success yet with your new procedure.

wp

  • Hero Member
  • *****
  • Posts: 5990
Re: [SOLVED by wp master] Raw Data from TDOMNode
« Reply #9 on: July 18, 2019, 12:46:32 am »
Look at what is happening in the xml reader of fpspreadsheet. It is pretty complete now and works rather well. This is in unit xlsxml.pas, method TsExcelXMLReader.ReadCell.
Lazarus trunk / fpc 3.0.4 / all 32-bit on Win-10

totya

  • Hero Member
  • *****
  • Posts: 572
Re: [SOLVED by wp master] Raw Data from TDOMNode
« Reply #10 on: July 18, 2019, 06:39:32 am »
Look at what is happening in the xml reader of fpspreadsheet. It is pretty complete now and works rather well. This is in unit xlsxml.pas, method TsExcelXMLReader.ReadCell.

Hi master! :)

It doesn't work. Try this:
Quote
<Cell><Data ss:Type="String">Sample&#10;Text</Data></Cell>

Code: Pascal  [Select]
  1.     if nodeName = 'ss:Data' then begin
  2.             txt := '';
  3.             RebuildChildNodes(node, txt);
  4.             HTMLToRichText(FWorkbook, font, txt, s, cell^.RichTextParams, 'html:');
  5.           end;
  6.  

I will see it after job again :)

wp

  • Hero Member
  • *****
  • Posts: 5990
Re: [SOLVED by wp master] Raw Data from TDOMNode
« Reply #11 on: July 18, 2019, 09:14:23 am »
Sorry, your messages are a bit cryptic. What does not work? Is the '&#10;' kept in the TextContent? For me everything is ok. What is your Lazarus/fpc version? Are you working with fpspreadsheet or with your own reader? In the latter case, post some compilable code.
Lazarus trunk / fpc 3.0.4 / all 32-bit on Win-10

totya

  • Hero Member
  • *****
  • Posts: 572
Re: [SOLVED by wp master] Raw Data from TDOMNode
« Reply #12 on: July 18, 2019, 05:45:56 pm »
Sorry, your messages are a bit cryptic. What does not work? Is the '&#10;' kept in the TextContent? For me everything is ok. What is your Lazarus/fpc version? Are you working with fpspreadsheet or with your own reader? In the latter case, post some compilable code.

Hi master! :)

I'm sorry for the misunderstanding, so , the topic name is:  Raw Data from TDOMNode

So, I want to read the original (untouched) data values from the xml.

So, when this Data available in the xml:

Quote
<Data ss:Type="String">AA &#10; BB</Data>

When I read it, I want to got exactly this value:
Quote
AA &#10; BB

Sample code attached.

The result:

Quote
GetNodeValue(data_node): "AA
 BB"
data_node.TextContent: "AA
 BB"
After rebuild: String<Data ss:Type="String">AA
 BB</Data>
GetNodeValue(data_node): "AA
 BB"
data_node.TextContent: "AA
 BB"

But I wanted it the original value (raw data):

Quote
AA &#10; BB

Thank you :)

Lazarus version : fixes 2.0 branch, fpc version: fixes 3.2 branch.

wp

  • Hero Member
  • *****
  • Posts: 5990
Re: Raw Data from TDOMNode
« Reply #13 on: July 18, 2019, 07:07:58 pm »
Now I understand: fpspreadsheet must remove the special codes, and this works. But you want to keep them, and this does not work.

Of course you can pass the extracted string to the function UTF8TextToXMLText of unit fpsxmlcommon - it just replaces the line breaks and other special characters by the xml equivalents (set "ProcessLineEndings" to true in order to replace #10 by '&#10;').

Kind of cumbersome though: First the xml reader removes them, and UTF8TextToXMLText brings them back in... It would be better to force the xml reader to keep them in the first place. I don't know, however, how to do this.

But what exactly do you want to achieve? Maybe laz2_dom and laz2_xmlread are not the correct units for your purpose.

The strange output of the RebuildChildNodes procedure is due to the fact that you do not initialize the string parameter (s) passed to this function. RebuildChildNodes is a recursive function and always adds the node name, node attributes, and node content to this string which gets longer with every recursion level. You simply must set s := '' before you call RebuildChildNodes:
Code: Pascal  [Select]
  1.               if nodeName = 'Data' then
  2.               begin
  3.                 s := GetAttrValue(data_node, 'ss:Type');
  4.                 if (s = 'String') or (s = 'Number') then
  5.                 begin
  6.                   WriteLN(Format('GetNodeValue(data_node): "%s"', [GetNodeValue(data_node)]));
  7.                   WriteLN(Format('data_node.TextContent: "%s"', [data_node.TextContent]));
  8.                   s := '';             // <--------------------- ADDED -----------------<
  9.                   RebuildChildNodes(data_node, s);
  10.                   WriteLN('After rebuild: '+s);
  11.                   WriteLN(Format('GetNodeValue(data_node): "%s"', [GetNodeValue(data_node)]));
  12.                   WriteLN(Format('data_node.TextContent: "%s"', [data_node.TextContent]));
  13.  
  14.                   ReadLN;
  15.                 end
  16.                 else
  17.                   WriteLN('');
  18.               end;
Lazarus trunk / fpc 3.0.4 / all 32-bit on Win-10

totya

  • Hero Member
  • *****
  • Posts: 572
Re: Raw Data from TDOMNode
« Reply #14 on: July 18, 2019, 08:05:24 pm »
But what exactly do you want to achieve? Maybe laz2_dom and laz2_xmlread are not the correct units for your purpose.

Just I want to a read an (office created) xml files with original/untouched values, next step modify/select/copy values, then write to back, or write to the new file. These "special" codes must stay in code. Otherwise in xml must swap the critical chars, see:

https://www.w3schools.com/xml/xml_syntax.asp
See Entity References section.
These "chars" unfortunatelly converted too while I read values...

My own interpreter as I said under development... but thanks for the many help, and the ideas, master! :) And the great fps component now can handle +1 format :)