Lazarus

Programming => General => Topic started by: Germo on November 18, 2024, 03:20:17 pm

Title: Error during XMLREAD
Post by: Germo on November 18, 2024, 03:20:17 pm
For reading an XML-text into a DOM-document i'm using XMLREAD.
Mostly that works alright, except when the source puts a '&' in a nodevalue.
Something like <title>Mr&Mrs Smith</title> gives me an error:
'Project ... raised exception class 'EXMLReadError' with message: In 'stream:' (line .. pos 14): Expected ";"'

Code: Pascal  [Select][+][-]
  1. procedure demonstrate(Name : String);
  2. Var
  3.   SourceS : TStringStream;
  4.   XMLDoc : TXMLDocument;
  5. begin
  6.   SourceS := TStringStream.Create();
  7.   SourceS.LoadFromFile(Name);
  8.   Try
  9.     ReadXMLFile(XMLDoc, SourceS);
  10. ....
  11.  

How can i prevent the error raising?
(and modifying the xml-source is not an option)
Title: Re: Error during XMLREAD
Post by: marcov on November 18, 2024, 03:47:12 pm
Afaik & needs to be escaped as &amp;, so that code is probably not proper XML

Code: [Select]
"   &quot;
'   &apos;
<   &lt;
>   &gt;
&   &amp;
Title: Re: Error during XMLREAD
Post by: wp on November 18, 2024, 03:47:27 pm
I am afraid, you must modify the file... Replace '&' by '&amp;' (https://stackoverflow.com/questions/1091945/what-characters-do-i-need-to-escape-in-xml-documents)
Title: Re: Error during XMLREAD
Post by: TRon on November 18, 2024, 04:05:18 pm
How can i prevent the error raising?
You can't (well actually you can but the file will not be continued to parse)

Quote
(and modifying the xml-source is not an option)
The only option that I know of is to make a request to one of the working groups and ask if for your specific use case they want to change the XML standards.

But perhaps there is a more practical solution to go about it, like f.e. not accepting any files not adhering to the XML standard or write your own custom XML parser.
Title: Re: Error during XMLREAD
Post by: dsiders on November 18, 2024, 04:40:11 pm
How can i prevent the error raising?
You can't (well actually you can but the file will not be continued to parse)

Quote
(and modifying the xml-source is not an option)
The only option that I know of is to make a request to one of the working groups and ask if for your specific use case they want to change the XML standards.

But perhaps there is a more practical solution to go about it, like f.e. not accepting any files not adhering to the XML standard or write your own custom XML parser.

Or using a Sax-based parser instead of a Dom-based one.
Title: Re: Error during XMLREAD
Post by: Thaddy on November 18, 2024, 06:47:04 pm
Or using a Sax-based parser instead of a Dom-based one.
Although SAX based parsing is a bit more tolerant - in the sense that it can probably recover - to misformed XML that makes it not not-misformed XML.
As all the other answers already explained, it is misformed XML. & -> &amp and that is the only correct answer.
So even SAX based parsing would be ill-adviced. It is simply not XML. It is a look alike.
Title: Re: Error during XMLREAD
Post by: Germo on November 18, 2024, 06:57:17 pm
Afaik & needs to be escaped as &amp;, so that code is probably not proper XML

Code: [Select]
"   &quot;
'   &apos;
<   &lt;
>   &gt;
&   &amp;

Are these the only 5 xml-instances with an ampersand?
Title: Re: Error during XMLREAD
Post by: TRon on November 18, 2024, 07:03:32 pm
Are these the only 5 xml-instances with an ampersand?
W3schools, XML syntax (https://www.w3schools.com/XML/xml_syntax.asp) paragraph Entity References

Don't forget to read about this (https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references) as well.
Title: Re: Error during XMLREAD
Post by: Germo on November 18, 2024, 07:31:38 pm
Thank you, all
Title: Re: Error during XMLREAD
Post by: wp on November 18, 2024, 07:49:03 pm
Pass your input string (after reading it from the stream) through the following function which replaces isolated '&' by '&amp;':
Code: Pascal  [Select][+][-]
  1. function FixXML(XMLStr: String): String;
  2. const
  3.   AMP = 'amp;';
  4.   ESCAPED: array[0..4] of string = ('&quot;', '&apos;', '&lt;', '&gt;', '&'+AMP);
  5. var
  6.   i, j, k: Integer;
  7.   isolated: Boolean;
  8. begin
  9.   SetLength(Result, Length(XMLStr));
  10.   j := 1;
  11.   for i := 1 to Length(XMLStr) do
  12.   begin
  13.     Result[j] := XMLStr[i];
  14.     inc(j);
  15.     if XMLStr[i] = '&' then
  16.     begin
  17.       isolated := True;
  18.       for k := 0 to High(ESCAPED) do
  19.         if (Copy(XMLStr, i, Length(ESCAPED[k])) = ESCAPED[k]) then
  20.         begin
  21.           isolated := False;
  22.           break;
  23.         end;
  24.       if isolated then
  25.       begin
  26.         SetLength(Result, Length(Result) + Length(AMP));
  27.         Move(AMP, Result[j], Length(AMP));
  28.         inc(j, Length(AMP));
  29.       end;
  30.     end;
  31.   end;
  32. end;
  33.  
  34. procedure TForm1.Button1Click(Sender: TObject);
  35. Var
  36.   FileS: TFileSteam;
  37.   SourceS : TStringStream;
  38.   XMLDoc : TXMLDocument = nil;
  39.   xml: String;
  40. begin
  41.   FileS := TFileStream.Create(FileName, fmOpenRead + fmShareDenyWrite);
  42.   try
  43.     SetLength(xml, FileS.Size);
  44.     FileS.Read(xml[1], FileS.Size);
  45.     SourceS := TStringStream.Create(FixXML(xml));
  46.     Try
  47.       ReadXMLFile(XMLDoc, SourceS);
  48.       // do something with the XMLDoc ...
  49.     Finally
  50.       XMLDoc.Free;
  51.       SourceS.Free;
  52.     end;
  53.   finally
  54.     FileS.Free;
  55.   end;
  56. end;
Title: Re: Error during XMLREAD
Post by: Germo on November 19, 2024, 09:29:24 am
Pass your input string (after reading it from the stream) through the following function which replaces isolated '&' by '&amp;':
Code: Pascal  [Select][+][-]
  1. function FixXML(XMLStr: String): String;
  2. ...
  3.  

Thanks for the thinking along, but the rest of the string has some nodevalue's with CDATA in it. These ampersands need to remain, so your solution does not work in my case.
but perhaps someone can use it

Title: Re: Error during XMLREAD
Post by: Thaddy on November 19, 2024, 11:03:16 am
These ampersands need to remain, so your solution does not work in my case.
Then do not call it XML, because it isn't.
The last answer by @wp was as close as you can get to repair something in a wrong format that looks a bit like XML.
In your case you need probably the reverse function too, but that is easy based on his code.
You should do that yourself.
Is it too difficult to write:
Code: Pascal  [Select][+][-]
  1. procedure MutilateValidXML(var list:Tstringlist);
  2. begin
  3.    list.text := stringreplace(list.text, '&amp','&',[rfReplaceAll,rfIgnorecase]);
  4. end;
That is called a one-liner. Do it for the other escapes too, that makes a five liner.. :P
Title: Re: Error during XMLREAD
Post by: wp on November 19, 2024, 11:24:49 am
the rest of the string has some nodevalue's with CDATA in it. These ampersands need to remain, so your solution does not work in my case.
Then take it as an idea how to extend it yourself.
Title: Re: Error during XMLREAD
Post by: Thaddy on November 19, 2024, 11:30:14 am
For that case you can use the type helper for string: the parse overloads.
Title: Re: Error during XMLREAD
Post by: BeniBela on November 19, 2024, 06:20:46 pm
or write your own custom XML parser.

I did that: https://github.com/benibela/internettools

Code: [Select]
procedure MutilateValidXML(var list:Tstringlist);
begin
   list.text := stringreplace(list.text, '&amp','&',[rfReplaceAll,rfIgnorecase]);
end;
That is called a one-liner. Do it for the other escapes too, that makes a five liner.. :P

it already looks like 4 lines
Title: Re: Error during XMLREAD
Post by: Thaddy on November 19, 2024, 06:59:51 pm
It will also mutilate your xml and whitespace and no-ops do not count... 8-) :P
The problem stays that he thinks he is handling XML, but he does not.

(btw, your parser can recover in most cases from bad formatted XML like stuff, I tried it)

But when does OP finally understand he is not handling XML but something that resembles it?
And that he then loses much of its advantages? - personal remark:if any- because he now needs jump through hoops to first store proper XML from malformed input, and read it back to properly interpret it the way he wants it to be? Seems a case of lost time to him.
Title: Re: Error during XMLREAD
Post by: TRon on November 19, 2024, 07:06:42 pm
@BeniBela:
Yes, I am aware (your tools are actually very helpful btw)

But I side with Thaddy on this one. You can only learn from mistakes if you solve them yourself. Note that TS seem to refuse to budge the viewpoint in any way.

(And yes I am aware that half (if not more) of the internets do not take these standards serious).
Title: Re: Error during XMLREAD
Post by: Germo on November 19, 2024, 11:33:42 pm
But when does OP finally understand he is not handling XML but something that resembles it?
Thanks for knowing what i do or do not understand.

Note that TS seem to refuse to budge the viewpoint in any way.
And what gave you that idea?

I just asked a question on this forum, and when i accept the outcome i just get a load of shit over me?


My solution:
  The XML is not valid. I catch the raised error and leave it at that.
  The next read the XML may be valid, so I use that one for my app


Title: Re: Error during XMLREAD
Post by: jamie on November 20, 2024, 01:08:00 am
If you are creating these original XML strings, and insist on using esc letters because you said of some data CDATA?

I would suggest you should look into using Base64 strings to represent the string. You only need to use base64 to decode it.

 The Base64 if memory serves, does not use the & in its conversion so you should be able to present a "&" letter and it will convert it.

Jamie
Title: Re: Error during XMLREAD
Post by: Germo on November 20, 2024, 03:21:53 am
If you are creating these original XML strings, ..........

Nobody said i am creating these strings. I only read them (from the internet)
TinyPortal © 2005-2018