Forum > Networking and Web Programming

Error in ReadHTMLFile

(1/2) > >>

dseligo:
ReadHTMLFile procedure (from SAX_HTML unit) throws exception when reading this HTML text:

--- Code: HTML5  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---<!DOCTYPE html><html lang="en"><head><title>Test</title></head><body><div title="test<"></div></body></html>
https://validator.w3.org says that above HTML is without errors or warnings.

Should I report a bug?

Test program:

--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---program htmlproblem; {$mode objfpc}{$H+} uses SysUtils, Classes, DOM, DOM_HTML, SAX_HTML; var  sHTML: String =        '<!DOCTYPE html><html lang="en"><head><title>Test</title></head><body>' +        '<div title="test<"></div>' +        '</body></html>';  HTMLDocument: THTMLDocument;  iPos: Integer; procedure ReadHTML(AHTML: String);begin  WriteLn('Reading: ', AHTML);  try    try      ReadHTMLFile(HTMLDocument, TStringStream.Create(AHTML));      WriteLn('Read OK.');    except      on E: Exception do      begin        WriteLn('Read not OK: ', E.Message);        If E is EDOMError then          WriteLn('EDOMError code: ', (E as EDOMError).Code);      end;    end;    WriteLn;  finally    HTMLDocument.Free;  end;end; begin  ReadHTML(sHTML);   // remove less-than sign  iPos := Pos('<">', sHTML);  Delete(sHTML, iPos, 1);   ReadHTML(sHTML);end.

marcov:
Usually < and > and & need to be escaped.

rvk:
Mentioning the exact exception would be helpful too ;)

Technically the < is allowed in the values of an attribute.
But it is strongly discouraged and advised you encode those characters (like in normal text).

But yes, it is allowed so if that's the reason for the exception you could technically consider it a bug.

dseligo:

--- Quote from: rvk on November 15, 2023, 10:58:46 pm ---Mentioning the exact exception would be helpful too ;)

--- End quote ---

Exception is 'EDOMError in DOMDocument.CreateElement', code is 5 (INVALID_CHARACTER_ERR).


--- Quote ---But yes, it is allowed so if that's the reason for the exception you could technically consider it a bug.

--- End quote ---

I created bug report: https://gitlab.com/freepascal.org/fpc/source/-/issues/40523

I forgot to mention compiler version: Free Pascal Compiler version 3.3.1-14158-g6fda6f79d8 [2023/10/23] for x86_64.

dseligo:

--- Quote from: marcov on November 15, 2023, 10:50:09 pm ---Usually < and > and & need to be escaped.

--- End quote ---

I have no influence on HTML in question, but (as rvk also said) less-than sign doesn't need to be escaped if value of attribute is quoted.

Navigation

[0] Message Index

[#] Next page

Go to full version