I am a newbie using Indy and do not know how to solve my problem.
Im Not sure if publish my doubt in the Lazarus forum or contact to the Indy Forum.
I need to read the contents of an HTML page (requesting 3 parameters) to extract some data and I am using TidHTTP for it.
The code I have is like this:
param := TStringList.Create;
param.Add('Param1=value1');
param.Add('Param2=value2');
param.Add('Param3=value3');
Memo1.Text := idHTTP1.Post(MYURL, param);
FreeAndNil(param);
For security reasons I am not able to put both the true parameters as the URL.
The code works. I get response, and receive HTML. But in some tests I noticed something strange. I need some data who have Latin characters, such as Ñ, but in HTML instead of Ñ appearing the question mark ?. I am Argentine, and therefore it is important to me regain Ñ as other "special" characters that are used in Spanish.
For example:
<td align="center" bgColor="ghostwhite" colspan="5"><b><font face="Arial, Helvetica, sans-serif" size="1" > VILLAFA?E BLANCA </font> </b></td>
And should read VILLAFAÑE BLANCA.
This data is extracted from a database, and its encoding is unknown to me.
In other parts of HTML, the text is fixed and not armed dynamically I see the entities HTML perfectly. As for example:
<td align="center" bgColor="gray"><font face="Arial, Helvetica, sans-serif" size="1" color="#FFFFFF">Tipo Trámite </font></td>
In the META section of the page I read this:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
A colleague from another forum suggests to me that the problem is possibly on the web and that is in the database using UTF8 and in the page use ISO 8859-1. This causes the data read is doubly coded.
The strange thing is that I also did a test trying to convert the HTML returned to UTF8, but my test shows me that is already encoded in UTF8 and needless conversion:
var param, html1, html2: TStringList;
encode: string;
begin
param := TStringList.Create;
html1 := TStringList.Create;
param.Add(xxx1);
param.Add(xxx2);
param.Add(xxx3);
html1.Text := idHTTP1.Post(MYURL, param);
encode := GuessEncoding(html1.Text);
if encode <> EncodingUTF8
then begin
html2 := TStringList.Create;
html2.Text := ConvertEncoding(html1.Text, encode, EncodingUTF8);
html2.SaveToFile(MYTEST);
ShowMessage('Convert to a UTF8');
FreeAndNil(html2);
end
else begin
html1.SaveToFile(MYTEST);
ShowMessage('Already is UTF8');
end;
freeAndNil(param);
freeAndNil(html1);
I've tried to convert Windows-1252 to UTF8 and from UTF8 to ISO 8859-1 to no avail.
If I access the page from Firefox or Chrome and see in the source code HTML the letter Ñ appears! I do not understand what happens in TidHTTP. If it is a bug, or am I doing wrong.
Actually use CodeTyphon version 5.1 and the version of Indy is 24/10/2014 (format dd/mm/aaaa) SVN Rev 5201. In Windows 8.1.
It could also be a problem of encoding the OS? The code:
ShowMessage(GetDefaultTextEncoding);
shows that I use cp1252.
I hope that with this information you can understand my problem.
If necessary explain or give further details please them to indicate I would be grateful.
Regards,