Forum > General
The warning is confusing! Please help
(1/1)
EganSolo:
I have to process text that contains non-ASCII characters. I understand that the String type in Free Pascal is keyed to UTF-8 by default. I don’t require Delphi compatibility, so mode Delphi isn’t essential for me.
Consider this bit of code:
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---program UniCode;{$H+}{$codepage utf8}uses Classes, SysUtils, LazUTF8; function IsNonAscii(const S : String; const aPos: integer): Boolean;const NonAsciiChar = 'é';//comment the line above and uncomment the line below to get rid of the warning://const NonAsciiChar : String = 'é'; var c : String;begin c := UTF8Copy(s,apos,1); Result := c = NonAsciiChar;end; begin end.
When I compile this code, the compiler issues the following warning: UniCode.lpr(12,13) Warning: Implicit string type conversion from “AnsiString” to “WideString”. However, if I switch to the typed constant, the warning disappears.
I don’t get it: even though I’m specifying $codepage utf8, why is the compiler defaulting string constants to WideString instead of String? Is there a way to switch that?
When I compile this, I get
Martin_fr:
The codepage directive afaik does not control what type your constant gets.
It controls, if the bytes in your sourcefile are converted, and what from.
See the example.
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---program P1; {$Codepage cp1250} { $codepage utf8}type U8 = type AnsiString(CP_UTF8);var s: U8;begin s := #$B1#$B1; writeln(s);end.
The string is always utf8.
But your source is not, with that directive, so the $B1 (same if you actually had a (or two) single char with that ordinal value) is converted from the codepage CP1250. And it prints
--- Code: Text [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---±±
But if you said that the source was utf8 already, then the $B1 is invalid utf8, and print ??
Btw
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---const NonAsciiChar = string('é');
Navigation
[0] Message Index