This has turned into a mess ...
Ask: Given a string, I want to determine if a left single curly quote is an apostrophe ...
Problem: I've landed into UTF swamp ...
Here’s the code for that function ... obviously, it’s a heuristic since there’s no foolproof way to figure out in English if a quote is an apostrophe or the closure of a quoted sentence:
‘Hello, how are you?’ ==> simple quote
I wouldn’t ==> apostrophe
Whatcha doin’ ==> apostrophe (missing ?)
In any event, my problem is with the conversion between different string types: specifically, I want to use Character.IsLetter, which requires a WideChar. I’ve used UTF8CodePointToUnicode to obtain the code point and then performed a straight WideChar conversion (WideChar(CodePoint)) to convert it into a WideChar, but that’s failing.
function IsApostrophe(const S: String; const aPos: Integer): Boolean;
var aPrevChar: WideChar;
aNextChar: WideChar;
CodePoint: LongInt;
aLen: Integer;
begin
If UTF8CompareStr(UTF8Copy(S,aPos,1),RightCurlySingleQuote) <> 0 then Exit(False);
aLen := UTF8Length(S);
//If at start or end, assume it’s a quote (even if it’s the wrong quote.)
If (aPos = 1) or (aPos = aLen)
then Exit(False);
UTF8CodepointToUnicode(@S[aPos-1], Codepoint);
aPrevChar := WideChar(CodePoint);
UTF8CodepointToUnicode(@S[aPos+1], Codepoint);
aNextChar := WideChar(CodePoint);
If Not Character.IsLetter(aNextChar)
then Result := (Lowercase(aPrevChar) = 's') or
((aPos > 3) and (Lowercase(UTF8Copy(S,aPos-2,2)) = 'in')) or
IsDigit(aNextChar)
else If IsLetter(aPrevChar)
then Exit(True)
else if (LowerCase(UTF8Copy(S,aPos+1,3)) = 'tis' ) or
(LowerCase(UTF8Copy(S,aPos+1,4)) = 'twas' ) or
(LowerCase(UTF8Copy(S,aPos+1,5)) = 'cause') or
(LowerCase(UTF8Copy(S,aPos+1,2)) = 'em' ) or
(LowerCase(UTF8Copy(S,aPos+1,3)) = 'til' ) or
(LowerCase(UTF8Copy(S,aPos+1,5)) = 'round') or
(LowerCase(UTF8Copy(S,aPos+1,4)) = 'fore' )
then Exit(True);
end;
Suggestions?