* * *

Author Topic: TFileStream with Unicode file/path characters  (Read 4293 times)

ausdigi

  • Jr. Member
  • **
  • Posts: 52
  • Programmer
    • RNR
TFileStream with Unicode file/path characters
« on: July 31, 2012, 06:24:27 am »
I encountered the situation where file names might contain "extended" unicode characters (ie. not like a degrees symbol: ° but multi-byte characters/symbols, eg Alt+NumPlus+Num3 ). Once I got around the problems of using FindFirstUtf8 instead of FindFirst I couldn't open these files using a TFileStream. Using Utf8ToSys replaces the extended unicode characters with question marks (it did convert the degrees symbol I think).

The problem is of course that the API used by TFileStream ends up being CreateFileA. One solution I found whilst searching was to use the cAlternateFileName returned in the search record, but this I feel is just a workaround (and probably the alternate name would need to be found for each directory in case they too contained extended unicode characters) so I ended up writing my own class (called TFileStreamUTF8 - based on TFileStream but derived from THandleStream). I've provided it below in case anybody else wants it - it could really be added to Lazarus/FPC too (classesh.inc & streams.inc) - maybe if I knew how I would one day.

However during writing this I had some issues with TFileStream as currently implemented (I have Lazarus 0.9.30.4 with FPC 2.6.0 SVN:35940):
  • TFileStream declares two overloaded constructors. Both implement full code but the one without the Rights parameters is exactly the same except it uses a "magic" 438 when calling FileCreate - why doesn't this constructor just call the other constructor with this number as the parameter? [That will probably create another instance but rather than duplicating code the body could be extracted into a single private/protected method to avoid duplication like I've done for TFileStreamUTF8.]
  • These constructors don't call an inherited constructor. Its likely they don't need to as none of the ancestors appear to dynamically create data but I think inherited should be called. The only field contained in the ancestor is the stream Handle and I assume the default object constructor will create space for this member variables.
  • Most/all TStream descendents have private member fields (eg. THandleStream has a private FHandle). This means descendents can't access these members properly (if they aren't exposed with read and write property methods). The way I got around it was by calling the inherited constructor but I could have done it differently if I had write access to FHandle.

Code: [Select]
unit uFileStreamUtf8;

{
   This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
}

{$mode objfpc}{$H+}

interface

uses
  Classes, SysUtils;

type
  TFileStreamUTF8 = class(THandleStream)
  private
  protected
    FFileName : String;
    procedure DoCreate(const AFileName: string; Mode: Word; Rights: Cardinal);
  public
    constructor Create(const AFileName: string; Mode: Word);
    constructor Create(const AFileName: string; Mode: Word; Rights: Cardinal);
    destructor Destroy; override;
    property FileName : String Read FFilename;
    property Handle;
  end;

function FileCreateUtf8 (Const AFileName : String; ShareMode : Integer; Rights : Integer) : THandle;


implementation

uses
  Windows, rtlconsts;


const  { redeclared because they are F#$%ing private in SysUtils }
  AccessMode: array[0..2] of Cardinal  = (
    GENERIC_READ,
    GENERIC_WRITE,
    GENERIC_READ or GENERIC_WRITE);
  ShareModes: array[0..4] of Integer = (
             0,
             0,
             FILE_SHARE_READ,
             FILE_SHARE_WRITE,
             FILE_SHARE_READ or FILE_SHARE_WRITE);


function FileCreateUtf8(Const AFileName : String; ShareMode : Integer; Rights : Integer) : THandle;
begin
  Result := CreateFileW(PWideChar(AFileName), GENERIC_READ or GENERIC_WRITE,
                       dword(ShareModes[(ShareMode and $F0) shr 4]), nil, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, 0);
end;


function FileOpenUtf8(Const AFileName : string; Mode : Integer) : THandle;
var
  fname : UnicodeString;
begin
  fname := UTF8Decode(AFileName);
  result := Windows.CreateFileW(PWideChar(fname), dword(AccessMode[Mode and 3]),
                       dword(ShareModes[(Mode and $F0) shr 4]), nil, OPEN_EXISTING,
                       FILE_ATTRIBUTE_NORMAL, 0);
  //if fail api return feInvalidHandle (INVALIDE_HANDLE=feInvalidHandle=-1)
end;


constructor TFileStreamUTF8.Create(const AFileName: string; Mode: Word);
begin
  DoCreate(AFileName, Mode, 438);  // wtf is 438?
end;


constructor TFileStreamUTF8.Create(const AFileName: string; Mode: Word; Rights: Cardinal);
begin
  DoCreate(AFileName, Mode, Rights);
end;


procedure TFileStreamUTF8.DoCreate(const AFileName: string; Mode: Word; Rights: Cardinal);
var
  h : THandle;
begin
  If (Mode and fmCreate) > 0 then
    h := FileCreateUtf8(AFileName, Mode, Rights)
  else
    h := FileOpenUtf8(AFileName, Mode);

  If (THandle(h) = feInvalidHandle) then
    If Mode=fmcreate then
      raise EFCreateError.createfmt(SFCreateError,[AFileName])
    else
      raise EFOpenError.Createfmt(SFOpenError,[AFilename]);

  inherited Create(h);
  FFileName := AFileName;
  if handle = 0 then
    h:=1;
end;


destructor TFileStreamUTF8.Destroy;
begin
  FileClose(Handle);
end;


end.
Win10/64: CT 5.7 32&64

KpjComp

  • Hero Member
  • *****
  • Posts: 680
Re: TFileStream with Unicode file/path characters
« Reply #1 on: July 31, 2012, 11:22:04 am »
I think part of the problems is that FPC/Lazarus is cross platform, so some logic might seem strange.

Ideally I believe the TFileStream wants converting to use use UTF8.  This shouldn't cause too much trouble as UTF8 is backwards compatible to 7bit ASCII.

Could you try setting up a ticket here -> http://bugs.freepascal.org

 

Recent

Get Lazarus at SourceForge.net. Fast, secure and Free Open Source software downloads Open Hub project report for Lazarus