Recent

Author Topic: TMemoryStreamUTF8 cannot LoadFromFile  (Read 3881 times)

martinrame

  • Full Member
  • ***
  • Posts: 119
TMemoryStreamUTF8 cannot LoadFromFile
« on: June 13, 2022, 11:02:05 pm »
Hi, I need to load a file whose name contains accented characters, such as:

"/mnt/Informes/202205/PEÑALOZA JORGE.pdf"

The fpc program is a CGI, which runs in a FreeBSD system. The files reside in a Windows server mounted on /mnt/Informes on the FreeBSD system.

The filename is returned from a database query, and I store it in a string, such as:

Code: Pascal  [Select][+][-]
  1. lFile := lQuery.FieldByName('filename').AsString;

Then, I check if the file exists:

Code: Pascal  [Select][+][-]
  1. if FileExists(lFile) then
  2. begin
  3. ...
  4. end;

If the file exists I do this:

Code: Pascal  [Select][+][-]
  1. lPdfStream := TMemoryStreamUTF8.Create;
  2. lPdfStream.LoadFromFile(lFile);
  3. ...
  4.  

The file exists, but when I call LoadFromFile I get "Unable to open file ...".

What can I do to open it?

martinrame

  • Full Member
  • ***
  • Posts: 119
Re: TMemoryStreamUTF8 cannot LoadFromFile
« Reply #1 on: June 13, 2022, 11:15:35 pm »
I also tried with a TFileStream (and TFileStreamUTF8), with the same results.

martinrame

  • Full Member
  • ***
  • Posts: 119
Re: TMemoryStreamUTF8 cannot LoadFromFile
« Reply #2 on: June 14, 2022, 12:39:44 am »
I created a simple program to traverse all files in the directory and got the names with "ñ" where replaced with "??":

Code: Pascal  [Select][+][-]
  1. procedure AddAllFilesInDir(const Dir: string);
  2. var
  3.   SR: TSearchRec;
  4. begin
  5.   if FindFirst(IncludeTrailingBackslash(Dir) + '*.*', faAnyFile or faDirectory, SR) = 0 then
  6.     try
  7.       repeat
  8.         if (SR.Attr and faDirectory) = 0 then
  9.           ListBox1.Items.Add(SR.Name)
  10.         else if (SR.Name <> '.') and (SR.Name <> '..') then
  11.           AddAllFilesInDir(IncludeTrailingBackslash(Dir) + SR.Name);  // recursive call!
  12.       until FindNext(Sr) <> 0;
  13.     finally
  14.       FindClose(SR);
  15.     end;
  16. end;

This is the result:

Quote
..
PERALTA ANDRES.pdf
PE??ALOZA JORGE.pdf
..

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1090
  • Professional amateur ;-P
Re: TMemoryStreamUTF8 cannot LoadFromFile
« Reply #3 on: June 14, 2022, 02:07:32 am »
Hey Martin,

Let me start by saying that this is not an answer to your problem, it's more of pointing you in a direction so that you can solve your problem.

With that said, if you look at this page: FindFirst, you'll find that there are 2 overloads for the same function.

One uses a UnicodeString and the other uses a RawByteString. And both return a different type of SearchRec.

Why can't I help you more? Well, that's because I've never dug deep into the magic that is the transformation the type String gets involved in when it's passed around with UnicodeString, WideString, UTF8String and all the others that I'm not aware of.

Because of that the issue could be anywhere since you're passing around a string with UTF content, but some of the LCL objects don't really know how to deal with that.

And good luck trying to identify what type of string it is. And how or where it gets mangled. And if you're planing on cross platform, if you get it running on Windows, then nothing guarantees you it will work on Linux due mainly to the OS, not the FCL or LCL.

So yeah, while it's a pain, and believe me, I'm Portuguese and we have to deal with the same issues, it's something that you'll have to dig around and see what type of string is being returned an then do stuff accordingly.

Unfortunately, you'll have to wait for the people that have more experience with UTF containing strings, cuz I'm pretty crap at it myself, and for that I apologise profusely.

Cheers,
Gus
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

PascalDragon

  • Hero Member
  • *****
  • Posts: 5444
  • Compiler Developer
Re: TMemoryStreamUTF8 cannot LoadFromFile
« Reply #4 on: June 14, 2022, 08:57:57 am »
The fpc program is a CGI, which runs in a FreeBSD system.

Make sure that your main program uses either cwstring or fpwidestring. If that's not enough please provide the output of the locale (at least I hope that also exists on FreeBSD).

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11351
  • FPC developer.
Re: TMemoryStreamUTF8 cannot LoadFromFile
« Reply #5 on: June 14, 2022, 11:03:28 am »
Afaik on all *nix, the apis don't transcode filenames, you get them binary as written in the filesystem directory structure.

But if you then have filesystems with differing encodings (e.g. freebsd native utf8, and Windows some ansi encoding), you have a problem.

The best bet is to look into mount options for the said SMB share, and see if you can fix the encoding the FS driver gives to the kernel on that end.

martinrame

  • Full Member
  • ***
  • Posts: 119
Re: TMemoryStreamUTF8 cannot LoadFromFile
« Reply #6 on: June 14, 2022, 12:36:11 pm »
Hi, the locale is this:

Code: Pascal  [Select][+][-]
  1. LANG=en_US.UTF-8
  2. LC_CTYPE="en_US.UTF-8"
  3. LC_COLLATE="en_US.UTF-8"
  4. LC_TIME="en_US.UTF-8"
  5. LC_NUMERIC="en_US.UTF-8"
  6. LC_MONETARY="en_US.UTF-8"
  7. LC_MESSAGES="en_US.UTF-8"
  8. LC_ALL=

The smb filesystem is mounted specifying this encodings (in /etc/nsmb.conf):

Code: Pascal  [Select][+][-]
  1. charsets=utf-8:iso-8859-1

That means, the local filesystem is UTF8, and the remote (Windows) is ISO-8850-1.

martinrame

  • Full Member
  • ***
  • Posts: 119
Re: TMemoryStreamUTF8 cannot LoadFromFile
« Reply #7 on: June 14, 2022, 12:38:00 pm »
The fpc program is a CGI, which runs in a FreeBSD system.

Make sure that your main program uses either cwstring or fpwidestring. If that's not enough please provide the output of the locale (at least I hope that also exists on FreeBSD).

Hi PascalDragon, how can I tell which type of strings?.

PascalDragon

  • Hero Member
  • *****
  • Posts: 5444
  • Compiler Developer
Re: TMemoryStreamUTF8 cannot LoadFromFile
« Reply #8 on: June 14, 2022, 01:26:09 pm »
The fpc program is a CGI, which runs in a FreeBSD system.

Make sure that your main program uses either cwstring or fpwidestring. If that's not enough please provide the output of the locale (at least I hope that also exists on FreeBSD).

Hi PascalDragon, how can I tell which type of strings?.

This is not about types of strings. You need to put either cwstring or fpwidestring into the uses-clause of your main program (first place; second or third if you also use cthreads and/or cmem).

The locale output looks good, FPC should detect that as UTF-8 then (as long as you use either cwstring or fpwidestring).

martinrame

  • Full Member
  • ***
  • Posts: 119
Re: TMemoryStreamUTF8 cannot LoadFromFile
« Reply #9 on: June 14, 2022, 01:35:55 pm »
Thanks PascalDragon, but unfortunately using
Quote
cwstring
or
Quote
fpwidestring
doesn't change the result.

martinrame

  • Full Member
  • ***
  • Posts: 119
Re: TMemoryStreamUTF8 cannot LoadFromFile
« Reply #10 on: June 14, 2022, 02:12:45 pm »
I did an: ls /directory/file|hexdump -C to see the actual ascii codes of the filename and found the Ñ is replaced to the hex a5, then I did this:

Code: Pascal  [Select][+][-]
  1. lFile := AnsiReplaceStr(lFile, 'Ñ', chr($A5));                                        
  2. lPdfStream.LoadFromFile(lFile);

But still I can't open the file with LoadFromFile.

martinrame

  • Full Member
  • ***
  • Posts: 119
Re: TMemoryStreamUTF8 cannot LoadFromFile
« Reply #11 on: June 14, 2022, 03:51:09 pm »
Traversing char by char, the string returned by the database I'm getting this:

Code: Pascal  [Select][+][-]
  1. P=80
  2. E=69
  3. =195
  4. =145
  5. A=65
  6. L=76
  7. O=79
  8. Z=90
  9. A=65
  10.  

Compared to the output of hexdump I get:

Code: Pascal  [Select][+][-]
  1. ls /mnt/Informes/202205/PE*19900.pdf|hexdump -C -s 36 -n 8
  2. 00000024  2f 50 45 a5 41 4c 4f 5a                           |/PE.ALOZ|
  3. 0000002c
  4.  

The difference is the file name contains a5 (Ascii 165) in the place of the Ñ, while the string returned by the database contains ASCII 195 followed by 145.

It looks like the database is effectively using UTF8, while the file naming is ASCII. Is there a way to convert between those incodings?

martinrame

  • Full Member
  • ***
  • Posts: 119
Re: TMemoryStreamUTF8 cannot LoadFromFile
« Reply #12 on: June 14, 2022, 04:09:21 pm »
I replaced that with:

Code: Pascal  [Select][+][-]
  1. lFile := StringReplace(lFile, #195#145, #165, [rfReplaceAll]);

But still getting the "Unable to open file...No such file or directory".

martinrame

  • Full Member
  • ***
  • Posts: 119
Re: TMemoryStreamUTF8 cannot LoadFromFile
« Reply #13 on: June 14, 2022, 04:28:56 pm »
The funny thing is that FileExists returns true when I check both the UTF8 file name and the one whith replaced #195#145 with #165. But LoadFromFile raises the "Unable to open file ... No such file or directory" error.
« Last Edit: June 15, 2022, 01:20:45 pm by martinrame »

martinrame

  • Full Member
  • ***
  • Posts: 119
Re: TMemoryStreamUTF8 cannot LoadFromFile
« Reply #14 on: June 16, 2022, 02:54:34 am »
I still can't find a solution, but I'm still trying.

I made this simple recursive file search, for finding part of the file. I know all files end with a number followed by .pdf, so I call:

Code: Pascal  [Select][+][-]
  1. FindFile('/path/to/file/', '1234.pdf');

That finds the first file that ends in '1234.pdf' and I try to open using LoadFromFile, but I still get the famous "Unable to open file..." ONLY ON FILES WITH SPECIAL CHARACTERS, like Ñ. Why I can't open the file, when FindFirst/FindNext finds it?.

Code: Pascal  [Select][+][-]
  1. function TTest.FindFile(const Dir: string; AMatch: string): string;                
  2. var                                    
  3.   SR: TUnicodeSearchRec;                                                                          
  4.   lFiles: string;                          
  5. begin                                                                                            
  6.   if FindFirst(IncludeTrailingBackslash(Dir) + '*.pdf', faAnyFile or faDirectory, SR) = 0 then    
  7.     try                                                                                          
  8.       repeat                                                                                      
  9.         if (SR.Attr and faDirectory) = 0 then                                                    
  10.           if Pos(AMatch, SR.Name) > 0 then
  11.           begin                                                                                  
  12.             lFiles := SR.Name;                                                                    
  13.             with TStringList.Create do  
  14.             begin                                                                                
  15.                LoadFromFile(Dir + SR.Name); // <-- Unable to open file....
  16.             end;                                                                                  
  17.             break;                                                                                
  18.           end                                                                                    
  19.         else if (SR.Name <> '.') and (SR.Name <> '..') then
  20.           FindFile(IncludeTrailingBackslash(Dir) + SR.Name, AMatch);  // recursive call!
  21.       until FindNext(Sr) <> 0;                                                                    
  22.       Result := lFiles;
  23.     finally
  24.       FindClose(SR);
  25.     end;
  26. end;
  27.  

 

TinyPortal © 2005-2018