Recent

Author Topic: [Solved] Rename files with unicode char to ansi.  (Read 7396 times)

elioenaishalom

  • New Member
  • *
  • Posts: 28
[Solved] Rename files with unicode char to ansi.
« on: October 16, 2018, 03:46:50 pm »
Lazarus 1.8.4 (Windows 10)
It is impossible for me to solve this; if anyone can help, I thank you in advance:
My stream ripper application writes dozens of "MP3" files in unicode
and I need to rename the files.
My codepage is UTF8

Quote
procedure TForm1.FileNameUnicodeToAnsiClick(Sender: TObject);
var
  K: integer;
  AnsiFileName: string;
begin
  if OpenDialog1.Execute then
  for K := 0 to OpenDialog1.Files.Count - 1 do
  begin
    {
    example:
    If OpenDialog1.Files[K] returns "Gabriel Fauré"
    AnsiFileName should return "Gabriel Fauré",
    so I can compare the two names and rename if it is the case.
    }
    AnsiFileName := OpenDialog1.Files[K];
    if AnsiFileName <> OpenDialog1.Files[K] then
      RenameFile((OpenDialog1.Files[K]), AnsiFileName);

    end;
  end;
end;
« Last Edit: October 18, 2018, 06:12:23 pm by elioenaishalom »

Bart

  • Hero Member
  • *****
  • Posts: 5275
    • Bart en Mariska's Webstek
Re: Rename files with unicode char to ansi.
« Reply #1 on: October 17, 2018, 09:47:51 am »
Your ripper uses unicode "characters" in filenames?
Why do you need to convert that to ANSI (I assume default codepage of windows)?
For what purpose?

How do the filenames "look" in Windows explorer: "Fauré" or "Fauré"?
If it is the first, the the ripper seems to use a one-byte encoding of filenames that does not mathc your default Wiindows codepage (which b.t.w. seldom in UTF8).

Bart

elioenaishalom

  • New Member
  • *
  • Posts: 28
Re: Rename files with unicode char to ansi.
« Reply #2 on: October 17, 2018, 10:43:13 am »
Windows Explorer shows "Fauré" and this is not the correct name of the composer.
Code: Pascal  [Select][+][-]
  1. {With Lazarus 1.0.6 (unit with page code 1252) it worked fine:}
  2. var
  3.   K: integer;
  4.   S: string;
  5. for K := 0 to OpenDialog1.Files.Count - 1 do
  6.   begin
  7.     S := Utf8ToAnsi(UTF8ToSys(OpenDialog1.Files[K]));
  8.     if S <> OpenDialog1.Files[K] then
  9.       RenameFile(UTF8ToSys(OpenDialog1.Files[K]), S);
  10.   end;

"Fortunately my ignorance has no limits, so I know that I have much to learn in any matter; where ignorance is small, knowledge is mediocre" (anonymous).


Bart

  • Hero Member
  • *****
  • Posts: 5275
    • Bart en Mariska's Webstek
Re: Rename files with unicode char to ansi.
« Reply #3 on: October 17, 2018, 01:45:26 pm »
OK, so the filename should be "Fauré", but it actually is "Fauré".
The é in UTF8 is encode in 2 bytes, in ANSI in 1 byte, so I have no idea how the é becomes é.

If it used to work in a previous edition, then maybe you should change Utf8ToSys() with Utf8ToWinCP().

Also in new fpc version (3.0) renameFile wil convert your string paramters to UnicodeString and then call Widestring Windows API with that.
Older FPC versions simply called the A version of the Windows API.

Bart

Bart

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4459
  • I like bugs.
Re: Rename files with unicode char to ansi.
« Reply #4 on: October 17, 2018, 03:35:12 pm »
Code: Pascal  [Select][+][-]
  1. S := Utf8ToAnsi(UTF8ToSys(OpenDialog1.Files[K]));
This did not make much sense in the old versions either. It converts twice from UTF-8 but after UTF8ToSys the encoding was something else.
Anyway now those conversion funcs are not needed any more. You could try GuessEncoding() if only some file names differ from UTF-8 encoding.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

elioenaishalom

  • New Member
  • *
  • Posts: 28
Re: Rename files with unicode char to ansi.
« Reply #5 on: October 17, 2018, 04:05:44 pm »
Mr. Bart, I feel my mistake in transcription, the correct ones are
Pascal Rogé
Canción Amatoria
Homenage a Tárrega
Fauré Barcarolle
'Les Quatre Saisons' - Soirées d'Eté
Frédéric Chopin

and so on.
Mr. Bart, Mr. JuhaManninen; thank you very much - I'll try Utf8ToWinCP() &  GuessEncoding()

wp

  • Hero Member
  • *****
  • Posts: 11855
Re: Rename files with unicode char to ansi.
« Reply #6 on: October 17, 2018, 04:11:08 pm »
OK, so the filename should be "Fauré", but it actually is "Fauré".
The é in UTF8 is encode in 2 bytes, in ANSI in 1 byte, so I have no idea how the é becomes é.
I played a bit with the character table which is available in Lazarus, menu "Edit" > "Insert from character map":

Since the post looks to be French in orgin I assumed the cp1252 (West European Latin) on the ANSI page of the character map. Then the string 'é' has the hex values #$C3 #$83 #$C2 #$A9.
Switching to the Unicode page and selecting "Latin-1 Supplement", I found that #$C383 is 'Ã' again having Unicode U+00C3, and #$C2A9 is '©' with Unicode U+00A9. Since the U+00xx numbers are mostly identical with the ANSI codes xx, i put the two unitcodes together which leads me to UTF8 #$C3A9 which is just 'é'...

Therefore I think the malformed filename is due to inappropriate usage of the UTF8 conversion routines applied to a UnicodeString, and usage of WinCPtoUTF8 could lead you on the wrong track...

elioenaishalom, where do these strings come from?
« Last Edit: October 17, 2018, 04:16:58 pm by wp »

Thaddy

  • Hero Member
  • *****
  • Posts: 14201
  • Probably until I exterminate Putin.
Re: Rename files with unicode char to ansi.
« Reply #7 on: October 17, 2018, 07:47:00 pm »
I think the issue is you assume windows to be ansi where in fact it is UTF16.
Like wp deducted: you are converting this all over the place and that goes wrong at some point.
Since windows is utf16 (unicodestring) these are assignment compatible to utf8 without calling any conversion routines.
Specialize a type, not a var.

elioenaishalom

  • New Member
  • *
  • Posts: 28
Re: Rename files with unicode char to ansi.
« Reply #8 on: October 18, 2018, 01:50:14 am »
where do these strings come from?

"Alexander Paley - Phantasiestück Des-dur.mp3"
"Pyotr Il'yich Tchaikovsky - Morceaux (6) composés sur un seul thème, Op. 21 -Fugue à quatre voix (Andante).mp3"
"Frédéric Mesnier - Waltz Again.mp3"


These file names came from applications such as

streamripper-1.64.6
https://sourceforge.net/projects/streamripper/
http://streamripper.sourceforge.net/

stationripper version 0.01.13 (the last free version with unlimited songs & stations that can be recorded)
http://www.stationripper.com/buy.html

Mr. Bart & Mr. JuhaManninen:

Thank you for your interest, the problem is solved:
Utf8ToWinCP worked fine

Code: Pascal  [Select][+][-]
  1. uses LConvEncoding;
  2. var
  3.   K: integer;
  4.   S, S2: string;
  5. for K := 0 to OpenDialog1.Files.Count - 1 do
  6.   begin
  7.     { testing Utf8ToWinCP() & GuessEncoding()}
  8.     S := Utf8ToWinCP(OpenDialog1.Files[K]);
  9.     S2:=  GuessEncoding(OpenDialog1.Files[K]);
  10.     { example:
  11.     "Alexander Paley - Phantasiestück Des-dur.mp3"
  12.     Utf8ToWinCP    returned "Alexander Paley - Phantasiestück Des-dur.mp3"
  13.     GuessEncoding  returned "utf8" }
  14.     if S <> OpenDialog1.Files[K] then
  15.     begin
  16.       if RenameFile((OpenDialog1.Files[K]), S) then
  17.       begin
  18.         { some processing ... }
  19.       end;
  20.     end;
  21.   end;
  22.  

 

TinyPortal © 2005-2018