Recent

Author Topic: Reading Russian ID3-Tags  (Read 13534 times)

Zittergie

  • Full Member
  • ***
  • Posts: 114
    • XiX Music Player
Reading Russian ID3-Tags
« on: January 12, 2015, 07:23:07 pm »
Hi,

i need some help on reading ID3-Tags.  I am using http://www.xixmusicplayer.org/download/ID3v2.pas and it read ID3-tags fine, but foreign chars are a problem and I don't get the differences in charsets.
The song I try to read is http://www.xixmusicplayer.org/download/Lumen - Марш согласных - OK.mp3

Can someone please help me to complete ID3v2 to more international standards.

Thanks in advance,
Bart
Be the difference that makes a difference

skalogryz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2770
    • havefunsoft.com
Re: Reading Russian ID3-Tags
« Reply #1 on: January 12, 2015, 08:00:46 pm »
looks like a corrupted utf8.
Did you try to save the mp3 with TID3v2 ?

VLC player recognizes the information

ok. it's plain UTF16, thus I'd think ConvertID3() does a lot of damage to it :)

here's an example on how to deal with this (assuming ConvertID3 doesn't corrupt bytes):
Code: [Select]
var
  t: string;
  ws: WideString;
..
  // let's grab song's title
  t:=mptag3.Title;
 
   //move bytes from String to WideString
   SetLength(Ws, length(t) div 2);
   Move(t[1], ws[1], length(t));
 
   // lcl wants UTF8
   Memo1.Lines.Add( UTF8Encode(ws));
;

I'd assume there might be an indicator in the mp3 file somewhere, if it's using uncode or not.
« Last Edit: January 12, 2015, 08:14:21 pm by skalogryz »

Zittergie

  • Full Member
  • ***
  • Posts: 114
    • XiX Music Player
Re: Reading Russian ID3-Tags
« Reply #2 on: January 12, 2015, 08:17:32 pm »
Hi,

Did not change the file.
kid id3-tag reader also reads the Tags.

I think it is in UTF16, but utf16toutf8 or utf8encode does not work.

The information in ID3 Tags can be in any encoding, so how do you know how to use the right encoding?
Be the difference that makes a difference

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12593
  • FPC developer.
Re: Reading Russian ID3-Tags
« Reply #3 on: January 12, 2015, 08:18:02 pm »
If I look at the hex dump below, then the "03" after id3 means utf8. However the characters after
TIT2 should be the name, but there we see a FF FE UTF16 byte order mark.

IOW the tag is corrupt, but probably some programs recover from it.

Code: [Select]

00000000  49 44 33 03 00 00 00 00  01 36 54 49 54 32 00 00  |ID3......6TIT2..|
00000010  00 1f 00 00 01 ff fe 1c  04 30 04 40 04 48 04 20  |.........0.@.H. |
00000020  00 41 04 3e 04 33 04 3b  04 30 04 41 04 3d 04 4b  |.A.>.3.;.0.A.=.K|
00000030  04 45 04 54 50 45 31 00  00 00 06 00 00 00 4c 75  |.E.TPE1.......Lu|
00000040  6d 65 6e 54 41 4c 42 00  00 00 13 00 00 01 ff fe  |menTALB.........|
00000050  1d 04 30 04 20 00 47 04  30 04 41 04 42 04 38 04  |..0. .G.0.A.B.8.|
00000060  54 52 43 4b 00 00 00 03  00 00 00 31 30 43 4f 4d  |TRCK.......10COM|
00000070  4d 00 00 00 49 00 00 00  65 6e 67 00 46 72 65 65  |M...I...eng.Free|
00000080  20 64 6f 77 6e 6c 6f 61  64 20 66 72 6f 6d 20 68  | download from h|
00000090  74 74 70 3a 2f 2f 77 77  77 2e 6c 61 73 74 2e 66  |ttp://www.last.f|
000000a0  6d 2f 6d 75 73 69 63 2f  4c 75 6d 65 6e 20 61 6e  |m/music/Lumen an|
000000b0  64 20 68 74 74 70 3a 2f  2f 4d 50 33 2e 63 6f 6d  |d http://MP3.com|
000000c0  ff fb 90 64 00 09 f0 00  00 65 80 00 00 08 00 00  |...d.....e......|

skalogryz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2770
    • havefunsoft.com
Re: Reading Russian ID3-Tags
« Reply #4 on: January 12, 2015, 08:29:21 pm »
I'm sure the Tag is fine. Players recognize the info. The reader needs to be tweaked.

Zittergie

  • Full Member
  • ***
  • Posts: 114
    • XiX Music Player
Re: Reading Russian ID3-Tags
« Reply #5 on: January 12, 2015, 08:31:28 pm »
Thanks,

i will try to replace convertID3 with a function that checks in what encoding the tags are in.
I've seen tags that mixed utf8 and utf16 in one file.

the convertID3 was a bad implementation to convert international chars anyway
Be the difference that makes a difference

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12593
  • FPC developer.
Re: Reading Russian ID3-Tags
« Reply #6 on: January 12, 2015, 08:52:52 pm »
I'm sure the Tag is fine. Players recognize the info.

Players are typically written by  battlescarred veterans and usually ignore encoding info in favor of running various detection and heuristic routines.

Dibo

  • Hero Member
  • *****
  • Posts: 1057
Re: Reading Russian ID3-Tags
« Reply #7 on: January 12, 2015, 09:16:49 pm »
You can also try with taglib: https://taglib.github.io/
This lib is used by most popular multimedia players (I'm not sure but VLC and Rhythmbox are using it). I'm using it also, it  has headers for Free Pascal and is working great. The only one disadvantage that you must deploy taglib (dll / so) with your application (~2MB)

skalogryz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2770
    • havefunsoft.com
Re: Reading Russian ID3-Tags
« Reply #8 on: January 12, 2015, 09:50:05 pm »
Players are typically written by  battlescarred veterans and usually ignore encoding info in favor of running various detection and heuristic routines.
I presume that was the original purpose of ConvertID3 - do the encoding detection and the proper conversion.
but mp3 tag format is to blamed anyway.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12593
  • FPC developer.
Re: Reading Russian ID3-Tags
« Reply #9 on: January 12, 2015, 09:52:57 pm »
Players are typically written by  battlescarred veterans and usually ignore encoding info in favor of running various detection and heuristic routines.
I presume that was the original purpose of ConvertID3 - do the encoding detection and the proper conversion.
but mp3 tag format is to blamed anyway.

I'm not sure that is the case here. The header says utf8, but the field is encoded in utf16. That is the fault of the writing application, not ID3. Unless I'm mistaken and the field has a separate encoding field.

The problem with "testing with applications" is that they are often hardened against mistakes made by popular tools. Even if those popular tools are wrong.

Fred vS

  • Hero Member
  • *****
  • Posts: 3734
    • StrumPract is the musicians best friend
Re: Reading Russian ID3-Tags
« Reply #10 on: January 12, 2015, 10:42:57 pm »
Hello.

Maybe completely out of the game... but who knows.  :-[

I had so many problems with some id3 v2 tag that...

I use that solution (read id3 v2 => delete this tag => re-create equivalent id3 v1)

=> http://stackoverflow.com/questions/14147402/remove-or-edit-id3tag-version-2-from-mp3-file-using-delphi-7

I use Lazarus 2.2.0 32/64 and FPC 3.2.2 32/64 on Debian 11 64 bit, Windows 10, Windows 7 32/64, Windows XP 32,  FreeBSD 64.
Widgetset: fpGUI, MSEgui, Win32, GTK2, Qt.

https://github.com/fredvs
https://gitlab.com/fredvs
https://codeberg.org/fredvs

Zittergie

  • Full Member
  • ***
  • Posts: 114
    • XiX Music Player
Re: Reading Russian ID3-Tags
« Reply #11 on: January 12, 2015, 10:44:00 pm »
The frame has the encoding, the first '03' is the id3 version: 2.3
The frames in the mp3 file are '01', that means UTF16

So the tag is written correct.  Going to change the reading to follow the right encoding.

Will post the update
Be the difference that makes a difference

skalogryz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2770
    • havefunsoft.com
Re: Reading Russian ID3-Tags
« Reply #12 on: January 12, 2015, 11:08:35 pm »
The problem with "testing with applications" is that they are often hardened against mistakes made by popular tools. Even if those popular tools are wrong.
My regular approach here for "testing with application" - "if they could, then we can too" (exception is made for MS/Apple products only :) for obvious reason of access to "private interfaces")   
Typically it's a motivation to make the same target as others did.

Even if other tools are wrong, the best application design would be to provide both:
"right solution" - following specs
"other-tool compatible solution" - optional by users choice.
 
Just recently I found that there're at least 3 different ways to set  background (aka highlight) color in RTF. And two of them are completely conflicting each other and are used by MS Wordpad and MS Word Office.

Players are typically written by battlescarred veterans and usually ignore encoding info in favor of running various detection and heuristic routines.
Besides, isn't Zittergie a battlescarred veteran? No? then it's time to get into fighting (and get some scars and code fixed). ConvertID3 must be fixed for sure.
xixmusicplayer deserves its place under the sun.

My other suggestion would be to remove LCL dependency from the ID3v2 unit. RTL dependency should be enough.
« Last Edit: January 12, 2015, 11:15:36 pm by skalogryz »

varianus

  • New Member
  • *
  • Posts: 27
Re: Reading Russian ID3-Tags
« Reply #13 on: January 13, 2015, 08:59:00 am »
If I look at the hex dump below, then the "03" after id3 means utf8. However the characters after
TIT2 should be the name, but there we see a FF FE UTF16 byte order mark.
This is not correct. That "03"  means that tags are in the ID3 V2.3 format.
The enconding of text is in the 7th byte after TIT2 tags, in this case is "01" and this means it's unicode.  As stated in official ID3 V2.3 documentation, unicode strings may have the Unicode BOM  followed by a Unicode NULL ($FF FE 00 00 or $FE FF 00 00).

The tag reading code that i've witten for my own player returns this data.
Code: [Select]
TIT2 ---> Марш согласных
TPE1 ---> Lumen
TALB ---> На части
TRCK ---> 10
COMM ---> Free download from http://www.last.fm/music/Lumen and http://MP3.com

Zittergie

  • Full Member
  • ***
  • Posts: 114
    • XiX Music Player
Re: Reading Russian ID3-Tags
« Reply #14 on: January 15, 2015, 06:36:24 pm »
Ugly solution, but it works, before rewriting the id3 unit:

FrSize is holding the size of the string.  using length(id3tag) or sizeof(id3tag) gives the wrong stringlength. Don't know why.
Also, what is the difference between the FF FE  and  FE FF encoding, and what needs to be changed to let mode 2 work?

Code: [Select]
function TID3v2.ConvertID3(const id3tag: string; FrSize: integer): string;
var mode: byte;
    id3tag2: string;
    i: integer;
begin
  if length(id3tag)>1 then
  begin
  mode:=byte(id3tag[1]);
  case mode of
   0: begin
        id3tag2:=copy(id3tag,2,length(id3tag));
        ConvertId3:=id3tag2;
      end;
   1: begin
        id3tag2:='';
        if (ord(id3tag[2])=255) and (ord(id3tag[3])=254) then i:=4  // ShowMessage('BOM found')
                                                         else i:=2; //ShowMessage('No BOM found');
        repeat
          id3tag2:=id3tag2+utf16toutf8(WideChar(Ord(id3tag[i]) or (Ord(id3tag[i+1]) shl 8)));
          i:=i+2;
        until i>FrSize-1;
        ConvertId3:=id3tag2;
      end;
   2: begin
        id3tag2:='';
        if (ord(id3tag[2])=254) and (ord(id3tag[3])=255) then i:=4  // ShowMessage('BOM found')
                                                         else i:=2; //ShowMessage('No BOM found');
        repeat
          id3tag2:=id3tag2+utf16toutf8(WideChar(Ord(id3tag[i]) or (Ord(id3tag[i+1]) shl 8)));
          i:=i+2;
        until i>FrSize-1;
        ConvertId3:=id3tag2;
      end;
   3: begin
        id3tag2:=copy(id3tag,2,utf8length(id3tag));
        ConvertId3:=id3tag2;
      end;
      else ConvertId3:=id3tag;
    end;
  end
  else ConvertId3:=id3tag;
end; 
« Last Edit: January 15, 2015, 06:42:12 pm by Zittergie »
Be the difference that makes a difference

 

TinyPortal © 2005-2018