Recent

Author Topic: “pos” not working with extended ASCII characters  (Read 1416 times)

stephanos

  • New Member
  • *
  • Posts: 25
“pos” not working with extended ASCII characters
« on: November 14, 2021, 05:12:01 pm »
Dear All

I continue with my mp3 play list file writer.  I have abandoned looking at file properties for the purpose of calculating the total play time of all the files listed in the play list file.  That will have to wait for now.

I am at the stage of validating the path/filenames.  Through trial and error I have worked out that the play list function will only read in paths and file names whose characters are within limited range of ASCII.  Specifically all 1 byte characters between Alt + 32 - Alt + 126, inclusive.  There are some characters in that range which cannot be used in path filenames, such as Alt + 42
  • , so I did not test those.  This is definitive information proved by trial and error and it leaves a lot of characters that cannot be used.  Some are already in file names of mp3 files on my mp3 player and they will not play if included in the play list file.  That is how I became suspicious of the play list functions limitations.  So with the Hercule Poirot in me inspired I set about writing a validation section to my programme.


The following code explains where I have got to so far.  All permissible characters are in a string. A 3 byte character is selected to see what happens when it is NOT fund in the string.  On click, zero is written to the label caption.  As it should be. 

Code: Pascal  [Select][+][-]
  1. procedure TForm1.Button1Click(Sender: TObject);
  2. var
  3.   LegalChars : string;  letter : Char;  posi : LongInt;
  4. begin    
  5.   LegalChars :='\ 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz!$%&‘’()+,-.;=@[]^_`{}~';
  6.   posi := 99;  
  7.  posi := pos('►', LegalChars);    Label1.Caption := IntToStr(posi);  // Output to label caption = 0
  8. end;
  9. end.
  10.  

The intention is to loop through a path/filename testing to see if each character is in the string of legal chars.  When a character is not found the loop will break and the path/filename is written to an error file.  That means the specific char, '►', will need to be replaced with a string of a char, which might be 1 or 2 or 3 bytes in size, like this.

Code: Pascal  [Select][+][-]
  1. procedure TForm1.Button1Click(Sender: TObject);
  2. var
  3.   LegalChars, PathFile : string;  letter : Char;  posi : LongInt;
  4. begin  
  5.   LegalChars :='\ 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz!$%&‘’()+,-.;=@[]^_`{}~';
  6.   posi := 99;
  7.   PathFile :='Agua de ►Pena\ROMARIAS\BAILE DA FESTA.mp3';
  8.   posi := UTF8Pos(PathFile[9], LegalChars);  Label1.Caption := IntToStr(posi); // Output to label caption = 69?  
  9. end;
  10. end.
  11.  

The 69th char in the string of legal chars is: ‘ (single open speech mark).

I have done much reading. 
https://wiki.lazarus.freepascal.org/UTF8_strings_and_characters
https://www.pascal-programming.info/lesson10.php

I think I grasp the concepts that 2 and 3 byte characters need to be saved to a string, but it mucks up pos.  I am at the edge of my knowledge and any help appreciated

Thanks

winni

  • Hero Member
  • *****
  • Posts: 2860
Re: “pos” not working with extended ASCII characters
« Reply #1 on: November 14, 2021, 05:27:53 pm »
Hi!


The chars ‘’ are UTF8chars.

For Utf8chars you need the unit LazUTF8 and then you use UTF8pos, UTF8delete, UTF8length ....

The easier way is to build a set with forbidden chars and to check if a character is the the forbidden set.

Winni

wp

  • Hero Member
  • *****
  • Posts: 9169
Re: “pos” not working with extended ASCII characters
« Reply #2 on: November 14, 2021, 06:16:05 pm »
There are several issues with this procedure:
  • winni already noted that the list of legal characters contains non-ASCII characters which should not be allowed as you explained.
  • You check whether the 9th character of the test string is contained in the list of legal characters. This is correct basically but when the test string PathFile is non-ASCII then Path[9] only means the 9th byte in the string - in case of UTF8 this is not necessarily the 9th character! you must use UTF8Copy(PathFile, 9, 1) to extract the entire UTF8 byte sequence for the 9th character.
  • All these UTF8 operations can become rather slow, in particular if many tests are to be performed. I'd simplyify the procedure as follows: Use the string iterator "in" defined in unit LazUnicode to hop from test string "character" to "character" (more precisely: UTF8 code points), and then check whether the length of each character string is 1. Only then this can be an ASCII character. Finally I would define a "black list" ForbiddenChars containing all the ASCII characters not allowed in a filename, e.g. '\', '/', ':', '*', '?' (this is for Windows, maybe there are some more), and check whether the current character is one of them:
Code: Pascal  [Select][+][-]
  1. uses
  2.   LazUnicode;
  3.  
  4. function CheckLegalChars(s: String; out Letter: String): Boolean;
  5. const
  6.   ForbiddenChars: String = '/\*?:';
  7. begin
  8.   Result := false;
  9.   for Letter in s do
  10.   begin
  11.     if Length(Letter) > 1 then
  12.       exit;
  13.     // We know that we have a 1-byte character now
  14.     // Characters below #32 are forbidden, as well as those listed in ForbiddenChars
  15.     if (Letter[1] >= #32) and (pos(letter, ForbiddenChars) > 0) then
  16.       exit;
  17.   end;
  18.   Result := true;
  19.   Letter := #0;
  20. end;
  21.  
  22. procedure TForm1.Button1Click(Sender: TObject);
  23. var
  24.   PathFile : string;  
  25.   letter : String;  
  26. begin  
  27.   PathFile :='Agua de ►Pena\ROMARIAS\BAILE DA FESTA.mp3';
  28.   if CheckLegalChars(PathFile, letter) then
  29.     Label1.Caption := 'ok'
  30.   else
  31.     Label1.caption := '"' + letter + '" is not a legal character.';
  32. end;
Mainly Lazarus trunk / fpc 3.2.0 / all 32-bit on Win-10, but many more...

ASerge

  • Hero Member
  • *****
  • Posts: 1904
Re: “pos” not working with extended ASCII characters
« Reply #3 on: November 14, 2021, 07:34:26 pm »
Code: Pascal  [Select][+][-]
  1. // Characters below #32 are forbidden, as well as those listed in ForbiddenChars
  2. if (Letter[1] >= #32) and (pos(letter, ForbiddenChars) > 0) then
  3.   exit;
May be?
Code: Pascal  [Select][+][-]
  1. if (Letter[1] < #32) or (pos(letter, ForbiddenChars) > 0) then
or even better
Code: Pascal  [Select][+][-]
  1. for Letter in S do
  2.   if (Length(Letter) > 1) or (Letter[1] in [#0..#31, '/', '\', '*', '?', ':']) then
  3.     Exit;

wp

  • Hero Member
  • *****
  • Posts: 9169
Re: “pos” not working with extended ASCII characters
« Reply #4 on: November 14, 2021, 07:48:04 pm »
May be?
Yes, typo (did not fully test this code)...
Mainly Lazarus trunk / fpc 3.2.0 / all 32-bit on Win-10, but many more...

stephanos

  • New Member
  • *
  • Posts: 25
Re: “pos” not working with extended ASCII characters
« Reply #5 on: November 14, 2021, 10:48:56 pm »
Thank you all

These responses have given me a lot to go on. Especially #32, comparing the number.

I am going to experiment with first checking the character is not a '#' (#35) then check it is between #32 and #126.  It seems that as I am getting the path/filename from a hard drive there is not going to be any of the characters, within that range, that cannot be used in a path/file name in Windows.

 

TinyPortal © 2005-2018