Recent

Author Topic: Unit Zipper from package paszlib has an inconsistency  (Read 5484 times)

wp

  • Hero Member
  • *****
  • Posts: 11906
Re: Unit Zipper from package paszlib has an inconsistency
« Reply #15 on: April 24, 2021, 01:48:02 am »
I think this fixes it:
Code: Pascal  [Select][+][-]
  1. procedure TZipFileEntry.SetDiskFileName(const AValue: String);
  2. begin
  3.   if FDiskFileName=AValue then Exit;
  4.   // Zip file uses / as directory separator on all platforms
  5.   FDiskFileName := StringReplace(AValue, '\', '/', [rfReplaceAll]);
  6. end;

It works on Windows and Linux - try this test
Code: Pascal  [Select][+][-]
  1. program project1;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. uses
  6.   Classes, zipper;
  7.  
  8. const
  9.   zipfilename = 'test.zip';
  10.   filename1 = 'project1.lpr';
  11.   filename2 = 'project1.lpi';
  12.   filename3 = 'subdir\a.txt';
  13.   filename4 = 'subdir/b.txt';
  14.  
  15. var
  16.   zip: TZipper;
  17.   unzipper: TUnzipper;
  18.  
  19. begin
  20.   zip := TZipper.Create;
  21.   zip.Filename := zipfilename;
  22.   zip.Entries.AddFileEntry(filename1);
  23.   zip.Entries.AddFileEntry(filename2);
  24.   zip.Entries.AddFileEntry(filename3);
  25.   zip.Entries.AddFileEntry(filename4);
  26.   zip.SaveToFile(zipfilename);
  27.   zip.Free;
  28.  
  29.   unzipper := TUnzipper.Create;
  30.   unzipper.Filename := zipFileName;
  31.   unzipper.OutputPath := 'unzip';
  32.   unzipper.Examine;
  33.   unzipper.UnzipAllFiles;
  34.   unzipper.Free;
  35. end.

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1119
  • Professional amateur ;-P
Re: Unit Zipper from package paszlib has an inconsistency
« Reply #16 on: April 24, 2021, 01:56:23 am »
Hey WP,

Found this one:
Code: Pascal  [Select][+][-]
  1. procedure TZipFileEntry.SetDiskFileName(const AValue: String);
  2. begin
  3.   if FDiskFileName=AValue then Exit;
  4.   // Zip file uses / as directory separator on all platforms
  5.   // so convert to separator used on current OS
  6.   if DirectorySeparator='/' then
  7.     FDiskFileName:=AValue
  8.   else
  9.     FDiskFileName:=StringReplace(AValue,'/',DirectorySeparator,[rfReplaceAll]);
  10. end;

This one in particular makes absolute sense since it's the DiskFile and should be platform DEPENDANT.

I think the error is not making sure that TZipFileEntry.ArchiveName NEVER has a back slash.
TZipFileEntry.Filename and TZipFileEntry.Diskname can be, SHOULD be, platform dependant.
BUT TZipFileEntry.ArchiveName should ALWAYS be ZIP compliant.
I hope I'm making sense here.

And this looks as if your issue is already considered...
But the slash to be replaced by StringReplace is the wrong one!
Code: Pascal  [Select][+][-]
  1.     FDiskFileName := StringReplace(AValue, '\', '/', [rfReplaceAll]);
  2.  

There are several locations like this. Not sure, though, which one is the name to be used for zipping and which one for unzipping.

If you want to debug the zipper unit you should copy the zipper.pp file into your test project directory. This unit is at the "end of the food chain" and can be recompile without affecting others.

I'm sorry to disagree with you WP, but I think, respectfully, that you're barking at the wrong tree.

If you do that replace to TZipFileEntry.Diskname you will still have a zipped file like this:
Code: Bash  [Select][+][-]
  1.     & less file.zip
  2.     Archive:  file.zip
  3.      Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
  4.     --------  ------  ------- ---- ---------- ----- --------  ----
  5.           16  Defl:N       16   0% 2021-04-21 21:20 fffc0087  data\block1.txt
  6.           16  Defl:N       16   0% 2021-04-21 21:20 d4d15344  data\block2.txt
  7.           16  Defl:N       16   0% 2021-04-21 21:20 cdca6205  data\block3.txt
  8.           16  Defl:N       16   0% 2021-04-21 21:21 828bf4c2  data\block4.txt
  9.     --------          -------  ---                            -------
  10.           64               64   0%                            4 file

Because you're not affecting TZipFileEntry.ArchiveName, which is the field that is used in the listing above.

So again, the issue is with TZipFileEntry.ArchiveName but not the other fields!!

Cheers,
Gus
« Last Edit: April 24, 2021, 02:10:27 am by Gustavo 'Gus' Carreno »
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1119
  • Professional amateur ;-P
Re: Unit Zipper from package paszlib has an inconsistency
« Reply #17 on: April 24, 2021, 01:59:24 am »
Hey WP,

It works on Windows and Linux - try this test

But have you looked at the generated zip file?

Are all the slashes ONE way or are they BOTH ways?

If they are BOTH ways, that's not ZIP compliant :P

Cheers,
Gus
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

wp

  • Hero Member
  • *****
  • Posts: 11906
Re: Unit Zipper from package paszlib has an inconsistency
« Reply #18 on: April 24, 2021, 08:37:13 am »
I did not care about the variable names, I think somebody got confused here.

Fact is what the debugger does: in my line "zip.Entries.AddFileEntry(filename3)" it goes into TZipFileEntry.SetDiskFileName, and it is that DiskFileName which is used by ZipOneFile and written to the archive.

Later, the Unzipper uses a procedure GetOutputfileName (local to UnzipOneFile) the replace the forward slashes by DirectorySeparator of the OS.

Using ArchiveName in AddFileEntry, on the other hand, is a way to rename entries in the zip (and later as the extracted file).
Here is a modified test which also sets the ArchiveName of one entry. It works also.

Code: Pascal  [Select][+][-]
  1. program project1;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. uses
  6.   Classes, zipper;
  7.  
  8. const
  9.   zipfilename = 'test.zip';
  10.   filename1 = 'project1.lpr';
  11.   filename2 = 'project1.lpi';
  12.   filename3 = 'subdir\a.txt';
  13.   filename4 = 'subdir/b.txt';
  14.  
  15. var
  16.   zip: TZipper;
  17.   unzipper: TUnzipper;
  18.  
  19. begin
  20.   zip := TZipper.Create;
  21.   zip.Filename := zipfilename;
  22.   zip.Entries.AddFileEntry(filename1);
  23.   zip.Entries.AddFileEntry(filename2);
  24.   zip.Entries.AddFileEntry(filename3);
  25.   zip.Entries.AddFileEntry(filename4);
  26.   zip.Entries.AddFileEntry(filename3, 'subdir\renamed_a.txt');
  27.   zip.SaveToFile(zipfilename);
  28.   zip.Free;
  29.  
  30.   unzipper := TUnzipper.Create;
  31.   unzipper.Filename := zipFileName;
  32.   unzipper.OutputPath := 'unzip';
  33.   unzipper.Examine;
  34.   unzipper.UnzipAllFiles;
  35.   unzipper.Free;
  36. end.

When I run this in Linux the unzipper creates a subfolder "subdir", and that contains files "a.txt", "b.txt" and "renamed_a.txt" - just like it should be.

A patch should also take care of the confusing comments in the TZipEntry declaration regarding the meaning of the names DiskFileName and ArchiveFileName. "DiskfileName" should be "Name of the file to be zipped, directory separators will be replaced by '/' internally", and "ArchiveFileName" should be "Name of the file to be extracted, directory separators are '/' internally but will become those of the OS upon extraction".

There is one inconsistency in my patch, though: When AddFileEntry replaces '\' by '/' then ZipOnFile will search for a file with forward slashes in Windows. But this is no problem in Windows and FPC.

Alternatively ZipOneFile should introduce a local function GetZippedFileName which does the '\'-to-'/' replacement and which is called after the stream with the input file (original DiskFileName) has been opened
« Last Edit: April 24, 2021, 08:53:02 am by wp »

wp

  • Hero Member
  • *****
  • Posts: 11906
Re: Unit Zipper from package paszlib has an inconsistency
« Reply #19 on: April 24, 2021, 11:32:38 am »
On the other hand, when a backslash is a valid character in a Linux filename, how should zipper know that the '\' in 'subdir\a.txt' refers to file 'subdir\a.txt' in the current directory or to 'a.txt' in subdirectory 'subdir' due to the user meaning a '/' rather than a '\'?

I get the feeling this "issue" should not be fixed on the zipper side. It is the user's responsibility to provide the correct path delimiters in cross-platform applications. There is the global variable PathDelim (in SysUtils) for exactly this purpose.

The worst software is the one which tries to guess what the user means.
« Last Edit: April 24, 2021, 11:51:06 am by wp »

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1119
  • Professional amateur ;-P
Re: Unit Zipper from package paszlib has an inconsistency
« Reply #20 on: April 24, 2021, 12:24:22 pm »
Hey WP,

The worst software is the one which tries to guess what the user means.

This statement alone made me think of this matter in a total different light.
And in all frankness I completely agree with the statement.

But this means that in ALL the places where there is an example of TZipper, there has to be a clear explanation on how you should never use a backslash when dealing with this component UNLESS you really want one.

If this is not explained in ALL examples then someone else will use Windows backslashes to create a zip that will not Unzip correctly in *nix environments and we will be back here again.

So we have a dilemma:
  • We leave it as is and change all examples and leave clear warnings to never use backslashes UNLESS you need it to be.
  • We commit the changes that you made and risk the software guessing what the user means.

One is going to be dependant that all new users of the Zipper unit read the correct examples but has no guaranties.
The other one is dependant that one user will stumble on this issue when he's in *nix land and wants to create a file with an actual backslash.

My vote is with the latter not the former, EVEN when I agree with you that software should not try to guess what the user's intentions are.
There will be WAY less users stumbling on the latter that on the former. I think this is lesser of two evils territory and the lesser evil is the latter.

Just my 2c.

Cheers,
Gus
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

wp

  • Hero Member
  • *****
  • Posts: 11906
Re: Unit Zipper from package paszlib has an inconsistency
« Reply #21 on: April 24, 2021, 01:28:25 pm »
Let's think about it again...

The names first... I think the basic idea is correct. There is the name of the disk file which will be added to the zip - this is the DiskFileName. And there is the ArchiveFilename which is used inside the zip and requires forward slashes.

AddFileEntry has several overloads: two of them with an ArchiveFileName specified
Code: Pascal  [Select][+][-]
  1. procedure TZipFileEntry.SetArchiveFileName(const AValue: String);
  2. begin
  3.   if FArchiveFileName=AValue then Exit;
  4.   // Zip standard: filenames inside the zip archive have / path separator
  5.   if DirectorySeparator='/' then
  6.     FArchiveFileName:=AValue
  7.   else
  8.     FArchiveFileName:=StringReplace(AValue, DirectorySeparator, '/', [rfReplaceAll]);
  9. end;
  10.  
Here, nothing is changed on Linux, but on Windows (DirectorySeparator <> '/') the backslashes are replaced by '/'.

The third overload has no ArchiveFileName as argument. It leaves the ArchiveFileName empty, but I think we should set the ArchiveFilename here equal to the DiskFileName since this will replace the backslashes on windows for the zip's internal name.
Code: Pascal  [Select][+][-]
  1. function TZipFileEntries.AddFileEntry(const ADiskFileName): TZipFileEntry;
  2. begin
  3.   Result:=Add as TZipFileEntry;
  4.   Result.DiskFileName:=ADiskFileName;
  5.   Result.ArchiveFileName:=ADiskFileName;     // ADDED. The setter SetArchiveFileName will replace '\' by '/' in Windows
  6. end;

Let's look at the following line of my sample:
Code: Pascal  [Select][+][-]
  1.   zip.Entries.AddFileEntry('subdir\a.txt', 'subdir\renamed_a.txt');

On Windows, the second parameter is changed to 'subdir/renamed_a.txt' which makes the zip valid. And the file itself is found for packing - fine.

On Linux, we have two problems: the first one is that the file itself is not found because of the backslash. And if we refrain from guessing what the user thinks we must keep it like that - or the user must code correctly with the correct path delimiter, i.e. the first parameter must be specified as 'subdir' + PathDelim + 'a.txt'. But this is no limitation of the zipper, this is general practice in file handling on Linux.

The other problem is the ArchiveFilename parameter, 'subdir\renamed_a.txt'. When the user knows and understands that this is the name of the file inside the zip he would specify it as 'subdir/renamed_a.txt' and everything would be fine. But he probably thinks that this is the name of the file after unzipping. Now we have the same problem as before, and the ArchiveFileName must be specified as a valid name of the file system: 'subdir' + PathDelim + 'renamed_a.txt' and let the setter of ArchiveFileName do the work to replace the PathDelim back to a forward slash.

This way the zip/unzip operation yields the desired results, on both Windows and Linux - I tested it.

To summarize we must
- fill the ArchiveFileName in the one-parameter overload of AddFileName to be equal to the DiskFileName, as shown above.
- assume that all names are to be understood as names of the filesystem, i.e. with the correct path delimiter. But this is rather self-evident.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11444
  • FPC developer.
Re: Unit Zipper from package paszlib has an inconsistency
« Reply #22 on: April 24, 2021, 10:48:07 pm »
Contracase:  preparing a ZIP for dos  on Windows? Needs to have backslashes in the archive.

wp

  • Hero Member
  • *****
  • Posts: 11906
Re: Unit Zipper from package paszlib has an inconsistency
« Reply #23 on: April 25, 2021, 12:33:41 pm »
In this case even the current (un)zipper does not work because there are several replacements of the directory separator by a forward slash. To implement a DOS-compatible version of the (un)zipper a property ArchivePathdelim = (pdDefault, pdDOS) should be introduced.

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1119
  • Professional amateur ;-P
Re: Unit Zipper from package paszlib has an inconsistency
« Reply #24 on: April 26, 2021, 07:51:09 am »
Hey All,

Did I just stumble on a very big can of worms??

It really feels like that :)

@WP: Have you reached a final conclusion on this all?

Cheers,
Gus
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

wp

  • Hero Member
  • *****
  • Posts: 11906
Re: Unit Zipper from package paszlib has an inconsistency
« Reply #25 on: April 26, 2021, 01:00:07 pm »
My conclusions do not matter since I cannot write to the fpc repository... But I submitted a patch similar to reply #21 to your report hoping that an fpc dev picks it up and finds it useful.

It does not address the DOS issue mentioned by marcov since it has been there already before your report.

tetrastes

  • Sr. Member
  • ****
  • Posts: 481
Re: Unit Zipper from package paszlib has an inconsistency
« Reply #26 on: April 26, 2021, 09:34:14 pm »
It seems that this is not the only problem of this package. Try filenames with non-ASCII characters at Windows...

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1119
  • Professional amateur ;-P
Re: Unit Zipper from package paszlib has an inconsistency
« Reply #27 on: April 28, 2021, 04:34:43 am »
Hey WP,

My conclusions do not matter since I cannot write to the fpc repository... But I submitted a patch similar to reply #21 to your report hoping that an fpc dev picks it up and finds it useful.

Well, I thank you so very much for dropping in the patch. I have to admit that I was too lazy to do it, so all praises to you my friend!!

It does not address the DOS issue mentioned by marcov since it has been there already before your report.

Hummm, yeah, the worms on this can are increasing...

Cheers,
Gus
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 1119
  • Professional amateur ;-P
Re: Unit Zipper from package paszlib has an inconsistency
« Reply #28 on: April 28, 2021, 04:36:13 am »
Hey Tetrastes,

It seems that this is not the only problem of this package. Try filenames with non-ASCII characters at Windows...

Ouch, this can of worms gets bigger and bigger !!!

Many thanks for the report Tetrastes!!

Cheers,
Gus
Lazarus 3.99(main) FPC 3.3.1(main) Ubuntu 23.10 64b Dark Theme
Lazarus 3.0.0(stable) FPC 3.2.2(stable) Ubuntu 23.10 64b Dark Theme
http://github.com/gcarreno

wp

  • Hero Member
  • *****
  • Posts: 11906
Re: Unit Zipper from package paszlib has an inconsistency
« Reply #29 on: April 28, 2021, 10:07:07 am »
This has nothing to do with zipper. It is a standard problem in FPC string handling. If you use zipper in an LCL GUI program everything works fine because the standard encoding is UTF8. The problem mentioned appears in standalone commandline programs in which FPC does not switch to UTF8 by default.

Simply call SetMultiByteConversionCodePage(CP_UTF8) as the first line of your program and the zipper handles utf8-encoded file names correctly. See also attached demo.


 

TinyPortal © 2005-2018