Recent

Author Topic: [SOLVED] Open File fails on some .htm files.  (Read 4955 times)

Hopestation

  • Full Member
  • ***
  • Posts: 181
[SOLVED] Open File fails on some .htm files.
« on: January 19, 2016, 11:53:11 am »
Hi.

I am using Lazarus 1.4.4, fpc Version 2.6.4. on Windows10.

I have written a simple program to find jpegs in files. It works successfully on a wide variety of files but on some .htm files it gives the message that the file that I have just selected in an Open dialog box can't be found.

This is the procedure:

procedure TForm1.Read;
Var
    M, N, P: Integer;
   Str: String;
begin
   Memo.Lines.Add('File NAME - ' + Rdfil);
   AssignFile(InFile, RdFil);
   try
     Reset (InFile);
   except
      on E : Exception do
      ShowMessage(E.ClassName+' error raised, with message : ' + E.Message);
   end;

   Memo.Clear;
   Memo.Lines.Add('File NAME - ' + Rdfil);

   M := 0; P := 0;
   While not EOF(InFile) do
   begin
     Inc (M);
     ReadLn(InFile,Str);
     Str := UpperCase(Str);
     N := Pos('JPG', Str);
     if N > 0 then
     begin
        Inc (P);
        Memo.Lines.Add('JPG found in line ' + IntToStr(M));
     end;
   end;
   Memo.Lines.Add(IntToStr(M) + ' lines read, ' + IntToStr(P) + ' Pictures found');
   CloseFile(InFile);
end;

A typical example is: http://www.formula1.com/.

I right click the page and use "save page as" to save it to an htm file and folder of other files.

I can open the hmt file in Aptana Studio3 and Notepad++ but my program fails at:

   Reset (InFile);

If I delete sections of the html code in Notepad++ I eventually get to a file that my program can open, but if I have to do this on every file I try to open it makes my program useless.

I have tried opening this file using another program I wrote using Lazarus and got exactly the same problem.

Can anyone explain why Lazarus can't open all htm files.

Thanks.
« Last Edit: January 19, 2016, 04:59:18 pm by Hopestation »

hy

  • Full Member
  • ***
  • Posts: 224
Re: Open File fails on some .hmt files.
« Reply #1 on: January 19, 2016, 12:15:18 pm »
You have never assigned anything to your variable "InFile".
Please show more of the code.
_____
***hy

Bart

  • Hero Member
  • *****
  • Posts: 5674
    • Bart en Mariska's Webstek
Re: Open File fails on some .hmt files.
« Reply #2 on: January 19, 2016, 12:20:09 pm »
Maybe you need to set the filemode to (fmOpenRead or fmSaherdenyNone).
If you use a TFileStream (instead of "classic" file routines like reset/rewrite) you can do this in the constructor.

Bart

Hopestation

  • Full Member
  • ***
  • Posts: 181
Re: Open File fails on some .hmt files.
« Reply #3 on: January 19, 2016, 12:28:51 pm »
Sorry, Hy.
I didn't think it was necessary to show all the code, as it opens some htm files and every other file format I could try, including reading more than 30,000 lines from a Nero Backup .NRI file.

The remaining code is shown below:

unit Read_File01;

{$mode objfpc}{$H+}

interface

uses
  Classes, SysUtils, FileUtil, Forms, Controls, Graphics, Dialogs, StdCtrls,
  ExtCtrls;

type

  { TForm1 }

  TForm1 = class(TForm)
    Button1: TButton;
    DlgOpen: TOpenDialog;
    Memo: TMemo;
    procedure Button1Click(Sender: TObject);
    procedure FormActivate(Sender: TObject);
    procedure Read;
  private
    { private declarations }
  public
    { public declarations }
  end;

var
  Form1: TForm1;
  RdFil: TFileName;
  InFile: TextFile;

implementation

{$R *.lfm}

{ TForm1 }

procedure TForm1.FormActivate(Sender: TObject);
begin
  Width := Width + 50;
  Color := clYellow;
  if Not DlgOpen.execute then Exit;
  RdFil := DlgOpen.FileName;
  Read;
end;

procedure TForm1.Button1Click(Sender: TObject);
begin
  if Not DlgOpen.execute then Exit;
  RdFil := DlgOpen.FileName;
  Read;
end;

Thanks for you comments, Bart.

I'll try your suggestions.

Hopestation

  • Full Member
  • ***
  • Posts: 181
Re: Open File fails on some .hmt files.
« Reply #4 on: January 19, 2016, 12:40:13 pm »
Hi Bart.

I looked at the Open Dialog for the parameters you suggested and couldn't find them.

A search showed that they are part of the FileOpen command.

I haven't used this command in my code, I assign it and reset it. Is this the problem?

wp

  • Hero Member
  • *****
  • Posts: 13350
Re: Open File fails on some .hmt files.
« Reply #5 on: January 19, 2016, 12:41:18 pm »
If I save the file that you linked above to disk I see that the file name "Formula 1®.htm" contains a non-standard character. Since you use fpc 2.6.4 you still have to convert this filename to your ANSI codepage (it would not be required with fpc 3.0):

Code: Pascal  [Select][+][-]
  1. AssignFile(InFile, Utf8ToAnsi(RdFil));

Hopestation

  • Full Member
  • ***
  • Posts: 181
Re: Open File fails on some .hmt files.
« Reply #6 on: January 19, 2016, 12:52:07 pm »
Thanks WP.

I looked at the Lazarus Downloads page and started to download Lazarus and found that it is still offering Version 1.4.4 and fpc 2.4.6.

Looking at the Main site I found a link to Lazarus 1.6 - 2nd Release Candidate.

As a relatively inexperienced Lazarus user is this suitable for me?

Hopestation

  • Full Member
  • ***
  • Posts: 181
Re: Open File fails on some .hmt files.
« Reply #7 on: January 19, 2016, 01:08:29 pm »
Hi WP.

I renamed the file to remove the odd character and it opened correctly, so I went back to the web site i was really interested:

   http://derbyshire.libraryebooks.co.uk/site/EB/ebooks/firstlisted.asp

I saved this page and got the file: "Our Latest Arrivals - Derbyshire Libraries eBooks.htm".

I can't see any odd characters in this name, but my file won't open it.

I renamed it to "Derbyshire Libraries eBooks.htm", and it still failed, but changing to just

  "eBooks.htm" worked, so the spaces seem to have been the problem.

Thanks everyone.

Bart

  • Hero Member
  • *****
  • Posts: 5674
    • Bart en Mariska's Webstek
Re: Open File fails on some .hmt files.
« Reply #8 on: January 19, 2016, 01:35:23 pm »
  "eBooks.htm" worked, so the spaces seem to have been the problem.

No, that cannot be the problem at all.
Reading/writing/assiging filenames with spaces has never been a problem in fpc.

Bart

wp

  • Hero Member
  • *****
  • Posts: 13350
Re: Open File fails on some .hmt files.
« Reply #9 on: January 19, 2016, 03:15:38 pm »
I confirm that Laz 1.4.4 does not open the file 'Our Latest Arrivals - Derbyshire Libraries eBooks.htm'. After using Utf8ToAnsi it does (Laz 1.6 does always). Looking at the filename in a hex editor I see that the space before "ebook" is not an ordinary space (#$20), but a non-break space character (#$C2A0). Therefore it is clear the Utf8ToAnsi is required.

I'd recommend (unless you upgrade to Laz 1.6) that you always use Utf8ToAnsi when passing filenames to fpc procedures.

Bart

  • Hero Member
  • *****
  • Posts: 5674
    • Bart en Mariska's Webstek
Re: Open File fails on some .hmt files.
« Reply #10 on: January 19, 2016, 04:27:02 pm »
Utf8ToAnsi (or UTF8ToSys) is not always sufficient. If the character is outside your current system codepage it will fail (e.g. some chinese characters).

Since you seem to read strings, I suppose the file in question is a text file (not a binary)?
If so, I would suggest opening the file using the TStringListUtf8 class (it's in unit Utf8Classes from LazUtils).
The UTF8 in this classname simply means that it can handle all unicode characters when doing LoadFromFile or SaveToFile (because it uses the WideString Windows API).

Code: [Select]
  ...
  SL := TStringListUtf8.Create;
  try
    SL.LoadFromFile(RdFil);
    //iterate over the strings and do your stuff
    for i := 0 to SL.Count - 1 do
    begin
      Str := UpperCase(SL.Strings[i]); //SL[i] will do fine as well
      //rest of your code
    end;
  finally
    SL.Free;
  end;//try
  ..

LoadFromFile will load the entire file into memory.
Unless the file is realy huge (> 100K lines), this performs pretty well.

Bart

Hopestation

  • Full Member
  • ***
  • Posts: 181
Re: Open File fails on some .hmt files.
« Reply #11 on: January 19, 2016, 04:58:46 pm »
Thank you for all your help.

I found that if I didn't use the file name supplied by the library and used a name usng the keyboard, I could always open the file.

I had tried to open the file in a hex editor, but it was one I wrote in Lazarus, so I got the same problem.

I will use Bart's code.

Thanks again. It's good to know that there is a logical reason for the problem.

wp

  • Hero Member
  • *****
  • Posts: 13350
Re: Open File fails on some .hmt files.
« Reply #12 on: January 19, 2016, 05:03:45 pm »
I had tried to open the file in a hex editor, but it was one I wrote in Lazarus, so I got the same problem.

No, you won't see the filename this way - opening a file in a hexeditor shows you the hex codes of the file content, but not the filename. In Windows explorer I copied the file name to clipboard and pasted it into a temporary Lazarus file which I opened in the hex editor.

 

TinyPortal © 2005-2018