Recent

Author Topic: [SOLVED] Convert the old file reading to TFileStream Method  (Read 11492 times)

Gizmo

  • Hero Member
  • *****
  • Posts: 831
[SOLVED] Convert the old file reading to TFileStream Method
« on: August 17, 2012, 03:15:46 pm »
The following is a simple way that I have coded to parse an untyped file in buffers of 4096 bytes. With the raw data, the ASCII is pulled out and then a reg expre ran over it.

I realise people will get annoyed at me asking this but what would be the equivalent way of coding this SIMPLY' (without additional constructors and seperate functions etc) using TFileStreams? I have asked before but got overly complex examples that I couldn't understand. Now that I've coded it using the old method, perhaps someone could just say "Replace that with this" converting it to a streams method?

Code: [Select]
var
   SrcFile                                          : File;
   Buffer                                           : array [1..4096] of char;
   TotalBytesRead, BytesRead, PositionInSourceFile,
    error                                          : Integer;
  TotalBytesRead, BytesRead        : Int64;


 begin
   FillChar(Buffer,SizeOf(Buffer),0);  // Clear the buffer array out from any previous runs
   BytesRead                 := 0;
   TotalBytesRead            := 0;
   PositionInSourceFile      := 0;

   AssignFile(SrcFile, UTF8ToSys(SourceFileName));   // and open the file using the old way
      try
        {$I-}
        Reset(SrcFile, 1) // Opens the file for reading. '1' is the recommended record size of untyped binary files
        {$I+};
        error := IOResult;
        if error = 0 then
          begin
            while not eof(SrcFile) do
              repeat
              StringContent := ' ';
              // Read the source file in buffers of 4096 bytes

              BlockRead(SrcFile, Buffer, SizeOf(Buffer), BytesRead);
              inc(TotalBytesRead, BytesRead);

              // Strip out binary garbage
              StringContent := StripNonAsciiExceptCRLF(Buffer); // a seperate function

              if MyRegExpression.Exec(StringContent) then
                begin
                      // Do stuff with the found text expression
                 end;
              until BytesRead = 0;  // Now more data to read so move on
          end;
      finally
        lblNoOfHits.Caption := ('Approx ' + strFBIDCounter + ' potential Facebook IDs found.');
        CloseFile(SrcFile);
        MyRegExpression.Free;
      end;
 
« Last Edit: August 17, 2012, 11:53:43 pm by tedsmith »

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: Convert the old file reading to TFileStream Method
« Reply #1 on: August 17, 2012, 03:23:08 pm »
Though I'm definitely not annoyed at your question, I'm going to be a bit rude here ;)

If you got overly complex examples, perhaps it's time to start at the beginning.
Have you read and understood http://wiki.lazarus.freepascal.org/File_Handling_In_Pascal

What was your own attempt at conversion? Where did it fail? Please show us the code.

Finally: why do you want to rewrite it with streams? Are you looking for some functionality only streams can provide or do you just want to learn about streams?
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

Gizmo

  • Hero Member
  • *****
  • Posts: 831
Re: Convert the old file reading to TFileStream Method
« Reply #2 on: August 17, 2012, 04:16:01 pm »
I have a program that uses 10 checkboxes. Each checkbox calls a different function. 7 of them relate to, and expect, small files so I am just using TFileStream.LoadFromFile to load the whole sourcefile to memory and "do stuff" with it there.

The other 3 expect large files - tens or hundreds of Gb. I tried to use Streams, specifically FileStream.ReadBuffer and KPJComp tried to help me out but his fine and full examples threw me a bit.

Yes, I have read the link referred to several times, particulary the part about binary files of course. But the example, like many examples on the net, is a very quick and small example that either refer to loading files in full into a buffer (not possible with huge files) or refers more to writing new files than reading existing ones.   

I realise I can use FS.ReadBuffer (if buffer size is known), FS.Read (if buffer size is not known), FS.Size (for getting the size of the file) FS.Position (for knowing where you are in the file) and one or two others. But putting them together is what keeps defeating me. I tried something like: 

while SF.Position <= SF.Size repeat
   SF.ReadBuffer(Buffer, SizeOf(Buffer);

etc etc  but then I kept hitting stumbling blocks relating to variable types not being compatible with what the stream returns, etc.  So eventually I gave up and recoded it to the older way, as coded above. However, the problem I have now is I have 7 checkboxes expecting a FileStream and 3 that don't and instead use a file assignent. I then end up with a problem if the user selects all of the checkboxes because there are issues with releasing the stream and it then not being available for the next function. So, I'm trying hard to get it all back to streams as they seem more flexible, thus my question, but they also seem harder to use (to me).
« Last Edit: August 17, 2012, 04:19:04 pm by tedsmith »

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: Convert the old file reading to TFileStream Method
« Reply #3 on: August 17, 2012, 04:27:28 pm »
Regarding the wiki article: please edit it if you want to with more example code (e.g. reading files using buffers instead of writing).
Improving the wiki is a very good thing as I'd say more people would look/search there than search the forums (which seems more suited for asking specific questions).

Regarding your example: showing actual code would help fix it.
Perhaps somebody else is willing to rewrite the code in your first post, though.

As for having "issues releasing the stream"??? Seems you'll have to fix that regardless. Presumably your problem with using a stream, that not being released, then using the file assignment, and then getting into trouble is merely a symptom... (Perhaps I misunderstand your problem description though).

Thanks,
BigChimp
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

ludob

  • Hero Member
  • *****
  • Posts: 1173
Re: Convert the old file reading to TFileStream Method
« Reply #4 on: August 17, 2012, 04:57:47 pm »
A while ago there was a similar thread on scanning huge files. Problem with normal streams is that they put everything in memory and that does not work for big files. Reading in chunks cause problems when the string to find is split over 2 chunks.
In the thread http://forum.lazarus.freepascal.org/index.php/topic,16496.msg89711.html#msg89711 several solutions where provided but the one that this link points to resembles most the operation of TFileStream but without loading the full file in memory.

BTW I believe you know the person that started that thread  ;D

Gizmo

  • Hero Member
  • *****
  • Posts: 831
Re: Convert the old file reading to TFileStream Method
« Reply #5 on: August 17, 2012, 08:46:54 pm »
OK, I think I worked it out, thanks for the reminders guys.

Code: [Select]
var
  BytesRead : Int64;
  Buffer  : array [0..4096] of char; // or of bytes

begin
  try
      while BytesRead < SourceFile.Size do
         BEGIN         
           BytesRead := FileStream.Read(Buffer,sizeof(Buffer));
           inc(TotalBytesRead, BytesRead);     
           // Whatever you want to do with each buffer segment 
         END;
  finally
    FileStream.Free
  end;
end;
That compiles, and works, generating the same output as my former code above. Does that look correct and optimised against problems?
« Last Edit: August 17, 2012, 10:29:48 pm by tedsmith »

mas steindorff

  • Hero Member
  • *****
  • Posts: 555
Re: Convert the old file reading to TFileStream Method
« Reply #6 on: August 17, 2012, 09:17:57 pm »
just for more info: (new is not always better)

http://forum.lazarus.freepascal.org/index.php/topic,9509.0.html

In the end, I looked at the source code of TFileStream and found it was making the same Filexxx calls I was but would add a double jump to my code.  Since I had a limited target and time, I kept the Filexxx calls. 

I guess it would be better for someone who hasn't worked with files before to use especially since it uses the newer file i/o call.

The wiki link that BigChimp implies the xxx.loadfromfile use the same stream calls as TFileStream.  if that was true, then it would have the same problem as my original Tstring.loadfromfile code did which is way I shyed away form it at the time. 
windows 10 &11, Ubuntu 21+ IDE 3.4 general releases

Gizmo

  • Hero Member
  • *****
  • Posts: 831
Re: Convert the old file reading to TFileStream Method
« Reply #7 on: August 17, 2012, 11:53:31 pm »
I have created an account and added my first update to the Wiki to give guidance on reading buffers of stream that will hopefully help others:

http://wiki.lazarus.freepascal.org/File_Handling_In_Pascal#Binary_files

Thanks for the help.

Ted

mas steindorff

  • Hero Member
  • *****
  • Posts: 555
Re: [SOLVED] Convert the old file reading to TFileStream Method
« Reply #8 on: August 18, 2012, 12:08:42 am »
tip:
loop:
  the rd:= FileRead(buff,count) command returns the number of bytes read.  (I belive this is true for Tfilestream.read() too

...
  you can check

if (rd < count) then
  Eof := TRUE;  // we are at the end of the file if count is the sizeof the buffer we need to contiunue
...
until Eof
windows 10 &11, Ubuntu 21+ IDE 3.4 general releases

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: Convert the old file reading to TFileStream Method
« Reply #9 on: August 18, 2012, 12:47:37 am »
I have created an account and added my first update to the Wiki to give guidance on reading buffers of stream that will hopefully help others:

http://wiki.lazarus.freepascal.org/File_Handling_In_Pascal#Binary_files
Thanks for the help in improving the wiki ;)
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

KpjComp

  • Hero Member
  • *****
  • Posts: 680
Re: [SOLVED] Convert the old file reading to TFileStream Method
« Reply #10 on: August 18, 2012, 09:05:17 am »
Code: [Select]
if (rd < count) then
  Eof := TRUE;  // we are at the end of the file if count is the sizeof the buffer we need to contiunue
...
until Eof

In theory using streams, that could potentially miss some data.

I often like this construct!!,.

Code: [Select]
while true do
begin
   BytesRead := FileStream.Read(Buffer,sizeof(Buffer));
   if BytesRead < 1 then break;
  // Whatever you want to do with each buffer segment 
end;

 

TinyPortal © 2005-2018