Recent

Author Topic: Fast and simple find and replace string in large binary file  (Read 2079 times)

Cyclone

  • Newbie
  • Posts: 3
Fast and simple find and replace string in large binary file
« on: November 02, 2021, 11:49:47 am »
Hi!
This is my first post on this forum, despite the fact that I have been reading it for many years.
I have a question that I cannot figure out on my own, so I decided to ask here.

I have several hundred of binary files up to 20 MB in size (task for the Windows platform).
The first 100 bytes of some files contain ANSI-string up to 10 characters long. I need to go through all the files, determine if the file contains this line, and if it does, then replace it with another line (with the same length), while not changing any other bytes from the file, and overwrite it back to the disk.

After several hours of reading forums and programming, I have not found a simple and fast solution.
First, I loaded the full file in memory (may be not the fastest way, I didn't figure out how to work only with first, let say, 100 bytes of file).
I tried opening binaries in a TStringList, but after saving back to disk, LF bytes were changed to a CR LF (determined as line end delimiter). Tried TMemoryStream, but couldn't find an efficient way to find and replace a string. I haven’t figured it out to the end with TStringStream.

Please advise what is the easiest and fastest way to solve this problem.
Thank you in advance.
« Last Edit: November 02, 2021, 12:00:22 pm by Cyclone »

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11448
  • FPC developer.
Re: Fast and simple find and replace string in large binary file
« Reply #1 on: November 02, 2021, 11:56:32 am »
IS the string to replace the same length as the existing one?

Cyclone

  • Newbie
  • Posts: 3
Re: Fast and simple find and replace string in large binary file
« Reply #2 on: November 02, 2021, 11:59:36 am »
IS the string to replace the same length as the existing one?
Yes, it is.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11448
  • FPC developer.
Re: Fast and simple find and replace string in large binary file
« Reply #3 on: November 02, 2021, 12:03:40 pm »
Then you can just use (from memory search in docs for the "file" filetype and blockread/write for more details)


Code: Pascal  [Select][+][-]
  1. var f : file;
  2.    buf : array[0..4095] of byte;
  3.   byteswritten,
  4.   bytesread : integer;
  5. byteswritten
  6. begin
  7.   assignfile(f,'filename.something.txt');
  8.   reset(f,1);
  9.   blockread(f,buf,sizeof(buf),bytesread);
  10.   // inspect and modify buf
  11.   if needs modification then
  12.    begin
  13.      seek(f,0);
  14.      blockwrite(f,buf,bytesread,byteswritten);
  15.    end;
  16. closefile(f);
  17. end;

Bart

  • Hero Member
  • *****
  • Posts: 5289
    • Bart en Mariska's Webstek
Re: Fast and simple find and replace string in large binary file
« Reply #4 on: November 02, 2021, 03:30:43 pm »
In a 20 MB file, one blockread won't do, and if you are unlucky you read the first part of the string in block X and the rest in block X+1.

Bart

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11448
  • FPC developer.
Re: Fast and simple find and replace string in large binary file
« Reply #5 on: November 02, 2021, 03:38:45 pm »
In a 20 MB file, one blockread won't do, and if you are unlucky you read the first part of the string in block X and the rest in block X+1.

In the original message was that the string was in the first 100 bytes. And if the replacement is same size, you don't have to read/write the whole file, you can just do an in place update with seek.

Bart

  • Hero Member
  • *****
  • Posts: 5289
    • Bart en Mariska's Webstek
Re: Fast and simple find and replace string in large binary file
« Reply #6 on: November 02, 2021, 04:22:27 pm »
In the original message was that the string was in the first 100 bytes.

Missed that.
Sorry.

Bart

Cyclone

  • Newbie
  • Posts: 3
Re: Fast and simple find and replace string in large binary file
« Reply #7 on: November 02, 2021, 06:39:13 pm »
Then you can just use (from memory search in docs for the "file" filetype and blockread/write for more details)


Code: Pascal  [Select][+][-]
  1. var f : file;
  2.    buf : array[0..4095] of byte;
  3.   byteswritten,
  4.   bytesread : integer;
  5. byteswritten
  6. begin
  7.   assignfile(f,'filename.something.txt');
  8.   reset(f,1);
  9.   blockread(f,buf,sizeof(buf),bytesread);
  10.   // inspect and modify buf
  11.   if needs modification then
  12.    begin
  13.      seek(f,0);
  14.      blockwrite(f,buf,bytesread,byteswritten);
  15.    end;
  16. closefile(f);
  17. end;

Many thanks!
I tried your code and it works fine.
But I didn't get why the buffer size is 4096 bytes? Any specific reason for that (program speed, e.g. disk sector size?)? I need only the first 100 bytes. In blockread doc example buffer size is 2048. I also tried 100 bytes long buffer, and it also works fine.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11448
  • FPC developer.
Re: Fast and simple find and replace string in large binary file
« Reply #8 on: November 02, 2021, 06:57:14 pm »
Yeah, sector size.  100 bytes will work fine.

MarkMLl

  • Hero Member
  • *****
  • Posts: 6685
Re: Fast and simple find and replace string in large binary file
« Reply #9 on: November 02, 2021, 08:43:30 pm »
Or if the first few bytes really do have a well-understood format define an appropriate record, read/write precisely that number of bytes, and leave the compiler to do the hard work and optimisation.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

Warfley

  • Hero Member
  • *****
  • Posts: 1499
Re: Fast and simple find and replace string in large binary file
« Reply #10 on: November 02, 2021, 09:46:38 pm »
If you need very high performance, you should consider mapping the file into memory (On Windows: Link, on Unix: Link).

Then simply search through the memory region for the pattern (e.g. using a pointer) and override the memory.

 

TinyPortal © 2005-2018