Recent

Author Topic: Is this data record a binary data structure?  (Read 18085 times)

flywire

  • Jr. Member
  • **
  • Posts: 85
Re: Is this data record a binary data structure?
« Reply #15 on: February 17, 2015, 09:00:22 am »
I am using Win7SP1 32 on an Intel Core2 Duo PC. Lazarus 1.2.6 r46529 FPC 2.6.4 i386-win32-win32/win64.

Output from Example2 project1 with Bermudas.imi in the same directory:

Code: [Select]
Usinge TMemoryStream:
1 Files, Checksum: 3411
[   0]:       test.txt  00000040  0000000B
AGELLAN
B2B

Usinge TFileStream:
1 Files, Checksum: 3411
[   0]:       test.txt  00000040  0000000B
AGELLAN
B2B

Usinge TFileStream:
122 Files, Checksum: F559
[   0]:       00cn.dat  00000B98  00000200
[   1]:      00cnc.dat  00000D98  00000400
[   2]:      00gr0.aux  00001198  00002876
[   3]:      00gr0.clp  00003A0E  00000200
[   4]:      00gr0.ext  00003C0E  000034EA
[   5]:     00gr0c.aux  000070F8  00000400
[   6]:     00gr0c.clp  000074F8  00000400
[   7]:     00gr0c.ext  000078F8  00000400
[   8]:      00gr1.aux  00007CF8  00000200
[   9]:      00gr1.clp  00007EF8  00000200
[  10]:      00gr1.ext  000080F8  00000200
[  11]:     00gr1c.aux  000082F8  00000400
[  12]:     00gr1c.clp  000086F8  00000400
[  13]:     00gr1c.ext  00008AF8  00000400
[  14]:      00gr2.aux  00008EF8  00000200
[  15]:      00gr2.clp  000090F8  00000200
[  16]:      00gr2.ext  000092F8  00000200
[  17]:     00gr2c.aux  000094F8  00000400
[  18]:     00gr2c.clp  000098F8  00000400
[  19]:     00gr2c.ext  00009CF8  00000400
[  20]:      00gr3.aux  0000A0F8  00000200
[  21]:      00gr3.clp  0000A2F8  00000200
[  22]:      00gr3.ext  0000A4F8  00000200
[  23]:     00gr3c.aux  0000A6F8  00000400
[  24]:     00gr3c.clp  0000AAF8  00000400
[  25]:     00gr3c.ext  0000AEF8  00000400
[  26]:      00gr4.aux  0000B2F8  00000200
[  27]:      00gr4.clp  0000B4F8  00000200
[  28]:      00gr4.ext  0000B6F8  00000200
[  29]:     00gr4c.aux  0000B8F8  00000400
[  30]:     00gr4c.clp  0000BCF8  00000400
[  31]:     00gr4c.ext  0000C0F8  00000400
[  32]:      00gr5.aux  0000C4F8  0000024C
[  33]:      00gr5.clp  0000C744  00000200
[  34]:      00gr5.ext  0000C944  0000023A
[  35]:     00gr5c.aux  0000CB7E  00000400
[  36]:     00gr5c.clp  0000CF7E  00000400
[  37]:     00gr5c.ext  0000D37E  00000400
[  38]:      00gr6.aux  0000D77E  00000270
[  39]:      00gr6.clp  0000D9EE  00000200
[  40]:      00gr6.ext  0000DBEE  00000248
[  41]:     00gr6c.aux  0000DE36  00000400
[  42]:     00gr6c.clp  0000E236  00000400
[  43]:     00gr6c.ext  0000E636  00000400
[  44]:      00gr7.aux  0000EA36  00000AB2
[  45]:      00gr7.clp  0000F4E8  00000238
[  46]:      00gr7.ext  0000F720  00001C9A
[  47]:     00gr7c.aux  000113BA  00000400
[  48]:     00gr7c.clp  000117BA  00000400
[  49]:     00gr7c.ext  00011BBA  00000400
[  50]:     00lay0.clt  00011FBA  00000000
[  51]:     00lay0.lay  00011FBA  00000080
[  52]:     00lay1.clt  0001203A  0000000C
[  53]:     00lay1.lay  00012046  0000245C
[  54]:    00lay10.clt  000144A2  0000000C
[  55]:    00lay10.lay  000144AE  000003EE
[  56]:    00lay11.clt  0001489C  0000000C
[  57]:    00lay11.lay  000148A8  0000032E
[  58]:    00lay12.clt  00014BD6  0000000C
[  59]:    00lay12.lay  00014BE2  0000026C
[  60]:    00lay13.clt  00014E4E  0000000C
[  61]:    00lay13.lay  00014E5A  00000230
[  62]:    00lay14.clt  0001508A  0000000C
[  63]:    00lay14.lay  00015096  00000FE6
[  64]:    00lay15.clt  0001607C  0000000C
[  65]:    00lay15.lay  00016088  00000090
[  66]:    00lay16.clt  00016118  0000000C
[  67]:    00lay16.lay  00016124  0000009A
[  68]:    00lay17.clt  000161BE  0000000C
[  69]:    00lay17.lay  000161CA  00000090
[  70]:    00lay18.clt  0001625A  0000000C
[  71]:    00lay18.lay  00016266  0000009A
[  72]:    00lay19.clt  00016300  0000000C
[  73]:    00lay19.lay  0001630C  0000047A
[  74]:     00lay2.clt  00016786  0000000C
[  75]:     00lay2.lay  00016792  000015CE
[  76]:    00lay20.clt  00017D60  0000000C
[  77]:    00lay20.lay  00017D6C  00003C4C
[  78]:    00lay21.clt  0001B9B8  0000000C
[  79]:    00lay21.lay  0001B9C4  0000179E
[  80]:    00lay22.clt  0001D162  0000000C
[  81]:    00lay22.lay  0001D16E  00000A4C
[  82]:     00lay3.clt  0001DBBA  0000000C
[  83]:     00lay3.lay  0001DBC6  00008318
[  84]:     00lay4.clt  00025EDE  0000000C
[  85]:     00lay4.lay  00025EEA  000013D8
[  86]:     00lay5.clt  000272C2  00000000
[  87]:     00lay5.lay  000272C2  00000080
[  88]:     00lay6.clt  00027342  000000A8
[  89]:     00lay6.lay  000273EA  00017396
[  90]:     00lay7.clt  0003E780  0000000C
[  91]:     00lay7.lay  0003E78C  000000DA
[  92]:     00lay8.clt  0003E866  0000000C
[  93]:     00lay8.lay  0003E872  00001CC6
[  94]:     00lay9.clt  00040538  0000000C
[  95]:     00lay9.lay  00040544  000004F0
[  96]:      00map.ini  00040A34  00000EA6
[  97]:      00poi.cfg  000418DA  000000EE
[  98]:      00poi.clt  000419C8  0000000C
[  99]:      00poi.dax  000419D4  0000025A
[ 100]:      00poi.dct  00041C2E  0000045A
[ 101]:      00poi.dpo  00042088  00000A1E
[ 102]:      00poi.dsc  00042AA6  00000402
[ 103]:      00poi.dtx  00042EA8  00000B6C
[ 104]:      00poi.lay  00043A14  00000A76
[ 105]:     00poic.dax  0004448A  00000400
[ 106]:     00poic.dct  0004488A  00000400
[ 107]:     00poic.dpo  00044C8A  00000400
[ 108]:     00poic.dsc  0004508A  00000400
[ 109]:     00poic.dtx  0004548A  00000400
[ 110]:       00t0.blx  0004588A  00002CEE
[ 111]:       00t1.blx  00048578  00000992
[ 112]:        00z.dat  00048F0A  00000200
[ 113]:       00zc.dat  0004910A  00000400
[ 114]:   add_maps.cfg  0004950A  0000000E
[ 115]:    BMP2BIT.ICS  00049518  000001C0
[ 116]:    BMP4BIT.ICS  000496D8  00007BA4
[ 117]:   cprt_txt.txt  0005127C  00000118
[ 118]:    cvg_map.msf  00051394  00000320
[ 119]:       db00.dbd  000516B4  00001A26
[ 120]:   logo_img.png  000530DA  00002CC2
[ 121]:     topo3d.ini  00055D9C  000001B9
 MAGELLA
4E00

Code: [Select]
Bytes from 5 5F50 : 5 5F61

48 0D 0A 0D   0A 00 20 4D   41 47 45 4C   4C 41 4E 00   11 B9
End of file above is 55D9C + 1B99 = 55F55 as position of next file. This is an odd number so add 1 to 55F56.
This character should be the first character of the 'MagicWord' "Magellan" but it is Character 20. Then 'MagicWord' then null then Checksum.

I had seen things suggesting that the BodyEnd varies a bit and may even be replaced by another version but the most important thing is that it excludes the last two checkbits which are then calculated and added.

PS Sorry for the dumb questions but that helps me confirm my assumptions. The  Body[ i ] was very interesting and I assume that the convention is to move towards Null terminated strings and assume that that is what is in these records.

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Is this data record a binary data structure?
« Reply #16 on: February 17, 2015, 06:10:47 pm »
I'm sorry I should have paid more attention to the results! Now I see the problem. A simple solution would be to look for the magic word "MAGELLAN" directly after the end of the last entry within 3 bytes offset. To do so I elected to store the magic word in a constant:
Code: [Select]
  const
    MAGELLAN:array[0..7] of char='MAGELLAN';
then to use a variable of type QWord (8 bytes) that has the same absolute address of the previous constant:
Code: [Select]
  var
    qwMAGELLAN:QWord absolute MAGELLAN;
this way I can simply read a QWord value from the steam and compare it with the magic word:
Code: [Select]
      qw := Stream.ReadQWord;
      if qw=qwMAGELLAN then
if it passed then we found the magic word. I put this change in ReadBodyEnd:
Code: [Select]
  procedure ReadBodyEnd;
  const
    MAGELLAN:array[0..7] of char='MAGELLAN';
  var
    qwMAGELLAN:QWord absolute MAGELLAN;
    qw:QWord;
    i: integer;
    EndPos: int64;
  begin
    //Find the magic word
    EndPos := Stream.Position - (IMIArchive.TOC.TOCEntries[IMIArchive.TOC.NumOfFiles1-1].Length and 1);
    for i := 0 to 2 do
    begin
      Stream.Position:=EndPos+i;
      qw := Stream.ReadQWord;
      if qw=qwMAGELLAN {$4E414C4C4547414D} then
      begin
        Stream.Position := Stream.Position - 8;
        break;
      end;
    end;

    Stream.Read(IMIArchive.BodyEnd.MAGELLAN, 8);
    if ((Stream.Position-StartPos) and 1) = 1 then
      Stream.Read(IMIArchive.BodyEnd.OptionalB, 1);
    Stream.Read(IMIArchive.BodyEnd.Checksum1, 2);
  end;

flywire

  • Jr. Member
  • **
  • Posts: 85
Re: Is this data record a binary data structure?
« Reply #17 on: February 19, 2015, 10:51:41 am »
Perfect.  :D

Code: [Select]
Usinge TMemoryStream:
1 Files, Checksum: 3411
[   0]:       test.txt  00000040  0000000B
MAGELLAN
B2B

...

[ 121]:     topo3d.ini  00055D9C  000001B9
MAGELLAN
11B9


The three digit checksum from $0B, $2B had me confused then I realised that the '0' is not displayed.

flywire

  • Jr. Member
  • **
  • Posts: 85
Re: Is this data record a binary data structure?
« Reply #18 on: March 05, 2015, 01:06:44 pm »
Based on the file specification:
Code: [Select]
type
  TTOCEnd = packed record
    Checksum1: Byte;
    Checksum2: Byte;
    MAGELLAN: array[0..8-1] of char;
    Zeros22: array[0..22-1] of byte;
  end;
  PTOCEnd = ^TTOCEnd;

  TTOCEntry = packed record
    Name: array[0..8-1] of char;
    NullB: Byte;
    Extension: array[0..3-1] of char;
    NullDW: DWord;
    Offset: DWord;
    Length: DWord;
  end;
  PTOCEntry = ^TTOCEntry;

  TTOC = packed record
    NumOfFiles1: DWord;
    NumOfFiles2: DWord;
    TOCEntries: array of TTOCEntry;
    TOCEnd: TTOCEnd;
  end;
  PTOC = ^TTOC;

  TBodyEnd = packed record
    MAGELLAN: array[0..8-1] of char;
    OptionalB: byte;
    Checksum1: byte;
    Checksum2: byte;
  end;

  TIMIArchive = packed record
    TOC: TTOC;
    Body: array of string;
    BodyEnd: TBodyEnd;
  end;
  PIMIArchive = ^TIMIArchive;

What are the thoughts about how to set this complex record type up as a variant record with a word as the variant? I will rewrite the procedure as TFileStream because I need to process some potentially very large files very efficiently two bytes at a time from a specified start and end position in the file.

The other problem I run into is the file is stored in little endian so the bytes get reversed when I read the file as a word (but hey the checksum will be reversed when it is written too so maybe it doesn't matter).

Code: [Select]
procedure CheckSumCalc(var CsumValue: word; CsumData: array of word; errorflag: integer);
var j: LongInt;
begin
  CsumValue := 0;
  for j := 0 to High(CsumData) do
    CsumValue := CsumValue xor CsumData[j];
end;

PS Yes, I'm stumped at the Binary File Handling in the wiki.
« Last Edit: March 05, 2015, 01:10:47 pm by flywire »

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Is this data record a binary data structure?
« Reply #19 on: March 05, 2015, 04:49:01 pm »
What are the thoughts about how to set this complex record type up as a variant record with a word as the variant? I will rewrite the procedure as TFileStream because I need to process some potentially very large files very efficiently two bytes at a time from a specified start and end position in the file.

What we have here is just a draft, so yes, it does need some changes, for instance this:
Code: [Select]
    Checksum1: Byte;
    Checksum2: Byte;
should become:
Code: [Select]
    Checksum: Word;

And for speed and for the way the strings are evolving in the coming compiler version this:
Code: [Select]
    Body: array of string;
should be canceled. Originally it was meant as an example and to make testing easy. Only the needed file(s) should be read from the archive and disposed of when done.

TFileStream is TStream, you can use it now without changes, and you can also use other streams like TMemoryStream. Having a TOC and a stream should be enough to enable you to get any data you need quickly. Let me repeat again, Body: array of string; should not be used! With archives of Gigabytes, reading the whole body is not going to be efficient.

The other problem I run into is the file is stored in little endian so the bytes get reversed when I read the file as a word (but hey the checksum will be reversed when it is written too so maybe it doesn't matter).

Code: [Select]
procedure CheckSumCalc(var CsumValue: word; CsumData: array of word; errorflag: integer);
var j: LongInt;
begin
  CsumValue := 0;
  for j := 0 to High(CsumData) do
    CsumValue := CsumValue xor CsumData[j];
end;

PS Yes, I'm stumped at the Binary File Handling in the wiki.

As you did guess correctly, it does not matter as long as reading and writing the archive uses the same endianness. The code we have now is just for LE CPUs, the problem might happen if you write an archive on Power PC (Motorola CPU - BE) and read it on a PC (Intel CPU - LE) or vice versa. The specification made it clear that:
Quote
All files within the archive, as well as the imi-file are LITTLE_ENDIAN encoded.
So BE CPUs would need extra coding to change endianness. I assume you are using an Intel [compatible] CPU.

flywire

  • Jr. Member
  • **
  • Posts: 85
Re: Is this data record a binary data structure?
« Reply #20 on: March 05, 2015, 08:23:08 pm »
Hmmm  ... the version in this post is not exactly what I am using but my changes would confuse this post. Of course I am using word rather than two bytes in the record structure.

The checksum function is very slow but I think it demonstrates the functionality for this post until I sort it out. I expect that TFileStream, memory stream, large buffers, address pointers and asm should make it fly!

Is the variant record a useful way to go with the checksum? The whole file needs to be read to calculate the checksum but only the complex record would be used when anything is written.


engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Is this data record a binary data structure?
« Reply #21 on: March 05, 2015, 09:12:17 pm »
Ah, I see! Keep in mind that reading the whole file is usually a slow process. Most likely that is where most of the time is lost. As for calculating the check sum, I see two issues:
1- Add const before CsumData: array of word in the procedure signature.
2- Order is not important when doing xor operations. Use PtrUInt type instead or word to benefit from the CPU (32 or 64) bits, excluding the last few odd bytes. This way you do (2 or 4) xor operations at the same time, at the end merge the result back to word.

What file sizes are you dealing with?

Edit:
Here is one way to calculate the checksum:
Code: [Select]
procedure CheckSumCalc(var CsumValue: word; const CsumData: array of word; const vSize: PtrInt; var errorflag: integer); overload;
var
  j, CPUTimes, Odd: PtrInt;
  cs: PtrUInt;
  p: PPtrUInt;
  pw: PWord;
begin
  { Start with zero checksums }
  CsumValue := 0;
  cs := 0;

  { Calculate loop length based on PtrUInt }
  CPUTimes := (vSize * 2) div SizeOf(PtrUInt);

  { Calculte left over loop length }
  Odd := vSize - (SizeOf(PtrUInt)*CPUTimes div 2);

  { Point at the beginning of data past left over }
  p := @CsumData[Odd];

  { Do main calculation }
  for j := 0 to CPUTimes - 1  do
    cs := cs xor p[j];

  { Do left over calculation }
  for j := 0 to Odd - 1  do
    CsumValue := CsumValue xor CsumData[j];

  { Merge calculations back to word }
  pw := @cs;
  for j := 0 to SizeOf(PtrUInt) div 2 - 1 do
    CsumValue := CsumValue xor pw[j];
end;

procedure CheckSumCalc(var CsumValue: word; const CsumData: array of word; var errorflag: integer); overload;
begin
  CheckSumCalc(CsumValue, CsumData, Length(CsumData), errorflag);
end;

procedure CheckSumCalc(var CsumValue: word; const vStream: TStream; var errorflag: integer); overload;
const
  BufSize = 1024*4;
var
  BufByte: array[0..BufSize-1] of byte;
  Buf: array[0..BufSize div 2 - 1] of word absolute BufByte;
  size: Int64;
  RemainingSize: Int64;
  cs: word;
begin
  { Start with zero checksums }
  cs := 0;
  CsumValue := 0;

  { Exclude 2 bytes - original checksum }
  RemainingSize := vStream.Size - 2;

  { Do calculations in chunks }
  while RemainingSize>0 do
  begin
    { Read a chunk }
    if RemainingSize>Length(BufByte) then
      size := vStream.Read(Buf[0], Length(BufByte))
    else
      size := vStream.Read(Buf[0], RemainingSize);

    { Do the real checksum calculation for this chunk }
    CheckSumCalc(cs, Buf[0], Size div 2, errorflag);

    { Merge the result with previous ones }
    CsumValue := CsumValue xor cs;
    Dec(RemainingSize, size);
  end;
end;

And to use it:
Code: [Select]
    m_stream.Position:=0;
    CheckSumCalc(chksum, m_stream, errorflg);
    WriteLn('chksum: ',IntToHex(chksum, 4));

or

Code: [Select]
    f_stream := TFileStream.Create('C:\Path\SomeBigFile.imi', fmOpenRead);
    f_stream.Position:=0;
    CheckSumCalc(chksum, f_stream, errorflg);
    WriteLn('checksum: ',IntToHex(chksum, 4));
    f_stream.Free;
« Last Edit: March 06, 2015, 12:30:18 am by engkin »

flywire

  • Jr. Member
  • **
  • Posts: 85
Re: Is this data record a binary data structure?
« Reply #22 on: March 07, 2015, 07:50:59 pm »
Engkin, I hope this is fairly advanced stuff for the beginners section. You have certainly got me scratching my head a bit over the last month given that this was my first program for decades. The concepts are fine but I am not completely competent with all of the code and I am still using other exercises and writeln various bits to build my competency. I haven't even figured out the IDE debugger yet.

File sizes - 1GB is a large map but there are normally up to 20 smaller ones down to Bermudas.imi at 344k, which is the smallest one that I could find, but who knows in the future. Speed is OK - testing a 1GB map your routine took 10 sec in Win7 on a core Duo with 2GB RAM. What I like about Lazarus is the prospect of running the application non-windows and on the actual WinCE device (ie very crammed environment so I don't assume that everything will fit in memory).

Since you have stuck with me can you explain how the overloading is working? As I see it there are two procedures that can be run and I suspect that I am using the inefficient one.

I still need to get on top of working with part of a stream. For example, I need to calculate two checksums, 'the first checksum in the TOC is build over the first NumberOfFiles * 24 + 8 bytes of data. The second checksum is build over the complete file starting at byte 0'. I was working on copying the required part of the fstream to an array and passing that to the procedure but I understand that this is very inefficient. I can see that stream.Position sets the start and I will modify procedure to set the end since it is not always eof-2 bytes. It seems that I need to move the stream position around a bit. Can you confirm that I need separate streams to process the file and calculate checksums? I suppose that this is where pointers are efficient.

Similar for files. I need to be able to extract a subfile to disk file, edit a subfile and replace a subfile. I understand that streams will do all of this but I haven't been able to find good examples yet. eg The wiki binary file handling tutorial explains start from new or append (and I can confirm that the ReadBinaryDataInMemoryForAppend with a different stream position works fine as overwrite). I need replace, perhaps zero length for one part (ie insert or delete) and I think it would be well placed in the wiki and would be happy to put it there when I understand it. The File Copy in the wiki would be the perfect place to demonstrate larger files. I assume that if I set the stream position to part way through and write to it then I have to rewrite the rest of the stream to make that change or is there an easy way to achieve this with large files?

Another issue is not knowing how much I can fit into RAM (especially considering devices) so picking buffer sizes to allowing the processing in stages (ie large file). How do I design for that?
« Last Edit: March 08, 2015, 06:02:28 am by flywire »

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Is this data record a binary data structure?
« Reply #23 on: March 08, 2015, 05:27:45 am »
The concepts are fine but I am not completely competent with all of the code and I am still using other exercises and writeln various bits to build my competency.

Feel free to ask questions.

(ie very crammed environment so I don't assume that everything will fit in memory).

There is little or no benefit of having all of it in memory. The sample checksum calculation in the previous post reads the file in 4kb chunks.

can you explain how the overloading is working? As I see it there are two procedures that can be run and I suspect that I am using the inefficient one.

Both are equal. One uses a stream while the other uses an array. Based on your post I made the array version. While a stream version seemed more convenient. If you have the data in a file then use the stream version.

I still need to get on top of working with part of a stream. For example, I need to calculate two checksums, 'the first checksum in the TOC is build over the first NumberOfFiles * 24 + 8 bytes of data. The second checksum is build over the complete file starting at byte 0'.

If you change the procedure that's using a stream to have an additional parameter "vSize" with a default value of -1:
Code: [Select]
procedure CheckSumCalc(var CsumValue: word; const vStream: TStream; var errorflag: integer; vSize: int64 = -1); overload;
and in its body instead of:
Code: [Select]
    { Exclude 2 bytes - original checksum }
    RemainingSize := vStream.Size - 2;

use the new parameter:
Code: [Select]
  if vSize>0 then
    RemainingSize := vSize
  else
    { Exclude 2 bytes - original checksum }
    RemainingSize := vStream.Size - 2;
that should allow you to use part of a stream.

I was working on copying the required part of the fstream to an array and passing that to the procedure but I understand that this is very inefficient.

For a small amount, which is the case with TOC, it should not be a problem.

Similar for files. I need to be able to extract a subfile to disk file,

This involves two file streams. One for the imi file (IMIStream), and the other for the subfile (SubStream) and assuming the index of the file you want to extract in the TOCEntries is vIndex:
Code: [Select]
  { Move the stream position to the beginning of the file you want to extract }
  IMIStream.Position := TOC.TOCEntries[vIndex].Offset;

  { Copy it to the sub stream (FileStream) }
  SubStream.CopyFrom(IMIStream, TOC.TOCEntries[vIndex].Length);
of course the constructors of IMIStream and SubStream would need file names and proper file modes. Don't forget to free the streams when done.

edit a subfile and replace a subfile. I understand that streams will do all of this but I haven't been able to find good examples yet. eg The wiki binary file handling tutorial explains start from new or append (and I can confirm that the ReadBinaryDataInMemoryForAppend with a different stream position works fine as overwrite). I need replace, perhaps zero length for one part (ie insert or delete) and I think it would be well placed in the wiki and would be happy to put it there when I understand it. The File Copy in the wiki would be the perfect place to demonstrate larger files. I assume that if I set the stream position to part way through and write to it then I have to rewrite the rest of the stream to make that change or is there an easy way to achieve this with large files?

For editing, replacing, and deleting a file or more from an imi archive generate another imi archive based on the contents of the original one. Which means if you delete one byte from the first file on a 1GB archive, you'll have to create another 1GB archive. IMHO, this format is meant to be used for reading mainly.

Another issue is not knowing how much I can fit into RAM (especially considering devices) so picking buffer sizes to allowing the processing in stages (ie large file). How do I design for that?

In our time 4KB should be ok on most (all?) devices. You can specify the size of the buffer as a setting in your program and use dynamic arrays for the buffer, but I don't think you need to do that.

flywire

  • Jr. Member
  • **
  • Posts: 85
Re: Is this data record a binary data structure?
« Reply #24 on: March 08, 2015, 10:30:38 am »
can you explain how the overloading is working? As I see it there are two procedures that can be run and I suspect that I am using the inefficient one.

Both are equal. ...

I see that the Tstream version calls the array version near the end of the Tstream procedure.


Finally a one line answer. The file checksum can be calculated more efficiently by starting after the TOC checksum. Why won't the seek line reset the stream to the required start point?

Code: [Select]
var
  errorflg: integer;
  size: Int64;
  chksum: word;
  f_stream: TFileStream;

begin
  f_stream := TFileStream.Create('TestArchive.imi', fmOpenRead); // TestArchive.imi 32 Bermudas.imi 152
  // Calculate TOC checksum
  f_stream.Position:=0;
  size:=32;
  CheckSumCalc(chksum, f_stream, errorflg, size);
  WriteLn('checksum: ',IntToHex(chksum, 4));

  // Calculate file checksum
writeln(f_stream.Position);                                    //
f_stream.Position:=0;                                          //
//f_stream.seek(2, soFromCurrent);                               //
writeln(f_stream.Position);                                    //
  CheckSumCalc(chksum, f_stream, errorflg);
  WriteLn('checksum: ',IntToHex(chksum, 4));
  ReadLn();
  f_stream.Free;
end.

Full code attached.

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Is this data record a binary data structure?
« Reply #25 on: March 08, 2015, 04:36:49 pm »
The file checksum can be calculated more efficiently by starting after the TOC checksum.

That's true. It did not occur to me.  ;D

Why won't the seek line reset the stream to the required start point?

It does. But CheckSumCalc then needs the correct size of bytes to calculate the checksum. The size is the number of bytes after the 1st checksum and before the 2nd checksum:

Code: [Select]
var
  errorflg: integer;
  size: Int64;
  TocChecksum, FileChecksum: word;
  f_stream: TFileStream;

begin
  f_stream := TFileStream.Create('TestArchive.imi', fmOpenRead); // TestArchive.imi 32 Bermudas.imi 152

  { Calculate TOC checksum }
  f_stream.Position:=0;

  size:=32; { should be NumberOfFiles * 24 + 8 }
  CheckSumCalc(TocChecksum, f_stream, errorflg, size);
  WriteLn('TOC Checksum: ',IntToHex(TocChecksum, 4));

  { Calculate file checksum starting after the TOC checksum }
  writeln('Position before seek: ', f_stream.Position);
  f_stream.seek(2, soFromCurrent);

  { Calculate the number of bytes left }
  size := f_stream.Size - f_stream.Position;
  size := size - 2; { Exclude 2 bytes for the file checksum itself }

  writeln('Position after seek: ', f_stream.Position, ', Size: ', size);

  CheckSumCalc(FileChecksum, f_stream, errorflg, size);
  WriteLn('IMI Checksum starting after TOC checksum: ',IntToHex(FileChecksum, 4));

  { Calculate file checksum using the whole file }
  f_stream.Position:=0;
  CheckSumCalc(FileChecksum, f_stream, errorflg);
  WriteLn('IMI Checksum using the whole file: ',IntToHex(FileChecksum, 4));

  ReadLn();
  f_stream.Free;
end.

 

TinyPortal © 2005-2018