Recent

Author Topic: TMemoryStream file format  (Read 2671 times)

jcmontherock

  • Full Member
  • ***
  • Posts: 234
TMemoryStream file format
« on: June 08, 2020, 05:18:02 pm »
Hello, I have the following code:
Code: Pascal  [Select][+][-]
  1. TParmData = record
  2.     FormRect: TRect;
  3.     sName :   AnsiString;
  4.     age  :    Integer;
  5.     male :    Boolean;
  6.   end;
  7.   ....
  8.   ClearStream:  TMemoryStream;
  9.   ParmData:     TParmData;
  10.   slServers:    TStringList;
  11.   ....
  12.   ParmData.FormRect := Bounds(Form1.Left, Form1.Top, Form1.Width, Form1.Height);
  13.   ParmData.sName := 'ééààèèüüüüü abcdefghijklmnopqrstuvwxy 40........50' +
  14.                     'Turner ==^<=abcdefghijklmnopqrstuvwxy 40........50' +
  15.                     'Turner ==^<=abcdefghijklmnopqrstuvwxy 40........50';
  16.   ParmData.age  := 45;
  17.   ParmData.male := False;
  18.  
  19.  
  20.   Writing file: ..............
  21.   ClearStream.Write(ParmData, SizeOf(ParmData));
  22.   ClearStream.WriteAnsiString(slServers.DelimitedText);
  23.   ClearStream.Position := 0;
  24.   ClearStream.SaveToFile(sFilename);
  25.  
  26.   Reading file: ............
  27.   if FileExists(sFilename) then begin
  28.     ClearStream.Clear;
  29.     ClearStream.LoadFromFile(sFilename);
  30.     ClearStream.Position := 0;
  31.     ClearStream.Read(ParmData, SizeOf(ParmData));
  32.     slServers.DelimitedText := ClearStream.ReadAnsiString;
  33.   ...
  34.  
  Aftert reading disk file, every data are OK, and I get:
  ------------------------------------------------------
  sName length = 161                      OK. Increase of size was done by Utf-8 encoding.
  SizeOf(ParmData) = 32                   ???????
  Length(slServers.DelimitedText) = 147   OK. Same remark.
  Disk Record = 179 (0xB3) without BOM    ???????
  Encoding record: utf-8, W10.

Disk record contains the name (150 char or 161 bytes), a text (143 char or 147 bytes) from TStrings, a TRect (16), an Integer (4) and a boolean (1).
Total = 161 + 147 + 16 + 4 + 1 = 329 bytes ??????
Record size:                     179 bytes
I don't understand these numbers. I search to find in which format, disk record was written.
I did'nt find it. Has somebody an idea ?
Windows 11 UTF8-64 - Lazarus 3.2-64 - FPC 3.2.2

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11383
  • FPC developer.
Re: TMemoryStream file format
« Reply #1 on: June 08, 2020, 05:26:06 pm »
ansistring is a reference type. IOW the record only stores a pointer.

jcmontherock

  • Full Member
  • ***
  • Posts: 234
Re: TMemoryStream file format
« Reply #2 on: June 08, 2020, 06:14:33 pm »
Ok, I read the file in a second run... It's not a pointer in the file.
Windows 11 UTF8-64 - Lazarus 3.2-64 - FPC 3.2.2

wp

  • Hero Member
  • *****
  • Posts: 11857
Re: TMemoryStream file format
« Reply #3 on: June 08, 2020, 06:49:43 pm »
Ok, I read the file in a second run... It's not a pointer in the file.
No, it is. A string IS a pointer. Your record contains only the pointer to the string characters - the character array is stored somewhere else. Either you use string[255] (or similar) which holds the characters within the record or you read/write the string characters separately from the record:

Not tested:
Code: Pascal  [Select][+][-]
  1. TParmData = record
  2.     FormRect: TRect;
  3.     age  :    Integer;
  4.     male :    Boolean;
  5.     sName :   AnsiString;      // <---- Move the string to the end of the record
  6.   end;
  7. var
  8.   ParmData: TParmData;
  9.   sNameLen: Integer;
  10. ...
  11.   // Writing
  12.   ClearStream.Write(ParmData, SizeOf(TParmData) - SizeOf(AnsiString));  // Write the record, but without the string (which is a pointer!), ...
  13.   sNameLen := Length(ParmData.sName);                                  
  14.   ClearStream.Write(sNameLen, SizeOf(sNameLen));                        // ... write the string length ...
  15.   if sNameLen > 0 then
  16.     ClearStream.Write(ParmData.sName[1], sNameLen);                     // ... and the string data separately
  17. ...
  18.   // Reading
  19.   ClearStream.Read(ParmData, SizeOf(TParmData) - SizeOf(AnsiString));   // do not read the string here, but...
  20.   ClearStream.Read(sNameLen, SizeOf(sNameLen));                         // ... read the string length
  21.   SetLength(ParmData.sName, sNameLen);                                  // ... allocate memory for the string
  22.   if sNameLen > 0 then
  23.     ClearStream.Read(ParamData.sName[1], sNameLen);                     //... and read the string characters separately
  24.  

Alternatively to explicitly writing/reading the stringlength and string characters you could also use the Write/ReadAnsiString method of the stream, as you already do with the slServers.
« Last Edit: June 08, 2020, 07:05:31 pm by wp »

jamie

  • Hero Member
  • *****
  • Posts: 6090
Re: TMemoryStream file format
« Reply #4 on: June 09, 2020, 01:20:49 am »
Looking at the original code he posted it does look like the poster is well aware of the pointer workings of the string.

 In short he is making a variable length record but this is going to be hard to index of course..

 He should decide on the maximum size the string will ever be and replace it with a ShortString with the max size indicated..

 sName:ShortString[20];

 That will ensure there is at least 20 bytes with a length marker at the start..

 with that, simply write the complete record out and don't worry about the string..

 when reading the record just read the complete record in.

 the correction of putting that AsnsiString at the end helps but it really does not solve the issue at hand overall..

The only true wisdom is knowing you know nothing

jcmontherock

  • Full Member
  • ***
  • Posts: 234
Re: TMemoryStream file format
« Reply #5 on: June 09, 2020, 10:25:06 am »
As I said, it works perfectly. It could be used for saving parms of an application.
My question was about the file format. It seems to be compressed because total length is less than the sum of each element. Except for the AnsiSAtring text and the BOM, all the file record text is unreadable.
Windows 11 UTF8-64 - Lazarus 3.2-64 - FPC 3.2.2

wp

  • Hero Member
  • *****
  • Posts: 11857
Re: TMemoryStream file format
« Reply #6 on: June 09, 2020, 10:40:06 am »
As I said, it works perfectly. It could be used for saving parms of an application.
My question was about the file format. It seems to be compressed because total length is less than the sum of each element. Except for the AnsiSAtring text and the BOM, all the file record text is unreadable.
"It works perfectly" and "the file record text is unreadable" are contradicting statements. I do not understand what you are saying.

Code: Pascal  [Select][+][-]
  1. type
  2.   TParmData = record
  3.     FormRect: TRect;
  4.     sName :   AnsiString;
  5.     age  :    Integer;
  6.     male :    Boolean;
  7.   end;

Calculation of the SizeOf(ParmData):
- FormRect TRect is 4 integers --> 16 bytes
- sName: is a pointer --> 4 bytes (or 8 of you have a 64-bit binary) - no matter how long the string is because the characters are not stored within the record!
- age: integer --> 4 bytes
- male: boolean: 1 byte. Since the record is not packed it will be expanded to 4 bytes
--> total: 16 + 4 + 4 + 4 = 28 for 32 bit, or 16 + 8 + 4 + 4 = 32 for 64 bit.

As already mentioned the record saved by your code does NOT contain the string characters but the value of the pointer to the characters and thus is useless.
« Last Edit: June 09, 2020, 10:42:08 am by wp »

rvk

  • Hero Member
  • *****
  • Posts: 6111
Re: TMemoryStream file format
« Reply #7 on: June 09, 2020, 10:54:40 am »
As I said, it works perfectly. It could be used for saving parms of an application.
It might work perfectly if you read the record back in the same session as you wrote the file. Because the string-pointer might still be valid. But if you end your program and read it back after restart of your program, you'll see that your string isn't there (what the others tried to explain).

jcmontherock

  • Full Member
  • ***
  • Posts: 234
Re: TMemoryStream file format
« Reply #8 on: June 09, 2020, 11:37:54 am »
It works. file is written at form close and read at show form event (2 runs). You are right: I am surprise that it works fine.
« Last Edit: June 09, 2020, 11:40:19 am by jcmontherock »
Windows 11 UTF8-64 - Lazarus 3.2-64 - FPC 3.2.2

rvk

  • Hero Member
  • *****
  • Posts: 6111
Re: TMemoryStream file format
« Reply #9 on: June 09, 2020, 11:42:01 am »
It works. file is written at form close and read at show form event.
Yes, in that case it might work.
But try to close your program, start it again and read the file. You'll see the string is garbage.

When you don't close your program, you also won't need to write the config to disk. It is still in memory. So why are you using a file???

If you need the config after you restart the program, you'll need to listen what the others are telling you. You only save the pointer to the string. After restart, that pointer isn't valid anymore and has no information.

wp

  • Hero Member
  • *****
  • Posts: 11857
Re: TMemoryStream file format
« Reply #10 on: June 09, 2020, 11:44:37 am »
It works. file is written at form close and read at show form event.
Yes, in that case it might work.
But try to close your program, start it again and read the file. You'll see the string is garbage.
Maybe one step further: close the IDE and run the program directly from the OS.
« Last Edit: June 09, 2020, 01:02:57 pm by wp »

jamie

  • Hero Member
  • *****
  • Posts: 6090
Re: TMemoryStream file format
« Reply #11 on: June 09, 2020, 01:08:34 pm »
Guys look closer. What he is doing is writing a variable sized data field.

It starts with fixed known types and is trailed with a string that ends with a bull char.

If he was to write a series of them to file you still can read them back following the bull market ect.

This has been done many times although u really need to know your code to understand it. That is the author should fully understand it.
Ps.
Bull = null . Stupid auto correct.
The only true wisdom is knowing you know nothing

rvk

  • Hero Member
  • *****
  • Posts: 6111
Re: TMemoryStream file format
« Reply #12 on: June 09, 2020, 01:21:29 pm »
It starts with fixed known types and is trailed with a string that ends with a null char.
I don't see a null character written anywhere in that code.
WriteAnsiString for the TStringList.Text has a prefix of the length. So, that way, there could be multiple records written without any problem,

Anyways... I think you are missing the point about the record-type which is saved.
It has an AnsiString (which is normally a pointer to a string).
You can't save the record like that because only the pointer gets saved. Not the actual characters.

Doing the calculations...
Quote
Disk record contains the name (150 char or 161 bytes), a text (143 char or 147 bytes) from TStrings, a TRect (16), an Integer (4) and a boolean (1).
Total = 161 + 147 + 16 + 4 + 1 = 329 bytes ??????
Record size:                     179 bytes
I don't understand these numbers. I search to find in which format, disk record was written.
Actually.
a TRect (16), a sName (pointer = 4), age (4) and male (4 because of non-packed) = 28 , on a 32 byte boundary = 32
Adding the TStringList.Text (147, which is done correctly) is a total of 179 bytes.
That's what written to disk. But the actual characters of sName are NOT written.

But the sName problem needs to be fixed (which is already discussed).

jamie

  • Hero Member
  • *****
  • Posts: 6090
Re: TMemoryStream file format
« Reply #13 on: June 09, 2020, 01:25:48 pm »
The null is already at the end of string.

Any ways this looks like a school project and should be done via text file instead.
The only true wisdom is knowing you know nothing

rvk

  • Hero Member
  • *****
  • Posts: 6111
Re: TMemoryStream file format
« Reply #14 on: June 09, 2020, 01:29:25 pm »
The null is already at the end of string.
Yes, the null is at the end of the string but is not written to disk.

Look at the implementation of
Code: Pascal  [Select][+][-]
  1. Procedure TStream.WriteAnsiString (const S : String);
  2. Var L : Longint;
  3. begin
  4.   L:=Length(S);
  5.   WriteBuffer (L,SizeOf(L));
  6.   WriteBuffer (Pointer(S)^,L);
  7. end;

So if the string is 100 characters... (without the #0) exactly 100 characters are written.
But in front of those 100 characters the length is written so TStream.ReadAnsiString can read them back.

So, although there is no null written behind the string, the string can still be read because it is prefixed by the length !!!

(But that's not the original problem of the poster.)

 

TinyPortal © 2005-2018