Recent

Author Topic: Can Absolute be used on string / packed record  (Read 6295 times)

BrunoK

  • Sr. Member
  • ****
  • Posts: 452
  • Retired programmer
Re: Can Absolute be used on string / packed record
« Reply #30 on: June 29, 2022, 05:21:21 pm »
@BrunoK
Thanks for the example, but it wasn't my point, my point was it is not a typecast but type replacement.
Actually my example shows a situation where a typecast will not work, hence the usefulness and simplicity of absolute.

Warfley

  • Hero Member
  • *****
  • Posts: 1499
Re: Can Absolute be used on string / packed record
« Reply #31 on: June 29, 2022, 05:30:19 pm »
Why make it complicated ?
Is having to write less code really more complicated (the pointer variant has 2 chars less than using absolute)? Also as I stated, it is safer and doing something unusual, which can go wrong much easier, should be reflected by having more unusual code as an indicator.
If it looks like a variable, and gets assigned like a variable, one would be excused to believe it is a variable with it's own memory and not a hidden pointer to different memory. If it looks like a pointer, and gets assigned like a pointer, one directly knows that this points to somewhere else.

Absolute is nothing other than a hidden pointer, and why do you want to hide information from the programmer that is required to understand the code? I personally think code should be as easiely understandable as possible.

If I am reading code, when I see an assignment I usually think that this is a variable. But when you start using absolute, you always have to look into the declaration if it is actually a variable or a hidden pointer. This makes reading code much harder
« Last Edit: June 29, 2022, 05:32:32 pm by Warfley »

440bx

  • Hero Member
  • *****
  • Posts: 3946
Re: Can Absolute be used on string / packed record
« Reply #32 on: June 29, 2022, 05:49:22 pm »
@440bx
Haven't you heard about unions (aka variant records)? Both Mark an Warfley noted them.
Yes, I have heard of those (I think I might have used them too.)  It would quite interesting to see an attempt at parsing a structure made of variably sized fields whose size can only be determined at _runtime_ not compile time, such as null terminated strings, intermixed with fixed numeric fields, using variant records.

Show me a variant record that can represent something like, for instance, a table where the first element is a variably sized, null terminated string, followed by a DWORD, followed by another null terminated variably sized string, followed by another DWORD and the end of the table is determined only by its size in bytes. 

Absolute is nothing other than a hidden pointer, and why do you want to hide information from the programmer that is required to understand the code?
A variable name is a hidden pointer too.  A variable name simply identifies a fixed address.  A pointer identifies a variable address.  The compiler just hides the fact that a variable name is an address and, a class name is a pointer that pretends to be a variable. Many programmers seem to be rather happy with that pretense - which is why that information is hidden but, as you rightfully pointed out, often to the detriment of the programmer's understanding - all in the name of syntactic sugar - from Caramelized Apples to Caramelized pointers.
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

alpine

  • Hero Member
  • *****
  • Posts: 1038
Re: Can Absolute be used on string / packed record
« Reply #33 on: June 29, 2022, 06:13:19 pm »
@440bx
Haven't you heard about unions (aka variant records)? Both Mark an Warfley noted them.
Yes, I have heard of those (I think I might have used them too.)  It would quite interesting to see an attempt at parsing a structure made of variably sized fields whose size can only be determined at _runtime_ not compile time, such as null terminated strings, intermixed with fixed numeric fields, using variant records.

Show me a variant record that can represent something like, for instance, a table where the first element is a variably sized, null terminated string, followed by a DWORD, followed by another null terminated variably sized string, followed by another DWORD and the end of the table is determined only by its size in bytes. 
Already shown. Reply #26 of Warfley, second example.

For your example it should look like:
Code: Pascal  [Select][+][-]
  1. var
  2.   Data: record
  3.     case Integer of
  4.       0: (OptionalHeader          : PIMAGE_OPTIONAL_HEADER);
  5.       1: (OptionalHeader32        : PIMAGE_OPTIONAL_HEADER32);
  6.       2: (OptionalHeader64        : PIMAGE_OPTIONAL_HEADER64);
  7.     end;
"I'm sorry Dave, I'm afraid I can't do that."
—HAL 9000

andresayang

  • Full Member
  • ***
  • Posts: 108
Re: Can Absolute be used on string / packed record
« Reply #34 on: June 29, 2022, 06:29:50 pm »
@Bart:
Yep, this is what I realize when I was remind that short string have 1 byte more (byte 0) for length.

So do you have an optimized technique to do "column separated values" of text file or we absolutely need to use Copy(S, start, length) ?

Thanks

Why you didn'd just go with PChar's (ASCIIZ) strings and replace commas with #0? (Hope it is a single-byte encoded string)

Yes I was thinking about having a try with null terminated strings. First I have to examine how TStringList in handling strings internally ...

Well the aim of the question was to try to improve speed on one of my app, working on "SEG - SPS" files format with files containing over 400 000 lines. I have optimized part avoiding sorting by using "index arrays".

However, thanks all for your tips.

Cheers
Linux, Debian 12
Lazarus: always latest release

440bx

  • Hero Member
  • *****
  • Posts: 3946
Re: Can Absolute be used on string / packed record
« Reply #35 on: June 29, 2022, 06:57:37 pm »
For your example it should look like:
Code: Pascal  [Select][+][-]
  1. var
  2.   Data: record
  3.     case Integer of
  4.       0: (OptionalHeader          : PIMAGE_OPTIONAL_HEADER);
  5.       1: (OptionalHeader32        : PIMAGE_OPTIONAL_HEADER32);
  6.       2: (OptionalHeader64        : PIMAGE_OPTIONAL_HEADER64);
  7.     end;

looks like you're claiming that structure is simpler than
Code: Pascal  [Select][+][-]
  1. var
  2.   OptionalHeader          : PIMAGE_OPTIONAL_HEADER   = nil;
  3.   OptionalHeader32        : PIMAGE_OPTIONAL_HEADER32 absolute OptionalHeader;
  4.   OptionalHeader64        : PIMAGE_OPTIONAL_HEADER64 absolute OptionalHeader;
  5.  
Your variant record is cluttered and requires the extraneous "Data" (which doesn't contribute anything to clarity) in order to access the pointer of interest.  With absolute it's as simple as it gets.

In addition to that, that's a very simple case and, in spite of that, using a variant is not nearly as clear and easy to understand. 

On the other hand, I understand where you're coming from because I tried that and after looking at the results, I decided that a better way had to be found (which "absolute" provided.)  Using variants on something a little more complicated such as the LOADER TABLE ENTRY yields this:
Code: Pascal  [Select][+][-]
  1. type
  2.   { NOTE: the following data structure is also found on the net with the name }
  3.   {       _LDR_MODULE                                                         }
  4.  
  5.   { ALSO: this structure is also used by LdrFindEntryForAddress               }
  6.  
  7.   { the layout of the structure below is from Geoff Chappell                  }
  8.   { see: https://www.geoffchappell.com/studies/windows/km/ntoskrnl/inc/api/ntldr/ldr_data_table_entry.htm }
  9.  
  10.   { minor changes made due to Pascal not allowing unnamed unions              }
  11.  
  12.   PPLDR_DATA_TABLE_ENTRY = ^PLDR_DATA_TABLE_ENTRY;
  13.   PLDR_DATA_TABLE_ENTRY  = ^TLDR_DATA_TABLE_ENTRY;
  14.   TLDR_DATA_TABLE_ENTRY  = record                     { _LDR_DATA_TABLE_ENTRY }
  15.     {  offsets }
  16.  
  17.     { 32b  64b }
  18.  
  19.     {  0     0 }  InLoadOrderLinks            : TLIST_ENTRY;           { 3.10 and higher            }
  20.     {  8    10 }  InMemoryOrderLinks          : TLIST_ENTRY;           { 3.10 and higher            }
  21.  
  22.     { 10    20 }  LinksUnion                       : record
  23.                     case integer of
  24.                       1 : (
  25.                            InInitializationOrderLinks  : TLIST_ENTRY;  { 3.10 to 6.1                }
  26.                           );
  27.                       2 : (
  28.                            InProgressLinks             : TLIST_ENTRY;  { 6.2 and higher             }
  29.                           );
  30.                   end;
  31.  
  32.     { 18    30 }  DllBase                     : pointer          ;     { 3.10 and higher            }
  33.     { 1C    38 }  EntryPoint                  : PLDR_INIT_ROUTINE;     { 3.10 and higher            }
  34.     { 20    40 }  SizeOfImage                 : DWORD            ;     { 3.10 and higher            }
  35.     { 24    48 }  FullDllName                 : TUNICODE_STRING  ;     { 3.10 and higher            }
  36.     { 2C    58 }  BaseDllName                 : TUNICODE_STRING  ;     { 3.10 and higher            }
  37.  
  38.     { 34    68 }  FlagsUnion                  : {$ifdef FPC} bitpacked {$endif} record
  39.                     case integer of
  40.                       1 : (
  41.                            Flags                   : DWORD;            { 3.10 to 6.1                }
  42.                           );
  43.  
  44.                       2 : (
  45.                            { 6.2 and higher                                   }
  46.  
  47.                            FlagGroup               : array[0..3] of byte;
  48.                           );
  49.  
  50.                       {$ifdef FPC}
  51.                       2 : (
  52.                            { 6.2 and higher                                   }
  53.  
  54.                            PackagedBinary          : _1bit;
  55.                            MarkedForRemoval        : _1bit;
  56.                            ImageDll                : _1bit;
  57.                            LoadNotificationsSent   : _1bit;
  58.                            TelemetryEntryProcessed : _1bit;
  59.                            ProcessStaticImport     : _1bit;
  60.                            InLegacyLists           : _1bit;
  61.                            InIndexes               : _1bit;
  62.                            ShimDll                 : _1bit;
  63.                            InExceptionTable        : _1bit;
  64.                            ReservedFlags1          : _2bits;
  65.                            LoadInProgress          : _1bit;
  66.                            LoadConfigProcessed     : _1bit;
  67.                            EntryProcessed          : _1bit;
  68.                            ProtectDelayLoad        : _1bit;
  69.                            ReservedFlags3          : _2bits;
  70.                            DontCallForThreads      : _1bit;
  71.                            ProcessAttachCalled     : _1bit;
  72.                            ProcessAttachFailed     : _1bit;
  73.                            CorDeferredValidate     : _1bit;
  74.                            CorImage                : _1bit;
  75.                            DontRelocate            : _1bit;
  76.                            CorILOnly               : _1bit;
  77.                            ChpeImage               : _1bit;
  78.                            ReservedFlags5          : _2bits;
  79.                            Redirected              : _1bit;
  80.                            ReservedFlags6          : _2bits;
  81.                            CompatDatabaseProcessed : _1bit;
  82.                           );
  83.                       {$endif}
  84.                   end;
  85.  
  86.     { 38    6C }  LoadCount                   : word             ; { 3.10 to 6.1                    }
  87.                   { renamed ObsoleteLoadCount                        6.2 and higher                 }
  88.  
  89.     { 3A    6E }  TlsIndex                    : word             ; { all                            }
  90.  
  91.     { 3C    70 }  HashUnion                   : record             { 3.10 to 6.1                    }
  92.                     case integer of
  93.                       1 : (
  94.     { 3C    70 }           HashLinks              : TLIST_ENTRY  ; { 3.10 and higher  }
  95.                           );
  96.  
  97.                       2 : (                                        { 3.10 to 6.1 only }
  98.     { 3C    70 }           SectionPointer         : pointer;
  99.     { 40    78 }           CheckSum               : DWORD;
  100.                           );
  101.                   end;
  102.  
  103.  
  104.                   {---------------------------------------------------------- }
  105.                   { appended for Windows NT 4                                 }
  106.  
  107.     { 44    80 }  Nt4Union                    : record
  108.                     case integer of
  109.                       1 : (
  110.     { 44    80 }           TimeDateStamp          : DWORD;         { 4.0 and higher                 }
  111.                            {$ifdef WIN64}
  112.                            BadFood                : DWORD;
  113.                            {$endif}
  114.                           );
  115.  
  116.                           { LoadedImports field exists only up to 6.1         }
  117.  
  118.                       2 : (
  119.     { 44    80 }           LoadedImports          : pointer;       { 4.0 to 6.1                     }
  120.                           );
  121.                   end;
  122.  
  123.                   {---------------------------------------------------------- }
  124.                   { appended for Windows XP                                   }
  125.  
  126.                   { the following pointer is a PACTIVATION_CONTEXT            }
  127.  
  128.     { 48    88 }  EntryPointActivationContext : pointer;           { 5.1 and higher                 }
  129.  
  130.     { 4C    90 }  XpUnion                     : record
  131.                     case integer of
  132.                       1 : (
  133.                            PatchInformation       : pointer;       { 5.1 from Windows XP SP2 to 6.2 }
  134.                           );
  135.  
  136.                       2 : (
  137.     { 4C    90 }           Spare                  : pointer;       { 6.3 only                       }
  138.                           );
  139.  
  140.                       3 : (
  141.                            { the following pointer is a PRTL_SRWLOCK          }
  142.  
  143.     { 4C    90 }           Lock                   : pointer;       { 10.0 and higher                }
  144.                           );
  145.                   end;
  146.  
  147.  
  148.     { 50    98 }  VersionUnion                : record
  149.                     case integer of
  150.                       1 : (
  151.                            { appended for Windows Vista                       }
  152.  
  153.                            Vista                  : record
  154.     { 50    98 }             ForwarderLinks           : TLIST_ENTRY;       { 6.0 to 6.1             }
  155.     { 58    A8 }             ServiceTagLinks          : TLIST_ENTRY;       { 6.0 to 6.1             }
  156.     { 60    B8 }             StaticLinks              : TLIST_ENTRY;       { 6.0 to 6.1             }
  157.                            end;
  158.                           );
  159.  
  160.                       2 : (
  161.                            { Windows 7 fields                                 }
  162.  
  163.                            Win7                   : record
  164.                              { the Vista fields defined above (case 1)        }
  165.  
  166.     { 50    98 }             VistaForwarderLinks        : TLIST_ENTRY;     { 6.0 to 6.1             }
  167.     { 58    A8 }             VistaServiceTagLinks       : TLIST_ENTRY;     { 6.0 to 6.1             }
  168.     { 60    B8 }             VistaStaticLinks           : TLIST_ENTRY;     { 6.0 to 6.1             }
  169.  
  170.  
  171.                              { fields appended for Windows 7                  }
  172.  
  173.     { 68    C8 }             ContextInformation       : pointer;           { 6.1 only               }
  174.     { 6C    D0 }             OriginalBase             : pointer;
  175.     { 70    D8 }             LoadTime                 : TLARGE_INTEGER;    { 6.1 and higher         }
  176.                            end;
  177.                           );
  178.  
  179.                       3 : (
  180.                            { Win8 field reorganization                        }
  181.  
  182.                            Win8                   : record
  183.     { 50    98 }             DdagNode                 : PLDR_DDAG_NODE;
  184.     { 54    A0 }             NodeModuleLink           : TLIST_ENTRY;
  185.  
  186.     { 5C    B0 }             ContextUnion             : record
  187.                                case integer of
  188.                                  1 : (
  189.     { 5C    B0 }                      SnapContext         : PLDRP_DLL_SNAP_CONTEXT; { 6.2 to 6.3       }
  190.                                      );
  191.  
  192.                                  2 : (
  193.     { 5C    B0 }                      LoadContext         : PLDRP_LOAD_CONTEXT;     { Win10 and higher }
  194.                                      );
  195.                              end;
  196.  
  197.     { 60    B8 }             ParentDllBase            : pointer;                    { 6.2 and higher }
  198.     { 64    C0 }             SwitchBackContext        : pointer;                    { 6.2 and higher }
  199.     { 68    C8 }             BaseAddressIndexNode     : TRTL_BALANCED_NODE;         { 6.2 and higher }
  200.     { 74    E0 }             MappingInfoIndexNode     : TRTL_BALANCED_NODE;         { 6.2 and higher }
  201.                            end;
  202.                           );
  203.                   end;
  204.  
  205.     { 80    F8 }  OriginalBase                : pointer;                   { 6.2 and higer           }
  206.  
  207.                   {$ifdef DELPHIv2}
  208.                     ForceLoadTimeAlignment8   : DWORD;                     { Delphi v2 alignment     }
  209.                   {$endif}
  210.     { 88   100 }  LoadTime                    : TLARGE_INTEGER;            { 6.2 and higer           }
  211.  
  212.  
  213.                   {---------------------------------------------------------- }
  214.                   { appended for Windows 8                                    }
  215.  
  216.     { 90   108 }  BaseNameHashValue           : DWORD;                     { 6.2 and higer           }
  217.     { 94   10C }  LoadReason                  : TLDR_DLL_LOAD_REASON;      { 6.2 and higer           }
  218.  
  219.  
  220.                   {---------------------------------------------------------- }
  221.                   { appended for Windows 8.1                                  }
  222.  
  223.     { 98   110 }  ImplicitPathOptions         : DWORD;                     { 6.3 and higer           }
  224.  
  225.  
  226.                   {---------------------------------------------------------- }
  227.                   { appended for Windows 10                                   }
  228.  
  229.     { 9C   114 }  ReferenceCount              : DWORD;                     { 10.0 and higher         }
  230.     { A0   118 }  DependentLoadFlags          : DWORD;                     { 1607 and higher         }
  231.     { A4   11C }  SigningLevel                : byte;                      { 1703 and higher         }
  232.   end;
  233.  
That was an interesting academic exercise that offered clear visual proof that using variants was ok for simple cases but, it obviously doesn't scale at all.  Instead, have separate definitions for each version of Windows and pointers to those definitions.  Once the pointer has been chosen based on the Windows version of interest, the structure for that specific version is significantly simpler making it much easier to work with and much less likely to be a source of errors.

Using that structure is a nightmare.  I kept it as a reminder of what _not_ to do and _why_.

Just for the record, that definition is fully tested and established to be correct.  The individual definitions are a world cleaner than the variant you're looking at and, "absolute" makes their use completely transparent once the correct layout has been chosen.




(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

MarkMLl

  • Hero Member
  • *****
  • Posts: 6676
Re: Can Absolute be used on string / packed record
« Reply #36 on: June 29, 2022, 07:34:01 pm »
looks like you're claiming that structure is simpler than

Not really relevant. He's claiming that it still gives the compiler a chance to check type consistency etc. except where the programmer makes clear the rules are to be relaxed.

Look, I incur Sven's wrath on a regular basis by (inter alia) persisting in my belief that more of the common C idioms should be supported by FPC. But a suggestion that Pascal's type checking should be subverted is well beyond the pale.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

alpine

  • Hero Member
  • *****
  • Posts: 1038
Re: Can Absolute be used on string / packed record
« Reply #37 on: June 29, 2022, 08:28:17 pm »
@Bart:
Yep, this is what I realize when I was remind that short string have 1 byte more (byte 0) for length.

So do you have an optimized technique to do "column separated values" of text file or we absolutely need to use Copy(S, start, length) ?

Thanks

Why you didn'd just go with PChar's (ASCIIZ) strings and replace commas with #0? (Hope it is a single-byte encoded string)

Yes I was thinking about having a try with null terminated strings. First I have to examine how TStringList in handling strings internally ...

Well the aim of the question was to try to improve speed on one of my app, working on "SEG - SPS" files format with files containing over 400 000 lines. I have optimized part avoiding sorting by using "index arrays".

However, thanks all for your tips.

Cheers

Is that the file format? https://seg.org/Portals/0/SEG/News%20and%20Resources/Technical%20Standards/seg_sps_rev0.pdf

If you're aiming for speed, the best way to sort items incrementally is to use a heap data structure. Using TStringList requires first to add all items and then to switch to Sorted:=True. That will perform a QuickSort, which is fine, but all subsequent Inserts (in case of Sorted:=True) will need to move all of the pointers behind to make place for just one item. Thus using a heap will perform quicker - after you push all of the items then they will just come out sorted.

There is a heap implemented in FCL: TPriorityQueue<T, TCompare>. It needs to be specialized with appropriate TCompare class written, but it is not such a big deal.

Just let me know for the format and then we can figure the most efficient way to parse the files. It will be good to work just with a PChar's to avoid the String manager overhead.

Regards,
"I'm sorry Dave, I'm afraid I can't do that."
—HAL 9000

440bx

  • Hero Member
  • *****
  • Posts: 3946
Re: Can Absolute be used on string / packed record
« Reply #38 on: June 29, 2022, 08:30:46 pm »
But a suggestion that Pascal's type checking should be subverted is well beyond the pale.
But, that is a complete misconception.  Pascal's type checking isn't being subverted, on the contrary, the use of absolute is a mechanism to tell the compiler to treat whatever is at some location as type B instead of type A.   There is no type subversion.

I strongly disagree that simplicity is not relevant.  Simplicity is of paramount relevance and importance in programming.  The more succinct, expressive, accurate and precise a construct is, the better and, "absolute" has those characteristics in abundance.

That LDR record using variants I posted.  I'll _never_ use that.  It's unclear, it's imprecise, it's a jumbled soup of variables and data types. It's horrible. The individual definitions are vastly superior, clearer and much easier to use. 

That doesn't mean that variants aren't the right approach for some things but, there are plenty of situation when using "absolute" will produce significantly better - that is easier to understand and maintain - results than variants.   Variants are a reasonable choice when the differences are few, as the number of differences grows (the number of variants), they become a hairy soup.

« Last Edit: June 29, 2022, 08:32:19 pm by 440bx »
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

Warfley

  • Hero Member
  • *****
  • Posts: 1499
Re: Can Absolute be used on string / packed record
« Reply #39 on: June 29, 2022, 09:21:58 pm »
Your variant record is cluttered and requires the extraneous "Data" (which doesn't contribute anything to clarity) in order to access the pointer of interest.  With absolute it's as simple as it gets.
I would argue that the additional "data" object you have to call actually makes the code clearer, as it provides a clear link between the different representation and indicates that they are the same "data" object.

That said, of course in this example the names are chosen quite poorly, Look at it like this:
Code: Pascal  [Select][+][-]
  1.   OptionalHeader: record
  2.     case Integer of
  3.       0: (Raw : PIMAGE_OPTIONAL_HEADER);
  4.       1: (H32 : PIMAGE_OPTIONAL_HEADER32);
  5.       2: (H64 : PIMAGE_OPTIONAL_HEADER64);
  6.     end;
  7.  
  8. // usage:
  9. OptionalHeader.H32^...
This now clearly indicates that any of those are the OptionalHeader, where H32 is the 32 bit variant and H64 is the 64 bit variant. This is much clearer than having just 3 seemingly independent variables whose connectedness you can only see when looking at the declaration.

Using variants on something a little more complicated such as the LOADER TABLE ENTRY yields this:
Code: Pascal  [Select][+][-]
  1. ...
  2.  
That was an interesting academic exercise that offered clear visual proof that using variants was ok for simple cases but, it obviously doesn't scale at all.  Instead, have separate definitions for each version of Windows and pointers to those definitions.  Once the pointer has been chosen based on the Windows version of interest, the structure for that specific version is significantly simpler making it much easier to work with and much less likely to be a source of errors.

Using that structure is a nightmare.  I kept it as a reminder of what _not_ to do and _why_.

Just for the record, that definition is fully tested and established to be correct.  The individual definitions are a world cleaner than the variant you're looking at and, "absolute" makes their use completely transparent once the correct layout has been chosen.
The problem with this structure is not the use of variant records, but the fact that this is just one large pile of code. This is the data description equivalent of spaghetti code.
Whats interesting is, looking at your comments comments, you divide the record into individual bocks and give names and descriptions to each block of data. Why not simply use multiple record types if you want to name your datablocks:
Code: Pascal  [Select][+][-]
  1.   TLinkData = record
  2.     case boolean of
  3.       True: (InInitializationOrderLinks: TLIST_ENTRY); { 3.10 to 6.1 }
  4.       False: (InProgressLinks: TLIST_ENTRY);           { 6.2 and higher }
  5.   end;
  6.  
  7.   TFlagBits = bitpacked record
  8.     PackagedBinary: _1bit;
  9.     MarkedForRemoval: _1bit;
  10.     ImageDll: _1bit;
  11.     LoadNotificationsSent: _1bit;
  12.     TelemetryEntryProcessed: _1bit;
  13.     ProcessStaticImport: _1bit;
  14.     InLegacyLists: _1bit;
  15.     InIndexes: _1bit;
  16.     ShimDll: _1bit;
  17.     InExceptionTable: _1bit;
  18.     ReservedFlags1: _2bits;
  19.     LoadInProgress: _1bit;
  20.     LoadConfigProcessed: _1bit;
  21.     EntryProcessed: _1bit;
  22.     ProtectDelayLoad: _1bit;
  23.     ReservedFlags3: _2bits;
  24.     DontCallForThreads: _1bit;
  25.     ProcessAttachCalled: _1bit;
  26.     ProcessAttachFailed: _1bit;
  27.     CorDeferredValidate: _1bit;
  28.     CorImage: _1bit;
  29.     DontRelocate: _1bit;
  30.     CorILOnly: _1bit;
  31.     ChpeImage: _1bit;
  32.     ReservedFlags5: _2bits;
  33.     Redirected: _1bit;
  34.     ReservedFlags6: _2bits;
  35.     CompatDatabaseProcessed: _1bit;
  36.   end;
  37.  
  38.   TFlagData = bitpacked record
  39.     case Integer of
  40.       0: (Flags: DWord);
  41.       1: (FlagGroup: array[0..3] of byte);
  42.       2: (Bits: TFlagBits)
  43.   end;
  44.  
  45.   THashData = record
  46.     case boolean of
  47.     1 : (HashLinks: TLIST_ENTRY); { 3.10 and higher }
  48.  
  49.     2 : (SectionPointer: pointer; { 3.10 to 6.1 }
  50.          CheckSum: DWORD);
  51.   end;
  52.  
  53.     TLDR_DATA_TABLE_ENTRY  = record                     { _LDR_DATA_TABLE_ENTRY }
  54.     {  offsets }
  55.  
  56.     { 32b  64b }
  57.  
  58.     {  0     0 }  InLoadOrderLinks            : TLIST_ENTRY;           { 3.10 and higher            }
  59.     {  8    10 }  InMemoryOrderLinks          : TLIST_ENTRY;           { 3.10 and higher            }
  60.  
  61.     { 10    20 }  Links                       : TLinkData;
  62.  
  63.     { 18    30 }  DllBase                     : pointer          ;     { 3.10 and higher            }
  64.     { 1C    38 }  EntryPoint                  : PLDR_INIT_ROUTINE;     { 3.10 and higher            }
  65.     { 20    40 }  SizeOfImage                 : DWORD            ;     { 3.10 and higher            }
  66.     { 24    48 }  FullDllName                 : TUNICODE_STRING  ;     { 3.10 and higher            }
  67.     { 2C    58 }  BaseDllName                 : TUNICODE_STRING  ;     { 3.10 and higher            }
  68.  
  69.     { 34    68 }  Flags                       : TFlagData;
  70.  
  71.     { 38    6C }  LoadCount                   : word             ; { 3.10 to 6.1                    }
  72.                   { renamed ObsoleteLoadCount                        6.2 and higher                 }
  73.  
  74.     { 3A    6E }  TlsIndex                    : word             ; { all                            }
  75.  
  76.     { 3C    70 }  HashUnion                   : THashData;
  77. ...
This divides this one mega block into many smaller blocks. Especially for maintainance this can be really helpful, because let's say you find you have a problem with a specific flag, rather to search through a 200 line long data definition, you just need to look into the FlagData record and cut down search to only a few dozen lines. But more importantly it makes getting an overview much easier. So rather than having to scroll through dozens of lines of code that might not interest you, like ever single flag, in your TLDR_DATA_TABLE_ENTRY record you only see: "there are the flags from address x to y" and if you want to know what these flags are exactly, you can look at the corresponding record (also it allows debugging the flags independently as you can just create a unit test that only tests this record without having to manage the whole record)

Variant records might not be useful in the way you used them above, but thats because this is not how they are useful. If you use them together with well structured data, the massively improve the readability. In your example something like absolute would be a hotfix for a terrible data definition structure, that makes it kinda usable. But restructuring with advanced records gives you much clearer code to a degree you couldn't archive with absolute
« Last Edit: June 29, 2022, 09:25:39 pm by Warfley »

BrunoK

  • Sr. Member
  • ****
  • Posts: 452
  • Retired programmer
Re: Can Absolute be used on string / packed record
« Reply #40 on: June 29, 2022, 10:00:50 pm »
[Well the aim of the question was to try to improve speed on one of my app, working on "SEG - SPS" files format with files containing over 400 000 lines. I have optimized part avoiding sorting by using "index arrays".

However, thanks all for your tips.

Cheers
Do you mean files described in https://seg.org/Portals/0/SEG/News%20and%20Resources/Technical%20Standards/seg_sps_rev0.pdf
What file type(s) of record are you trying to process (pages in the above document) ?

440bx

  • Hero Member
  • *****
  • Posts: 3946
Re: Can Absolute be used on string / packed record
« Reply #41 on: June 29, 2022, 11:16:24 pm »
I would argue that the additional "data" object you have to call actually makes the code clearer, as it provides a clear link between the different representation and indicates that they are the same "data" object.

That said, of course in this example the names are chosen quite poorly, Look at it like this:
Code: Pascal  [Select][+][-]
  1.   OptionalHeader: record
  2.     case Integer of
  3.       0: (Raw : PIMAGE_OPTIONAL_HEADER);
  4.       1: (H32 : PIMAGE_OPTIONAL_HEADER32);
  5.       2: (H64 : PIMAGE_OPTIONAL_HEADER64);
  6.     end;
  7.  
  8. // usage:
  9. OptionalHeader.H32^...
This now clearly indicates that any of those are the OptionalHeader, where H32 is the 32 bit variant and H64 is the 64 bit variant. This is much clearer than having just 3 seemingly independent variables whose connectedness you can only see when looking at the declaration.
It takes reading the entire structure which is twice as many lines as the definition I have using absolute. In addition to that, the identifier OptionalHeader is not the real optional header, it's a structure that doesn't exist in Windows.  That causes problems which I will show next. 

The line:
Code: Pascal  [Select][+][-]
  1. OptionalHeader.H32^...
is a source of confusion because the real optional header has no "H32" field.  This immediately brings up the doubt "what's this extraneous H32 thing ?" this in turn means the programmer has to go read the definition of "OptionalHeader" to find out that it is _not_ the Windows structure.  That problem is _not_ present using absolute.

On the other hand, when a programmer sees:
Code: Pascal  [Select][+][-]
  1. OptionalHeader32^.NumberOfRvaAndSizes
that's _exactly_ what is expected, a dereferenced optional header pointer accessing a _real_ optional header field instead of some "H32" thing.  Additonally, the name "OptionalHeader32" makes it crystal clear that it is a pointer to the 32bit version of the optional header. No "H32, H64, Raw" needed (nor wanted for that matter.)

But... wait... there is more... the problems with that variant don't stop there.  During debugging, if the programmer hovers the cursor over "OptionalHeader", he/she will see 3 fields, namely Raw, H32 and H64 instead of fields that are in the optional header.

since I use the definition I posted in my code, I can show exactly how much cleaner the code is using absolute. 
Code: Pascal  [Select][+][-]
  1.   case OptionalHeader^.Magic of
  2.     IMAGE_NT_OPTIONAL_HDR32_MAGIC :
  3.     begin
  4.       pe_Bitness       := _eb_x32;
  5.       OptionalHeader32 := PIMAGE_OPTIONAL_HEADER32(OptionalHeader);
  6.  
  7.       pe_ImageBase     := OptionalHeader32^.ImageBase;
  8.  
  9.       DataDirectory           := pointer(@OptionalHeader32^.DataDirectory);
  10.       DataDirectoryEntryCount := OptionalHeader32^.NumberOfRvaAndSizes;
  11.     end;
  12.  
  13.     IMAGE_NT_OPTIONAL_HDR64_MAGIC :
  14.     begin
  15.       pe_Bitness       := _eb_x64;
  16.       OptionalHeader64 := PIMAGE_OPTIONAL_HEADER64(OptionalHeader);
  17.  
  18.       pe_ImageBase     := OptionalHeader64^.ImageBase;
  19.  
  20.       DataDirectory           := pointer(@OptionalHeader64^.DataDirectory);
  21.       DataDirectoryEntryCount := OptionalHeader64^.NumberOfRvaAndSizes;
  22.     end;
  23.   else
  24.     Newline();
  25.     OutputToDeviceA(FUNCTION_NAME, 'the PE file is neither 64bit nor 32bit');
  26.     Newline();
  27.  
  28.     exit;                                                           //   exit;
  29.   end;
  30.  
The explicit typecasts make it crystal clear how OptionalHeader will be interpreted and, second, if the cursor hovers over either OptionalHeader variable (not the 3 letter abbreviation but, the actual variable name) the debugger will display, surprise!, the fields that are part of the Optional Header, not "H32, H64 and Raw", you get _real_ fields and you don't have to move the cursor to three letter abbreviations to see what you really want to see, the optional header fields.  Just for fun, I attached a screenshot obtained from hovering over the _variable_ name (not some 3 letter abbreviation.)

The problem with this structure is not the use of variant records, but the fact that this is just one large pile of code. This is the data description equivalent of spaghetti code.
That's what having too many variants in a definition leads to.  Variants should be few, otherwise it becomes a mess.  In addition to that, for untagged variants, the debugger shows the data as interpreted for each variant, therefore, if there are 5 variants, the debugger will show 5 times the same information using the 5 different interpretations.  That problem does _not_ occur with absolute, you get one and only one interpretation and, lo and behold, it's always the one you want.

I want to make something very clear again, the definition of the structure I posted earlier is trash, totally unusable and, its only remedy is to never use it but, it is a good example of what having too many variants leads to.

On the other hand, using absolute, that structure can be split into 8 (estimate) definitions, each specific to the version of Windows to which it applies and at run time determine which one is the one that should be used, just as the example I've shown using the Optional Header and, end up with squeeky clean code that programmers and debugger alike can use.
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

jamie

  • Hero Member
  • *****
  • Posts: 6091
Re: Can Absolute be used on string / packed record
« Reply #42 on: June 29, 2022, 11:23:38 pm »
what find interesting is this generating an internal error!

Code: Pascal  [Select][+][-]
  1. procedure TForm1.Button1Click(Sender: TObject);
  2. Var
  3.   S:string[100];
  4.   A:Char absolute S[1];
  5. begin
  6.   //
  7. end;      
  8.  

S is a short string so it should allow this but I get an internal compiler error.

Please report as a bug with a self contained example (without LCL dependencies). Whether this should be allowed or not is up to discussion, but an internal error definitely shouldn't happen.

Done.
The only true wisdom is knowing you know nothing

Warfley

  • Hero Member
  • *****
  • Posts: 1499
Re: Can Absolute be used on string / packed record
« Reply #43 on: June 30, 2022, 10:10:33 am »
The line:
Code: Pascal  [Select][+][-]
  1. OptionalHeader.H32^...
is a source of confusion because the real optional header has no "H32" field.  This immediately brings up the doubt "what's this extraneous H32 thing ?" this in turn means the programmer has to go read the definition of "OptionalHeader" to find out that it is _not_ the Windows structure.  That problem is _not_ present using absolute.

On the other hand, when a programmer sees:
Code: Pascal  [Select][+][-]
  1. OptionalHeader32^.NumberOfRvaAndSizes
that's _exactly_ what is expected, a dereferenced optional header pointer accessing a _real_ optional header field instead of some "H32" thing.  Additonally, the name "OptionalHeader32" makes it crystal clear that it is a pointer to the 32bit version of the optional header. No "H32, H64, Raw" needed (nor wanted for that matter.)
Well most pascal programmers know that .XXX not necessarily references a discrete field. It can be methods or properties, or in the case of variant record a different memory layout of the fields.
If you use two different variables with absolute, and you look at the code and see that there OptionalHeader32 is used, and then a few lines later OptionalHeader64 is used, you would assume that these are two different objects, not relate to the same object. The variant record through different members of the same object make this clear.

But... wait... there is more... the problems with that variant don't stop there.  During debugging, if the programmer hovers the cursor over "OptionalHeader", he/she will see 3 fields, namely Raw, H32 and H64 instead of fields that are in the optional header.
And if you hover over those you see the actual fields. While if you hover over your OptionalHeader32, you won't even know that there exists a 64 option. It literally gives you more information, how can this be bad?

That's what having too many variants in a definition leads to.  Variants should be few, otherwise it becomes a mess.  In addition to that, for untagged variants, the debugger shows the data as interpreted for each variant, therefore, if there are 5 variants, the debugger will show 5 times the same information using the 5 different interpretations.  That problem does _not_ occur with absolute, you get one and only one interpretation and, lo and behold, it's always the one you want.

I want to make something very clear again, the definition of the structure I posted earlier is trash, totally unusable and, its only remedy is to never use it but, it is a good example of what having too many variants leads to.

On the other hand, using absolute, that structure can be split into 8 (estimate) definitions, each specific to the version of Windows to which it applies and at run time determine which one is the one that should be used, just as the example I've shown using the Optional Header and, end up with squeeky clean code that programmers and debugger alike can use.
Thats why you should split it up into multiple smaller records. All the problems you have with that is due to the fact that you use anonymous variant records within one large definition. It's like if you remove your car seats and then claim that driving is less comfortable than walking because of all the problems you got through having to stand in your car.
Having 8 definition sounds really horrible. Let's say a new windows version with exactly one of the fields got an update, now you need 9 defintions for the change of one line. With variant records you just need to add exactly one case in the variant record. If you realize you made a mistake in the definition (e.g. you mixed up the order of a flag or forgot a flag), then you need to change this at 8 places in the code, with variant records just at one place.
The fact that it shows you all the overlapping fields not just the relevant fields is because you don't use records for encapsulation. Take my example with flags above:
Code: Pascal  [Select][+][-]
  1.   TFlagData = bitpacked record
  2.     case Integer of
  3.       0: (Flags: DWord);
  4.       1: (FlagGroup: array[0..3] of byte);
  5.       2: (Bits: TFlagBits)
  6.   end;
When you hover over it, you have either the Flags, FlagGroups or the Bits as different representations. When you then hover over the Bits you see the the different bits, but as this is encapsulated to it's own representation you don't see Flags and FlagGroup anymore.

This is like the most basic of structured programming, put things that logically belong together in a discrete named block and just combine these blocks. On a control flow level this is the difference between procedural code and spaghetti code, and with data this is the difference between using many named records versus one big unnamed composition.

The problems you have with variant records is because your example is simply bad code, and yes bad code with variants is still bad code. But your if your "solution" to this is to have 8 definitions of mostly the same thing only different in a few places, instead of having one flexible one, you are doing something wrong.

440bx

  • Hero Member
  • *****
  • Posts: 3946
Re: Can Absolute be used on string / packed record
« Reply #44 on: June 30, 2022, 11:32:04 am »
Well most pascal programmers know that .XXX not necessarily references a discrete field. It can be methods or properties, or in the case of variant record a different memory layout of the fields.
If you use two different variables with absolute, and you look at the code and see that there OptionalHeader32 is used, and then a few lines later OptionalHeader64 is used, you would assume that these are two different objects, not relate to the same object. The variant record through different members of the same object make this clear.
On the contrary, there is no indication at all that Raw, H32 and H64 are overlays of each other.  Just reading the code would give the impression that they are distinct objects and, it's only after inspecting the variant definition that it becomes clear they reference one single object.

On the hand, with absolute, if you hover over either OptionalHeader, OptionalHeader32 or OptionalHeader64, you'll notice the address for all three of them is the _same_.  When debugging, you don't even need to look at the definition to see they are the same thing.  (anyone who knows the format of the PE file would have suspected that anyway.)


And if you hover over those you see the actual fields.
And you better have good aim to hover over a 3 character identifier.

While if you hover over your OptionalHeader32, you won't even know that there exists a 64 option. It literally gives you more information, how can this be bad?
On the contrary, as I mentioned above, you'll see the address is the same for all three of them showing they are the same thing _without_ having to even look at the definition (unlike with the variant.)

Thats why you should split it up into multiple smaller records. All the problems you have with that is due to the fact that you use anonymous variant records within one large definition.
The real problem with that definition is that it mixes Apples and Oranges.  There should be _one_ and only _one_ definition per Windows version instead of one definition that attempts to cover all versions.

Having 8 definition sounds really horrible. Let's say a new windows version with exactly one of the fields got an update, now you need 9 defintions for the change of one line.
It's not horrible, it's the right way.  One definition per Windows version.  That way there can be no confusion.  The only time a definition between Windows versions should be shared is if more than one version use _exactly_ the _same_ structure.  Otherwise, there should be distinct definitions (Apples to Apples and Oranges to Oranges.)

With variant records you just need to add exactly one case in the variant record.
And that is exactly how you end up with the atrocity I posted earlier.  One small change here... one small change there... over time that mess accumulates.

If you realize you made a mistake in the definition (e.g. you mixed up the order of a flag or forgot a flag), then you need to change this at 8 places in the code, with variant records just at one place.
The only way that could happen is if the mistake was made in a field that is shared among _all_ the different versions.  If such a mistake existed, it would have been caught when used for the first version it applied to.  IOW, extremely unlikely to spread over to other versions (but, admittedly, possible.)

This is like the most basic of structured programming, put things that logically belong together in a discrete named block and just combine these blocks.
I agree with that and, another thing that is basic programming is, keep things that are separate, _separate_.  IOW, don't mix Apples and Oranges.
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

 

TinyPortal © 2005-2018