Recent

Author Topic: Can Absolute be used on string / packed record  (Read 6285 times)

MarkMLl

  • Hero Member
  • *****
  • Posts: 6676
Re: Can Absolute be used on string / packed record
« Reply #45 on: June 30, 2022, 12:31:05 pm »
On the contrary, there is no indication at all that Raw, H32 and H64 are overlays of each other.  Just reading the code would give the impression that they are distinct objects and, it's only after inspecting the variant definition that it becomes clear they reference one single object.

In that case use more meaningful names, as you do in your example below.

Quote
On the hand, with absolute, if you hover over either OptionalHeader, OptionalHeader32 or OptionalHeader64, you'll notice the address for all three of them is the _same_.  When debugging, you don't even need to look at the definition to see they are the same thing.  (anyone who knows the format of the PE file would have suspected that anyway.)

That's a canard, since you're talking about the behaviour of the IDE while trying to argue details of the fundamental language.

Pascal has (I believe) supported untagged variant records as a way of overlaying different data types ab initio. There is no need to augment them by pulling in "absolute", which was rooted in the CP/M and DOS era (and is obviously also applicable to embedded systems).

I do obviously concede that there are areas in (FPC's implementation of) Pascal where misapplication of type checking and conversion inside a statement causes problems (e.g. somebody's issue of a few days ago using fields and indexed properties together). The addition of yet more hacks to "absolute" is no solution.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

Warfley

  • Hero Member
  • *****
  • Posts: 1499
Re: Can Absolute be used on string / packed record
« Reply #46 on: June 30, 2022, 01:02:58 pm »
On the hand, with absolute, if you hover over either OptionalHeader, OptionalHeader32 or OptionalHeader64, you'll notice the address for all three of them is the _same_.  When debugging, you don't even need to look at the definition to see they are the same thing.  (anyone who knows the format of the PE file would have suspected that anyway.)
Most of the time writing code you won't spend in the debugger but in your code, the link should be obvious from the code, not the debugger. And if you just look at a piece of code where in one place you use OptionalHeader32 and in another place you use OptionalHeader64, you don't see that they belong together.
But if the 3 char short names are not enough for you, you can give them better names like StructureOn32Bit and StructureOn64Bit. You can give meaningful names. In fact with code completion and all the other features it doesn't even cost you more time to write it out.


And you better have good aim to hover over a 3 character identifier.
A small advice, if you have problems aiming at small pieces of code, increase the font size or slow down your cursor. I have rather bad eyes so I am using font size 12. Never had any aming problems with that.

On the contrary, as I mentioned above, you'll see the address is the same for all three of them showing they are the same thing _without_ having to even look at the definition (unlike with the variant.)
In the debugger, as stated above, with clear naming you don't even need the debugger to know this.

The real problem with that definition is that it mixes Apples and Oranges.  There should be _one_ and only _one_ definition per Windows version instead of one definition that attempts to cover all versions.

It's not horrible, it's the right way.  One definition per Windows version.  That way there can be no confusion.  The only time a definition between Windows versions should be shared is if more than one version use _exactly_ the _same_ structure.  Otherwise, there should be distinct definitions (Apples to Apples and Oranges to Oranges.)
Are these really apple and oranges? Because after counting I found that 19 of the fields are identical, while for the largest configuration (windows 8 union) only 7 fields are specific to that version. So this configuration is, compared to windows 7 about 3/4th the same and 1/4th different. Those aren't apple and oranges, those are just different flavors of apples

And that is exactly how you end up with the atrocity I posted earlier.  One small change here... one small change there... over time that mess accumulates.
Yes and with absolute its one small change here -> a new copy of the whole record with dozens of entries, another change over there, another whole copy. And not to mention that this can increase combinatorically, so if there is a new version which introduces two options for a setting, you now have 2 copies, with 3 settings you get 4 copies, while with variant records you always have exactly one variant record of a few lines.
Having 8 copys of something that is to 75% the same is not less of a mess than having one variant record for this.

The only way that could happen is if the mistake was made in a field that is shared among _all_ the different versions.  If such a mistake existed, it would have been caught when used for the first version it applied to.  IOW, extremely unlikely to spread over to other versions (but, admittedly, possible.)
Just from my personal experience, when implementing network protocols, I had this quite often that I mixed up some bit fields, or encoded the length of a field wrong, and as those protocols usually also have different versions of each header, I have often encountered the problem of fixing something in one version and not the other. These are the worst kind of bugs, because you are testing one configuration very well and then in production you find a bug which you think can't happen because you already fixed it, until you realized that there was somewhere else in a completely different line the exact same code that you forgot to fix when fixing it in another place.
And again 75% of the fields are equal across versions, so the chance that if you have an error, that it is also in 7 other places is quite high
« Last Edit: June 30, 2022, 01:20:47 pm by Warfley »

MarkMLl

  • Hero Member
  • *****
  • Posts: 6676
Re: Can Absolute be used on string / packed record
« Reply #47 on: June 30, 2022, 01:18:16 pm »
Just from my personal experience, when implementing network protocols, I had this quite often that I mixed up some bit fields, or encoded the length of a field wrong, and as those protocols usually also have different versions of each header, I have often encountered the problem of fixing something in one version and not the other.

I wrote a microkernel+network that ran in '286 protected mode or a Z80 (both bare-metal), or on a PC under DOS: this necessitated multiple compilers from (at least) two different vendors, and leaving aside any variation of word length there was an inconsistency in their bitset numbering and different directive/pragma formatting. I have also, recently, been looking at some crypto stuff which was endian-sensitive using FPC. I feel that combination allows me to speak with a level of experience.

I was able to cope with all of the above using untagged variant records and conditional compilation, with almost everything done at the point of definition (the exception being a couple of macros to sort out endianness, including "towards" which was defined as either "to" or "downto" as appropriate. And if "absolute" (or equivalent) was used, it was strictly in its original sense (i.e. "this variable represents something at a known point in the memory space"). Plus, of course, well-understood semantics for type transfers ("casts" etc.).

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

PascalDragon

  • Hero Member
  • *****
  • Posts: 5446
  • Compiler Developer
Re: Can Absolute be used on string / packed record
« Reply #48 on: June 30, 2022, 02:03:17 pm »
what find interesting is this generating an internal error!

Code: Pascal  [Select][+][-]
  1. procedure TForm1.Button1Click(Sender: TObject);
  2. Var
  3.   S:string[100];
  4.   A:Char absolute S[1];
  5. begin
  6.   //
  7. end;      
  8.  

S is a short string so it should allow this but I get an internal compiler error.

Please report as a bug with a self contained example (without LCL dependencies). Whether this should be allowed or not is up to discussion, but an internal error definitely shouldn't happen.

Done.

Thank you. Next time please also mention the number of the internal error.

alpine

  • Hero Member
  • *****
  • Posts: 1038
Re: Can Absolute be used on string / packed record
« Reply #49 on: June 30, 2022, 03:08:33 pm »
The discussion shifted from "What is a proper way to parse tabulated text data..." into "How to define in FPC a proper Windows kernel loader structure...". (proper in the eye of the certain beholder).
It can be observed, also from another topic https://forum.lazarus.freepascal.org/index.php/topic,59758.0.html, that one  wants to get involved with the ntoskernel with whatever intention it might be.
IMHO that should be left that way from a certain point on.

BTW, I remember a word from a Welsh language that I've learnt in that forum - twp.
"I'm sorry Dave, I'm afraid I can't do that."
—HAL 9000

MarkMLl

  • Hero Member
  • *****
  • Posts: 6676
Re: Can Absolute be used on string / packed record
« Reply #50 on: June 30, 2022, 03:58:18 pm »
The discussion shifted from "What is a proper way to parse tabulated text data..." into "How to define in FPC a proper Windows kernel loader structure...". (proper in the eye of the certain beholder).
It can be observed, also from another topic https://forum.lazarus.freepascal.org/index.php/topic,59758.0.html, that one  wants to get involved with the ntoskernel with whatever intention it might be.
IMHO that should be left that way from a certain point on.

BTW, I remember a word from a Welsh language that I've learnt in that forum - twp.

Interesting observation, particularly since this thread was subverted around message #15.

However is it accurate to say that data structures which use a length as their "magic number" are peculiar to the low-level NT kernel or loader, as distinct from the app-level Windows API family?

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

440bx

  • Hero Member
  • *****
  • Posts: 3944
Re: Can Absolute be used on string / packed record
« Reply #51 on: June 30, 2022, 07:48:51 pm »
That's a canard, since you're talking about the behaviour of the IDE while trying to argue details of the fundamental language.
Hardly !. 

Just because a variant can be used to implement a deficient solution (as is the case with the optional header) doesn't mean it's the way it should be done.  Let's look at a very similar case that really shows how inadequate using variants would be. Consider this:

Code: Pascal  [Select][+][-]
  1. type
  2.   PIMAGE_THUNK_DATA64 = ^TIMAGE_THUNK_DATA64;
  3.   TIMAGE_THUNK_DATA64 =  packed record
  4.     case integer of
  5.       0 : (ForwarderString : qword);    { pchar                 }
  6.       1 : (Ordinal         : qword);
  7.       2 : (AddressOfData   : qword);    { PIMAGE_IMPORT_BY_NAME }
  8.       3 : (Entry           : qword);    { for generic access    }
  9.     end;
  10.  
  11. type
  12.   PIMAGE_THUNK_DATA32 = ^TIMAGE_THUNK_DATA32;
  13.   TIMAGE_THUNK_DATA32 =  packed record
  14.     case integer of
  15.       0 : (ForwarderString : DWORD);    { pchar                 }
  16.       1 : (Ordinal         : DWORD);
  17.       2 : (AddressOfData   : DWORD);    { PIMAGE_IMPORT_BY_NAME }
  18.       3 : (Entry           : DWORD);    { for generic access    }
  19.     end;
  20.  
By the way, those are not made up structures.  They are part of the PE specification.

if someone tried to "join" these two structures into a variant, the resulting structure would obviously require superfluous "tags" (such as the "Raw", "H32", H64") which only contribute to cluttering the definitions but, it wouldn't stop there.  When inspecting that structure in the debugger (what you characterized as a "canard") there would be 8 fields, instead of 4 and, the programmer would have to mentally weed out the invalid data.  Much simpler and _cleaner_ to have the standalone definitions above and
Code: Pascal  [Select][+][-]
  1.   var
  2.     Thunk        : pointer = nil;
  3.     Thunk32      : PIMAGE_THUNK_DATA32 absolute Thunk;
  4.     Thunk64      : PIMAGE_THUNK_DATA64 absolute Thunk;
  5.  
with that definition, the debugger isn't going to show data that doesn't apply (32bit interpretations for a 64bit structure or viceversa.)

A canard ? ... hardly!

The addition of yet more hacks to "absolute" is no solution.
What I just showed above is not a hack.  It's the clean and simple way of doing it.  Using variants in that case, while not a hack, would be a very deficient implementation with undesirable consequences.



Most of the time writing code you won't spend in the debugger but in your code, the link should be obvious from the code, not the debugger. And if you just look at a piece of code where in one place you use OptionalHeader32 and in another place you use OptionalHeader64, you don't see that they belong together.
But if the 3 char short names are not enough for you, you can give them better names like StructureOn32Bit and StructureOn64Bit. You can give meaningful names. In fact with code completion and all the other features it doesn't even cost you more time to write it out.
I use a debugger not just to debug code.  I use it as a testing tool and, writing a program that is easy to debug, is _always_ one of the driving considerations in how I choose to write the code.  I likely spend about 30% of the time I spend editing, testing code in the debugger.

As far as OptionalHeader32 and OptionalHeader64 "not belonging together" as you put it, anyone who has basic knowledge of the PE format _knows_ they belong together and, while I have no problem hovering over a 3 character identifier, it is definitely much easier to place the cursor over OptionalHeader32/64 than it is to place it over "Hxx".  On that note, I sometimes name indexes using 3 characters instead of just "i" because it is easier to put the cursor over them (and also because 3 characters is usually sufficient to indicate what the index is being used for - instead of a generic "i")

A small advice, if you have problems aiming at small pieces of code, increase the font size or slow down your cursor. I have rather bad eyes so I am using font size 12. Never had any aming problems with that.
As stated above, it's definitely easier to hover over a long identifier than a short one.  What _really_ needs to be increased is not the font size but the ease of maintaining and debugging a program and, "absolute" can be very helpful in the pursuit of that goal.

In the debugger, as stated above, with clear naming you don't even need the debugger to know this.
Clear naming or otherwise, if you join those two THUNK definitions into a variant, you'll end up with a soup of data and superfluous tags to make the matter even worse.

Are these really apple and oranges? Because after counting I found that 19 of the fields are identical, while for the largest configuration (windows 8 union) only 7 fields are specific to that version. So this configuration is, compared to windows 7 about 3/4th the same and 1/4th different. Those aren't apple and oranges, those are just different flavors of apples
Yes, they are Apples and Oranges because, it's not the fields they have in common that matters is the differences between versions.  Mixing those fields that don't belong together into a variant makes it much more likely that a field that doesn't apply be used, not to mention that the fields they have in common may not be at the same offsets, which is a critical difference.  Just because the field(s) has/have  the same name, doesn't mean they belong there (because of the different offsets.)   Mixing things like that is asking for trouble and making things complicated.

Yes and with absolute its one small change here -> a new copy of the whole record with dozens of entries, another change over there, another whole copy. And not to mention that this can increase combinatorically, so if there is a new version which introduces two options for a setting, you now have 2 copies, with 3 settings you get 4 copies, while with variant records you always have exactly one variant record of a few lines.
Having 8 copys of something that is to 75% the same is not less of a mess than having one variant record for this.
Sounds like you're getting carried away on that one.  It's simple, the worst case is: one definition per Windows version and only when the definitions differ.   It is only in rare cases that there will be more than 4 or 5 definitions.

And about bugs, mixing fields that apply to one version with fields that only apply to a different version in the same structure (using variants of course) is literally asking for bugs.  With separate definitions, the compiler will flag any attempt at using a field that does not apply to the version, that alone is reason enough to have separate definitions.  It is irrelevant that 75% of the fields may be the same, what's relevant is that, in that case, there is 1 chance in 4, to use the _wrong_ field and, opening that door is simply bad design.
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

Warfley

  • Hero Member
  • *****
  • Posts: 1499
Re: Can Absolute be used on string / packed record
« Reply #52 on: June 30, 2022, 08:47:32 pm »
When inspecting that structure in the debugger (what you characterized as a "canard") there would be 8 fields, instead of 4 and, the programmer would have to mentally weed out the invalid data.  Much simpler and _cleaner_ to have the standalone definitions above and
But you see them categorized by each layout:
Code: Pascal  [Select][+][-]
  1. type
  2.   TTestA = record
  3.     A1: Integer;
  4.     A2: Double;
  5.     A3: Char;
  6.   end;
  7.  
  8.   TTestB = record
  9.     B1: Int64;
  10.     B2: Single;
  11.     B3: Char;
  12.   end;
  13.  
  14.   TTest = record
  15.   case Boolean of
  16.     True: (A: TTestA);
  17.     False: (B: TTestB);
  18.   end;
Hovering over it gives
Code: Text  [Select][+][-]
  1. t = record TTEST {
  2.   A = {
  3.     A1 = 0,
  4.     A2 = 0,
  5.     A3 = 0 #0},
  6.   B = {
  7.     B1 = 0,
  8.     B2 = 0,
  9.     B3 = 0 #0}}
So the programmer "mentally weed out the invalid data." means look at the subsection that is relevant. It is neatly ordered and makes it very easy to spot what you are looking for. But if thats to hard, simply hover over the respective member, like if I hover over t.A rather than over t:
Code: Pascal  [Select][+][-]
  1. t.A = record TTESTA {
  2.   A1 = 0,
  3.   A2 = 0,
  4.   A3 = 0 #0}
Oh look it's only the relevant information. The same information your absolute would give.
It gives you all the information in one place, and if you only need a certain subset you can easiely look at it by addressing it.

Btw, just as a side note, using a variant record for pointers to data is imho kinda pointless, you should instead of having a variant record with two different pointers, have a pointer to a variant record containing the data. This way you skip one indirection step

I use a debugger not just to debug code.  I use it as a testing tool and, writing a program that is easy to debug, is _always_ one of the driving considerations in how I choose to write the code.  I likely spend about 30% of the time I spend editing, testing code in the debugger.
I mean this is nice if it's possible, but not possible for all projects, for example, if you are on a project that takes long to compile, like 10 minutes (the delta compilation, not even the full project), you are going to think twice if you are starting a debugger session just for something small. Sure I must say that I had this quite infrequently with pascal, only when I was hacking around in the compiler/RTL it took ages to compile even for small changes, but if you have complicated buildsystems you think twice about any time you want to run it.

As far as OptionalHeader32 and OptionalHeader64 "not belonging together" as you put it, anyone who has basic knowledge of the PE format _knows_ they belong together and, while I have no problem hovering over a 3 character identifier, it is definitely much easier to place the cursor over OptionalHeader32/64 than it is to place it over "Hxx".  On that note, I sometimes name indexes using 3 characters instead of just "i" because it is easier to put the cursor over them (and also because 3 characters is usually sufficient to indicate what the index is being used for - instead of a generic "i")
I don't know what kind of cursor aiming issues you have. I have absolutely no problem hovering over a 3 char identifier. Also I love your "anyone who has basic knowledge of the PE format _knows_ they belong together", you should not write code for people who are already familiar with everything, you should write code such that it is easy to understand ieven for someone having never worked with anything similar before. Those that already know a lot won't suffer from the additional information provided by the code, but those that do not know much about it will temendously be helped by it.

Clear naming or otherwise, if you join those two THUNK definitions into a variant, you'll end up with a soup of data and superfluous tags to make the matter even worse.
I hope you don't mean with "soup of data and superfluous tags" a clearly structured data serialization as the one shown above where each member of the variant record is neatly organized, because by soup I don't think about clearly structured and logically ordered as this clearly is.

Yes, they are Apples and Oranges because, it's not the fields they have in common that matters is the differences between versions.  Mixing those fields that don't belong together into a variant makes it much more likely that a field that doesn't apply be used, not to mention that the fields they have in common may not be at the same offsets, which is a critical difference.  Just because the field(s) has/have  the same name, doesn't mean they belong there (because of the different offsets.)   Mixing things like that is asking for trouble and making things complicated.
Thats why you give your fields meaningful names and a hierachical structure using the variant records. Let's take your example, the closest thing between two configurations I found was StaticLinks for Vista and VistaStaticLinks for Win7. Sure on the face value easy to mistake them, but if you write it out (I was so free to give it some meaningful names):
Code: Pascal  [Select][+][-]
  1. OptionalHeader.PE64Header.VersionSpecifics.Vista.StaticLinks
  2. vs
  3. OptionHeader.PE64Header.VersionSpecifics.Win7.VistaStaticLinks
It's very hard to confuse, you literally have the current version you are talking about written write there in the code. So you want to know which version you are currently looking at, easy, it is a PE64 for Windows 7. All this information right there in the code, no Debugger needed. You can't get confused there

Well structured code speaks for itself

Sounds like you're getting carried away on that one.  It's simple, the worst case is: one definition per Windows version and only when the definitions differ.   It is only in rare cases that there will be more than 4 or 5 definitions.
You stated that you already have 8 versions, and from my experience, if there is 8 of something, it is probably going to be 9 at some point and then 10 etc.

And about bugs, mixing fields that apply to one version with fields that only apply to a different version in the same structure (using variants of course) is literally asking for bugs.  With separate definitions, the compiler will flag any attempt at using a field that does not apply to the version, that alone is reason enough to have separate definitions.  It is irrelevant that 75% of the fields may be the same, what's relevant is that, in that case, there is 1 chance in 4, to use the _wrong_ field and, opening that door is simply bad design.
What are you talking about? The code is clear, let's say I'm handling a Windows Vista PE and have the following code:
Code: Pascal  [Select][+][-]
  1. OptionHeader.PE64Header.VersionSpecifics.Vista.VistaStaticLinks
The compiler will say to me that there is no VistaStaticLinks in Vista, because it is defined only for Win7. You get the exact same error checking, as you are encapsulating your data and the Vista fields are only accessible through the vista member, while the Win7 fields are clearly separated in their Win7 member.

440bx

  • Hero Member
  • *****
  • Posts: 3944
Re: Can Absolute be used on string / packed record
« Reply #53 on: June 30, 2022, 09:41:08 pm »
I mean this is nice if it's possible, but not possible for all projects, for example, if you are on a project that takes long to compile, like 10 minutes (the delta compilation, not even the full project), you are going to think twice if you are starting a debugger session just for something small.

 Sure I must say that I had this quite infrequently with pascal, only when I was hacking around in the compiler/RTL it took ages to compile even for small changes, but if you have complicated buildsystems you think twice about any time you want to run it.
You must talking about a C or C++ compilation.  I don't recall exactly but, I think that even FPCUPDELUXE doesn't spend 10 minutes compiling stuff and, it's doing a lot more than just one project.

I've got one personal project that, a full build requires compiling a little over 350,000 lines and, FPC does it in less than 2 seconds. A make, is probably around 1/10 of a second (and I use a below average machine, 2.8Ghz to be exact, a far cry from the 4 to 5 ghz typical of today's machines.)

That said, I use plenty of whitespace when formatting code.  Written the way I see most code written would probably lower that count to a little under 100,000 lines but, still, it's less than 2 seconds.  (Delphi 2 compiles it in 8 tenth of a second but, it has "unfair" advantages over FPC.)

I don't know what kind of cursor aiming issues you have. I have absolutely no problem hovering over a 3 char identifier.
I have no issues hovering over a 3 character identifier except when that identifier is superfluous and shouldn't exist if things had been done right.  I have an issue with that.

Also I love your "anyone who has basic knowledge of the PE format _knows_ they belong together", you should not write code for people who are already familiar with everything, you should write code such that it is easy to understand ieven for someone having never worked with anything similar before.
I write the code for someone who may not have worked with anything similar before but, I also write the code for someone who takes the time to educate themselves on the subject (PE file format in this case) and someone who reads the code.  I don't write code for someone who expects to magically understand code for something they've never seen before and have not even taken the time to read the most basic information on the subject.


Code: Pascal  [Select][+][-]
  1. OptionalHeader.PE64Header.VersionSpecifics.Vista.StaticLinks
  2. vs
  3. OptionHeader.PE64Header.VersionSpecifics.Win7.VistaStaticLinks

It's very hard to confuse, you literally have the current version you are talking about written write there in the code.
No, it isn't because instead of writing code that is specific to one version, you're using a "one size fits all" variant to write code that is _hopefully_ correct for the specific version you're addressing at the time and, you are foregoing any help the compiler could provide identifying the use of a field that doesn't belong.  You can inadvertently type ThisThat.WindowsVista.MyField instead of ThisThat.Win7.MyField and, using variants the compiler cannot flag that mistake.   Poor design!


You stated that you already have 8 versions, and from my experience, if there is 8 of something, it is probably going to be 9 at some point and then 10 etc.
Yes, and I also said that is a rare occurrence.  Very convenient for you to omit that part.

(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

Warfley

  • Hero Member
  • *****
  • Posts: 1499
Re: Can Absolute be used on string / packed record
« Reply #54 on: June 30, 2022, 10:36:04 pm »
You must talking about a C or C++ compilation.  I don't recall exactly but, I think that even FPCUPDELUXE doesn't spend 10 minutes compiling stuff and, it's doing a lot more than just one project.

I've got one personal project that, a full build requires compiling a little over 350,000 lines and, FPC does it in less than 2 seconds. A make, is probably around 1/10 of a second (and I use a below average machine, 2.8Ghz to be exact, a far cry from the 4 to 5 ghz typical of today's machines.)

That said, I use plenty of whitespace when formatting code.  Written the way I see most code written would probably lower that count to a little under 100,000 lines but, still, it's less than 2 seconds.  (Delphi 2 compiles it in 8 tenth of a second but, it has "unfair" advantages over FPC.)
Some C++, some Java but you are right that FPC is quite fast comparatively. That said when I hacked around with the FPC and RTL itself, this took quite some time to compile. I don't know the time anymore, but it was long enough to get annyoing

No, it isn't because instead of writing code that is specific to one version, you're using a "one size fits all" variant to write code that is _hopefully_ correct for the specific version you're addressing at the time and, you are foregoing any help the compiler could provide identifying the use of a field that doesn't belong.  You can inadvertently type ThisThat.WindowsVista.MyField instead of ThisThat.Win7.MyField and, using variants the compiler cannot flag that mistake.   Poor design!
The distance between those two examples you brought up is 9, this is a giantic mistake to make, this would be like instead of writing "if ... then" to write "while ... do" (except this distance is just 7). You must not be looking at your screen to make such mistakes, at least to make them and not find them immediately. Especially, looking at this distance, you could literally have the problem that you are addressing a completely different field, in which chase the compiler also can't help you.

You are making a problem up here that doesn't exist. When you are at the point where your argument is "what if i completely mistype a whole identifier that does not even look similar to the identifier I want to use", and in that case, well better not use any identifiers at all if that is a concern you have.

So to recap, using variant records is less code, as you don't have to copy 75% of your codebase 8 times, but only write down the differences. It is therefore also less error prone because you can make every error exactly once, and not multiple times in the exact same code, and your argument on why it is bad is because that on usage if you mistakenly use completely different identifiers, this can lead to a problem... Not the strongest argument.

Yes, and I also said that is a rare occurrence.  Very convenient for you to omit that part.
I left that out because we were talking about this specific example, so what does rarety mean? If you have 8 versions you have 8 versions for the example this whole discussion revolves around. But also if its just 4-5 it would be equally bad. Having 4 times code that is to 75% equal is bad.


PS: I know that we will never agree on anything. I remember a few year ago a discussion about OOP, where I literally showed you controlled studies showing that OOP improves readability and maintainability of code, which btw is still the scientific consensus on the issue, and yet you said that you don't believe that because of your personal experience, without any evidence. I therefore know that talking with you about code quality is really pointless. You believe what you believe and no amount of arguing or even empirical measurements in controlled experiments will ever change your opinion. Therefore I will leave it at this, and just hope that I never have to use any program you have written or have to work with any of your code.

alpine

  • Hero Member
  • *****
  • Posts: 1038
Re: Can Absolute be used on string / packed record
« Reply #55 on: July 01, 2022, 12:11:36 am »
@andresayang
Encouraged by the long posts that the participants continue to make, I allowed myself to write something on the original topic. Taking into mind the original idea of dividing the record into separate fields with absolute (which turned out to be unfeasible because of the short strings, managed strings, etc.) I've managed to lay some variant records with a wisp of generic flavor, and the result is pretty much acceptable. 

Look at the attachment.

Regards,
"I'm sorry Dave, I'm afraid I can't do that."
—HAL 9000

440bx

  • Hero Member
  • *****
  • Posts: 3944
Re: Can Absolute be used on string / packed record
« Reply #56 on: July 01, 2022, 12:38:28 am »
The distance between those two examples you brought up is 9, this is a giantic mistake to make, this would be like instead of writing "if ... then" to write "while ... do"
<snip>
"what if i completely mistype a whole identifier that does not even look similar to the identifier I want to use"
sometimes I really wonder if people who make statements like that actually program.  I personally have, and have seen other programmers, type something (variable names included) that was incorrect but related in some way because of a getting their wires crossed.  The bottom line is: a well defined structure means the compiler can help catch those mental lapses (which are common in everyone!)


So to recap, using variant records is less code,
Essentially, your claiming that

Code: Pascal  [Select][+][-]
  1. var
  2.   Data: record
  3.     case Integer of
  4.       0: (OptionalHeader          : PIMAGE_OPTIONAL_HEADER);
  5.       1: (OptionalHeader32        : PIMAGE_OPTIONAL_HEADER32);
  6.       2: (OptionalHeader64        : PIMAGE_OPTIONAL_HEADER64);
  7.     end;

Code: Pascal  [Select][+][-]
  1. var
  2.   OptionalHeader          : PIMAGE_OPTIONAL_HEADER   = nil;
  3.   OptionalHeader32        : PIMAGE_OPTIONAL_HEADER32 absolute OptionalHeader;
  4.   OptionalHeader64        : PIMAGE_OPTIONAL_HEADER64 absolute OptionalHeader;
  5.  
the first structure is simpler than the second one.  It is visually obvious that is _not_ the case.




you don't have to copy 75% of your codebase 8 times, but only write down the differences.
Good rule to make a mess. Let's see how applying that rule would work.  Here is the optional header (32bit)
Code: C  [Select][+][-]
  1. typedef struct _IMAGE_OPTIONAL_HEADER {
  2.   WORD                 Magic;
  3.   BYTE                 MajorLinkerVersion;
  4.   BYTE                 MinorLinkerVersion;
  5.   DWORD                SizeOfCode;
  6.   DWORD                SizeOfInitializedData;
  7.   DWORD                SizeOfUninitializedData;
  8.   DWORD                AddressOfEntryPoint;
  9.   DWORD                BaseOfCode;
  10.   DWORD                BaseOfData;
  11.   DWORD                ImageBase;
  12.   DWORD                SectionAlignment;
  13.   DWORD                FileAlignment;
  14.   WORD                 MajorOperatingSystemVersion;
  15.   WORD                 MinorOperatingSystemVersion;
  16.   WORD                 MajorImageVersion;
  17.   WORD                 MinorImageVersion;
  18.   WORD                 MajorSubsystemVersion;
  19.   WORD                 MinorSubsystemVersion;
  20.   DWORD                Win32VersionValue;
  21.   DWORD                SizeOfImage;
  22.   DWORD                SizeOfHeaders;
  23.   DWORD                CheckSum;
  24.   WORD                 Subsystem;
  25.   WORD                 DllCharacteristics;
  26.   DWORD                SizeOfStackReserve;
  27.   DWORD                SizeOfStackCommit;
  28.   DWORD                SizeOfHeapReserve;
  29.   DWORD                SizeOfHeapCommit;
  30.   DWORD                LoaderFlags;
  31.   DWORD                NumberOfRvaAndSizes;
  32.   IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
  33. } IMAGE_OPTIONAL_HEADER32, *PIMAGE_OPTIONAL_HEADER32;

and the optional header 64bit:

Code: C  [Select][+][-]
  1. typedef struct _IMAGE_OPTIONAL_HEADER64 {
  2.  WORD        Magic;
  3.  BYTE        MajorLinkerVersion;
  4.  BYTE        MinorLinkerVersion;
  5.  DWORD       SizeOfCode;
  6.  DWORD       SizeOfInitializedData;
  7.  DWORD       SizeOfUninitializedData;
  8.  DWORD       AddressOfEntryPoint;
  9.  DWORD       BaseOfCode;
  10.  ULONGLONG   ImageBase;
  11.  DWORD       SectionAlignment;
  12.  DWORD       FileAlignment;
  13.  WORD        MajorOperatingSystemVersion;
  14.  WORD        MinorOperatingSystemVersion;
  15.  WORD        MajorImageVersion;
  16.  WORD        MinorImageVersion;
  17.  WORD        MajorSubsystemVersion;
  18.  WORD        MinorSubsystemVersion;
  19.  DWORD       Win32VersionValue;
  20.  DWORD       SizeOfImage;
  21.  DWORD       SizeOfHeaders;
  22.  DWORD       CheckSum;
  23.  WORD        Subsystem;
  24.  WORD        DllCharacteristics;
  25.  ULONGLONG   SizeOfStackReserve;
  26.  ULONGLONG   SizeOfStackCommit;
  27.  ULONGLONG   SizeOfHeapReserve;
  28.  ULONGLONG   SizeOfHeapCommit;
  29.  DWORD       LoaderFlags;
  30.  DWORD       NumberOfRvaAndSizes;
  31.  IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
  32. } IMAGE_OPTIONAL_HEADER64, *PIMAGE_OPTIONAL_HEADER64;

It is definitely possible to "join" these two structures into a single one using variants/unions (according to you, it should be done because they certainly have a lot of fields in common) but, doing that would result in a structure that is hard to understand, hard visually parse and, easy to misuse any one of the fields.  Good for MS for not falling in that trap.

It is therefore also less error prone because you can make every error exactly once, and not multiple times in the exact same code,
Looks like I need to point out that the fields those different structures have in common are handled by functions/procedures that, not only can be shared but, should be shared.  Therefore, there is no duplication of code.

your argument on why it is bad is because that on usage if you mistakenly use completely different identifiers, this can lead to a problem... Not the strongest argument.
It really seems to be a very poor choice to create structures where the compiler cannot catch such simple errors. That's what compilers are for and, one of the best features of the Pascal language, its strong typing means it can catch those small, and apparently more common than some admit or think, errors.

so what does rarety mean?
rarely means that in most cases (well over 90%) one or two structures are sufficient.  In exceptional cases more and, in very exceptional cases as many as 8 (I think that LDR structure is the only case so far where it has required that many and, that's because I wanted to cover from Windows NT 3.1 all the way up to Windows 11.)

Having 4 times code that is to 75% equal is bad.
I need to point out that having 4 versions of a data structure is _not_ the same as having 4 instances of code that are equal.  Code that handles structures that have fields whose _meaning_ are the same can easily share the code to handle them using shared functions and/or procedures.


PS: I know that we will never agree on anything.
I don't know if I'd go as far as saying never but, yes, it's quite common that we don't agree and, so far, it doesn't look like that's going to change.

I remember a few year ago a discussion about OOP, where I literally showed you controlled studies showing that OOP improves readability and maintainability of code, which btw is still the scientific consensus on the issue,
I've always been impressed with Aristotle's principles of Physics and Astronomy that were "scientifically proved" for over 2000 years.  Fortunately, Galileo, Newton and Einstein didn't see it that way but, the guys who gave us the Inquisition (what a solid endorsement but, I give them credit for using rather effective tools to "control" their "studies") certainly bought lock, stock and barrel into all that "scientifically proven" stuff.  No wonder progress is slow.
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

andresayang

  • Full Member
  • ***
  • Posts: 108
Re: Can Absolute be used on string / packed record
« Reply #57 on: July 01, 2022, 01:14:15 am »
Is that the file format? https://seg.org/Portals/0/SEG/News%20and%20Resources/Technical%20Standards/seg_sps_rev0.pdf

If you're aiming for speed, the best way to sort items incrementally is to use a heap data structure. Using TStringList requires first to add all items and then to switch to Sorted:=True. That will perform a QuickSort, which is fine, but all subsequent Inserts (in case of Sorted:=True) will need to move all of the pointers behind to make place for just one item. Thus using a heap will perform quicker - after you push all of the items then they will just come out sorted.

There is a heap implemented in FCL: TPriorityQueue<T, TCompare>. It needs to be specialized with appropriate TCompare class written, but it is not such a big deal.

Just let me know for the format and then we can figure the most efficient way to parse the files. It will be good to work just with a PChar's to avoid the String manager overhead.

Regards,

Yes this is the format: Using StringList to use customsort on time fields ... I do no use too much "Sorted:=True" as it is not really good when inserting TStrings. I do prefer to insert all my strings, and then sort once.

Quote
It will be good to work just with a PChar's to avoid the String manager overhead.

Yes, I think so. But I'm at work at the moment and no time to search. The tool I wrote take 11 seconds to process around 500 000 lines, so it is ok right now (I'll try to improve later)

Thanks
Linux, Debian 12
Lazarus: always latest release

andresayang

  • Full Member
  • ***
  • Posts: 108
Re: Can Absolute be used on string / packed record
« Reply #58 on: July 01, 2022, 02:17:50 am »
@andresayang
Encouraged by the long posts that the participants continue to make, I allowed myself to write something on the original topic. Taking into mind the original idea of dividing the record into separate fields with absolute (which turned out to be unfeasible because of the short strings, managed strings, etc.) I've managed to lay some variant records with a wisp of generic flavor, and the result is pretty much acceptable. 

Look at the attachment.

Regards,

Thanks a lot for the code.
In fact, I never guess my question will go so far and deep ...
The only think I forget at origin was that string[0] = short string length.
« Last Edit: July 01, 2022, 02:21:45 am by andresayang »
Linux, Debian 12
Lazarus: always latest release

Warfley

  • Hero Member
  • *****
  • Posts: 1499
Re: Can Absolute be used on string / packed record
« Reply #59 on: July 01, 2022, 10:06:37 am »
I've always been impressed with Aristotle's principles of Physics and Astronomy that were "scientifically proved" for over 2000 years.  Fortunately, Galileo, Newton and Einstein didn't see it that way but, the guys who gave us the Inquisition (what a solid endorsement but, I give them credit for using rather effective tools to "control" their "studies") certainly bought lock, stock and barrel into all that "scientifically proven" stuff.  No wonder progress is slow.
As I stated I am done with the arguing about variant records, as I think everything was said and we won't agree on that. Just as a side note, the scientific method, which we call science today, really only exists for 300 years by now. There where no controlled studies or something comparable before the 17th century. So there was no scientific proof for astronomy back than, because the thing we call science today did not exist before the 17th century.
That said, of course the explainations given by those papers can be wrong, but the measurements, e.g  the fact that code had less bugs when using OOP patterns rather than procedural patterns, are still empirical measurements and any new explaination needs to also explain those. So any new findings can only refine these findings, it is highly unlikely that all of those findings are wrong (5% confidence interval on 6 studies or so which I found back then is a probability of 0,0000015% that they are all wrong).
This is the same with newtonic gravety and einsteinian relativity. Sure Newton is not correct, we know that it is wrong for some cases, close to the speed of light or for massive objects, but it is still valid for anything human sized, so while not being correct, and proven to be wrong after a few hundred years, it is still correct enough to be useful.

I personally try to enhance my coding practices according to such findings, e.g. after the thread about OOP, where I found that Inheritance only improves code quality up to 4 levels of inheritance and is best at 2-3 levels, I changed my coding style to avoid any large inheritance chains if possible, where before I was very happy to use them
Recently I've found out that functional programming reduces code size of a factor of 10 and development time (including testing and bug fixing) by a factor of 5-10, so I learned a few functional languages like haskell and tried to incorporate some of the functional paradigms into my programming in other languages (e.g. in pascal with the new features like generic functions, implicit specialization and function references).

I try to always find better ways to write code and when looking at my old code on GitHub I see how much my coding style has changed over the past years. It sometimes even scares me a bit that there are people that don't even want to change their coding style and seem to be completely uninterested in what the current state of the art is.

 

TinyPortal © 2005-2018