Recent

Author Topic: OS-es, obfuscation and strings  (Read 2410 times)

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 635
OS-es, obfuscation and strings
« on: March 11, 2019, 01:27:43 am »
Operating systems are written in C. Well, long ago, they were written in assembler. But after that, C.

Many people think that's stupid, because, obviously C++ is a better choice. It's newer. It's designed to be a better C. So, it would be better to use that. Right?

Eh, no. C++ requires things like a memory manager and scheduler to be able to function. Or, in other words: an OS. And C++ is very broken.

There are very many courses, seminars and trainings that help you navigate the vast amount of programming pitfalls the average programmer encounters daily. And they are all about C++. Because other programming languages rarely suffer from those problems.

There are many problems with C++. But the main ones are: speed over all, templates, void pointers, no finalizers, no modules and strings.

Speed: If anything that would make programming better and safer takes longer to execute, it is automatically disregarded. Programmers are expected to know in detail how everything works. The compiler isn't going to help you.

Templates: They can redefine everything. So, some random line of code, somewhere in the project, can change what any keyword or variable does. So, you have no idea what your code is going to do in a large project. And this is regarded one of the best features of C++!

Void pointers: There are very many ways in C++ to address things. Like, if you want to change the value of a variable, you can use the variable name and have it dereferenced automatically, you can use an explicit reference, or simply put an asterix in front. There are more, like arrays. And while they all seem to do the same thing, they don't. You have to use the same pattern that was used to declare the variable. Often, the only way to do that is to cast it to a generic ("void") pointer. And that throws away all the type information. The compiler cannot check the correctness.

Finalizers: In Free Pascal, you have try..finally. Which will execute the code after "finally" no matter what (well, unless your app is already killed by the OS). And that's why destructors work. You are assured that they will run. Not so in C++. It's a minefield. Stuff might get freed, depending. Lots of papers have been written about this.

Modules: In C++, all your code is build as a single unit. You can compile parts, if you supply the object file and a header file. And the right versions of all the parts. That's why it is very hard to use any downloaded code. It probably won't work, because there's a discrepancy in one of the hundreds of header files used. Which you cannot fix, because your version is slightly different. And because it evolved from a C pre-processor, all the names are mangled. That's why we have namespaces, instead of modules. And namespaces can be distributed throughout the source code. If there are a hundred people working on a project, they can all be maintaining part of all of those namespaces...

Strings: Or, dynamic arrays in general. This should be the big showstopper. In C and C++, most arrays are unlimited in size, and end when an item has value 0. So, they never know how long an array is. To find out, you have to read all the items until you encounter one that is 0. This is what most malware uses to penetrate, and why C and C++ software is so very buggy.

So, why is C++ used so much, if it is so bad? Well, it was seen as an upgrade to C, the programming language most programmers used at that time, and so it is what most senior programmers have used most of their life. It's what they're familiar with. To them, it is how it should be.


So, why is it that most programming languages use the C syntax, instead of something verbose, like Pascal or Fortran? Two reasons. First, it seems that there are less characters to type in C: "{" instead of "begin", for example. That isn't really true, because you have to specify more meta-data. And you have to provide much more comments, because C and C++ are hard to read and understand.

But, the cryptic nature is seen as a good thing. It makes your job sound much better, if it looks like incomprehensible mumbo-jumbo. Also, getting a job is easy, but to keep it you have to make yourself indispensable. Or, at least, that's a common rhetoric. And the best way to do that, is to make sure nobody understands your code.

This is also reinforced by the need for speed. No matter that the speed of 99% of your code is irrelevant and barely measurable, many (C and C++) programmers feel the need to make sure that every bit of their code is smoking. They don't trust the compiler to get it right, which is probably a good thing because even compilers find C++ really hard to comprehend. Writing your code in a totally incomprehensible way, but which might be a bit faster is seen as mastering your profession.


So, what does this have to do with Free Pascal? Simple. It doesn't have those problems. If you rewrote Linux in it, it would be nearly bullet-proof. But them again, no OSes are written in it, and you don't need to create a free pascal compiler to roll out your tool chain. Because that is written in C and C++.

jamie

  • Hero Member
  • *****
  • Posts: 2072
Re: OS-es, obfuscation and strings
« Reply #1 on: March 11, 2019, 02:53:27 am »
I believe long ago WINDOWS was written in Pascal based language.
Number 1 at blue screen app creations!

lucamar

  • Hero Member
  • *****
  • Posts: 2076
Re: OS-es, obfuscation and strings
« Reply #2 on: March 11, 2019, 03:03:46 am »
I believe long ago WINDOWS was written in Pascal based language.

Windows? No: MS's Pascal was never that good. But great swaths of the first versions of MacOS? Damn yeah :)
« Last Edit: March 11, 2019, 03:05:28 am by lucamar »
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus 2.0.2/2.0.4  - FPC 3.0.4 on:
(K|L)Ubuntu 12..16, Windows XP SP3, various DOSes.

440bx

  • Hero Member
  • *****
  • Posts: 1197
Re: OS-es, obfuscation and strings
« Reply #3 on: March 11, 2019, 07:25:12 am »
Eh, no. C++ requires things like a memory manager and scheduler to be able to function. Or, in other words: an OS. And C++ is very broken.
No, that simply isn't the case.  C++ can be used as a strongly typed C.  There is nothing in the language that forces the programmer to use C++ facilities he/she doesn't want.

There are very many courses, seminars and trainings that help you navigate the vast amount of programming pitfalls the average programmer encounters daily. And they are all about C++. Because other programming languages rarely suffer from those problems.
I have to admit you are right there.  Both C and C++ are filled with "subtleties" (commonly referred to as "implementation dependent" constructs) that can bite a programmer.

There are many problems with C++. But the main ones are: speed over all, templates, void pointers, no finalizers, no modules and strings.
The lack of modularity is a problem.  The rest is a problem because the programmers using it either don't know how to use the features or can't help themselves from showing off.

Speed: If anything that would make programming better and safer takes longer to execute, it is automatically disregarded. Programmers are expected to know in detail how everything works. The compiler isn't going to help you.
The speed problem isn't the compiler's problem, it's a programmer problem, can't fault the language for that but, that said, C/C++ do encourage poor programming (hard to maintain code.)

Programmers should be expected to know in detail how everything works.  When I get on an airplane I expect the pilot to know how everything works.  Programmers should be held to the same standards as in any other profession: know your stuff. Period.  No excuses.

Templates: They can redefine everything. So, some random line of code, somewhere in the project, can change what any keyword or variable does. So, you have no idea what your code is going to do in a large project. And this is regarded one of the best features of C++!
I don't think templates are that great of a feature but, C++ programmers are far from being the only ones addicted to them.  They do tend to be misused and abused.  That's more a programmer flaw than a language flaw.  If someone uses a toothpick to blow their nose, the poor judgment didn't come from the toothpick.

Void pointers: There are very many ways in C++ to address things. Like, if you want to change the value of a variable, you can use the variable name and have it dereferenced automatically, you can use an explicit reference, or simply put an asterix in front. There are more, like arrays. And while they all seem to do the same thing, they don't. You have to use the same pattern that was used to declare the variable. Often, the only way to do that is to cast it to a generic ("void") pointer. And that throws away all the type information. The compiler cannot check the correctness.
void pointers are not a problem.  A void pointer is just a way of declaring the variable contains an address.  Automatic dereference, you can blame the programmers that are addicted to syntactic sugar for that and, there are quite a few of those.

Finalizers: In Free Pascal, you have try..finally. Which will execute the code after "finally" no matter what (well, unless your app is already killed by the OS). And that's why destructors work. You are assured that they will run. Not so in C++. It's a minefield. Stuff might get freed, depending. Lots of papers have been written about this.
Finalizers didn't exist for decades, yet programmers managed to create software without them.   

Modules: In C++, all your code is build as a single unit. You can compile parts, if you supply the object file and a header file. And the right versions of all the parts. That's why it is very hard to use any downloaded code. It probably won't work, because there's a discrepancy in one of the hundreds of header files used. Which you cannot fix, because your version is slightly different. And because it evolved from a C pre-processor, all the names are mangled. That's why we have namespaces, instead of modules. And namespaces can be distributed throughout the source code. If there are a hundred people working on a project, they can all be maintaining part of all of those namespaces...
Modularization has never been one of the best features of C and C++ inherited that from C because it tries hard to be very compatible with C.  The name mangling is just a hack to avoid losing parameter information.  Not exactly elegant. 

Strings: Or, dynamic arrays in general. This should be the big showstopper. In C and C++, most arrays are unlimited in size, and end when an item has value 0. So, they never know how long an array is. To find out, you have to read all the items until you encounter one that is 0. This is what most malware uses to penetrate, and why C and C++ software is so very buggy.
What you state there applies mostly to arrays of characters not to all arrays as your statement implies.

So, why is C++ used so much, if it is so bad? Well, it was seen as an upgrade to C, the programming language most programmers used at that time, and so it is what most senior programmers have used most of their life. It's what they're familiar with. To them, it is how it should be.
One reason is because C++ catches a lot more errors at compile time than a regular C compiler.  That is very nice feature of C++ and, it doesn't "assume", if the programmer failed to specify something, C++ will output an error or at the very least a warning. 

So, why is it that most programming languages use the C syntax, instead of something verbose, like Pascal or Fortran? Two reasons. First, it seems that there are less characters to type in C: "{" instead of "begin", for example. That isn't really true, because you have to specify more meta-data. And you have to provide much more comments, because C and C++ are hard to read and understand.
I believe you have a valid point there, except for the comments part.  C/C++ programmers are notorious for not adequately commenting their code and producing verbal cubism when it comes to naming variables.

But, the cryptic nature is seen as a good thing. It makes your job sound much better, if it looks like incomprehensible mumbo-jumbo. Also, getting a job is easy, but to keep it you have to make yourself indispensable. Or, at least, that's a common rhetoric. And the best way to do that, is to make sure nobody understands your code.
You have a point there. 

This is also reinforced by the need for speed. No matter that the speed of 99% of your code is irrelevant and barely measurable, many (C and C++) programmers feel the need to make sure that every bit of their code is smoking. They don't trust the compiler to get it right, which is probably a good thing because even compilers find C++ really hard to comprehend. Writing your code in a totally incomprehensible way, but which might be a bit faster is seen as mastering your profession.
a responsive program that works as it should is usually a pleasure to use, therefore attention to speed isn't that bad.  The kind of attention it gets is often where the problem resides.  When speed is obtained at the expense of maintainability then it can quickly become a problem and, it's true that too many C/C++ programmers seen to have been bitten by the speed over maintainability bug.

So, what does this have to do with Free Pascal? Simple. It doesn't have those problems. If you rewrote Linux in it, it would be nearly bullet-proof. But them again, no OSes are written in it, and you don't need to create a free pascal compiler to roll out your tool chain. Because that is written in C and C++.
O/S code must be fast.  That's one of the very few areas where speed is occasionally more important than simplicity (which implies maintainability.)  The compiler has to be able to generate solid optimized code and FPC could definitely be improved in that area.

FPC is good and, with some effort it could probably be used to write an O/S but, there would be a fair number of bumps on the road to getting there.
using FPC v3.0.4 and Lazarus 1.8.2 on Windows 7 64bit.

Thaddy

  • Hero Member
  • *****
  • Posts: 9167
Re: OS-es, obfuscation and strings
« Reply #4 on: March 11, 2019, 08:57:28 am »
FPC is good and, with some effort it could probably be used to write an O/S but, there would be a fair number of bumps on the road to getting there.
I agree with almost anything except the above: Pascal is and was widely used to write OS's including of course embedded systems. Those OS's are not as mainstream like they used to, but industrial OS code is still often Pascal code.
I can't see the bumps....I have even seen RTOS systems fully written in Pascal (not object pascal) and FPC supports this and is fast enough. ('fast' is a speed-trap that can go for any high-level language implementation... :D )
A computer scientist will not tie a high-level language with speed at all: such correlation is simply not there -spurious, if you want-. It is the implementation, which is a wholly different thing.
« Last Edit: March 11, 2019, 09:02:01 am by Thaddy »
also related to equus asinus.

440bx

  • Hero Member
  • *****
  • Posts: 1197
Re: OS-es, obfuscation and strings
« Reply #5 on: March 11, 2019, 09:38:34 am »
FPC is good and, with some effort it could probably be used to write an O/S but, there would be a fair number of bumps on the road to getting there.
Pascal is and was widely used to write OS's including of course embedded systems. Those OS's are not as mainstream like they used to, but industrial OS code is still often Pascal code.
I can't see the bumps....I have even seen RTOS systems fully written in Pascal (not object pascal) and FPC supports this and is fast enough. ('fast' is a speed-trap that can go for any high-level language implementation... :D )
A computer scientist will not tie a high-level language with speed at all: such correlation is simply not there -spurious, if you want-. It is the implementation, which is a wholly different thing.
I wasn't talking about Pascal, I mentioned FPC specifically.

Among the bumps is the lack of alignment control in FPC.  For instance, it is not possible to convince FPC to align null terminated character arrays on byte boundaries even though the documentation clearly states that is the alignment the compiler is supposed to use by default, yet it doesn't.

There are a few things where FPC simply cheats.  One such case is when you tell FPC to export a function or procedure by ordinal only, it won't do it.  What it does instead is create a null name.  The problem with that is, the export table ends up having duplicate names (nulls/nils) for different functions/procedures and the count of ordinals always equals the count of named functions (those counts should not be the same when a function or procedure is exported by ordinal only.) To be fair FPC inherited that from Delphi (Delphi does the same thing.)

Another one: it is not possible in FPC to declare an array of pointer constants that point to arrays of character constants.  The pointers are always considered variables and because of that cannot be used in other structures at compile time.  This is another thing FPC inherited from Delphi.

FPC's flow control analyzer is flawed.  Sometimes it concludes some section of code is unreachable when it is not.  The worse part is, because it believes the code is unreachable, it doesn't generate code for those statements.  I reported that problem and that is the one bug report that got _deleted_ (not only it didn't get fixed, it got deleted... beautiful !!).  I guess whoever coded that got sensitive about it... chuckle.

Those are just some of those that routinely get in my way at one time or another and, I'm not writing an O/S.

ETA:

Correction: the bug report didn't get deleted.  It got closed and the bug tracker doesn't include closed reports by default.


« Last Edit: March 12, 2019, 03:51:57 pm by 440bx »
using FPC v3.0.4 and Lazarus 1.8.2 on Windows 7 64bit.

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 7498
Re: OS-es, obfuscation and strings
« Reply #6 on: March 11, 2019, 09:50:25 am »
I believe long ago WINDOWS was written in Pascal based language.

Windows? No: MS's Pascal was never that good. But great swaths of the first versions of MacOS? Damn yeah :)

Afaik Windows 1.0 was indeed in Pascal. But it was more a dosshell/filemanager than an OS.

Thaddy

  • Hero Member
  • *****
  • Posts: 9167
Re: OS-es, obfuscation and strings
« Reply #7 on: March 11, 2019, 10:10:42 am »
I believe long ago WINDOWS was written in Pascal based language.

Windows? No: MS's Pascal was never that good. But great swaths of the first versions of MacOS? Damn yeah :)

Afaik Windows 1.0 was indeed in Pascal. But it was more a dosshell/filemanager than an OS.
Correct. Remember Wingdings fonts? That's all WIN 1 and 2 GUI ("stolen" from GEM, where's my C64?)
Win 1.0, MacOS (and Lisa) are all Pascal.
Note FPC is probably a much better compiler than were available at the time. And perfectly OK to write -academic if you will - OS's.
Aside: since OS programming relies heavily on basic tree's AKI proved at least to a certain extend that FPC programmed trees are very fast indeed. (note what I wrote about "fast" before....)
Aside2: I do not understand the alignment issues 440bx has: on such a low-level alignment is done by hand. Even in e.g. the Linux kernel. It is language agnostic, not programmer agnostic.

But I don't believe he has any formal computer science study on some level like a B.sc or a M.sc.. (I may be wrong....) He is just very interested and partly knowledgeable in how things get done.(which is good!)
« Last Edit: March 11, 2019, 10:14:40 am by Thaddy »
also related to equus asinus.

PascalDragon

  • Hero Member
  • *****
  • Posts: 668
  • Compiler Developer
Re: OS-es, obfuscation and strings
« Reply #8 on: March 11, 2019, 12:12:05 pm »
Operating systems are written in C. Well, long ago, they were written in assembler. But after that, C.

Many people think that's stupid, because, obviously C++ is a better choice. It's newer. It's designed to be a better C. So, it would be better to use that. Right?

Eh, no. C++ requires things like a memory manager and scheduler to be able to function. Or, in other words: an OS. And C++ is very broken.
Strange that the operating system we're using at work is written in C++...  :-[

Cyrax

  • Hero Member
  • *****
  • Posts: 758
Re: OS-es, obfuscation and strings
« Reply #9 on: March 11, 2019, 12:36:38 pm »
...
FPC's flow control analyzer is flawed.  Sometimes it concludes some section of code is unreachable when it is not.  The worse part is, because it believes the code is unreachable, it doesn't generate code for those statements.  I reported that problem and that is the one bug report that got _deleted_ (not only it didn't get fixed, it got deleted... beautiful !!).  I guess whoever coded that got sensitive about it... chuckle.
...

You mean this report? https://bugs.freepascal.org/view.php?id=34140

Thaddy

  • Hero Member
  • *****
  • Posts: 9167
Re: OS-es, obfuscation and strings
« Reply #10 on: March 11, 2019, 02:01:59 pm »
Strange that the operating system we're using at work is written in C++...  :-[
I am more amazed about the fact that people keep perceptions, even if they are proven to be not true.
Who cares? If it is well written? Use it... %) %) :-[ O:-) O:-) 8-)
also related to equus asinus.

440bx

  • Hero Member
  • *****
  • Posts: 1197
Re: OS-es, obfuscation and strings
« Reply #11 on: March 11, 2019, 03:48:12 pm »
You mean this report? https://bugs.freepascal.org/view.php?id=34140
Yes, that's the one I was thinking about.  I believed it got deleted because one day it stopped showing in my list of "Reported by Me" and even now, it doesn't show.  I looked for that bug report and couldn't find it.

Now, I'm really curious, how did you get to it ?
using FPC v3.0.4 and Lazarus 1.8.2 on Windows 7 64bit.

Thaddy

  • Hero Member
  • *****
  • Posts: 9167
Re: OS-es, obfuscation and strings
« Reply #12 on: March 11, 2019, 03:55:12 pm »
Well... (Don't laugh) https://bugs.freepascal.org/view.php?id=34140

Searching the bug tracker is a pain, that's true.
Your own reports are easy to follow and never disappear when logged in.
« Last Edit: March 11, 2019, 03:57:04 pm by Thaddy »
also related to equus asinus.

440bx

  • Hero Member
  • *****
  • Posts: 1197
Re: OS-es, obfuscation and strings
« Reply #13 on: March 11, 2019, 04:10:35 pm »
Well... (Don't laugh) https://bugs.freepascal.org/view.php?id=34140

Searching the bug tracker is a pain, that's true.
Your own reports are easy to follow and never disappear when logged in.
you quoted the link Cyrax posted... I guess you received a formal education in doing that.  Well done!... your abilities never cease to impress me.

I shouldn't have to search for that bug report, it should be on the list of bugs "reported by me" and the fact is, it isn't.
using FPC v3.0.4 and Lazarus 1.8.2 on Windows 7 64bit.

Thaddy

  • Hero Member
  • *****
  • Posts: 9167
Re: OS-es, obfuscation and strings
« Reply #14 on: March 11, 2019, 04:28:03 pm »

you quoted the link Cyrax posted... I guess you received a formal education in doing that.  Well done!... your abilities never cease to impress me.

Nahhh.. I just quote me, myself, alone   :D .
But other people make valid points... you seem to forget that a little.... On many occasions.... O:-)

Still curious about your education: are you a dentist? That happens quite often.... :P
« Last Edit: March 11, 2019, 04:30:24 pm by Thaddy »
also related to equus asinus.