Recent

Author Topic: AI translations  (Read 6248 times)

circular

  • Hero Member
  • *****
  • Posts: 4377
    • Personal webpage
Re: AI translations
« Reply #30 on: January 01, 2024, 04:26:49 am »
As I understand it, to "train" a programming bot using present AI methods, we would need a collection of C programs that had already been translated into FPC.  Is that right?  And if so, I doubt that there would ever be enough incentive to create such a training set.  So perhaps the "programing bot" translator will have to wait for an entirely different approach to machine learning -- with roots maybe in the reverse engineering programs noted by 440bx.
No I don't think such a training set is necessary. It only needs to properly understands what each language and its standard library does (discarding the problem of missing external references). It has a general intelligence to translate already, but miss some details. In my test, a missing function that scans a string would include the delimiter in the result, but it assumed that it was not the case. The rest of the code then was not functioning properly but it did not realize that.

There is certain level of complexity where it does not see yet that something is not consistent. A programming bot could write unit tests for each translated function and thus check for such side effects and then correct the mistakes. This means it would not be done in one go. This is not an obstacle though, it means that there is a need for managing the IA and this can be done by IA as well, either by having multiple agents or by integrating planning into the IA.

As Lainz points out, there is documentation the presents the code for various languages, serving as Rosetta stone. So between certain languages, there is already a lot of training data. Even without that, the IA has already a good understanding of language features. For example, it knows that a Pascal string is different from a char* in C or a String class in C++. But as previously stated, it can make mistakes and so need a mechanism to check and to improve on what it has already generated.
Conscience is the debugger of the mind

Curt Carpenter

  • Hero Member
  • *****
  • Posts: 578
Re: AI translations
« Reply #31 on: January 01, 2024, 05:36:42 pm »
I need to think about this some more.

Presumably, the Programming bot could
  o  compile program P in language L1 to a specific machine language M;
  o  attempt to write P in a different language L2
  o  compile the program in L2 to machine language M;
  o  compare the two machine language representations.

if the two machine language representations are identical, I think you can conclude that the translation from L1 to L2 is functionally perfect.  But can you conclude that if the two machine language representations are different? I guess we need to define what it means for two programs to be "provably identical."   (In that regard:  do Lainz' Java and Kotlin codes compile to identical machine language?  And if not -- how does one prove that they don't embody at least one corner case (bug perhaps) in which they behave differently?)

I can see how the programming bot might be able to reason from differences in the machine language representations to a better translation.  But then the question is whether it would ever (provably) halt in the reverse engineering process -- which sounds familiar of course :).

Interesting problem!  To be a perfect translation, does the L2 version have to have the same bugs as the L1 version?
 

circular

  • Hero Member
  • *****
  • Posts: 4377
    • Personal webpage
Re: AI translations
« Reply #32 on: January 03, 2024, 10:54:54 am »
The idea of using an intermediate language makes sense. In a way, the two versions of the program in language L1 and L2 are supposedly equivalent to a language that encompasses both L1 and L2.

Regarding the choice of the M language, I would not lean towards a machine language as such. On one hand, I agree that we need a language that expresses all the capabilities of both languages L1 and L2 in a way that is not ambiguous, on the other hand, it would be better if it is done in a short way to be easy to handle by LLMs. It is ok if the code does not exactly match at the processor level as long as it gives the same result in a similar amount of time.

The generic language I am thinking about would need to have all the basic features of all languages, so one way or another it would be able to handle char*, reference counted strings, garbage collection, explicit (de)allocation on the heap, lambda functions, etc.

Having one generic language would simplify the need for documentation. Basically the parts that would require some thinking would essentially how to translate features into a language that do not have them by default. Whether we use an intermediate language or let the LLM the task of noticing the differences in the languages, remains the choice on what to do when basic features of the language differ.

When converting from C to Pascal, would the translation use Pascal strings or continue with PChar? For internal values in methods, switching to Pascal strings could be simpler, but for memory structures that are exported, one need to keep the PChar for compatibility. It could rely on external libc, on a package, etc.

Conversely, when converting from Pascal to C, it may be needed to have reference counted strings in C to make the program work the same. It could rely on GLib to do that.

The IA could search the web for resources on specific subjects, get the most up-to-date method to do something etc.
Conscience is the debugger of the mind

Curt Carpenter

  • Hero Member
  • *****
  • Posts: 578
Re: AI translations
« Reply #33 on: January 03, 2024, 05:20:25 pm »
Compiler C1 maps language L1 into machine language M:  C1(L1) --> M.
Compiler C2 maps language L2 into machine language M:  C2(L2) --> M.

If the mappings C1 and C2 are invertible -- that is there exists mapping  C1' such that C1'(M) --> L1 and similarly C2'(M) --> L2, then clearly    C1(L1) --> M; C2'(M) --> L2 is a feasible translation procedure. 

So the theoretical question would seem to boil down to whether the inverse mappings exist for languages L1 and L2 (and that's the "reverse engineering" problem mentioned by 440bx earlier in this conversation for all intents and purposes I think).

Compilers unlike assemblers are one-to-many mappings though, so the core question I guess boils down to under what conditions one-to-many mappings are invertible.  I've been too long away from the books to remember the available knowledge on that question.

If a compiler has a BNF definition, does that ensure the existence of an inverse compiler?  And if so:  do both gnu C and  Free Pascal, as written in normal use (to take just one example) conform strictly to their BNF definitions?  (I suspect the answer to that in both cases is no). 

circular

  • Hero Member
  • *****
  • Posts: 4377
    • Personal webpage
Re: AI translations
« Reply #34 on: January 03, 2024, 11:00:00 pm »
I would say that compilers, given fixed compilation options, are one-to-one. So the inverse transformation would involve guessing the original code and the compiler options.

The problem is rather that compilers are many-to-one. There are various ways of writing the same thing. For example:
  • using while or using repeat until could give the same machine code
  • using whatever name for variables or procedures (you can get this information if you keep debugging information but I would not take it for granted)
  • redundant code that gets optimised
That's another reason, on top of conciseness of resulting code, for an intermediate language that would not actually be compiled using a typical compiler. It would be better to have an intermediate language that keeps the structure.

For example, we know that some allocated memory corresponds to a class, and is thus grouped under a name. What could be left for the decompiler to guess is whether it is an allocated record, or class, or whatever structure is closest in the destination language.
Conscience is the debugger of the mind

Curt Carpenter

  • Hero Member
  • *****
  • Posts: 578
Re: AI translations
« Reply #35 on: January 04, 2024, 12:24:21 am »
I would say that compilers, given fixed compilation options, are one-to-one. So the inverse transformation would involve guessing the original code and the compiler options.

Can that be right?  It's been a ridiculously long time since I wrote anything in assembly, but something as primitive as assignment may produce multiple statements at the assembly (machine) level depending on where the arguments are and what machine resources are available at execution time.   So not 1:1 at all for most conventional ISAs.   And flow control keys like "while" and "repeat" are probably much better examples of things that the compiler translates into a whole cascade of machine instructions.  I may be misunderstanding you though.


circular

  • Hero Member
  • *****
  • Posts: 4377
    • Personal webpage
Re: AI translations
« Reply #36 on: January 04, 2024, 12:44:00 am »
If you take a portion of code, yes it can be compiled into multiple ways depending on the context. But compilation is independent from what is available at execution time. The available registry to produce the assembler instructions is determined by the preceding code, which ever branch will actually be taken at runtime.

But if you have a whole program, if you don't change it, supposing the compiler is deterministic, that it doesn't use randomness to generate the code, if you keep the same options, you will get the same machine code.

Rather than one-to-one, it is even many-to-one, because you can find two programs that have the exact same meaning and thus could be translated to the same machine code. So in fact, given a machine code, it is plausible to find various ways of writing the equivalent program.
Conscience is the debugger of the mind

VisualLab

  • Hero Member
  • *****
  • Posts: 620
Re: AI translations
« Reply #37 on: January 04, 2024, 04:40:43 pm »
So the answer to "when" would seem to be decades, not months or years (assuming anyone undertakes the development of a programming bot at all) -- and the possibility that such translations may be theoretically impossible remains open.

After the next "two waves" of A.I. There were two, now there is a third. The next two will still fail. But perhaps 6 will achieve the goal sufficiently. That probably gives you "estimated" :):

100 years, 2 months, and 3 days

Curt Carpenter

  • Hero Member
  • *****
  • Posts: 578
Re: AI translations
« Reply #38 on: January 04, 2024, 10:01:21 pm »
I think the question of "universal program translatability" is going to take some careful definition of the problem and its bounds.

circular

  • Hero Member
  • *****
  • Posts: 4377
    • Personal webpage
Re: AI translations
« Reply #39 on: January 05, 2024, 09:42:25 am »
Another way of looking at it is that it may not make sense to translate a program if IA will be able to make changes to existing programs and humans will merely explain with natural language what they want. Of course, we would still have the need sometimes to look into it, but to have a language we like to use may be less important.
Conscience is the debugger of the mind

 

TinyPortal © 2005-2018