How do you program a programming language?

Josh H: How do you program a programming language?
Obviously you have to start with a base language so wouldn’t every programming language just be a dialect of the first? To clarify: How could they have made Perl without using another programming language to program Perl’s functionality?

Answers and Views:

Answer by James
I have often pondered this question myself… I have no REAL knowledge about it, so this is just speculation:

A programming language is basically just a system for converting words/commands that we can understand (if x then ….) into commands the computer understands. I assume that the ORIGINAL language was written purely in binary, and then somehow used to branch off into other languages. All modern languages are probably written in other low-level languages, but I am not positive.

These are the questions that keep me up at night =)

Answer by ItachisXeyes
long story short, there are generations of programming languages. most people today NEVER deal with strait binary (thank God) there are a small group who deal with Assembly. then a slightly larger group that deal with languages that are at a higher level like C and C++ and again an even higher level like Perl, Python and so on.
everything boils down to binary and Assembly basically.

https://en.wikipedia.org/wiki/First-generation_programming_language

use the little guide on the bottom to go through the different levels of languages.

Answer by TaZ
The primary compiler of any language will be written in another language.

Most of the modern compilers are written in C (some in C++). In the olden days (may be 25-30 years back) while there were no C compiler on PC, someone sat down with an assembler and made a small C compiler, once he got one right he rewrote the C compiler in C itself! And interestingly, the C compiler produced Assembly language code which was assembled into the new C compiler.

But that relation does not lead to a relationship as dialect. Visual Basic compiler is written in C, but VB is no way a dialect of C!

Good question!

Answer by Dan
A language compiler contains three distinct sections:
– a Parser. This analyzes what the programmer has typed in and checks syntax and so on;
– a Tokenizer: This takes the verified source code and converts it into internal tokens, checking logical inconsistencies in the code;
– an Encoder: this takes the tokenized source code and creates object code for it (the compiled result).

More often than not, the Encoder generates machine code, but there are exceptions: for example, one of the compilers I wrote takes table layout descriptions and writes SQL to generate tables, stored procedures, relationships, etc. in a database.

The Parser uses a syntax table to do its analysis. The most common type of syntax table is a Backus-Naur table (https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form). The first language which did this (which was written in order to prove the concept) was Algol-60, which its author, the great Edsger Dijkstra, described many years later as “A great improvement upon many of its successors.” Most modern programming languages (C, Pascal, Java, C#, etc.) owe their design to Algol-60.

The Tokenizer relies on a translation table in the compiler which matches a language’s reserved words and symbols to easily-recognizable tokens, substituting the one for the other. In most of the compilers I’ve written, the Parse and Tokenize operations happen together: it’s usually easier to tokenise something as soon as you recognise it, rather than in a second pass. If you read the Wikipedia article on Backus-Naur form, you’ll probably be able to see why.

If you write a decent parser and tokenizer, you generally only ever need to write one: the Backus-Naur table you give it determines the structure of the language, and the parser/tokenizer should be able to handle any Backus-Naur table you give it.

The Encoder takes previously defined combinations of tokens (there are a limited number in any language) and substitutes object code for them, adding in any numbers, strings and other literals that the programmer has defined in the source code. It works based on a number of rules that the compiler writer has determined will be needed in order to make the language work. In any given language, there are actually surprisingly few of these: programming languages rarely contain more than 20 or 30 inbuilt commands.

The holy grail of compiler writers is to write an encoder which can encode anything. I’ve proved over the years to myself that this is in fact impossible, since the encoder would have to understand every combination of every possible programming language ahead of time in order to be able to function. it’s logically conceivable that you could construct your tokenizer so that it always generates structures that the encoder understands, but one only has to think about writing a language for some purpose hitherto unsuspected, and that idea also falls short. In other words, a universal encoder would have to be as intelligent and subjective as a programmer: some kind of AI, perhaps.

Some compilers attempt to optimize the encoded result by analyzing the tokenized source code and trying to find the best possible way to translate it. For example, Sun Microsystems wrote a C compiler back in the 80s which, if it detected a loop which always left variables in the same state after execution, would remove the loop entirely in the object code and simply set the variables to the correct values. This is a very, very tricky type of thing to write, and can only be achieved with a lot of practice in compiler writing.

If you’re truly interested in the subject, I’d suggest reading Dijkstra’s “Report on the Algorithmic Language ALGOL 60” which is available all over the internet these days, along with a few books on compiler writing.

As an aside, it is, fascinatingly, impossible to write a perfect Algol-60 compiler. This has to do with the structure of the FOR loop… there’s an element of subjectivity in terms of certain ways to write the FOR statement which make deterministic compilation impossible. The compiler writer simply has to decide ahead of time what’s going to happen when the compiler finds that particular structure and make the compiler work that way and that way only. We can’t ask Dijkstra what he meant, since he’s dead… I guess we’ll never know.

Related Questions:

Reader Interactions

Leave a Reply Cancel reply