What makes a programming language whole?
Table of contents
Programming language is an immensely overloaded term. It can describe anything between a certain type of logical structuring (such as object-oriented, functional, or script programing) or different layers of the abstraction stack (such as systems, application, or web programming). This long-form post will create a basic framework for a potential mental-model of what a language is, alongside clarifying various related terms.
In essence, the term can be split into four major and two minor components. The major ones being: the language specification, the built-ins, the standard library, and the compiler/interpreter. Whereas the minor ones are: the community packages and the writing guidelines. These terms will all be explained in the explicit context of the Go programming language, but they similarly apply to all other common programming languages.
Major Programing Language Components
Programming Language Specification
This is the official specification for the Go programing language. It is an overview of the correct syntax for the text files written according to the Go language. It explains things like: what are proper variable names, how can a variable be initialized, and how to define the return type of a function. It does not however do any computing based on the content itself, i.e. it does not demonstrate what machine code should explicitely be run when a particular syntax token is written in the text. It is merely a high level overview that specifies things like: unlike the Python programming language Go uses the := assignment operator.
The Built-ins
Built-ins1 are code chunks that are meant to perform actions that do not explicitely need to be imported into the code in some way. For example the len() built-in function can be used to figure out the number of characters in a given string. It can be used anywhere, and at any time in a program without requiring special syntax to first make it available. In contrast, fmt.Println() first has to be imported via import fmt before it can be used in a code piece. These are usually the core building blocks that can be used to describe logic in new custom software. This is the first level of actual runnable code provided by the creators of a language.
The Standard Library
Other code that is provided by the language developers directly is considered the standard library. This is also runnable code that is normally provided directly alongside the language specification. It is generally meant to provide common, useful tools that any program might need, such as printing out text to the terminal, as mentioned in the above example of fmt.Println().2 Since it is always shipped directly with the language documentation, it is generally easier to import than other external, community packages. In the case of Go, packages in the standard library can be directly imported via their name, such as import "fmt", whereas community packages have to be imported with a path specifier, such as github.com/modernc. However, simply because they are easily accessible does not indicate them to necessarily be the best solution to all problems. Often times the standard library is limited in update frequency by the updates made to the language specification itself. I.e. sometimes community packages have solved a problem better than the implementation in the standard library has.3
The Compiler/Interpreter
The compiler/interpreter4 is the part of a language that allows the actual execution of code written according to the language specification. Since the Go language is based on compilation,5 I will stick to compilers in this section, but interpreters laregely cover the same concepts. Many different programing languages exist, however any one specific CPU architecture only has a specific set of commands it knows how to execute, known as machine code. In order to convert the text written based on the language specification into this machine code, the compiler is used. Most programming languages have their official reference compiler. In the case of Go, this is the Go compiler. However, one a language specification exists, other compilers can be created which also comprehend this specification. For example, the GCC compiler has also added a backend specific to the Go language.6
Minor Programing Language Components
The Community Packages
Beyond the standard library, given time, other programers will also write and publish code written according to a particular language specification. This turns into the community packages for the programing language. In the case of Go a limited overview can be found at the Go package index. Unlike some other programming languages, the Go index does not store packages directly, but rather maintains a reference of the location for code repositories storing the code for community packages. However, the working principle remains the same. It allows coders to download code published (and licensed7) by other creators to then incorporate it into their own source code. The only difference between this code and the standard library code is that it is not directly published along side the specification and compiler, but rather has to be specially downloaded on an as-needed basis.8
The Writing Guidelines
Lastly, programing languages come with varying levels of rigor in relation to their writing guidelines. Go has a fairly strict system, largely enforced via the go fmt program, but not all languages have such a strict set of rules.9 These guidelines can easily be confused with the language specification itself, but technically they are design choices that should have no effect on the output created by the compiler.10 To give an example, it is considered more proper to write the following in Go code:
import (
"fmt"
"net/http"
)
whereas writing this instead, would still be completely valid according to the language specification alone:
import "fmt"
import "net/http"
SHA256 for this post: e3c3049eec492303a868e41e193941d22b59c176b74878b29a1a228727d3e926
Endnotes
A perhaps easier to digest listing of the Go language's built-ins can be found here. However, it should be noted, that the canonical definition is found in the language specification, and not in this package.↩
Even though the standard library is often times seen as omni-present, this is not always the case. For example in the Rust programming language, the crate (package) containing the standard library (
std) is not guranteed to be usable on embedded hardware ddevices.↩Different programming languages have starkly differing opinions about what constitutes a good standard library. For example, Go focusses on having a lot of functionality built into the standard library directly to prevent the need for external packages. In contrast, JavaScript and Rust adhere to the principal that most functionality should be provided by external packages and only absolute necessities should be provided by the official standard library. There is a lot of literature on the question of which approach is best (link, link); I do not wish to comment on the subject any more than necessary.↩
Going into the details between what a compiler versus an interpreter is, would be outside the scope of this blog post. To put it simply, a copmiler takes in text and outputs machine code directly executable by a computer's CPU, without needing the compiler to be present at runtime. In contrast, an interpreter is a program that is run on the code, which then executes the instructions of the program in real time and therefore must be present to be able to run the program.↩
Technically, it should be noted that a given language specification does not imply that a programming language necessarily must be compiled or interpreted. Based on the same spec, it would entirely be possible to offer both alternatives. However, coding either a reference compiler or interpreter is quite difficult, so in practice most languages are limited to one or the other. In the case of go, both can be seen somewhat though, for example a Go program can directly be executed via
go runwhile a compiled version can be created withgo build.↩It can be an interesting field of research to understand the advantages/disadvantages of different compiler options for a given language specificaiton. Some outdated comparisons between
go buildandgccgocan be seen here. It should also be noted that different compilers do not always produce identical machine code for the same input code, due to the compilers applying different optimizations to their input.↩Program licenses is a topic all for itself. In essence any code written is copyrighted by default to the person who originally wrote it. In order for others to be allowed to reuse it, they must obtain a license for use. Some commonly known licenses are: MIT, GPLv3, BSD-3-clause, but there are many others as well. For some more info, see here.↩
It also comes with other differences such as: a higher likelyhood of supply-chain troubles, varying licenses, different authors, versioning separate from changes to the language specification, etc.↩
For example, Google publishes style guides used internally for many different programming languages, even though these are completely separate from the actual language specifications.↩
Technically, writing guideline changes should not have an effect on the execution of the program. However, this is not always the case. One notable example would be Python code, in which reordering import statements can lead to a difference in code output due to the way different namespaces can be shadowed by a later import. See the section on 5. Shadowing Standard Library Modules for an example here.↩