Introduction to computer programming

Computer components

It is important, when first learning to program a computer, to learn the inner-workings of a computer. This is a very brief introduction.

The “heart” of a modern computer is the central processing unit (CPU), which can do a very limited number of operations on numbers. The numbers on which the CPU operates are stored in computer memory. Memory commonly has two different embodiments:

random-access memory (RAM), which has lower capacity but is fast to access and
storage, which has higher capacity but is slow to access (e.g. a solid-state or a hard drive).

There are also a number of ports through which the CPU can communicate with other components, called peripherals. Peripherals include keyboards, monitors, and external storage. When a port communicates from a peripheral to the CPU, it is called an input port. Otherwise, the port allows the CPU to communicate with a peripheral, and it is called an output port.

Numbers represent everything in a computer

The CPU only deals in numbers, so we use numbers to represent everything. This includes basic computing elements such as a character. A character, such as the letter “a” is encoded as a number. So a string, a datatype we will later learn more about, which is often used to represent a word, would be processed and stored as several numbers.

The numbers processed by a CPU represent such objects as characters, memory addresses (the “location” in memory where a piece of information is stored), and even instructions for what to do next.

Number systems

The numbers we most often use are in a very specific numeral system: base-10. In base-10, we use ten distinct numerals (0-9) to represent numbers. There are other common numeral systems, however, such as base-2, which only uses the numerals 0 and 1. This is the familiar binary numeral system, and it is ubiquitous in computing. In fact, it is found at the hardware level, because most computer memory stores a piece of information with a physical mechanism that can be in only one of two possible states. This naturally maps to the binary numeral system.

Another common numeral system for computing is the hexadecimal or base-16 numeral system. It uses numerals 0-9 and numerals a-f.

Machine code

Machine code is the lowest-level description of a program. Since processors only work with numbers, this code is nothing but a series of numbers, or instructions.

There are only a limited number of operations that a processor can perform. The essential functions are:

loading from memory: retrieve a number stored at a given address in memory,
storing to memory: store a number at a given address;
loading from port: get a number from a given port address;
storing to port: store a number to a given port address;
performing arithmetic: perform basic arithmetic on given numbers such as add, subtract, multiply, and divide;
testing: compare numbers with operations like less-than, equal-too, etc.;
conditional jumping: given the outcome of a test, jump to a memory address.

Hierarchy of software

Machine code is the lowest-level encoding of the program, and technically is all that is required to write any program. However, it is extremely difficult to use to perform higher-order operations. Therefore, we create higher-level programming languages that allow us to express what we would like the machine to do, and an assembler or interpreter translates this into machine code.

The next level above machine code is assembly language, which is usually specific to a given processor. This is very easy for an interpreter to translate because it is essentially giving a mnemonic, like “add” or “divide” to each machine code instruction. Assembly language is still cumbersome for programming, so we typically use higher-level languages.

High-level languages are the ones we typically hear about: C, C++, Java, Python, Ruby, MATLAB, etc. These allow us to define data types of rather high complexity–an array, for instance–and operate on this data with complex operations, like transposition.