Vinícius Vidal
Full-Stack Developer
Hardware Basics - Processor (CPU)
I believe that, besides writing code, a programmer should have a minimal understanding of the environment they work in.
I used to think these things didn’t matter too. But over time, I realized that having this knowledge makes you a better programmer and helps with seemingly unexplained problems.
“With 20% of the knowledge, you can solve 80% of the problems. But to solve the last 20%, you’ll need the 80% of knowledge you don’t yet have.”
Words I heard in a Fabio Akita video. Having the missing 20% will make you exceptional.
Here’s a very simple and straightforward analysis of a processor. Understanding how a modern CPU works is overwhelming. “Real processors, memories, disks, and other devices are very complicated and present difficult, awkward, idiosyncratic, and inconsistent interfaces to the people who have to write software to use them.”
Words from the book "Modern Operating Systems, 4th ed", which I’m using as a guide while writing.
I hope those who read this will come away knowing something new. 😁
What is CPU
The processor, or CPU, short for “Central Processing Unit”, is considered the “brain” of the computer. It runs programs by continuously repeating the following cycle: fetch an instruction from memory, understand what needs to be done (decode), execute the action, and then move on to the next instruction. This process repeats until the end of the program.
CPUs can have different architectures, so an x86 processor cannot run ARM programs, and an ARM processor cannot run x86 programs. Each architecture has its own set of instructions it can execute.
What exactly is an instruction and how is it executed?
Instructions are basically machine language, a sequence of bits (zeros and ones) that represent commands encoded according to the rules of the processor’s architecture.
But how does the processor understand that this seemingly random sequence (it isn’t!) means a command?
The answer lies in what we call the ISA (Instruction Set Architecture), which works like the processor’s dictionary and defines exactly which instructions it recognizes and how they are represented in bits. That’s why an ARM program won’t run on an x86 processor: the program’s instructions are not present in that “dictionary”.
It’s important to understand: the CPU does not interpret code. When we say it “understands” a command, we mean that the bits of the instruction follow physical paths on the chip and trigger specific transistors that carry out the corresponding operation.
Assembly 😰
Since humans don’t understand machine code, it’s impractical to write programs using zeros and ones, so an intermediary is needed. That’s where a slightly more friendly language comes in: Assembly, along with its compiler, the Assembler, which handles the task of transforming text into machine language.
Assembly is a low-level language that represents these binary instructions in textual form. Each command in Assembly usually translates directly into a machine instruction, making it an almost 1:1 mirror of what will actually be executed by the processor.
Different Assemblers
As already mentioned, each architecture (x86, ARM, RISC-V...) has its own instruction set. The same command in C can become very different code in each of them, even though the final result is the same.
For each architecture, there is a specific Assembler responsible for translating that architecture’s Assembly into its respective machine code. For example, an x86 assembler doesn’t understand ARM Assembly, and vice versa.
Even if you never write Assembly code in practice, it’s helpful to understand that all high-level code (C, JavaScript, or Python, among others) is ultimately translated into a sequence of simple instructions that the processor actually understands.
Registers
Fetching data from memory is slow compared to execution, so processors have registers, which are extremely fast small storage areas within the CPU itself. They store temporary variables, intermediate values, and addresses.
The instruction set includes commands that transfer data between registers and memory. There are also specialized registers for performing mathematical and logical operations, controlled by the ALU (Arithmetic Logic Unit), a FPU (Floating Point Unit) for real number operations, among others.
Additionally, caches (L1, L2, L3) help reduce memory access time by acting as intermediate layers between the registers and main memory.
Some special registers
Among the general-purpose registers, there are some special ones that the programmer can access, such as the program counter, the stack pointer, and the PSW (Program Status Word).
The program counter holds the memory address of the next instruction to be executed, acting as a kind of guide that tells the CPU "what’s next" in the program.
The stack pointer is a register that points to the top of the stack in memory. The stack is a LIFO (Last In, First Out) structure mainly used to manage function calls. Whenever a function is called, the computer creates a stack frame, which is a reserved area in the stack to store that function’s input parameters, local variables, and temporary values. When the function ends, this frame is removed from the stack, and the pointer is updated.
For example:
function sum(a, b) {
const result = a + b;
return result;
}
In the call sum(2, 3)
, a frame with a = 2
, b = 3
, and result = 5
is allocated in the stack.
The stack pointer points to this frame while the function is being executed. At the end of execution, the frame is removed from the stack and the pointer returns to the previous state.
The PSW (Program Status Word), finally, is another essential register, as it stores various pieces of information about the CPU’s current state. Among this information are: condition bits (set by comparison instructions, such as whether a number is greater than another), CPU priority, the operation mode (user mode or kernel mode), and other control bits. User programs can usually read the entire content of the PSW but can only write to certain fields. The PSW is especially important for the functioning of system calls (to be explained later) and I/O, as it helps control the behavior and privileges of the code being executed.
32-bit vs. 64-bit
You’ve probably heard of 32-bit and 64-bit processors, and that 64-bit ones are better. But why?
These two options refer to the width of the CPU’s registers, which directly affects memory addressing capacity, the size of operands that can be handled, and the overall system performance.
Keep in mind that when we talk about width, we’re referring to the number of bits the registers can store, not their physical size.
A 32-bit system can address a maximum of 2³² bytes, which is about 4 GB (gigabytes) of RAM. In other words, the processor can only access and use up to 4 GB of memory at once. Because of this, these systems aren’t suitable for running heavy software or applications that demand lots of resources.
A 64-bit system, on the other hand, can theoretically address up to 2⁶⁴, an impressive 16 million terabytes of RAM! Obviously, no current device comes anywhere close to that limit, and probably won’t for many decades, but the important point is that 64-bit systems can use more than the 4 GB of RAM that limit 32-bit systems. They’re also compatible with older 32-bit software, while the opposite isn’t always true: 32-bit systems may struggle to run programs made for 64-bit, especially due to memory limitations.
Beyond memory, the width of the registers also influences performance. With 64-bit registers, the processor can handle larger numbers and perform operations with greater precision and speed, reducing the number of instructions needed to work with large volumes of data. This means programs compiled for 64-bit can be faster and more efficient than their 32-bit versions.
Finally, 64-bit operating systems can run both 64-bit and 32-bit programs (thanks to compatibility modes), but the reverse isn’t true: 32-bit systems can’t run 64-bit programs or use more than 4 GB of RAM, regardless of how much is installed on the machine.
Kernel mode vs. User mode
The processor operates in two main modes: kernel mode and user mode, and this distinction is essential for the security, stability, and functioning of the operating system.
In kernel mode, the processor has full access to all system resources, such as memory, input/output (I/O) devices, and privileged operations. This mode is used by the operating system to manage hardware, control processes, and allocate resources. Code executed in this mode can perform critical operations, such as direct memory manipulation and device control.
User mode is restricted and used for running programs and applications. In this mode, code has limited access to system resources, without the ability to directly access memory or critical devices. This ensures that failures in user programs do not compromise the integrity of the operating system or other processes.
When a program in user mode needs to perform an operation that requires elevated permissions, such as accessing system files or allocating memory, it makes a system call. When a system call is made, the user-mode code triggers a software interrupt, causing a transition from user mode to kernel mode. The operating system then performs the requested task and, after completing it, the processor returns to user mode, allowing the program to resume execution. The execution flow is as follows:
- Program in user mode requests access to a privileged resource.
- System call is made → transition to kernel mode.
- Kernel performs the operation.
- Return to user mode.
An example of a system call in C:
#include <stdio.h>
#include <unistd.h>
int main() {
char *pathname = "example.txt";
if (access(pathname, R_OK) == -1) {
printf("%s: File not found or not readable\n", pathname);
} else {
printf("%s: File exists and can be read\n", pathname);
}
return 0;
}
In this example, the user-mode program uses the access
function to check if the file example.txt
can be read. This function makes a system call, which triggers the transition to kernel mode. The kernel checks the permissions and returns the result to the program. After that, the program continues execution in user mode.
The PSW, mentioned earlier, controls the transition between kernel and user modes. It records the current operation mode and, when a system call or interrupt occurs (executions can be interrupted), it is updated to reflect the context switch. After executing the system call, the PSW ensures that the processor safely returns to user mode, preserving the system state.
Interrupts
During program execution, events may occur that require the CPU’s immediate attention. To handle these events properly, processors have a mechanism called interrupts. Interrupts can be classified into three main categories:
- Hardware interrupts
These are generated by physical devices (such as keyboard, mouse, disk, network card). Examples:
- Keyboard: key pressed → interrupt → CPU reads the character.
- Disk: when it finishes reading/writing data.
Such events require immediate and efficient handling.
- Software interrupts
These are triggered by programs in execution that request services from the operating system—such as system calls, as explained earlier.
- Exceptions (or internal interrupts)
These are generated by the CPU itself during the execution of instructions that result in some abnormal condition.
Examples:
- Division by zero.
- Invalid memory access.
- Illegal instruction.
In such cases, the interrupt serves as a signal that something went wrong, and the system may react in different ways: terminating the process, throwing an error, or even restarting the system in more critical situations.
Pipelining
In order to optimize performance, processor designers created a technique called pipelining. This approach allows multiple instructions to be processed simultaneously, but at different stages. Instead of waiting for one instruction to be fully executed before starting the next, pipelining breaks down the execution process into several steps (such as fetch, decode, and execute), and each step can work on part of an instruction while other instructions move to subsequent stages. This enables the CPU to perform more operations in less time, significantly improving performance and efficiency.
What is a core
A core is an independent processing unit within a CPU. Each core is basically a miniature CPU, with its own capability to fetch, decode, and execute code instructions. In a processor with multiple cores, these cores can work together to process different tasks, or even split a larger task between them. This process is called multithreading.
Multithreading
In 1965, Gordon Moore, co-founder of Intel, formulated what’s known as Moore’s Law, which predicted that the number of transistors on a silicon chip would double approximately every two years, resulting in exponential increases in computing power.
With this growing abundance of transistors, the next step was to increase parallel execution capacity within processors. Instead of relying solely on a single line of execution, engineers began designing CPUs capable of handling multiple tasks simultaneously, giving rise to modern multithreading.
A thread is the smallest unit of processing that can run independently within a program (I’ll talk more about it in another post). That’s where the term multithreading comes from: more than one thread executing in parallel.
Dual-core processors have two cores, meaning they can execute two threads in parallel. Quad-core processors have four cores, and so on.
Hyperthreading (or SMT - Simultaneous Multithreading)
Imagine that a processor core is like a workstation with all the tools to perform tasks. When it’s processing something, not all internal resources are always used to the max, some parts may be idle for moments, waiting on memory or a previous operation to finish.
Hyperthreading, Intel’s trademarked name for SMT, tries to solve this. It allows a single core to run two threads at the same time, sharing internal resources as efficiently as possible. Instead of letting parts of the core sit idle, it uses them to execute instructions from another thread while it waits.
In practice, the operating system sees two logical cores, but physically it’s just one. The actual performance gain depends on the workload: in some cases, the boost is small; in others, it can be significant.
It’s important to note that it doesn’t double the performance, since both threads still compete for the same physical resources—but it helps make better use of what already exists.
And that’s it for today.
See you next time, dev! 🤠