Registers & Memory

If you stayed with me so far then, tell me do you remember our first warmup code. We have used some registers there. There we’ve only seen numbers being moved into registers (mov ax, 6, mov bx, 9) and then combined. What are registers? why we use and how many types of register out there?

We don’t know any of that yet so now we dive deep into just that. Buckle up, pick up your bottle of water, take a sip and let’s learn that.

Registers

In the CPU (Central Processing Unit), we have small but very fast storage locations called registers. Registers are used to hold data temporarily while the CPU is processing it.

The size of registers is much smaller compared to the main memory (RAM), because the CPU only needs to keep small chunks of data at a time for immediate operations. The size of a register is decided by the CPU’s design. For example:

On a 32-bit CPU (x86), the general-purpose registers are 32 bits wide (4 bytes).
On a 64-bit CPU (x64), the general-purpose registers are 64 bits wide (8 bytes).

This means a register can only hold data up to its width. If the data is larger, the CPU must use multiple registers or memory to handle it.

That’s why CPUs don’t have just one register, but several — each with a specific purpose. The most common categories are the following:

Categories of Registers

General Purpose Registers
Index Registers
Segment Registers
Instruction Pointer Register
Stack Registers
Flag Registers

All modern CPUs have these registers. Their sizes may vary depending on whether the processor is 8-bit, 16-bit, 32-bit, or 64-bit, but these categories of registers always exist.

Now we start learning about each of the above categories one by one, but before that here’s the warning.

Each register category contains more types inside. Take a quick break to grab some water if you need it.

1. General Purpose Registers

General Purpose Registers are the registers that the CPU uses to store temporary data while performing operations. They are called “general purpose” because they are not fixed for one single task. The programmer can use them for calculations, data storage, or memory addressing as needed.

In 16-bit assembly programming (which we are focusing on in DOSBox), the general-purpose registers are AX, BX, CX, and DX. Each of these registers is 16 bits wide, and each one can also be divided into two separate 8-bit registers: the higher byte and the lower byte.

AX (Accumulator Register)
The AX register is often used in arithmetic, logic, and data transfer operations. Many instructions work faster with AX because it is the default register for some operations.
- AH = higher 8 bits of AX
- AL = lower 8 bits of AX
BX (Base Register)
The BX register is commonly used as a base pointer for memory access. It can hold the address of data in memory and is often used in addressing modes.
- BH = higher 8 bits of BX
- BL = lower 8 bits of BX
CX (Count Register)
The CX register is mainly used for counting in loops and for shift/rotate instructions. It automatically decreases in loop instructions like LOOP.
- CH = higher 8 bits of CX
- CL = lower 8 bits of CX
DX (Data Register)
The DX register is often used in multiplication, division, and input/output operations. For example, in 32-bit results, DX stores the higher part while AX stores the lower part.
- DH = higher 8 bits of DX
- DL = lower 8 bits of DX

Before moving to the next section, let me clarify the use of the higher and lower parts of a register. As explained above, a 16-bit register like AX can be divided into two 8-bit parts (AH and AL). In the code, we can write it like this:

mov ah, 9
mov al, 6

Notice how we are using the two separate divisions of AX. This allows us to work with only the high byte or the low byte depending on our data, which can sometimes help in optimizing the program. The same applies to the other general-purpose registers as well.

Phew, that’s a lot, but hopefully it makes sense. If it does, then we shouldn’t have any problem understanding our previous warm-up code — specifically the register part. I know I haven’t explained what mov actually does yet, but now we should at least have no trouble understanding what AX and BX mean in that code.

2. Index Registers

Index Registers are mainly used for addressing memory locations, especially when working with arrays or strings. Their main purpose is to hold offsets (distance b/w memory cells) that point into memory.

In 16-bit assembly, the index registers are:

SI (Source Index)
Usually holds the source offset in memory, often used in string operations.
DI (Destination Index)
Usually holds the destination offset in memory, also important for string operations.

When we practically start working with memory, we will see how SI and DI are used together with segment registers to access different parts of memory. For now, just remember that these registers exist and are mainly related to memory handling. We will understand them more clearly once we start writing memory-related code.

3. Segment Registers

In the early days of computing, CPUs like the 8080 and 8085 were limited in size. They had only 8-bit registers, and their buses were also small. For example:

8080 → had an 8-bit data bus and a 16-bit address bus, which allowed access to only $2^{16} = 65536$ memory locations (64 KB).
8085 → also had 8-bit registers but slightly improved design.

At that time, the CPU used a straightforward single-address-bus model (within its bus-width limit). The processor could directly address memory up to the limit of its address bus. But this created a problem: even if more physical memory was added, the CPU still could not generate larger addresses because it was constrained by its fixed bus size.

As technology advanced, new processors were built — such as the 8086 family (the ancestors of modern x86 CPUs). These CPUs introduced a new idea called the segmented memory model. In this model, memory is divided into different segments: code, data, stack, and extra. Each segment is referenced via a segment register.

This design had two main goals:

To allow programs written for older CPUs to still run on newer CPUs.
To let the CPU access more memory than what was possible with a single 16-bit address.

The main segment registers are:

CS (Code Segment) → points to where the program instructions are stored. Works together with IP (Instruction pointer).
DS (Data Segment) → points to where general data is stored.
SS (Stack Segment) → points to the stack area in memory. Works with SP (Stack pointer) and BP (Base pointer).
ES (Extra Segment) → provides an additional area for data, often used with DI (Destination index).

When we practically work with memory, we will see how these segment registers combine with offsets (like IP, SI, DI, etc.) to calculate the actual memory address. For now, just keep in mind that segment registers were introduced to overcome the memory-limitations of older CPUs and are still part of the x86 design today.

4. Instruction Pointer Register

The Instruction Pointer (IP) register always holds the memory address of the next instruction to be executed by the CPU. After an instruction is executed, the CPU automatically updates IP to point to the following instruction in memory.

In 16-bit assembly, the IP register works together with the CS (Code Segment) register. The CS register contains the starting address of the current code segment, and the IP register provides the offset within that segment. Together, CS:IP points to the exact location of the next instruction in memory.

The IP register cannot be directly modified using normal instructions like MOV. Instead, it is changed indirectly when control flow instructions are executed, such as:

JMP → to jump to a different part of the code.
CALL → to call a procedure.
RET → to return from a procedure.
Interrupt instructions → which also update IP to point to the interrupt handler.

We learn about these instructions fully in later sections however for now just press the I believe button and try to understand the purpose behind each.

This makes the Instruction Pointer one of the most important registers, as it controls the sequence in which the program runs.

5. Stack Registers

Stack Registers are used to manage the stack in memory. A stack is a special area in memory that works on the principle of Last In, First Out (LIFO). It is mainly used for storing temporary data such as function parameters, return addresses, and local variables.

There are two important stack-related registers in 16-bit assembly:

SP (Stack Pointer)
The Stack Pointer register always points to the current top of the stack. Whenever data is pushed onto the stack, SP decreases (because the stack in x86 grows downward in memory). When data is popped from the stack, SP increases.
BP (Base Pointer)
The Base Pointer register is mainly used to access data stored in the stack. Unlike SP, the BP register does not change automatically during push or pop operations. Instead, it is used by the programmer (or compiler) to reliably reference variables or parameters inside the stack frame.

Both SP and BP work together when dealing with stack operations, especially in function calls and local variable management. Later we work with stack practically then these things make more sense. For now just try to consume what I have written above.

6. Flag Registers

Flag Registers are special registers that store the status of the CPU after performing an operation. Each flag is a single bit that can either be 0 (cleared) or 1 (set). These flags help the CPU make decisions during program execution, such as whether the result of an operation was zero, negative, or if an overflow occurred.

In 16-bit assembly, we called it with just FLAGS . Some of the important flags inside Flag registers are:

ZF (Zero Flag): Set if the result of an operation is zero.
SF (Sign Flag): Indicates the sign of the result. If the result is negative, this flag is set.
CF (Carry Flag): Set if an arithmetic operation generates a carry out of the most significant bit, or a borrow in subtraction.
OF (Overflow Flag): Set if the result of an operation is too large to fit in the destination register (signed overflow).
PF (Parity Flag): Set if the number of 1-bits in the result is even. Use for data corruption detection.
AF (Auxiliary Carry Flag): Set if there is a carry from the lower nibble (lower 4 bits). Mostly used in Binary Coded Decimal (BCD) arithmetic.
IF (Interrupt Enable Flag): Controls whether the CPU will respond to hardware interrupts.
DF (Direction Flag): Determines the direction for string operations. If set, string are process from high memory to low memory; if cleared, from low to high.

These flags automatically change depending on the result of an instruction, and the programmer can also test them using conditional instructions like JZ (Jump if Zero), JC (Jump if Carry), and so on.

Now, let's circle back to that Fig 1.2. After everything we've covered, do the different sections make more sense to you?

I know this is a lot, but what can we expect? Assembly is a very powerful language, and with great power comes great learning (sorry Spiderman). We have to be responsible, and to do that, we have to learn each topic completely. Just be honest with yourself. If you do that, I assure you, things will get interesting, slowly but surely.

Memory

Earlier in the debugger (Fig 1.2), you saw the Data Window, which showed the content stored in memory. That was our first visual interaction with memory although it doesn’t make sense. Now we go deeper and understand what memory actually is, how it works with the CPU, and how assembly language lets us access it. Hopefully then things makes sense.

What is Memory?

Memory is the storage area of the computer where instructions and data are kept. Unlike registers, which are tiny and very fast storage locations inside the CPU, memory is large and relatively slower.

In our warmup code, values like 6 and 9 were stored inside registers (AX, BX). But what if we need to work with bigger data sets or store things for longer than just a few instructions? That’s where memory comes in.

So the key difference is:

Registers → small, extremely fast, but limited in size.
Memory (RAM) → large, slower, but can store a lot more data.

The CPU cannot directly access everything in memory all at once. It uses a set of system buses to communicate with memory.

System Buses

The CPU connects with memory and other devices using special pathways called buses. Think of buses as electrical highways that carry signals back and forth.

There are three main buses:

Data Bus

Transfers the actual data between CPU, memory, and I/O devices.
Works in half-duplex mode i.e. Data read or write one at a time
For example, if the CPU wants to read a number from memory, that number travels through the data bus.
The width of the data bus (8-bit, 16-bit, 32-bit, etc.) defines how many bits can be transferred at once.

Address Bus
- Used to specify the location in memory where the CPU wants to read from or write to.
- Works in simplex mode i.e. address is send only in one way.
- For example, if the CPU wants to access memory cell 0x2000, that address will be placed on the address bus.
- The width of the address bus decides the maximum addressable memory. A 16-bit address bus can access 2^16 = 64 KB of memory.
Control Bus
- Carries control signals that tell whether the CPU is reading, writing, or performing some other operation.
- Works in half-duplex mode, i.e. send read/write signal into devices as well as devices can send single back as well. One at time of course.
- For example: “Read from memory” or “Write to memory”.

Together, these buses allow the CPU to interact with memory step by step:

CPU places the memory address on the address bus.
CPU signals whether it wants to read or write on the control bus.
The actual data travels over the data bus.

Logical Address and Physical Address

Earlier, when we talked about Segment Registers, there we learned segmented memory model: which is just dividing memory into different parts such as Code, Data, Stack, and Extra. Each of these segments has an address that is stored inside a segment register (CS, DS, SS, ES). At that point, we only discussed their role in program organization.

Now we go a step deeper. Memory is simply a collection of storage cells. Each cell has a unique number, which we call an Address. The CPU uses these addresses to read or write data in memory.

Registers store data directly, but memory is much larger than registers. To work with memory, we don’t refer to the content directly — instead, we refer to it through its address. That is why addressing becomes so important.

The Addressing Problem

The 8086 CPU was designed with a 20-bit address bus. That means it could generate addresses from 00000H up to FFFFFH, giving access to 1 MB of memory.

H means Hexadecimal notation

But here is the issue:

Registers inside the CPU (including segment registers) are only 16 bits wide.
A 16-bit value can represent only 0 to FFFFh, which is 64 KB.

So how can a CPU with only 16-bit registers handle memory that needs 20-bit addresses?

The solution is the segmented memory model, which we already introduced. The CPU does not store the full 20-bit address in a single register. Instead, it breaks it into two parts:

Segment Register (CS, DS, SS, ES) → holds the upper portion.
Offset (like IP, SI, DI, BP, SP) → holds the lower portion.

Together, they form what we call a Logical Address:

Segment : Offset

But the memory hardware cannot work with segment:offset directly. Inside the CPU, this logical address is converted into a single Physical Address (the actual 20-bit number sent out on the address bus).

The conversion is done by the formula:

Physical Address = (Segment × 10h) + Offset

This way, even though the registers are only 16 bits wide, the CPU can still form 20-bit addresses and reach the entire 1 MB memory space.

Now we look into a couple of exercises to convert logical addresses into physical addresses. The objective is to get ourselves familiar with these concepts. The conversion I’m going to share is a trick to solve it that I learned, but the actual conversion is slightly different. However, rest assured we get the same results in both scenarios.

Numerical Problems

Problem 1: If we have a Code segment placed at the address 1234H, and the Instruction Pointer is at the address 0022H, find the physical address.

Solution

To convert a logical address into a physical address we have to learn a table of Decimal, Binary, and Hexadecimal numbers.

Decimal

Binary

Hexa-decimal

0000

0001

0010

0011

0100

0101

0110

0111

1000

1001

1010

1011

1100

1101

1110

1111

Above table is very important for our conversion. We have CS that points to 1234H and IP that points to 0022H. We first convert them into binary form. To do that we follow these steps.

Step 1

In the first step we make both the segment address and the offset into 5 hex digits, because physical addresses are 20 bits (5 hex digits).

Right now CS = 1234H, so in the segment register we write an extra 0 at the end of the number, which makes it look like this: CS = 12340H.

Now for the Offset part we have IP = 0022H. For that we add an extra zero at the start of the number, which makes it IP = 00022H.

Result:

CS = 12340H

IP = 00022H

After that our next step is to convert each digit in both (segment & offset) into binary numbers shown in the table.

Step 2

CS = 0001 0010 0011 0100 0000 (Binary)

IP = 0000 0000 0000 0010 0010 (Binary)

In the next step we add these digits using binary addition. I’m not going deeper into how to do binary addition; you can look it up separately. Here I’m just writing the results.

CS        = 0001 0010 0011 0100 0000
IP        = 0000 0000 0000 0010 0010
-------------------------------------
Phy       = 0001 0010 0011 0110 0010      (Binary)
-------------------------------------

So our physical address in binary is Phy = 0001 0010 0011 0110 0010. Converting it back to Hex number using the table above we get Phy = 12361H.

Our final answer is:

CS = 12340H

IP = 00022H

Physical = 12361H

Do you notice that if we don’t add an extra digit in CS or IP, their binary digits are 16 bits, but after we add that it becomes 20 bits. Also notice our physical address we calculated above — it is 20 bits in binary.

Problem 2: What would be the offset to map the physical address location 003C3H if the contents of the segment register are 003AH.

Solution

This problem is a little bit different from the previous one. Here we are given the physical address (003C3H) and the segment address (003AH). We are asked to find the offset address.

Things are almost the same as before, if you remember the formula we used previously:

Physical = (Segment × 10H) + Offset

From this formula, since we already have the physical and segment addresses, we need to find the offset. To do that, we rearrange the equation by moving the (Segment × 10H) part to the other side, which makes the formula:

Offset = Physical - (Segment × 10H)

Now we just substitute the given values into the equation and calculate the offset address.

Phy = 003C3H
Seg = 003AH

As we know, we have to make the segment address 5 digits. Just like in the previous problem, we add an extra 0 at the end of the segment address, which makes it 003A0H.

Note: The same logic applies if we are given the physical and offset addresses but need to find the segment. In that case, we add an extra 0 at the start of the offset address.

Now, like we did before, use the table from the previous problem to convert both of the above addresses into binary digits.

Phy = 0000 0000 0011 1100 0011
Seg = 0000 0000 0011 1010 0000

Next, according to our new equation, we subtract both of these values. Again, I’m just writing the result here. If you don’t know how binary subtraction works, please take time to learn it first.

Phy        = 0000 0000 0011 1100 0011
Seg        = 0000 0000 0011 1010 0000
--------------------------------------
Offset     = 0000 0000 0000 0010 0011
--------------------------------------

Our result in binary is Offset = 0000 0000 0000 0010 0011. Converting this back to hexadecimal, it becomes 00023H.

So our final offset address is:

Offset = 00023H

Note:

The official formula to calculate a physical address is

Physical = (Segment × 10H) + Offset.

In these notes I’m using a shortcut: adding a 0 at the end of the segment and padding the offset with a leading 0. This makes the manual process shorter, but remember the real hardware method is bit shifting (which is what multiplying by 10H actually does).

I give you the following example problem that you have to solved at your own. Which you can if you follow what I have teach you above.

Homework Problem: Given a physical address 1A2B0H and an offset 00B0H, find the segment address value.

And just like that, we're at the end of this section! We've covered a lot of ground, and having my water bottle has really helped me stay focused and hydrated. Don't you feel the same way? If you haven't already, grab your water bottle now—it's not too late. Let's get hydrated and move on to the next section!

If you made it this far, that’s awesome. Let me give you some context. Up till now, whatever we covered, I call it our warmup sections. Remember the very first heading — Warmup? Yeah, that wasn’t random. The reason I called it warmup was to give you the bigger picture first: show you some snippets, walk you through the structure of a program, explain registers, memory, addressing, and all that stuff.

Why did I do that? Because I know if I had just jumped straight into writing code after code and only later explained registers or memory addresses, you would have gone like: “Oh my God, this is too much, it’s boring, I can’t keep up anymore.” Well… you might still say that. But hey, at least now you’ve got the context.

So here’s the deal: from this point onward, we officially end our Warmup section and enter what I call the Assembly Nerd Section. This is where we dive into the real thing — learning the different ways to access memory, understanding instructions like mov, add, loop, figuring out how to assign variables, and much more.

Fair Warning: From here on, it’s nerdy knowledge ahead. Water bottle recommended — stay hydrated because things are about to get heavier.

PreviousLet’s Warmup NextAssembly Nerds

Last updated 3 months ago

hashtag1. General Purpose Registers

hashtag2. Index Registers

hashtag3. Segment Registers

hashtag4. Instruction Pointer Register

hashtag5. Stack Registers

hashtag6. Flag Registers

hashtagWhat is Memory?

hashtagSystem Buses

hashtagLogical Address and Physical Address

hashtagThe Addressing Problem

hashtagNumerical Problems