Memory Management: How the program is loaded elegantly into the memory

var a [] int64
for I: = 0; I <= 1000000; i ++ {
if i%2 == 0 {
a = append (a, i) // a is continuously increasing

The following article comes from pretending to be programming, Master Kang

Memory is a more important resource in a computer. Its main function is to solve the gap between the speed between the CPU and the disk. Its memory cannot be infinite. We know that the code we wrote is eventually loaded from the disk to the memory, and then executed by the CPU. I don’t know if you thought about it, why some large games are as large as 10 G, but you can in a computer with only 8G memory, but you can have only 8G memory computers. Running? Even during the game, we can also talk about WeChat and listening to music … So so many processes are running at the same time. How are they managed in memory? Let’s take these questions to look at the computing system memory management.


Memory exchange technology


If our memory can be infinitely large, then the problems we worry about will not exist, but the actual situation is that often the process of running multiple processes on our machines is too small to require tens of mega memory, which may require hundreds of hundreds of might need to be hundreds of might. It is impossible when many of these processes want to load them at the same time, but from the perspective of our users, it seems that these processes are indeed running. What is going on?

This is introduced into the new one


Exchange technology

In terms of literally, I think you should guess that it will exchanges the process in a certain memory. When our process is idle, other processes need to be running, but unfortunately, there is no enough memory space at this time. What should I do at this time? It seems that the idle process just has a feeling of occupying the pit and not shit, so you can exchange this idle process from the memory to the disk. At this time When the free process that needs to be running needs to be running, then it will be exchanged into the memory again. Through this technology, the limited memory space can run more processes, and the processes are constantly switching back and forth. It seems that it can run.

As shown in the figure, the process A was replaced in memory at the beginning. Fortunately, there are more remaining memory space, and then the process B is also replaced in memory, but there are fewer remaining space. At this time, the process C wants to be changed to be changed. Enter the memory, but it is found that there is not enough space. At this time, the process A that has been running for a period of time will be changed to the disk, and then the process C is transferred.

Memory fragmentation

Through this exchange technology, the replacement and replacement process of alternation can achieve more processes that can run for small memory, but this seems to have some problems. I don’t know if you found it. After the process C is replaced, in the process There are smaller memory spaces between process B and process C, and there are smaller memory space above the process B. To be honest, these small spaces may never load the corresponding size of the corresponding size, then they will be wasted. In some cases, more of this memory fragment may be produced.

If you want to save memory, you have to use it

Compact memory

The technology is moved down all the processes, so that all fragments will be connected together into a larger continuous memory space.

However, this mobile overhead is basically proportional to the active process in the current memory. According to statistics, a 16G memory computer can copy 8 bytes every 8NS. Operation, because the CPU it consumes is relatively large.

Dynamic growth


In fact, the process loading mentioned above is ideal. Normally, when a process is created or replaced, how much space it takes up will be distributed, but if our process is required for dynamic growth, the space needs is dynamic growth. , Then it is troublesome, for example, our program is running during operation

for loop

It may use a temporary variable to store the target data (for example, the following variable A, as the program is running, it will become larger and larger):

When you need to grow:

If the neighbors of the process are

Free zone

Fortunately, you can allocate the free zone to the process

If the neighbors of the process are another process, the solution can only move the growth process into a larger idle memory, but in case there is no larger memory space, then it must be triggered and replaced, one or more more This process is replaced to provide more memory space, which is obvious that this overhead is not small.


In order to solve the problem of dynamic growth of process space, we can give some more space in advance, such as 10M of the process itself, we give 2M more, so that when the process increases, we can use these 2M spaces. Of course If it is not enough, you still have to trigger the same movement and replace the logic.

How to manage idle memory

Earlier, when we talk about the exchange technology of memory, the purpose of exchange technology is to make free memory, so how do we know whether a piece of memory has been used or idle? Therefore, a set of mechanisms are needed to distinguish the idle memory and the use of memory. There are two ways of operating systems for memory management:

Drawing method




Let’s talk about the graphic method first. Yes, the position method uses a comparison method to manage our memory. Each memory has a position. We use a comparison to express:


If a piece of memory is used, then the special is



If a piece of memory is idle, then the special is


The specific memory here depends on how the operating system is managed. It may be a byte, several bytes, or even thousands of bytes, but these are not the point. The point is that we must know that the memory has been divided like this. Essence

The advantage of the positioning method is that it is clear and clear. The state of a memory block can be quickly known through the bitmap, because its time complexity is O (1). Of course, its disadvantages are also obvious, that is, it needs to take up too much space. , Especially the smaller the management memory block. To make matters worse, the space distribution is not necessarily an integer multiple of memory blocks, so there must be waste in the last memory block.

As shown in the figure, both the last memory block occupied by the process A and the process B, then the other part of the last memory block must be wasted.

Compared with the positioning method, the chain method is more reasonable to use the space. I believe you should have guessed. The simple understanding of the linked list is to connect the use of the linked in the linked list of the use of the use. In terms of, he should have the following characteristics:

You should know whether each node is idle or used

Each node should know the beginning and end address of the memory of the current node

For these characteristics, the linked list node corresponding to the final memory is probably:


The memory space corresponding to this node is used,


The memory space corresponding to this node is idle,


The start address of this memory space,


It represents the length of this memory, and finally there are Pre and Next pointed to the neighbor nodes


Therefore, for a process, there are four combinations with neighbors:

Its front and rear nodes are not idle


Its first node is idle, and its latter node is also idle

Its first node is idle, and the latter node is free


Its front and rear nodes are idle

When a memory node is replaced or the process is over, then its corresponding memory is idle. At this time, if its neighbor is also idle, it will merge, that is, two idle memory blocks


Become a big free memory block.

OK, we managed our memory through the linked list. Next is how to find a suitable memory space from the linked list when creating a process or replacement from a disk to a process?

The first adaptation algorithm

In fact, the easiest way to find free memory space is to find along the linked list


To meet the nodes that need memory size, if the first free memory block found is the same size as the memory space we need, then it is directly used, but this is too ideal. Most of the reality may be the first goal to be found The memory block is larger than the memory space we need. At this time, this free memory space will be divided into two pieces.

A process that requires a 3M memory will split 4M space into 3M and 1m.

Next time adaptation algorithm


It is very similar to the first adaptation algorithm. After finding the target memory block, the position will be recorded. In this way, when you need to find the memory block again, you will start from this position without having to start from the head node of the linked list. This algorithm exists The problem is that if there is a suitable memory block before the labeling position, it will be skipped.

A process that requires a 2M memory, find the right space at the 5th position. The next time you need this 1M memory will start from the position of 5, and then you will find the right space at the 7th position, but you will skip the 1 position 1 position. Essence

Best suitable algorithm

Compared to the first adaptation algorithm, the difference between the best algorithm is: not to stop when you find the first suitable memory block, but will continue to look back, and you may have to retrieve the tail of the linked list each time, because it must have Find the most suitable memory block, what is the most suitable memory block? If the size is the same, it must be the most suitable. If there is no consistency, the smallest memory block that can tolerate the process is the most suitable. It can be seen that the average retrieval time of the best algorithm is relatively required. Slow, at the same time, it may cause a lot of small pieces.

Assuming that the process now needs 2M memory, the best algorithm will continue to retrieve it backward after retrieved to the position 3 (3M), and finally select the free memory block at the No. 5 position.


The worst suitable algorithm

We know that the best meaning in the best algorithm is to find a free memory block that is closest to the real size, but this will cause many small pieces. These small fragments are generally a high probability that if there is no memory compact, then there may be a high probability It is wasted. In order to avoid this, this worst -appropriate algorithm appears. This algorithm is reversed with the best algorithm. In theory, the remaining free zone is also relatively large. The memory fragments will not be so small, and they can be reused.

A process of 1.5m is needed. In the case of worst and appropriate algorithms, it will not choose No. 3 (2M) memory idle blocks, but will choose larger 5 (3M) memory empty blocks.

Quick adaptation algorithm

The above algorithms have a common characteristic: free memory blocks and memory blocks used are shared linked lists. What is the problem? Normally, I want to find a idle block. I don’t need to retrieve the memory blocks that have been used, so if you can separate the existing and unused, and then use two linked lists to maintain The speed will be improved, and the nodes do not need P and M to mark the state. However, there are also disadvantages to separate. If the process is terminated or replaced, the corresponding memory block needs to be deleted from the available linked list and then added to the unused linked list. This overhead is slightly larger. Of course, if the unused linked list is sorted, the first adaptation algorithm is as fast as the best adaptation algorithm.

The fast adaptation algorithm is to use this feature. This algorithm will maintain a separate linked list for those commonly used free blocks, such as 4K idle linked lists, 8K idle linked lists … If you want to assign a 7K memory space, then then You can choose both 4K or 8K.

Its advantages are obvious. It will be fast when finding a free -sized free zone, but when a process terminates or is replaced, it will find its adjacent block to check whether it can be merged. This process is relatively slow. If it is not merged, it is not merged. Then there will also be a lot of small free zones, and they may not be able to use and cause waste.

Virtual memory: small memory operation large program

Maybe you see a large procedure of small memory running, because it is not mentioned above? As long as you replace the idle process, can you just change the process that needs to be running? Memory exchange technology seems to be solved. It should be noted here that first of all, the memory exchange technology needs to replace the process to the disk when the space is insufficient, and then switch to the new process from the disk. You may understand when you see the disk. slow. Second, do you find it,

The entire process is replaced

We know that the process is also composed of a piece of code, that is, many machine instructions. For memory exchange technology,

All instructions in a process are either in memory, or all of them do not enter the memory

Essence Do you think this is not normal to see this? Okay, don’t worry, let’s look down.

Later, a more powerful technology appeared:

Virtual Memory

Essence Its basic idea is that each program has its own address space, especially paying attention to its own address space, and then this space can be divided into multiple pieces.



(Page) or page, for these pages, their addresses are continuous, and their addresses are virtual, not real physical memory addresses. What should I do? The program runs to read the real physical memory address, don’t play with me, this requires a mapping mechanism, and then


When it appears, the full name of MMU is: Memory Managment Unit, that is, the memory management unit. Normally, when the CPU reads a certain memory address data, the corresponding address will be sent to the memory bus, but in the case of virtual memory, directly, directly It must not find the corresponding memory address on the memory bus. At this time, the CPU will tell MMU the virtual address and let the MMU help us find the corresponding memory address. Yes, MMU is a transit station for address conversion.

Drawing method



The advantage of the pagination of the program address is:


For procedures, you do not need to load all the instructions to the memory like memory exchange to run. You can run a page alone.

When a page of the process is not in memory, the CPU will execute other processes during the process of loading this page.

Of course, the virtual memory will be paged, and the corresponding physical memory will actually be paged, but the unit corresponding to the corresponding physical memory is called

Page frame

Essence The page and the page box are usually the same. Let’s take a look at an example. Assuming that the size of the page and page box at this time is 4K, then 64K virtual address space can get 64/4 = 16 virtual pages, and for 32K physical address space, you can get 32/4 = The 8 page frames are obviously not enough for the page box at this time. There are always some virtual pages that cannot find the corresponding page box.

Let’s first look at how the virtual address is 20500 corresponding physical address is found:

First of all, the virtual address 20500 corresponds to page 5 (20480-24575)

The start address on page 5 20480 Find 20 bytes backwards, which is the location of the virtual address

Page 5 corresponds to No. 3 physical page box

The starting address of the No. 3 physical page box is 12288, 12288+20 = 12308, that is, 12308 is our actual target physical address.

However, for the virtual address, there is a red area in the picture, and we also mentioned that there are always some virtual addresses that there is no corresponding page frame, that is, this part of the virtual address is no corresponding physical address. When the program accesss a unparalleled one, When the mapping virtual address (red area), then it will happen

Lack of page interruption

, Then the operating system will find a page box that has been rarely used recently to replace its content on the disk, and then read the page that has just been interrupted from the lack of page from the disk to the pages box just recycled, and finally modify the virtual address to the page to the page. The mapping of the box, and then restart the instructions that cause interruption.

In the end, we can find that the paging mechanism makes our program more delicate. The operation of the operation is the page, not the entire process, which greatly improves efficiency.


It is said that there is a mapping of virtual memory to physical memory. We know that it was done by MMU, but how did it achieve it? The easiest way is to have a structure similar to the hash table to view. For example

have [1] = 10

, But this is just positioning to the page box. The specific location has not been positioned, that is, the data similar to the offset is not. Don’t guess, let’s take a look at how MMU is done. In case of a 16 -bit virtual address, and the page and page boxes are 4K, MMU will treat the first 4 digits as an index, that is, Positioning to the serial number of the page box, the latter 12 bits are used as offset. Why is it 12 bits? It is very clever, because 2^12 = 4K, which is just a label to the data in each page box. Therefore, we only need to find the corresponding page box according to the first 4 digits, and then the offset is the rear 12 bits. Finding the page box is what we are going to say soon

Looking in, in addition to the page frame corresponding to the page, there is a logo position to indicate whether the corresponding page is mapped to the corresponding page box. The lack of page interruption is based on this logo bit.


It can be seen that the page table is very critical, not only to know the page frame and whether the page is missing.

Protecting position

As well as


Access position


High -speed cache prohibition position

Protection level: What type of access allows a page to allow a page. It is common to use three ratios.




Modifying bit: Sometimes it is also called dirty position. It is automatically set by the hardware. When a page is modified, that is, it is inconsistent with the data of the disk. You need to swipe the dirty page back to the disk. If the mark on this page is 0, it means that it is not modified, then you do n’t need to swipe back to the disk, just discard the data directly.

Access bit: When a page is read or writes, the access position of this page will be set to 1, indicating that it is being accessed. Its role is that when the lack of page interruption occurs, it is preferred by this logo. Select one or more pages in the page.

High -speed cache prohibited bit: For those pages that are mapping to the device register rather than conventional memory, this characteristic is very important. Add the operating system that is intensely waiting for a certain IO device to respond It is not very important to read a copy of a high -speed cache.

TLB fast watch accelerate access

Through the page table, we can realize the conversion of virtual addresses to physical addresses. However There are 2^20 = 1048576. Whether it is the size of the page table itself or the retrieval speed, this number is actually a bit large. If it is a 64 -bit virtual address, in this way, the page table item will be beyond imagination, let alone the most important thing is


Every process will have such a page table

We know that if we need to retrieve the page frames in the huge page each time, the efficiency must not be very high. And computer designers observe such a phenomenon:

Most programs always visit a small number of pages multiple times

If you can create a query page table for these pages that are often accessed, then the speed will be greatly improved. This is


Fast watch

The fast table will only contain a small amount of page table items, usually not more than 256, when we want to find a virtual address. First of all, you will find it in the fast table. If you can find it, you can return the corresponding page box directly. If you can’t find it, you will go to the page table to find, and then eliminate a table item from the fast table, replace it with the newly found page instead of it Essence

Overall, TLB is similar to a smaller page table cache, and it stores the recently visited pages, not all pages.

Multi -level page table

Although TLB can solve the problem of conversion speed to a certain extent, it does not solve the problem of the page table itself occupied too much space. In fact, we can think about it, do most programs use all pages? Actually not. The address space of a process in memory is generally divided into sections, data sections, and stack sections. The stack segment increases from high addresses to low address in the structure of memory, and the other two are increased from low address to high address.

It can be found that the middle part is empty, that is, this part of the address is not available. Then we do not need to introduce the memory address that is not used in the middle to introduce the page table. This is the idea of ​​multi -level page tables. Take 32 -bit address as an example. The latter 12 digits are offset. The top 20 can be disassembled into two 10 bits. We are called top -level pages and second -level pages. Each 10 bits can indicate 2^10 = 1024 tables. Items, so its structure is roughly as follows:

For top -level pages, the middle gray part is the memory space that has not been used. The top page table is like the number in front of our ID number, which can be positioned to which city or county we are. The amount of offset is like our house number and name. Through this segment, it can greatly reduce the space. Let’s look at a simple example:

If we do not disassemble the top -level page table and second -level pages, then the required page table items are 2^20. If we split, then 1 top page table+2^10 second -level pages table

The storage gap between the two obviously shows that it saves more space after splitting, which is the benefits of multi -level page tables.

Of course, our secondary level can also be disassembled into third, fourth, or even more. The more the number of levels, the greater the flexibility, but the more levels, the slower the retrieval. This needs to be paid attention to.

at last

In order to facilitate everyone’s understanding, this article drew 20 pictures. The liver has nearly 7,000 words. The creation is not easy.

“San Company”

It is the greatest support for the author and the greatest motivation for the author.

Objectively please stay

If you just learn C/C ++ and read the article are boring, you may wish to pay attention to the video tutorial of Xiaobian. It is easy to understand. It is easy to understand. A video only talks about one knowledge point. The video is not profound and does not need to be studied. You can watch it at buses, subways, and toilets.



As well as


As well as

As well as

As well as

As well as

As well as


Bi Ya Bi: RockPort RocSport Lite men’s lace -up business leather shoes $ 56

Bi Ya Bi: RockPort RocSport Lite men's lace -up business leather shoes $ 56 RockPort is a well -known American…

Dry goods | How much do you know about all kinds of skin care?

Dry goods | How much do you know about all kinds of skin care? Last time, I explained to you…

What is the effect of using activated carbon filters to use activated carbon filters?

What is the effect of using activated carbon filters to use activated carbon filters? Soft -water station usually uses sodium…

I want to spend the rest of my life in such a yard

I want to spend the rest of my life in such a yard On the occasion of the Spring Festival…