硬件文境解说
为了控制进程的执行,内核必须有能力挂起正在CPU上运行的进程,并恢复执行以前挂起的某个进程。这种行为被称为进程切换(process switch)、任务切换或者文境切换。To control the execution of processes, the kernel must be able to suspend the execution of the process running on the CPU and resume the execution of some other process previously suspended. This activity goes variously by the names process switch, task switch, or context switch.
- 硬件文境(Hardware Context)
尽管每个进程可以拥有属于自己的地址空间,但所有进程必须共享CPU寄存器。因此,在恢复一个进程之前,内核必须确保每个寄存器装入了挂起进程时的值。
While each process can have its own address space, all processes have to share the CPU registers. So before resuming the execution of a process, the kernel must ensure that each such register is loaded with the value it had when the process was suspended.
进程恢复执行前必须装入寄存器的一组数据称为硬件文境(Hardware Context)。硬件文境是进程执行文境的一个子集,因为执行文境包含进程执行时需要的所有信息。在Linux中,进程硬件文境的一部分存放在TSS段,而剩余部分存放在内核态堆栈中。
The set of data that must be loaded into the registers before the process resumes its execution on the CPU is called the hardware context . The hardware context is a subset of the process execution context, which includes all information needed for the process execution. In Linux, a part of the hardware context of a process is stored in the process descriptor, while the remaining part is saved in the Kernel Mode stack.
在下面的描述中,我们假定用prev局部变量表示切换出去的进程的描述符,next表示切换进来的进程的描述符。因此,我们把进程切换定义为这样的行为:保存prev硬件文境,用next硬件文境代替prev。因为进程切换经常发生,因此减少保存和装入硬件文境所花费的时间是非常重要的。
In the description that follows, we will assume the prev local variable refers to the process descriptor of the process being switched out and next refers to the one being switched in to replace it. We can thus define a process switch as the activity consisting of saving the hardware context of prev and replacing it with the hardware context of next. Because process switches occur quite often, it is important to minimize the time spent in saving and loading hardware contexts.
早期的Linux版本利用80x86体系结构所提供的硬件支持,并通过far jmp 指令跳到next进程的TSS段描述符的选择符执行进程切换。当执行这条指令时,CPU通过自动保存原来的硬件文境,装入新的硬件文境来执行硬件文境切换。但是基于一下原因, Linux 2.6使用软件执行进程切换:
Old versions of Linux took advantage of the hardware support offered by the 80x86 architecture and performed a process switch through a far jmp instruction to the selector of the Task State Segment Descriptor of the next process. While executing the instruction, the CPU performs a hardware context switch by automatically saving the old hardware context and loading a new one. But Linux 2.6 uses software to perform a process switch for the following reasons:
通过一组mov指令逐步切换能较好地控制被装入数据的合法性。尤其是,这使检查段寄存器的值成为可能,当用单独的far jmp指令时,不可能进行这类检查。
Step-by-step switching performed through a sequence of mov instructions allows better control over the validity of the data being loaded. In particular, it is possible to check the values of the ds and es segmentation registers, which might have been forged by a malicious user. This type of checking is not possible when using a single far jmp instruction.
旧方法和新方法所需时间大致相同。然而,尽管当前的切换代码还有改进的余地,却不能对硬件文境切换进行优化。
The amount of time required by the old approach and the new approach is about the same. However, it is not possible to optimize a hardware context switch, while there might be room for improving the current switching code.
进程切换只发生在内核态。在执行进程切换之前,用户态进程使用的所有寄存器内容都已保存,这也包括ss和esp这对寄存器的内容(存储用户态堆栈指针的地址)
Process switching occurs only in Kernel Mode. The contents of all registers used by a process in User Mode have already been saved on the Kernel Mode stack before performing process switching. This includes the contents of the ss and esp pair that specifies the User Mode stack pointer address.
- Task State Segment(每一CPU有只有一个TSS)
When an 80x86 CPU switches from User Mode to Kernel Mode, it fetches the address of the Kernel Mode stack from the TSS.
When a User Mode process attempts to access an I/O port by means of an in or out instruction, the CPU may need to access an I/O Permission Bitmap stored in the TSS to verify whether the process is allowed to address the port.
More precisely, when a process executes an in or out I/O instruction in User Mode, the control unit performs the following operations:
1.It checks the 2-bit IOPL field in the eflags register. If it is set to 3, the control unit executes the I/O instructions. Otherwise, it performs the next check.
2.It accesses the tr register to determine the current TSS, and thus the proper I/O Permission Bitmap.
3.It checks the bit of the I/O Permission Bitmap corresponding to the I/O port specified in the I/O instruction. If it is cleared, the instruction is executed; otherwise, the control unit raises a "General protection " exception.
tss_struct结构描述TSS的格式,init_tss数组为系统上每个不同的CPU存放一个TSS。在每次进程切换时,内核都更新TSS的某些字段以便相应的CPU控制单元可以安全地检索到它需要的信息。因此,虽然TSS反应当前在CPU上运行的进程的权限,但是没有必要在该进程不运行时为该进程保存整个TSS。
The tss_struct structure describes the format of the TSS. The init_tss array stores one TSS for each CPU on the system. At each process switch, the kernel updates some fields of the TSS so that the corresponding CPU's control unit may safely retrieve the information it needs. Thus, the TSS reflects the privilege of the current process on the CPU, but there is no need to maintain TSSs for processes when they're not running.
TSS有它自己的8字节的段描述符(TSSD).这个描述符包括指向TSS起始地址的32位base字段,20位limit字段。TSSD的s标志位被清0,以表示相应的TSS是系统段
Each TSS has its own 8-byte Task State Segment Descriptor (TSSD). This descriptor includes a 32-bit Base field that points to the TSS starting address and a 20-bit Limit field. The S flag of a TSSD is cleared to denote the fact that the corresponding TSS is a System Segment.
Type字段置为11或者9以表示这个字段实际上是一个TSS。在Intel的最初设计中,系统中的每个进程都应当有自己的TSS;tyep字段的第二个有效位叫做Busy位;如果进程正在被CPU执行,则该位为1,否则为0。在Linux设计中,每个CPU只有一个TSS。因此,Busy位总置为1。
The Type field is set to either 9 or 11 to denote that the segment is actually a TSS. In the Intel's original design, each process in the system should refer to its own TSS; the second least significant bit of the Type field is called the Busy bit; it is set to 1 if the process is being executed by a CPU, and to 0 otherwise. In Linux design, there is just one TSS for each CPU, so the Busy bit is always set to 1.
由Linux创建的TSSD存放在全局描述符表(GDT)中,GDT的基地址存放在每个CPU的gdrt寄存器中。每个CPU的tr寄存器包含TSS的TSSD选择符,也包含了两个隐藏的非编程字段:TSSD的Base字段和Limit字段。这样,处理器就能直接对TSS寻址而不用从GDT中检索TSS的地址
The TSSDs created by Linux are stored in the Global Descriptor Table (GDT), whose base address is stored in the gdtr register of each CPU. The tr register of each CPU contains the TSSD Selector of the corresponding TSS. The register also includes two hidden, nonprogrammable fields: the Base and Limit fields of the TSSD. In this way, the processor can address the TSS directly without having to retrieve the TSS address from the GDT.
structtss_struct{
unsignedshortback_link,__blh;
unsignedlongesp0;
unsignedshortss0,__ss0h;
unsignedlongesp1;
unsignedshortss1,__ss1h;
unsignedlongesp2;
unsignedshortss2,__ss2h;
unsignedlong__cr3;
unsignedlongeip;
unsignedlongeflags;
unsignedlongeax,ecx,edx,ebx;
unsignedlongesp;
unsignedlongebp;
unsignedlongesi;
unsignedlongedi;
unsignedshortes,__esh;
unsignedshortcs,__csh;
unsignedshortss,__ssh;
unsignedshortds,__dsh;
unsignedshortfs,__fsh;
unsignedshortgs,__gsh;
unsignedshortldt,__ldth;
unsignedshorttrace,io_bitmap_base;
unsignedlongio_bitmap[IO_BITMAP_LONGS+1];
unsignedlongio_bitmap_max;
structthread_struct*io_bitmap_owner;
unsignedlong__cacheline_filler[35];
unsignedlongstack[64];
}__attribute__((packed));
unsignedshortback_link,__blh;
unsignedlongesp0;
unsignedshortss0,__ss0h;
unsignedlongesp1;
unsignedshortss1,__ss1h;
unsignedlongesp2;
unsignedshortss2,__ss2h;
unsignedlong__cr3;
unsignedlongeip;
unsignedlongeflags;
unsignedlongeax,ecx,edx,ebx;
unsignedlongesp;
unsignedlongebp;
unsignedlongesi;
unsignedlongedi;
unsignedshortes,__esh;
unsignedshortcs,__csh;
unsignedshortss,__ssh;
unsignedshortds,__dsh;
unsignedshortfs,__fsh;
unsignedshortgs,__gsh;
unsignedshortldt,__ldth;
unsignedshorttrace,io_bitmap_base;
unsignedlongio_bitmap[IO_BITMAP_LONGS+1];
unsignedlongio_bitmap_max;
structthread_struct*io_bitmap_owner;
unsignedlong__cacheline_filler[35];
unsignedlongstack[64];
}__attribute__((packed));
- The thread field (每个进程都有一个thread)
...
struct thread_struct thread;
...
}
在每次进程切换时,被替换进程的硬件文境必须保存在别处。不能像Intel最初设计的那样把它保存在TSS中,因为Linux中每个CPU只有一个TSS段,而不是每个CPU都有一个。
At every process switch, the hardware context of the process being replaced must be saved somewhere. It cannot be saved on the TSS, as in the original Intel design, because Linux uses a single TSS for each processor, instead of one for every process.
因此,每个进程描述符包含一个类型为thread_struct的thread字段,只要进程被切换出去,内核就把其硬件文境保存在这个结构中。
Thus, each process descriptor includes a field called thread of type thread_struct, in which the kernel saves the hardware context whenever the process is being switched out.
As we'll see later, this data structure includes fields for most of the CPU registers, except the general-purpose registers such as eax, ebx, etc., which are stored in the Kernel Mode stack.
structthread_struct{
structdesc_structtls_array[GDT_ENTRY_TLS_ENTRIES];
unsignedlongesp0;
unsignedlongsysenter_cs;
unsignedlongeip;
unsignedlongesp;
unsignedlongfs;
unsignedlonggs;
unsignedlongdebugreg[8];
unsignedlongcr2,trap_no,error_code;
unioni387_unioni387;
structvm86_struct__user*vm86_info;
unsignedlongscreen_bitmap;
unsignedlongv86flags,v86mask,saved_esp0;
unsignedintsaved_fs,saved_gs;
unsignedlong*io_bitmap_ptr;
unsignedlongio_bitmap_max;
};
structdesc_structtls_array[GDT_ENTRY_TLS_ENTRIES];
unsignedlongesp0;
unsignedlongsysenter_cs;
unsignedlongeip;
unsignedlongesp;
unsignedlongfs;
unsignedlonggs;
unsignedlongdebugreg[8];
unsignedlongcr2,trap_no,error_code;
unioni387_unioni387;
structvm86_struct__user*vm86_info;
unsignedlongscreen_bitmap;
unsignedlongv86flags,v86mask,saved_esp0;
unsignedintsaved_fs,saved_gs;
unsignedlong*io_bitmap_ptr;
unsignedlongio_bitmap_max;
};
本文地址:http://www.45fan.com/dnjc/73705.html