|
Content
|
Author: Martin Ruckert
Situation:
The problem of virtualizing is the transition from
an Application running on a true OS, running on real Hardware, to an
Application running on a Virtual OS, running itself on an OS, making it
look like the Virtual OS would run directly on the real (or virtual) hardware.
There are two separate issues to consider:
- The interface between Application and OS must be replaced by an interface between application and virtual OS.
- The virtual OS must run as an application in the host OS as if it would run directly on the hardware.
Finally, MMIX supporting virtualization, implies
- The necessary overhead should be minimal.
Achieving 1. with MMIX is not very difficult. The interface is limited to forced and dynamic TRAPS and RESUME 1. With some help by the host operating system, TRAPS from the application can lead to a RESUME 1 into the Virtual OS, just as if the TRAP had directly been routed there.
A RESUME 1 by the Virtual OS will TRAP by itself, since RESUME 1 is reserved for the host OS. The host OS can redirect the attempted RESUME 1 to the application.
Achieving 2. with MMIX requires some changes:
a) The virtual OS needs to run in the negative address space. This is currently not possible and requires an extension to the handling of virtual addresses.
b) The virtual OS will write to special registers (rI, rK, rQ, rT, rU, rV, and rTT) with the permission bit in rK set and it will change other special registers (rBB, rWW, rXX, rYY, rZZ) unaware that these might change any time due to a dynamic trap. It will further read these registers and expect the values it has written there.
c) The virtual OS wants to organize the page tables for the application.
d) The virtual OS expects dynamic traps to occur when e.g. the virtual devices trigger the respective interrupts.
Proposal:
a) Negative addresses, while not normally used, are allowed for applications and are considered virtual addresses in a separate segment, if the protection bit in rK is set.
Small detail regarding achieving 1 (see above): the address where the RESUME 1 must jump to is in the guest OS address space; so it is most probably a virtual negative address. So first rK must be set from $255, setting the protection bit, and only then the address in rWW is used to fetch the instruction, translating rWW using rV, even if rWW is negative.
The page tables need to accommodate another segment. The register rV currently stores in the most significant two byte four 4/bit nibbles b1, b2, b3, and b4. bi specifies the offset from the page table root, where the page tables for segment I starts. The number of supported indirection levels for each segment is implicitly given by b(i+1)-bi. This definition has already the disadvantage that certain configurations in the user space can not be realized. An application that wants access (even if infrequently) to one page at the very end of each segment will need 5 root pages for each segment. This yields a value of 20 for b4 which will not fit into 4-bit.
I propose to use five 3-bit numbers c0, c1,c2,c3,c4 instead of b1, b2, b3, and b4. This leaves one extra bit (for future use). The number ci gives the number of indirection levels or root pages for segment i. The offset from the start of the page table is given only indirectly by the sum of the preceding ci's. Values of ci greater than 5 (possible 6 and 7) would allocate extra root pages which are just nor used. A value of zero would indicate that the corresponding segment is not mapped at all. For example c4 (for the negative segment) would typically be zero. To be compatible with the present setting and to allow for very simple page tables, a value of c0=c1=c2=c3=c4=0 could be allowed to represent a single page table entry at the root location, that is shared for all segments (including the negative segment). This setting requires a different handling of values in rV, and of negative addresses when the protection bit is set. As long as c4 is 0, applications still do not have access to negative virtual addresses. Instead of a protection fault a page fault is issued.
b) A virtual machine monitor build into the host OS can allow the virtual guest OS to write its own copies of the protected special registers: When the PUT instruction issues a protection fault, the host OS gets control and can store the value intended for the special register in RAM. To allow also reading these private copies, also the corresponding GET instruction should cause a trap (preferably with ropcode=2). The virtual machine monitor in the host OS can then retrieve the private copy and return it. This mechanism should be extended to registers that currently do not cause a protection fault (rWW, rXX, rYY, rZZ) because these might change unexpectedly due to a dynamic trap and hence the host OS needs to maintain private copies. The present mode of operation, where these registers are readable and writable by the application can easily be restored by an OS that installs an appropriate TRAP handler moving the values given by the application in and out of the special registers.
c) While the virtual OS will maintain the page tables for the application, the host OS needs ultimate control over the page tables. Two mappings are involved in this process: The host OS will use mapping f1 to map the guest OS addresses to physical addresses and the guest OS will establish a mapping f2 to map application addresses to its own virtual addresses. To be efficient, the host OS must compute the composition of f1 and f2 and run the application with page tables representing this composition.
Because changes the guest OS performs on rV will TRAP, the host OS can always know where the page tables for f2 are located. It can make the entries in this virtual page table read only for the guest OS. Any attempt to write to these tables by the guest OS will than trap informing the host OS about changes in f2. The host OS can complete the write and propagate changes to the real page tables for the application.
d) A dynamic trap in the host OS might occur at three locations: inside the host OS, inside the guest OS or inside the application. In any case control is transferred to the host OS and control can return normally to the location before the interrupt. There is one special case. The interrupt might lead to a virtual interrupt that the host OS need to generate for the guest OS. The host OS can then set the appropriate bit in the guest OS private copy of rQ and depending on the private rK transfer control to the guest OS trap handler (according to its private copy of rT).
Achieving 3. with MMIX requires fast forwarding of traps and updates to page tables. All the rest is probably not too relevant for overall performance.
To forward a TRAP from the application to the guest OS requires:
- No Dispatching required because all TRAPS are forwarded.
- Saving rBB, rWW, rXX, rYY, rZZ to private copies.
- Changing rV to match the guest OS page tables (changes only c4)
- Changing rT and rTT to handle now guest OS interrupts.
- Establish rK suitable for the guest OS (might be different from rK for the application)
- Jump to the private rT using a RESUME 1.
Example:
SETH $255,#8000 OS RAM
ORMH $255,#0001 OS RAM
LDO $255,$255,CurrentTask get address of task control block
STO $0,$255,#00 scratch
GET $0,rBB store private copies
STO $0,$255,#08
GET $0,rWW
STO $0,$255,#10
GET $0,rXX
STO $0,$255,#18
GET $0,rYY
STO $0,$255,#20
GET $0,rZZ
STO $0,$255,#28
GET $0,rYY
LDO $0,$255,#30
PUT rV,$0 load runtime environment for guest OS
LDO $0,$255,#38
PUT rT,$0
LDO $0,$255,#40
PUT rTT,$0
LDO $0,$255,#48 private rT
PUT rWW,$0 jump there on resume
LDO $0,$255,#00 restore $0
LDO $255,$255,#50 this will become rK for guest OS
RESUME 1
26 instructions
To forward a RESUME 1 from the guest OS to the application requires:
- restoring rBB, rWW, rXX, rYY, rZZ from private copies.
- Changing rV to match the applications page tables (set c4 to 0)
- Changing rT and rTT to handle now application interrupts.
- Establish rK suitable for the application
- Jump to the restored rWW using a RESUME 1.
A TRAP from the guest OS is typically due to a protection violation in a PUT or GET. Handling such a TRAP requires:
- Dispatching to the proper handler depending on the opcode in rXX and ropcode.
- PUT: Storing the value in rZZ to the private special register copy depending on the X field in rXX.
- GET: Loading the value from the private special register copy depending on the Z field in rXX to rZZ
- Resuming with ropcode=2 to store rZZ in the correct target register.
Another source of TRAPS from the guest OS are store operations to the applications page tables. Handling requires:
- Dispatching to the proper handler depending on the rQ bit (no write access).
- Extracting the virtual physical address from the PTE or PTP.
- Translating the virtual physical address using the page table of the guest OS into a physical address already assigned to the guest OS.
- Construction of a new real PTE or PTP containing a true physical address
- Storing the virtual PTE/PTP into the virtual page table
- Storing the new real PTE/PTP in the real page table of the application.
- Storing the new real PTE/PTP in the real page table of the guest OS.
- Resuming.
Application and guest OS actually share the applications page table. The guest OS just has an extended version containing mappings for negative addresses. So sharing should be possible. If a guest OS runs several applications, all these applications have different tables for the non negative addresses. More sharing would be possible, if the root for the page table covering the negative addresses would be located independent of the regular page table. It would just need another special register with at least 27 bit. Then the rV register could remain as it is now.
Other TRAPs from the guest OS are possible, because the guest OS can save away its special registers and use TRAP instructions. These must be forwarded back to the guest OS in the same way as TRAPS from the application. There is just the additional need to dispatch the TRAP to the proper handler.
So all in all, it is mostly a few load and store operations. (if only we had one more register available at the time a trap occurs, since two registers would be enough for store operations).
Further Considerations
Ideas for efficiency
- have some sort of trap dispatch implemented in hardware.
- Have one additional register freed ( $254->rBBB ?) to allow simple handlers do load/store operations with minimal overhead.
-
Discussion
mail comments to ruckert@cs.hm.edu
|