Situation:
Page faults occur when the processor translates virtual addresses to physical addresses and discovers that the page tables do not provide a translation.
The (internal) LDPTE instruction will the return zero. Currently the implementation (mmix-pipe) will just check the three premission bits, beeing zero as well, and depending on whether it was a load, a store, or an instruction fetch, will raise the read write or execute bit of the "program" bits in rQ.
This causes a dynamic trap with ropcode 0, the "program" bits of rQ that indicate the problem set in rXX and the instruction itself (could be anything since an ADD $3,$2,1 can cause a page fault if it turns the register $3 from margiinal to local causing the register stack to spill registers to memory) in the lower half of rXX. Then a Jump to rTT occurs.
So far so good. What can the dynamic TRAP handler do now?
It can use $255, which is backed up in rBB and contains a Copy of rJ, as long as it does not use a PUSHJ (overwriting rJ). The handler can not use other registers, all of them, locals and globals, might be in use by the application. Nor can it use the special registers; the page fault might be inside a TRIP, and the TRAP registers just got fresh values through the page fault. Just working with $255 alone is not very attractive, but possible.
The application can check if $0 is a local register, then load $255 with a fixed location and store $0 there. From there on we have two registers. In case $0 is marginal, we can use a global register for the same purpose. If there are no local and no global registers (besides $255) we can make some registers (e.g. $254) global. Consider, however, that all this is just the beginning of the general purpose dynamic trap handler, that is invoked for each end every interrupt. May be it is completely save to use a PUSHJ and then use all the local registers - but at that point it could also be a page fault and we dont know yet.
The use ofmarginal registers, best by using a PUSHJ $255, YZ is the most attractive possibility.
But what if the page fault was caused by moving a value from the register stack to M8[rS]? This is a write operation to the stack segment and if the RAM used in the stack segment grows past the allocated page boundary---page fault.
First a note on POP: The situation with POP is not critical. A POP will issue decrease gamma Instructions as soon as rO reaches rS. If one of the load instructions causes a page fault, a trap handling routine can use a PUSHJ to increase the stack. When the Trap is called, rS will be at the very bottom of a page, and the operating system can use it without problems to swap out registers. At the time the operating system returns, of course the current page and the one below it should be mapped.
Now we return to the PUSHJ. In this case, if the operating system uses a PUSHJ $255,YZ it will try to write rL to the register stack and since there were no available marginal registers in the first place, this will again cause a spill of registers to memory, to a virtual location that still has no physical counterpart...page fault again.
The operating system goes into an infinite loop.
The problem is really bad. Almost any instruction, e.g. ADD $10,$1,$0, might cause a page fault if it promotes a marginal register to a local one and causes the register stack to spill to memory. Even the instructions inside a trap handler are not protected against this. Nested traps, however, require the saving of the special registers rWW, rXX, ...
This raises the question how to save these registers without risking a page fault. Or how to switch to a operating system owned stack (by changing rS) that is sure not to cause a page fault because it is kept big enough. So far no good solution came to my mind. If I had an idea, I started to dislike it after some hours or a day (the most). Do I just miss the simple solution?
Problem
At the beginning of a forced trap, interrupts are disabled. Hence, we have to avoid page faults because we can not handle them. Before we have found out if we have a page fault and if so where, what instructions are possible? All instructions that do not involve user memory access. Unfortunately due to the register stack, simple instructions like SET $3,5 can involve memory access since this might turn a marginal register $3 into a local register $3 and this might cause some hidden registers to spill to memory.
There is one free register, $255, which is saved to rBB and contains a copy of rJ. So you can try to get by with $255 only. A PUSHJ is not possible because it involves a memory access.
(Still all the operating systems I wrote so far do just that, use a PUSHJ.)
Proposal 1:
The $255 can be used to store away registers by loading it with a physical address in RAM and storing registers $0 to $31 there, then using them, and restoring afterwards.
rO is the memory address of $0. So if we make an address translation for rO each time we change it, and generate a TRAP interrupt if we can not write to rO, we can safely disable interrupts and still use all the physically available local registers $0 up to $254 since the hidden registers can be swapped to memory without problems. The operating system must make sure that the stack segment of a running process is completely mapped and loaded up to rO before resuming or starting a process.
This idea has still one flaw. If the user Program uses all the registers up to $254 as local Registers, using a PUSHJ will hide these and move the value 254 just above the former register $254 on the register stack. rL is now zero. Assuming a register Ring with 256 registers, this will swap out all the registers hidden before, which is possible, since the Stack is mapped to memory up to rO. Unfortunately now there is no more mapped memory in the stack segment and the register ring is filled completely. So the first access to e.g. $0 will need further register spills to memory.
To correct his problem, I propose a restriction on rL similar to the existing restriction on rG.
As rG must not be smaller than 32., rL must not be larger that 256-32. Instead of one register, $255, that is always global, we have reserved 32 register to be global (or marginal). In practice this should not be a serious restriction. Even medium sized Applications will usually require 32 global registers. This restriction will give us enough room for the implementation of simple TRAP Handlers. We can use a PUSHJ to push down the local registers of the application, swapping hidden registers to memory up to rO, and still have available at least registers $0 to $30 (31 local registers) without the need to swap into stack space beyond rO.
Of course it might increment rO into non allocated memory, but as long as rO is not used, that's OK. I silently assume that page faults are not enabled while inside the operating system. This poses more problems.
If page faults are not enabled inside the Operating system, it requires constant checking before loading or storing data to user space. E.g. implementing a TRAP 0,Fputs,StdOut, requires reading characters at address $255. This is not desirable.
TRAP handlers that need to access user space should then store away rWW, possible also rXX, rYY, rZZ, and rBB, reenable Page faults, and then go about their business.
TRAP handlers that need more than the 30 local registers or TRAP handlers that use SAVE should enable page faults as well.
Proposal 2
by Nils Asmussen
Thanks for the notice about the problem with page-faults during stack-extension! But I had already discovered that a few weeks ago and, recently, I've implemented a solution for it in my simulator.
Perhaps it's interesting for you to know how my solution works:
The basic idea is to give the kernel a different stack than the user-applications. To achieve that I've added a new special register, rSS, that holds the address of the "kernel-stack". The kernel will setup that register for every process individually. Additionally I've extended the instructions SAVE and UNSAVE. SAVE specifies in the Z-operand (1 instead of 0) whether the stack should be switched, and UNSAVE specifies that in the X-operand. If SAVE should switch the stack, it backups rS and rO first, changes rS and rO to start at rSS and writes all values to save on the new stack including the backup of rS and rO. Of course, UNSAVE does the opposite: it loads all values from the current stack, including rS and rO, and restores rS and rO appropriately to continue on the "old" stack. Of course, in contrast to the default version we have to load all values back into the local registers (including the values that had been pushed down but not yet stored on the stack before the SAVE) from the kernel-stack because otherwise they would be lost. To be able to allow nested interrupts, the switch in SAVE is done only if its specified by Z and rS has not set the sign-bit (so, it forces the kernel to put a privileged address in rSS). It stores in the rA/rG-value on the stack whether the switch has been done, so that the corresponding UNSAVE knows it. So, the kernel should do a SAVE $255,1 at the beginning of the interrupt-handler and an UNSAVE 1,$255 at the end. It can still use the default version of SAVE and UNSAVE to switch between processes "in the kernel". This way, the kernel has a different stack and we won't store anything on the user-stack when an exception occurs, so that exceptions because of a stack-overflow in a user-application are no problem. Of course, the kernel has to take care that the kernel-stack is big enough.
It sounds pretty complicated, but I think from the user-perspective itâs very clear and easy to use and - especially nice - its completely downward compatible. That means, all programs that run on the "old" MMIX will run on GIMMIX (that has this "feature").
Proposal 3
by Donald E. Knuth
I think that the simplest way to solve the "infinite loop on stack overflow" problem with comparatively few changes to the present MMIX-PIPE might be something like the following:
NEVER INTERRUPT when storing stack entries at the time rS increases. Instead, if there is any page fault, do the store into a fixed preallocated page within the physical memory. Then cause some kind of interrupt on the next "real" instruction (i.e., the instruction that triggered the change or sequence of changes to rS).
In this way no data is lost, and the OS has time to fix the page tables and shuffle everything around. Or to know that the problem is hopeless without losing control due to infinite looping.
I know this means that a "really tiny OS" has to either be larger, or more strict in what it allows users to do. But it is something that I think is adequate for the time being.
Proposal 4
by Martin Ruckert
Page faults are not faults, they are regular and common occurences. They should be handled nicely.
It could be a good idea to have an rBBB, rWWW, rXXX and rZZZ register set aside for page faults and allow page faults by default to interrupt even the operating system dispatching them to a convenient offset of rT. It would make programming definitely simpler. Masking the extra bit in rK for the case that page faults are not desired is then needed only at a few places.
The only critical case to handle is a growing stack. It should be possi ble to detect this situation easily and fix it. E.g. load the address of a preallocated and zeroed page in case of a write fault, insert it into the page table. Then the more lengthy code could start to do all the other things handling a page fault. I will think abot how this could look like.
|