MMIX Bug Report copy_block |
||
Content
|
MMIXware Versionmmix-20131017Bug ReportedInitial: 27/11/2014AuthorLaurent DesnoguesDescriptionThe function copy_block copies only dirty parts of a cache block.static void copy_block(cache *c, cacheblock *p, cache *cc, cacheblock *pp) { int j, jj, i, ii, lim; int off = p->tag & (cc->bb - 1); if (c->g != cc->g || p->tag - off != pp->tag) panic(confusion("copy block")); for (j = 0, jj = off >> c->g; j < c->bb >> c->g; j++, jj++) if (p->dirty[j]) { pp->dirty[jj] = true; for (i = j << (c->g - 3), ii = jj << (c->g - 3), lim = (j + 1) << (c->g - 3); i < lim; i++, ii++) pp->data[ii] = p->data[i]; } }This doesn't make sense and it indeed breaks in the following case:
Note that the tag check in copy_block won't fail due to this: case flush_to_S: ... case 3: if (Scache->filler.next) wait(1); p = alloc_slot(Scache, c->outbuf.tag); if (!p) wait(1); data->ptr_b = (void *) p; -> p->tag = c->outbuf.tag & (-Scache->bb); if (block_diff) { octa *d = p->data; p->data = Scache->inbuf.data; Scache->inbuf.data = d; } case 4: copy_block(c, &(c->outbuf), Scache, p);The line prefixed with '->' is making sure the tags will be the same when calling copy_block. Proposed PatchThe trivial fix to copy all of the 'dirty' and 'data' arrays doesn't work. To make my test pass I have to do the following:for (j = 0, jj = off >> c->g; j < c->bb >> c->g; j++, jj++) { pp->dirty[jj] |= p->dirty[j]; for (i = j << (c->g - 3), ii = jj << (c->g - 3), lim = (j + 1) << (c->g - 3); i < lim; i++, ii++) pp->data[ii] = p->data[i]; }That is 'or' pp and p dirty arrays. This is due to the fact that you can have in Scache a line that is dirty, though it's not dirty in the same places as in the Dcache. I'm not sure this is expected: a line in Scache can only be dirty after being evicted from Dcache; then later the line can go back to Dcache but I'd expect the dirty flags to be replicated from Scache to Dcache. In real CPU that'd be mostly mandatory (for instance to ensure coherency at L1 level). Anyway the code above fixes the issue I have seen. But I think it's slightly overkill and there might be better ways to fix it (for instance keep the old way and only do a full copy if the tags differed). My larger test still fails and I have to investigate, but it's becoming harder and harder. I'm afraid I can't share that test since it's from SPEC CPU2000. DiscussionLaurent Desnogues: Here is a new proposed fix: the idea uses the fact that alloc_slot will mark the clean lines in Scache as invalid; that information is saved when copying the tag in flush_to_S case 3; in copy_block this is used to force a replacement of the line. In terms of code this looks like this:
Further DiscussionThe copy_block bug manifests itself in the flush_to_S coroutine.There are four execution pathes through this coroutine which use the copy_block function and are potentially affected by this bug. After the coroutine accquires the Scache lock, there are four different execution pathes leading to copy_block:
Path 1: There is a target block with valid data and valid dirty bits. and a source block with valid dirty bits, and valid data in the dirty granules; the result is a block with valid dirty bits and valid data. Note that the implementation of the write_from_wbuf coroutine will write in state 3 (Handle write-around when writing to the Dcache) just one octabyte to the outbuf of Dcache, setting all dirty bits to zero except for the one octabyte written. This occurs only if the granule size is 8 (one octa), there is no hit in the Dcache, and the Dcache is in WRITE_AROUND mode or the instruction is STUNC (store-uncached). As a consequence, the Dcache outbuf will contain invalid data, except for one granule of 8 byte marked dirty. Hence, copy_block must not copy "clean" data; it might be invalid. Path 4: First all granules in the Scache outbuf are marked "clean", and then the dirty granules in the Dcache outbuf will be copied to the Scache outbuf. Later, flush_to_mem will write only dirty data to memory. The present implementation of copy_block is not correct for path 2 and 3: Path 2: After swapping the inbuf data with the data in the allocated slot in state 3, the newly allocated slot contains valid data. Unfortunately, the present code does not reset the dirty information, so we find the old dirty information in the newly allocated block. After copy_block, we will have valid data, but some granules might be marked dirty, where in fact these granules are clean. This will not cause the processor to malfunction, but might cause extra flushing to memory. Path 3: (This bug was found by Laurent Desnogues; his bug report started my investigation into this.) The alloc_slot function will try to find an available block in the Scache. If all candidates are in use, one of them is flushed to memory and without clearing data or dirty bits, this block is returned; only the sign bit in the tag is set to mark the block as invalid. The flush_to_S will set the new tag and will copy the dirty granules from the Dcache outbuf into the newly allocated block. If the outbuf was only partially dirty, some data from the old block content will remain. Proposal (Laurent): Keep the invalid bit in the newly allocated block in state 3 and augment copy_block as follows: check for a newly allocated block by testing the invalid bit and in this case copy all the data (including the "clean" data) and all the dirty bits. Unfortunately, this will not work in all cases, since the "clean" data in the Dcache outbuf might be invalid, for instance after a STUNC instruction (see note above). The bug in path 3 is difficult to fix. We have to distinguish two cases: either all data in the Dcache outbuf is valid or only the dirty granules contain valid data. In the first case, we do not need to load data from memory, in the second, loading data is necessary. The present flush_to_S coroutine in state 2 takes a liberal approach to loading data from memory: The complete cache block is always loaded but the delay is computed only for the amount of octabyte realy needed. Further, the memory is locked only for the required time. In this spirit, we should inform the flush_to_S coroutine if only a single octabyte in the Dcache outbuf is valid, compute block_diff accordingly and take path 2 instead of path 3 in this case. We take path 3 only if the Dcache outbuf is completely valid and in this case copy all data and all dirty bits to the newly allocated slot. To indicate that not all data in Dcache outbuf is valid, we could
So we change the following: In the subroutine flush_cache, we change @x startup(&c->flusher,c->copy_out_time); /* will not be aborted */ @y p->rank=c->bb; /* this many valid bytes */ startup(&c->flusher,c->copy_out_time); /* will not be aborted */ @z In flush_to_S, we change @x register int block_diff=Scache->bb-c->bb; @y register int block_diff=Scache->bb-c->outbuf.rank; @zand in state 3 of flush_to_S, we change @x if (block_diff) @In @<Handle write-around when writing to the D-cache@> we add at the beginning if (Dcache->filler.next) goto write_restart; if ((Scache&&Scache->lock) || (!Scache&&mem_lock)) goto write_restart;and before starting the flusher, we record the number of valid bytes: @x set_lock(self,wbuf_lock); startup(&Dcache->flusher,Dcache->copy_out_time); @y Dcache->outbuf.rank=Dcache->gg; /* this many valid bytes */ set_lock(self,wbuf_lock); startup(&Dcache->flusher,Dcache->copy_out_time); @z These changes will replace path 3 by path 2 in all cases where the data in the Dcache outbuf is not sufficent to fill an Scache slot and should also fix the bug in path 2.
StatusFixed in the repository, Rev. 92, on 2015-05-28. |
Please help to keep this site up to date! If you want to point out important material or projects that are not listed here, if you find errors or want to suggest improvements, please send email to