-
Notifications
You must be signed in to change notification settings - Fork 18
Immix Garbage Collection
Immix Garbage Collector is a type of mark-region garbage collection algorithm, that splits the heap space into blocks and lines, which make the finding of holes, and the freeing of unused objects faster than traditional mark-sweep or mark-copy algorithms
const TAG_SLOT = 0;
const SIZE_SLOT = 1;
const FIRST_CHILD_SLOT = 2;
const LAST_CHILD_SLOT = 3;
const MARK_SLOT = 4;Nodes in the Immix Garbage Collector differ from regular SVML nodes by including a MARK slot for all objects that live in the heap. This slot is required in order for us to resolve circular or multiple references when traversing live nodes.
// block nodes layout
//
// 0: tag = BLOCK_TAG
// 1: size = total size of block
// 2: offset of first child from the tag: 6
// 3: offset of last child from the tag: (NUM_OF_LINES_PER_BLOCK - 1) * LINE_BK_SIZE (3)
// 4: mark (free: 0, occupied: 1)
// 5: block state (occupied: 0, recyclable: 1, free: 2)
// 6: line 0 start address
// 7: line 0 mark bit (marked: 1, unmarked: 0)
// 8: line 0 end of occupied address (for block occupancy profiling purpose)
// 9: line 1 address
// ...
// NUM_OF_LINES_PER_BLOCK * LINE_BK_SIZE + BLOCK_BK_SIZE: start of line 0 usable memory
Blocks contain bookkeeping slots for lines, in order to reserve a contiguous memory space in the body of the block to allow for allocation of objects that are more than the size of a line.
Line address refers to start of the line metadata with reference to the heap.
- Line contains start and end addresses in order to provide more information and allow for flexibility of extension in the future, for example block occupancy profiling.
- We keep the start address even though it can be calculated from a given line address in order to reduce the use of registers to calculate this address, and also to provide convenience to the programmer.
The Immix Garbage Collector is essentially a Mark-Region Garbage collector with 2 levels of granularity as well as a compaction mechanism. There are also several other bells and whistles like overflow allocation and opportunistic defragmentation.
The garbage collection mechanism goes something like this:
function garbage_collect() {
map(roots(heap), mark);
map(blocks(heap), free);
unmark_all_nodes(heap);
}
function mark(node) {
node.mark_slot = MARKED;
for (child in node.children) {
if (is_forwarding_address(child)) {
child = heap[forwarding_address];
}
if (child.mark_slot === UNMARKED) {
const b = get_block(child);
if (need_to_defrag) {
const forwarding_address = copy(child);
child = heap[forwarding_address];
}
get_block(child).mark_slot = MARKED;
get_line(child).mark_slot = MARKED;
mark(child);
}
}
}
function free(block) {
if (block.mark_slot === MARKED) {
for (line in block.lines) {
if (line.mark_slot === MARKED) {
free_line(line);
}
}
} else {
free_block(block);
}
}All live objects that are reachable will be traversed via dfs and marked. If the defragmentation is active, the object will be copied to a free block and the garbage collector will leave behind a forwarding address. If this forwarding address is encountered again during the marking phase, the parent node of the object will have it's reference updated accordingly. At the end of the marking phase, we can be sure that there are no more loose pointers and the candidate block can then be freed and turned into the new headroom block.
Freeing a node consists of freeing the lines that it resides in. This is done by setting the LIMIT_ADDRESS of the line to its START_ADDRESS. No values are actually overwritten in favor of speed, although this can be done during development.
Freeing a block simply consists of changing the block state to FREE.
Debugging is a tricky thing to do considering the possible propagation of unwanted behavior before the program fails and the absence of a stack trace. In order to improve the programmer productivity, this writer has decided to include helpful and general assertions that can be used throughout the program in order to ensure correctness and faster debugging.
Assertions in Source makes use of the built-in error function. Assertion functions also take in a list of string that acts as a rudimentary stack trace that allows the user to identify where a violation of the assertion takes place. Upon encountering an assertion, the SVM will print out current register values as well as a visualization of the heap.
Note that the assertions listed here are only a part of
*-with-mark-region-gc.jsprograms. (at the point of writing, May 2020)
Assertions begin with assert_*. These functions do not return anything but check the validity of some given input.
Reference functions begin with ref_*. These functions have return values depending on the specification, and are meant to be correct functions not written in machine code, so that virtual machine macros can be checked against this output.
Assert input boolean to be true
-
boolboolean to check -
msgstring to display if boolean is not true -
stacklist of strings that act as stack trace
Assert input boolean to be false
-
boolboolean to check -
msgstring to display if boolean is true -
stacklist of strings that act as stack trace
Assert 2 input values to be equal (===)
-
v1first value to check -
v2second value to check -
stacklist of strings that act as stack trace
Assert a given input to be equal to the output of a reference-implementation function. Useful to check correctness of one's implementation, or discover bugs in the reference implementation.
-
v1value to check -
ref_wrapperfunction such that when applied to a stack, returns the output of the reference implementation. ie,const ref_wrapper = stack => ref_mark(pair("more info on stack_trace", stack)); -
stacklist of strings that act as stack trace
Reference implementation for the mark algorithm, returns a list of all live nodes
-
stacklist of strings that act as stack trace
List of marked node addresses
Makes sure that input value is an address that is not undefined and not used for Block or Line metadata.
-
addressaddress number -
stacklist of strings that act as stack trace
Makes sure that input value is a node that is not undefined and its children are all addressable and defined.
-
node_addressaddress of node -
stacklist of strings that act as stack trace
Makes sure that input value is an environment node and all its parent are also environment nodes.
-
env_addressaddress of environment node -
stacklist of strings that act as stack trace
Ensures operand stack (OS) has a LAST_CHILD value that is within the bounds of the size of the OS node.
-
OSaddress of the OS -
stacklist of strings that act as stack trace
Reference implementation for GET_BLOCK. Returns the address of the block that a given address is in.
-
addressinput address -
stacklist of strings that act as stack trace
Address of block
Reference implementation for GET_LINE. Returns the address of the line that a given address is in.
-
addressinput address -
stacklist of strings that act as stack trace
Address of line
Helper function that copies current RTS values into an array
-
topnumber that is used as the top of RTS
Array of copied RTS values.
That an input array is the same as current RTS, also considering the current TOP_RTS value.
-
copyarray of values, usually a copied RTS before executing some function. -
stacklist of strings that act as stack trace
Ensures that all lines a given node is residing in, are marked.
-
node_addressaddress of a given node -
stacklist of strings that act as stack trace
Asserts that a node is marked
-
mark_offsetnumber that determines the mark slot of the node. This can be MARK_SLOT or LINE_MARK_SLOT. Other values may work but results are not guaranteed to be meaningful. -
addressaddress of a given node -
stacklist of strings that act as stack trace
Asserts that a node and its corresponding lines and block are marked
-
node_addressaddress of a given node -
stacklist of strings that act as stack trace
Ensure that a given node and its corresponding lines and block are not freed.
-
addressaddress of a given node -
stacklist of strings that act as stack trace