The rt0 code implements a dedicated function for initializing the Go runtime
structures. Instead of reserving space for a dummy g struct, the rt0
code now uses the g0 and m0 symbols defined by the runtime package. In
addition to setting up g0, the rt0 also sets up the m0 struct and links
it to g0.
Setting up m0 is a requirement for properly bootstapping the
malloc-related code in the following commits
The Makefile contains rules for invoking the offsets tool to generate
the offset definitions for members of the g, m and stack structs. The
definitions are stored in BUILD_DIR and BUILD_DIR is passed as an
include target to nasm.
The offsets tool is essentially a wrapper around "go build -a -n". It
creates a temporary folder with a dummy go file and runs the above
command using the target OS/ARCH for the kernel and captures the output.
The use of the "-a" flag forces go build to generate a build script for
rebuilding all packages including the runtime ones. As a by-product of
building the runtime package, the compiler emits the "go_asm.h" file
that contains (among other things) the offsets for each element of the
g, m and stack structures (see src/runtime/runtime2.go).
These offsets are used in Go assembly files instead of hardcoded
offsets. For example the following snippet accesses the pointer to m in
the g struct address stored at register CX:
MOVQ TLS, CX
MOVQ g_m(CX), BX
The offsets tool modifies the captured output from the go build command
so it only includes the steps up to building the runtime package,
executes the build script and post-processes the generated go_asm.h file
to retain the entries relevant to g, m and stack and then formats them
so they are compatible with nasm definitions (name equ value).
Depending on the value of the "-out" option, the tool outputs the
generated definitions either to STDOUT (default value for -out) or to a
file.
The generated DWARF information contains absolute file paths for the
source files which causes issues when debugging on OSX as GDB cannot
lookup the source files.
Summary of changes:
- when building the gdb target, the source is built with optimizations
and inlining disabled (-N -l)
- source Go gdb helpers when running the gdb target
- set split layout (asm + code)
All calls (but one) to kernel.Panic have been replaced by calls to
panic. A call to kernel.Panic is still required to prevent the compiler
from treating kernel.Panic as dead code and eliminating it.
The rt0_64 code reserves space for _rt0_redirect_table using the output
from the redirect tool's "count" command as a hint to the size of the
table. The table itself is located in the .goredirectstbl section which
the linker moves to a dedicated section in the final ELF image.
When the kernel boots, the _rt0_install_redirect_trampolines function
iterates the _rt0_redirect_table entries (populated as a post-link step)
and overwrite the original function code with a trampoline that
redirects control to the destination function.
The trampoline is implemented as a 14-byte instruction that exploits
rip-relative addressing to ensure that no registers are made dirty. The
actual trampoline code looks like this:
jmp [rip+0] ; 6-bytes
dq abs_address_to_jump_to ; 8-bytes
The _rt0_install_redirect_trampolines function sets up the abs_address
to "dst" for each (src, dst) tuple and then copies the trampoline to
"src". After the trampoline is installed, any calls to "src" will be
transparently redirected to "dst". This hack (modifying code in the
.text section) is only possible because the code runs in supervisor mode
before memory protection is enabled.
The tool scans all go sources (excluding tests) in the "kernel" package
and its subpackages looking for functions with a "go:redirect-from
symbol_name" comment. The go:redirect-from directive implies that a
function serves as a redirect target for s symbol name. For example,
the following block:
//go:redirect-from runtime.gopanic
func foo(_ interface{}){
...
}
specifies that calls to "runtime.gopanic" should be redirected to "foo".
The tool provides two commands:
- count: prints the count of redirections
- populate-table: resolve redirect symbols and populate the
_rt0_rediret_table entries in the kernel image.
As the final virtual addresses for the symbols are only known after
linking, populating this table is a 2-step process. At first, the
"count" command is used to allocate enough space for 2 x NUM_REDIRECTS
pointers. The table itself is placed with the help of the linker script
in a separate section making it easy to find its offset in the ELF
image.
After the kernel is linked, the "populate-table" command use the
debug/elf package to scan the image file and resolve the addresses for
the src and dst redirection symbols. The tool will then open the image
file in RW mode, seek to the location of the table and write the symbol
addresses for each (src, dst) tuple.
This vmm package exports ReservedZeroedFrame which can be used to setup
a lazy physical page allocation scheme. This is implemented by mapping
ReservedZeroedFrame to each page in a virtual memory region using the
following flag combination: FlagPresent | FlagCopyOnWrite.
This has the effect that all reads from the virtual address region
target the contents of ReservedZeroedFrame (always returning zero). On
the other hand, writes to the virtual address region trigger a page
fault which is resolved as follows:
- a new physical frame is allocated and the contents of ReservedZeroedFrame
are copied to it (effectively clearing the new frame).
- the page entry for the virtual address that caused the fault is
updated to point to the new frame and its flags are changed to:
FlagPresent | FlagRW
- execution control is returned back to the code that caused the fault
Page faults occurring on RO pages with the CopyOnWrite flag set will be
handled by the page handler as follows:
- allocate new frame
- establish temporary mapping for new frame
- copy original page to new frame
- update entry for the page where the fault occurred:
- set physical frame address to the allocated frame
- clear CoW flag and set Present, RW flags
- return from the fault handler to resume execution at the instruction
that caused the fault
Any other page faults will still cause a kernel panic
The rt0_64 code will load a blank IDT with 256 entries (the max number
of supported interrupts in the x86_64 architecture). Each IDT entry is
set as *not present* but its handler is set to a dedicated gate entrypoint
defined in the rt0 code.
A gate entrypoint is defined for each interrupt number using a nasm
macro. Each entrypoint will then use the interrupt number to index a
list of pointers (defined and managed by the Go assembly code in
the irq pkg) to the registered interrupt handlers and push its address
on the stack before jumping to one of the two available gate dispatching
functions (some interrupts also push an error code to the stack which
must be popped before returning from the interrupt handler):
- _rt0_64_gate_dispatcher_with_code
- _rt0_64_gate_dispatcher_without_code
Both dispatchers operate in the same way:
- they save the original registers
- they invoke the interrupt handler
- they restore the original registers
- ensure that the stack pointer (rsp) points to the exception frame
pushed by the CPU
The difference between the dispatchers is that the "with_code" variant
will invoke a handler with signature `func(code, &frame, ®s)` and
ensure that the code is popped off the stack before returning from the
interrupt while the "without_code" variant will invoke a handler with
signature `func(&frame, ®s)`
Function EarlyReserveRegion reserves contiguous virtual address space
regions beginning at the end of the available kernel space and moving
towards lower virtual addresses. The only state that is tracked by this
function is the last allocated virtual page address which is adjusted
after each reservation request.
Starting at the end of the kernel address space ensures that we will not
step on the virtual addresses used by the kernel code and data sections.
This allows us to remove the allocFn argument from the vmm functions
which causes the compiler's escape analysis to sometimes incorectly flag
it as escaping to the heap.
The linked.ld script is extended to include the _kernel_start and
_kernel_end symbols which are passed by the rt0 code to Kmain. The
allocator converts these addresses to a start/end frame index by
rounding down the kernel start address to the nearest page and rounding
up the kernel end address to the nearest page.
When allocating frames, the allocator will treat the region defined by
these 2 indices as reserved and skip over it.