Thursday, January 31, 2008

Supporting "fork"

System calls that create child processes, such as fork, must be handled specifically. A child spawned by a variant must be synchronized to its corresponding children in other variants. Therefore, the monitor should group the children that correspond to each other for synchronization and supervision purposes. Monitoring the main variants and their children in a single threaded monitor can impose significant overhead. For example, consider this scenario: the main variants invoke a system call that needs to be executed by the monitor and takes a long time. In the meantime the children invoke a system call that just needs a quick approval from the monitor. The children have to wait a long time for the monitor to finish the execution of the main variants' system call.
To tackle this problem, our monitor spawns a new thread every time child processes are created by the variants and hands over the monitoring of the newly created children to the new thread. The new thread is terminated when the children terminate. Handing the control over to the new thread is not straightforward, since ptrace is not designed to be used in a multi-threaded debugger. The new thread is not allowed to trace the children unless the parent thread detaches from the children first and lets the new thread attach to them. When the parent thread detaches from the children the kernel sends a signal to the children and let them continue execution normally and without notifying the monitor at system call invocations. This would cause some system calls to escape the monitoring.

We solved the problem by letting the parent thread start monitoring the new child processes until they invoke the first system call. At this point, the parent thread saves the system call and its arguments and replaces it by sigsuspend. We block all the signals using sigsuspend. Since SIGSTOP and SIGKILL cannot be blocked, we still receive these two signals. Other than replacing the system call, we also decrement the instruction pointer by 2 so that the int 0x80 is executed again after the child is resumed. This is required to restore the original system call and run it. After these changes, the parent thread detaches from the children. The children run sigsuspend and get suspended. Then the parent thread spawns a new monitoring thread and passes process IDs of the child processes to it. The new monitoring thread attaches to the children; kernel sends SIGSTOP to the children and wakes them up. They run int 0x80 again which is intercepted by the new monitoring thread. The monitoring thread restores the original system call replaced by sigsuspend and starts monitoring the new children without missing any system call.

In a multi-threaded monitor, any thread can receive signals raised in any traced process; meaning that a thread can receive signals raised for the processes monitored by another thread. To solve this problem, we simply use wait4 instead of wait. This way each thread only receives signals raised in processes monitored by itself.
The above mechanism also works for tracing children created with clone. In order to receive signals received in cloned children, we should use __WALL with wait4.

Tuesday, January 8, 2008

Signal Handling in Reverse Stack Executables

One of the challenges in reverse stack manipulation is signal handling. If a signal handler is defined for a signal, when the signal is raised, the kernel sets up a signal frame and saves the context of the process on the stack and calls the corresponding handler. Since the kernel expects normal stack growth direction, e.g. downwards in x86, the context saved by the kernel overwrites data on a reverse growing stack. To tackle this problem, we allocate a small block of memory and call sigaltstack to notify the OS that it must use this memory block as the signal stack to set up the signal frame and save the process' context.

The problem is that the handler which is defined by the programmer, is compiled for reverse stack. Now when the signal rises, the kernel saves the context on the stack and calls the handler. The handler uses the same signal handling stack and when it starts execution, the stack pointer is located just below the context saved by the OS. Therefore, a handler compiled for reverse growing stack can overwrite and destroy the context of the process, causing a crash when the handler returns.

To solve this problem, we change the interface to sigaction system call in libc. sigaction registers a new handler for a specified signal number. We change the interface to the system call so that whenever it is called, the new interface sets the new handler to a wrapper function that we have defined in libc. The wrapper function increments the stack pointer to bypass the area used for saving the process' context and then calls the user-defined function. After the user-defined function returns, the wrapper decrements the stack pointer to point back to its original location and returns. Using this method, the saved context remains intact and the kernel is able to restore it while everything remains transparent to the kernel.

As mentioned above, we allocate a block of memory to use as the alternative stack. We pass a pointer close to the beginning of the block to sigaltstack. The kernel uses this pointer as the beginning of the alternative stack and saves the context down this point and towards the beginning of the block. The pointer is far enough from the start of the block to provide adequate room for saving the context. After saving the context, our wrapper function increments the stack pointer to go past the context and uses the rest of the memory block as an upward growing stack for the signal handler.