Thursday, January 31, 2008

Supporting "fork"

System calls that create child processes, such as fork, must be handled specifically. A child spawned by a variant must be synchronized to its corresponding children in other variants. Therefore, the monitor should group the children that correspond to each other for synchronization and supervision purposes. Monitoring the main variants and their children in a single threaded monitor can impose significant overhead. For example, consider this scenario: the main variants invoke a system call that needs to be executed by the monitor and takes a long time. In the meantime the children invoke a system call that just needs a quick approval from the monitor. The children have to wait a long time for the monitor to finish the execution of the main variants' system call.
To tackle this problem, our monitor spawns a new thread every time child processes are created by the variants and hands over the monitoring of the newly created children to the new thread. The new thread is terminated when the children terminate. Handing the control over to the new thread is not straightforward, since ptrace is not designed to be used in a multi-threaded debugger. The new thread is not allowed to trace the children unless the parent thread detaches from the children first and lets the new thread attach to them. When the parent thread detaches from the children the kernel sends a signal to the children and let them continue execution normally and without notifying the monitor at system call invocations. This would cause some system calls to escape the monitoring.

We solved the problem by letting the parent thread start monitoring the new child processes until they invoke the first system call. At this point, the parent thread saves the system call and its arguments and replaces it by sigsuspend. We block all the signals using sigsuspend. Since SIGSTOP and SIGKILL cannot be blocked, we still receive these two signals. Other than replacing the system call, we also decrement the instruction pointer by 2 so that the int 0x80 is executed again after the child is resumed. This is required to restore the original system call and run it. After these changes, the parent thread detaches from the children. The children run sigsuspend and get suspended. Then the parent thread spawns a new monitoring thread and passes process IDs of the child processes to it. The new monitoring thread attaches to the children; kernel sends SIGSTOP to the children and wakes them up. They run int 0x80 again which is intercepted by the new monitoring thread. The monitoring thread restores the original system call replaced by sigsuspend and starts monitoring the new children without missing any system call.

In a multi-threaded monitor, any thread can receive signals raised in any traced process; meaning that a thread can receive signals raised for the processes monitored by another thread. To solve this problem, we simply use wait4 instead of wait. This way each thread only receives signals raised in processes monitored by itself.
The above mechanism also works for tracing children created with clone. In order to receive signals received in cloned children, we should use __WALL with wait4.

No comments: