Thursday, February 5, 2009

Low-overhead access to the memory space of a traced process, Part IV

Low-overhead access to the memory space of a traced process is a major challenge when using a user-space tracer. I have devoted a few posts to mechanisms that allow us access a traced process's memory with low overhead. My last post on low-overhead memory access suggests using FIFOs. However, FIFOs have a limitation: size of FIFOs is often small and not configurable without recompiling the Linux kernel. For example, it is 4KB in Ubuntu 8.10. This means that multiple iterations of FIFOs are needed to transfer buffers whose sizes are larger than 4KB. Each iteration needs multiple context switches which increase the overhead.

Recently, I replaced the FIFOs by shared memory and obtained very good results. Shared memory size is by default several mega bytes (use "cat /proc/sys/kernel/shmmax" in Linux to see its default size). The default size is more than enough to read system call arguments from traced processes and to write back system call results to the processes.

Similar to FIFOs, the downside of using shared memory is the security risk, since any process can connect to them and try to access their contents. However, each shared memory block has a key and processes are allowed to attach a block only if they have the correct key. When we create shared memory blocks, their permissions are set so that only the user who has executed the monitor can read from or write to them. Therefore, the risk is limited to the case of a malicious program that is executed in the context of the same user or a super user. Both cases would be possible only when the system is already compromised.

Evaluating performance of shared memory versus FIFOs versus ptrace, I observed that shared memory is about 20 times faster than FIFOs and 900 faster than ptrace when transferring a 128KB buffer.

4 comments:

Wei Hu said...

Very interesting stuff. I read all your posts after reading your EuroSys 09 paper. Although the N-Variant idea isn't new, I can see that you had jumped through a hell lot of hoops in order to make the implementation work. Please keep posting so that we can see how the design decisions were made.

Babak said...

Thanks for your comment, Wei.
I have also read all the papers from your research group on n-variant execution. Although the concept of n-variant or multi-variant execution is no longer new, there are still a few open problems that should be addressed before the technique becomes usable in production systems. We should join forces to address these problems.

Wei Hu said...

Last year Prof. Evans invited Lorenzo Cavallaro to give a talk at our school. He also applied this multi-variant idea, so you might want to look up his work (in case you haven't). I cannot recall his status, but your work seems to be more complete.

My personal understanding of this n-variant idea is that it has many practical issues (some of which were addressed in your paper), and more importantly, we're running out of ideas in creating variants.

PS. Have you ever considered implementing the monitor inside the processes?

Babak said...

Yes, I became familiar with Lorenzo's work recently and found it interesting, however, as you said we managed to build a more robust multi-variant execution layer. Frankly, I no longer believe that multi-variant execution has many practical issues. With the help of our synchronous signal delivery mechanism, we are close to running programs as low level as user mode linux in a multi-variant environment.
As I mentioned before, I absolutely agree that there are still some issues, but not many.

Creating new variants is a little different story. However, I believe that currently available variation mechanisms are enough to cover almost all vulnerabilities found. So, I am not sure if we really need fundamentally new variation mechanisms. Instead, we need to modify some of the available variation mechanisms and make them run almost as fast as a normal executable. This is needed to minimize the overhead of synchronizing variants and get one step closer to practical deployment of the mechanism.

By "implementing the monitor inside the processes", do you mean implementing the monitor inside the variants? If you do, I should say "no, we haven't".
I am not sure what the advantages of this would be. Don't we need to have a separate third process to monitor and synchronize the variants anyways?