Recently, I replaced the FIFOs by shared memory and obtained very good results. Shared memory size is by default several mega bytes (use "cat /proc/sys/kernel/shmmax" in Linux to see its default size). The default size is more than enough to read system call arguments from traced processes and to write back system call results to the processes.
Similar to FIFOs, the downside of using shared memory is the security risk, since any process can connect to them and try to access their contents. However, each shared memory block has a key and processes are allowed to attach a block only if they have the correct key. When we create shared memory blocks, their permissions are set so that only the user who has executed the monitor can read from or write to them. Therefore, the risk is limited to the case of a malicious program that is executed in the context of the same user or a super user. Both cases would be possible only when the system is already compromised.
Evaluating performance of shared memory versus FIFOs versus ptrace, I observed that shared memory is about 20 times faster than FIFOs and 900 faster than ptrace when transferring a 128KB buffer.
