Plash



Plash can be downloaded from the Mac App Store (via 9to5Mac) and it's all very simple indeed. All you need to do is open the app's menu bar icon and give it a URL and it gets to work. You can tweak things like opacity if you want, and there's even a reload interval option for websites that are updated regularly. Find 94 ways to say PLASH, along with antonyms, related words, and example sentences at Thesaurus.com, the world's most trusted free thesaurus. Alore consolidates your growth tech stack and offers everything needed to run your growth function at scale.

Architecture overview

Plash limits the ability of a process to open files by running it in achroot environment, under dynamically-allocated user IDs. The chrootenvironment only contains one file, an executable to exec to start theprogram running in the process.

Rather than using the open() syscall to open files, the client processsends messages to a server process. One of the file descriptors thatthe client is started with is a socket which is connected to theserver. The environment variable PLASH_COMM_FD gives the filedescriptor number. The server can send the client open filedescriptors across the socket in response to `open' requests (seecmsg(3)).

The server can handle multiple connections. If the client wishes tofork() off another process, it first asks the server to send itanother socket for a duplicate connection.

GNU libc is re-linked so that open() etc. send requests to the serverrather than using the usual Unix system calls. The dynamic linker(/lib/ld.so or, equivalently, /lib/ld-linux.so.2) is similarlyre-linked. execve() is changed so that it always invokes the dynamiclinker directly, since the chroot environment does not contain themain executable and the kernel does not provide an fexecve() systemcall. The dynamic linker is passed the executable via a filedescriptor.

The file server uses its own filesystem object abstraction internally.Filesystem objects may be files, directories or symbolic links on theunderlying filesystem provided by the Unix kernel. They may also beimplemented entirely in the server. The server has its own functionsfor resolving pathnames and following symbolic links which do not usethe kernel's facility for following symbolic links.

The shell starts up a new server process for each command the userenters. The shell and the file server are linked into the sameexecutable and the shell uses the same filesystem object abstraction.The shell simply uses fork() to start a new server.

User IDs are allocated by the setuid programrun-as-anonymous. It picks IDs in the range 0x100000 to0x200000 (configurable by changing config.sh), and opens lock files inthe lock directory/usr/lib/plash-chroot-jail/plash-uid-locks so that thesame UID is not allocated twice. The lock directory goes inside thechroot jail so that the sandboxed processes can also spawn processeswith reduced authority (though this is not done yet). Therefore`chroot-jail' needs to go on a writable filesystem, so you may need tomove it.

The setuid program gc-uid-locks will garbage collect andremove UID lock files for UIDs that are no longer in use. It works byscanning the `/proc' filesystem to list currently-running processesand their UIDs. When the shell starts, it runsgc-uid-locks.

glibc library calls and whether they are altered by Plash
TreatmentFunction
Intercepted and reimplemented entirely open, mkdir, symlink, unlink, rmdir, stat, lstat, readlink, rename, link, chmod, utimes, chdir, fchdir, getcwd, opendir/readdir/closedir, getuid/getgid
Intercepted but reimplemented using the original system call
  • fork -- duplicates the connection to the server first
  • execve -- invokes execve syscall on dynamic linker directly
  • connect, bind, getsockname -- changed for Unix domain sockets
  • fstat -- changed for directory FDs
  • close, dup2 -- changed to stop processes overwriting or closing the socket FD that is used to communicate with the server
Not interceptedread, write, sendmsg, recvmsg, select, dup, kill, wait, getpid (and others)

Symbolic links

Semantics

If we pass a directory as an argument to a program, it may containsymbolic links to anywhere. Since processes may now have differentnamespaces, we have a choice of namespaces in which to resolve thedestinations of the symbolic links. Do we resolve them in the user'snamespace, or the process's namespace?

Plashing

If we resolve symlinks in the user's namespace, and we allow theprocess to create symlinks to arbitrary destinations, it could createa symlink to `/' and thereby grant itself access to all of the user'sfilesystem. Instead, we could try to restrict the ability of aprocess to create symlinks, so that it can only create symlinks tofiles and directories that it already has access to. But sincesymlinks are interpreted relative to their position in the filesystem,which can change, it would be difficult to make this robust.Furthermore, the problem of pre-existing symlinks remains. A usershould be able to tell what files and directories they're grantingaccess to based on the command invocation. Granting access also tofiles and directories that are symlinked to, perhaps from deep insidea directory, violates this, because there is little constraint on thedestinations of symlinks.

Resolving symlinks in the process's namespace makes more sense. Itfollows the normal semantics of symlinks under Unix, which is thatsymlinks are simply a convenience that *could* be implemented by theprocess itself rather than by the kernel.

Ultimately, the solution is to do away with symbolic links and replacethem with object references.

Implementation

If we are to implement these semantics, we must be careful not to usethe kernel's ability to follow symlinks. There is not astraightforward option for turning off following symlinks in theunderlying filesystem. When we give a pathname such as `a/b/c' to thekernel, if `a/b' is a symbolic link the kernel will always follow it,interpreting it in its namespace.

The approach used in the file server is to set the current workingdirectory to each component of the pathname in turn. For eachcomponent, do:

  • lstat() on the leaf name. If it's a symlink, do readlink() and interpret the link.

  • Otherwise, if it's a directory, do open(leaf, O_NOFOLLOW | O_DIRECTORY). If O_NOFOLLOW or O_DIRECTORY are not supported, we can do fstat() to check that the object opened is the same as the one we lstat()'d (it may have changed between the system calls).

  • Do fchdir() to set the current directory to the directory.

Obviously this requires more system calls than allowing the kernel toresolve symlinks.

Note that the server must never send the clients FDs for directories.A client could use a directory FD to break out of its chroot jail.

Remaining problems

The Unix kernel can be regarded as providing a set of capabilityregisters (file descriptors) that can contain directory objectreferences, along with a special capability register (the currentworking directory) relative to which pathnames are resolved.References can be copied from a normal register to the specialregister using fchdir(). References can be copied from the specialregister to the normal registers using open('.').

Unfortunately, this model falls down in two places:

  • Directories with `execute' but not `read' permission cannot be opened with open(). One can chdir() into them, but not fchdir() into them.

    Arguably, Unix should let you open() such directories but not read their contents using the resulting FD.

    This could be worked around, but no workaround is implemented yet.

  • link() is unusual in that it takes two pathname arguments. It is difficult to use safely (without the kernel following symlinks). We have no guarantee that the source file (or destination) is the one we intended to link. Any check will be vulnerable to race conditions.

    The same applies to rename().

    Under Plash, link() and rename() are only implemented for the same-directory case.

Parent directories: the semantics of dot-dot using dir_stacks

A directory may have different parent directories in differentnamespaces. Furthermore, a directory may appear multiple times in thesame namespace, and so have multiple parents in that namespace. `..' does not fit well into a system based on object references. However,it is widely used by Unix programs, so we have to support it.

Rather than using the `..' parent directory facility provided by theunderlying filesystem, the file server interprets `..' itself.

The semantics is that the parent of a directory is the directory thatit was reached through, after symlinks have been expanded.

This means that the filename resolver maintains a stack of directoryobject references, called a dir_stack. When resolving the pathname`/a/b/..', it will first push the root directory onto the stack, thendirectory objects for `/a' and `/a/b', and then it will pop `/a/b' offthe stack, leaving `/a' at the top of the stack as the result.

If `/a/b' is a symlink to `g/h', however, the filename resolver doesnot push `/a/b' onto the stack (since `/a/b' is not a directory object).It pushes `/a/g' and then `/a/g/h' onto the stack. Then, when itinterprets `..' in the pathname, it pops `/a/g/h' off the stack toleave `/a/g' (the result) at the top.

The server represents the current working directory as one of thesedirectory stacks. One of the consequences of these semantics is thatif the current working directory is renamed or moved, the result ofgetcwd() will not reflect this.

This approach means that doing:has no effect (provided that the first call succeeds). This contrastswith the usual Unix semantics, where the 'leafname' directory couldbe moved between the two calls, giving it a different parent directory.This is partly why programs like 'rm' use fchdir() -- to avoid thisproblem.

Directory file descriptors

Plash supports open() on directories. It supports the use offchdir() and close() on the resulting directory filedescriptor. However, it doesn't support dup() on directory FDs,and execve() won't preserve them.

PlashyPlash

Directory file descriptors require special handling. Under Plash,when open() is called on a file, it will return a real,kernel-level file descriptor for a file. The file server passes theclient this file descriptor across a socket. But it's not safe to dothis with kernel-level directory file descriptors, because if theclient obtained one of these it could use it to break out of itschroot jail (using the kernel-level fchdir system call).

A complete solution would be to virtualize file descriptors fully, sothat every libc call involving file descriptors is intercepted andreplaced. This would be a lot of work, because there are quite a fewFD-related calls. It raises some tricky questions, such as what bitsof code use real kernel FDs and which use virtualised FDs. It mightimpact performance. And it's potentially dangerous: if the changes tolibc failed to replace one FD-related call, it could lead to the wrongfile descriptors being used in some operation, because in this case avirtual FD number would be treated as a real, kernel FD number.(There is no similar danger with virtualising the system calls thatuse the file namespace, because the use of chroot() means thatthe process's kernel file namespace is almost entirely empty.)

However, a complete solution is complete overkill. There are probablyno programs that pass a directory file descriptor to select(),and no programs that expect to keep a directory file descriptor acrossa call to execve() or in the child process after fork().

So I have adopted a partial solution to virtualising file descriptors.When open() needs to return a virtualized file descriptor -- inthis case, for a directory -- the server returns two parts to theclient: it returns the real, kernel-level file descriptor that it getsfrom opening /dev/null (a 'dummy' file descriptor), and itreturns a reference to a dir_stack object (representing thedirectory).

Plash

Plash's libc open() function returns the kernel-level/dev/null file descriptor to the client program, but itstores the dir_stack object in a table maintained by libc. Plash'sfchdir() function in libc consults this table; it can only work ifthere is an entry for the given file descriptor number in the table.

Creating a 'dummy' kernel-level file descriptor ensures that the filedescriptor number stays allocated from the kernel's point of view. Itprovides a FD that can be used in any context where an FD can be used,without -- as far as I know -- any harmful effects. The clientprogram will get a more appropriate error than EBADF if it passes thefile descriptor to functions which aren't useful for directory filedescriptors, such as select() or write().

Why not do interception of system calls using, for example, ptrace?

Another way to do what Plash does is to intercept system calls.

Splash island

Plashight

One way to do this is to use the ptrace mechanism, which is availablein standard versions of Linux. Using ptrace, all the syscalls aprocess makes can be handled by another process. The problems withptrace are security and performance. Firstly, fork() cannot behandled securely with ptrace. Secondly, redirecting system calls withptrace is slow, but it can't be done selectively. ptrace doesn't letyou redirect some syscalls (such as 'open') while letting othersthrough (such as 'read'). (See David Wagner's Master's thesis,'Janus: an approach for confinement of untrusted applications'.)

systrace provides a mechanism that is similar to ptrace. It providesbetter performance, because it allows system calls to be interceptedselectively. It allows race-free handling of fork(). However, it isnot part of standard releases of Linux. Using it requires recompilingyour kernel and rebooting. Plash is intended to be immediately usablewithout recompiling your kernel. That said, it would be useful to addsystrace support to Plash in addition to its current approach.

Ostia provides a different mechanism intercepting system calls.Rather than redirecting a system call to a second process, it willbounce a system call back to the process that issued it. Then, muchlike in Plash, the process makes the request via a socket. Thisapproach is simpler than systrace. Unlike Plash, it doesn't requiremodifying libc. A separate library handles the syscalls that getbounced back. Ostia is implemented by a Linux kernel module.Unfortunately, the code is not publicly available. (See 'Ostia: ADelegating Architecture for Secure System Call Interposition' by TalGarfinkel, Ben Pfaff and Mendel Rosenblum, 2004.)

Plashspeed 5

Plash could benefit by using syscall interception. Using chroot andUIDs, Plash is able to control a process's ability to access thefilesystem and interfere with other processes. However, Plash doesnot prevent a process from connecting to or listening on networksockets. This could be done if there was a way for Plash to prevent aprocess from doing connect() and bind() system calls.

How does Plash compare with chroot jails?

Plash provides functionality similar to chroot(). The Linux kernel'schroot() system call can be used to run a program in a different filenamespace (ie. root directory). chroot jailing is a well-knowntechnique, though not used very frequently due to its limitations.

The facilities for creating new namespaces for use with chroot arelimited. You can only put individual files into the chrootenvironment by copying or hard linking them. It's not possible togrant read-write access to individual directory entries. Though youcan't hard link directories, you can put directories into a chrootenvironment using 'mount --bind', but this can't be used to grant onlyread-only access to a directory.

chroot environments are heavyweight. It is not practical to createone for every invocation of a program. To do so, you would have todelete the copied files and directories, and remove any mount pointentries, when the process you started had finished. If a programstarts child processes, it's hard to tell when this is. As a result,chroot environments are usually static.

Plash Palatka

Furthermore, the chroot() call is only available to the root user.(This is a consequence of the way chroot() interacts with setuidexecutables.)

Plash implements its security using a chroot environment, but this islargely just an implementation detail. Plash uses chroot() to takeauthority away from a process, but it uses file descriptor passing togive limited authority back to the process.

Plash moves the interpretation of filenames so that it is done in userspace. It allows directories to be implemented in user space. Thisallows the creation of file namespaces to be more flexible. Files,directories and directory entries (slots) can be mapped anywhere in adirectory tree. Since the directory tree for a file namespace isstored in a server process, tidying up is simple: the server processexits when no clients are connected to it.