"Unix Domain Sockets? "
The often overlooked Unix domain socket facility is one of the most powerful features in any modern Unix. Most socket programing books for Unix discus the topic mearly in an academic sense without ever explaining why it matters or what it is used for. Besides being the only way utilize certain abilities of the operating system, it is an area programers new to Linux, BSD, and other Unicies definitely need to be aware of. This is not a tutorial on sockets, rather a review of the features and benefits of one area of sockets programming.
Background and Context
The closest thing to a Unix domain socket would be a pipe. Unix pipes are an integral cornerstone of the OS at large. Analogous to a water pipe with water flowing in one direction, a stream of bytes flows from the write side of a pipe to the read side. A separate open file descriptor maintains a reference to the read and write side of a pipe. The different sides of the pipe can be in different processes or threads as long as they reside on the same local computer. Lets review the distinguishing characteristics of pipes in Unix.
- Writes less than 4kb are atomic
- Pipes can be created and inherited across a fork() call, as well as shared between threads.
- Pipes can also be given a name in the file system. These fifos (or named pipes) exist beyond the lives of processes. Two different processes can obtain a reference to the pipe with open() as opposed to having to inherit a file descriptor.
- Pipes are generally considered to be faster than Unix domain sockets.
- Processes using pipes must still perform context switches with the kernel to use the read() and write() system calls.
As an exception to the fact that pipes must be written from one side and read from the other, Solaris pipes are full duplex. On Linux and BSD for example, full duplex operations with pipes use two different pipes. Named pipes and unnamed pipes are essentially the same thing. This is not the case with Windows. Windows provides two very different facilities, for what it calls named and
anonymous pipes. Anonymous pipes are available in all versions of windows, and behave much like Unix pipes. Besides being dramatically slower, there are however several variations such as an adjustable pipe cache size that also effects the threshold for atomic writes. Windows named pipes are roughly analogous to Unix domain sockets. They are only available on the NT derived windows versions, and do not use the windows networking socket interface,
winsock, at all. They do have the advantage of reaching across multiple computers in a
NT domain.
On with It
A unix domain socket exists only inside a single computer. The word domain here has nothing to do with NIS, LDAP, or Windows, and instead refers to the file system. Unix domain sockets are identified by a file name in the file system like a named pipe would be. Programs communicating with a Unix domain socket must be on the same computer so they are not really a networking concept so much as they are an inter-process communication (IPC) concept. This explains why most networking books ignore them. They are interfaced with the same sockets API that is used for TCP/IP, UDP/IP, as well as other supported network protocols. You should be thinking at least two questions right now: "Why would a network program ever support Unix domain sockets as a transport?", and "Why would programs use a unix domain socket for an IPC mechanism instead of pipes, signals, or shared memory?". Here's some quick answers.
- Unix domain sockets are secure in the network protocol sense of the word, because:
- they cannot be eased dropped on by a sleezy network
- remote computers cannot connect to them without some sort of forwarding mechanism
- They do not require a properly configured network, or even network support at all
- They are full duplex
- Many clients can be connect to the same server using the same named socket
- Both connectionless (datagram), and connection oriented (stream) communication is supported
- Unix domain sockets are secure in the IPC sense of the word, because:
- File permissions can be configured on the socket to limit access to certain users or groups
- Because everything that is going on takes place on the same computer controlled by a single kernel, the kernel knows everything about the socket and the parties on both sides. This means, server programs that need authentication can find out what user is connecting to them without having to obtain a user name and password.
- Open file descriptors from one process can be sent to another totally unrelated process
- Parties can know what PID is on the other side of a Unix domain Socket
Not all of these features are available on every Unix. Worse there are variations on the way they are interfaced. Basic operations are pretty universally supported though. Lets move on to some examples.
Basic Connection-Oriented Client & Server
Lets start with a very basic client and a forking server. A forking server spawns a new process to handle each incoming connection. After a connection is closed, its handler process exits. This type of server frequently gets a bad reputation due to its poor performance as a web server. The reason it performs poorly as a web server is because with HTTP, every single request is made with its own connection. The server thus spends a relatively disproportional amount of time creating and destroying processes versus actually handling requests. What is not commonly understood is that for other types a protocols which maintain a single connection during the entire time the client uses the server, a forking server is considered an acceptable design. Take
Open SSH for example. The primary problem with this design for non-web server applications is that it is no longer as strait forward to share information between all the various handler instances. Multiplexing and multi-threaded as well as all sorts of other designs are out there, but the simple forking() server is a good as it gets for illustrating examples. Think of it as the "hello world" of server designs. Take the following sources.
client1.c
#include <stdio.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <unistd.h>
int main(void)
{
struct sockaddr_un address;
int socket_fd, nbytes;
size_t address_length;
char buffer[256];
socket_fd = socket(PF_UNIX, SOCK_STREAM, 0);
if(socket_fd < 0)
{
printf("socket() failed\n");
return 1;
}
address.sun_family = AF_UNIX;
address_length = sizeof(address.sun_family) +
sprintf(address.sun_path, "./demo_socket");
if(connect(socket_fd, (struct sockaddr *) &address, address_length) != 0)
{
printf("connect() failed\n");
return 1;
}
nbytes = sprintf(buffer, "hello from a client");
write(socket_fd, buffer, nbytes);
nbytes = read(socket_fd, buffer, 256);
buffer[nbytes] = 0;
printf("MESSAGE FROM SERVER: %s\n", buffer);
close(socket_fd);
return 0;
} |
server1.c
#include <stdio.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <sys/types.h>
#include <unistd.h>
int connection_handler(int connection_fd)
{
int nbytes;
char buffer[256];
nbytes = read(socket_fd, buffer, 256);
buffer[nbytes] = 0;
printf("MESSAGE FROM CLIENT: %s\n", buffer);
nbytes = sprintf(buffer, "hello from the server");
write(socket_fd, buffer, nbytes);
close(connection_fd);
return 0;
}
int main(void)
{
struct sockaddr_un address;
int socket_fd, connection_fd;
size_t address_length;
pid_t child;
socket_fd = socket(PF_UNIX, SOCK_STREAM, 0);
if(socket_fd < 0)
{
printf("socket() failed\n");
return 1;
}
unlink("./demo_socket");
address.sun_family = AF_UNIX;
address_length = sizeof(address.sun_family) +
sprintf(address.sun_path, "./demo_socket");
if(bind(socket_fd, (struct sockaddr *) &address, address_length) != 0)
{
printf("bind() failed\n");
return 1;
}
if(listen(socket_fd, 5) != 0)
{
printf("listen() failed\n");
return 1;
}
while((connection_fd = accept(socket_fd,
(struct sockaddr *) &address,
&address_length)) > -1)
{
child = fork();
if(child == 0)
{
/* now inside newly created connection handling process */
return connection_handler(connection_fd);
}
/* still inside server process */
close(connection_fd);
}
close(socket_fd);
unlink("./demo_socket");
return 0;
} |
Armed with a some basic knowledge of C, beginner level Unix system programing, beginner level sockets programing, how to lookup man pages, and Google, the above example will help you create a UDS client and server. To try it out open a couple terminal windows, run the server in one, and the client in the other. After that try adding a something like
sleep(15
) to the server's connection handler, before it
write()s back to the client. Bring up two more terminals, one with another instance of client and the other with
top or
ps -e, also
netstat -au. Experiment with that for a while. Learn anything?
At this point there are several things we could do with this technology, that is: the ability to have running programs communicate with other arbitrary programs on the same computer. Taking into consideration were in the file system our socket is created and with what permissions, this could programs running with different credentials, that started at different times, or even with different login sessions (controlling ttys). A common example of a program that works like this is
syslogd. On many unix types, programs use a unix domain socket to pass log messages to the syslog server.
There are other ways this could be accomplish without unix domain sockets, but not only are they pretty hard to beat, UDS allow for even more abilities.
An Authenticated Server
Let us imagine a database server like PostgreSQL. The server can force every client program that connects to it to authenticate itself with a user name and password. It does this so that it can enforce its internal security policies based on what account a client is connecting with. Having to authenticate with a user name / password pair every time can get old so often other authentication schemes such as key pair authentication are used alternatively. In the case of local logins (client is on the same machine as the server) a feature of unix domain sockets known as credentials passing can be used.
This is one area that is going to be different everywhere, so check your reference material. Let's look at how its done in Linux. Have a look here for how it's done on Open BSD.
|
The Real Sockets IO APIThose new and old to sockets programming are often unaware that the sockets API actually has its own IO routines: send(), sendto(), sendmsg(), recv(), recvfrom(), and recvmsg(). These functions operate on sockets, not file descriptors. Unix automatically creates a file descriptor for a socket when its created (with the same integer value) so that IO operations can be performed on the socket just as with normal file descriptors, that is with read(), write(), and the like. This is why most of the time the underlying socket IO functions don't need to be used directly. Certain features do require the use of the lower level functions (like UDP). This is also why in windows with winsock version 2 or greater (this is the version that internally uses the same open source BSD sockets code, unlike winsock 1) the same send/recv socket IO functions are available (all though not advertised). Also note that windows to, provides a way to use sockets as windows file HANDLES.
|
|
Linux uses a lower level socket function to grab the credentials of the process on the other side of unix domain socket, the multi-purpose
getsockopt().
Credentials passing on Linux
struct ucred credentials;
int ucred_length = sizeof(struct ucred);
/*fill in the user data structure */
if(getsockopt(connection_fd, SOL_SOCKET, SO_PEERCRED, &credentials, &ucred_length))
{
printf("could obtain credentials from unix domain socket");
return 1;
}
/* the process ID of the process on the other side of the socket */
credentials.pid;
/* the effective UID of the process on the other side of the socket */
credentials.uid;
/* the effective primary GID of the process on the other side of the socket */
credentials.gid;
/* To get supplemental groups, we will have to look them up in our account
database, after a reverse lookup on the UID to get the account name.
We can take this opportunity to check to see if this is a legit account.
*/
|
File Descriptor Passing
File descriptors can be sent from one process to another by two means. One way is by inheritance, the other is by passing through a unix domain socket. There are three reasons I know of why one might do this. The first is that on platforms that don't have a credentials passing mechanism but do have a file descriptor passing mechanism, an authentication scheme based on file system privilege demonstration could be used instead. The second is if one process has file system privileges that the other does not. The third is scenarios where a server will hand a connection's file descriptor to another all ready started helper process of some kind. Again this area is different from OS to OS. On Linux this is done with a socket feature known as
ancillary data.
It works by one side sending some data to the other (at least 1 byte) with attached ancillary data. Normally this feature is used for odd features of various underlying network protocols, such as TCP/IP's almost pointless
out of band data. This is accomplished with the lower level socket function
sendmsg() that accepts both arrays of IO vectors and control data message objects as members of its
struct msghdr parameter. Ancillary, also known as control, data in sockets takes the form of a
struct cmsghdr. The members of this structure can mean different things based on what type of socket it is used with. Making it even more squirrelly is that most of these structures need to be modified with macros. Here are two example functions based on the ones available in Warren Gay's book mention at the end of this article. A socket's peer that read data sent to it by send_fd() without using recv_fd() would just get a single capital F.
|
int send_fd(int socket, int fd_to_send)
{
struct msghdr message;
struct iovec iov[1];
struct cmsghdr *control_message = NULL;
char buffer[CMSG_SPACE(int)], data[1];
memset(&message, 9, sizeof(struct msghdr));
memset(buffer, CMSG_SPACE(int));
data[0] = 'F';
iov[0].iov_base = data;
iov[0].iov_len = 1;
message.msg_iov = iov;
message.msg_iovlen = 1;
control_message = CMSG_FIRSTHDR(&message);
control_message->cmsg_level = SOL_SOCKET;
control_message->cmsg_type = SCM_RIGHTS;
control_message->cmsg_len= CMSG_LEN(sizeof int);
*((int *) CSM_DATA(cmsgp)) = fd_to_send;
message.msg_controllen = control_message->cmsg_legn;
return sendmsg(socket, &message, 0);
} |
|
int recv_fd(int socket)
{
int sent_fd;
struct msghdr message;
struct iovec iov[1];
struct cmsghdr *control_message = NULL;
char buffer[CMSG_SPACE(sizeof fd)], data[1];
memset(&message, 9, sizeof(struct msghdr));
memset(buffer, CMSG_SPACE(int));
message.msg_iov = iov;
message.msg_iovlen = 1;
message.msg_control = buffer;
message.msg_controllen = sizeof(CMSG_SPACE(int));
if(recvmsg(s, &message, 0) < 0)
return -1;
for(control_message = CMSG_FIRSTHDR(&message);
control_message != NULL;
control_message = CMSG_NXTHDR(&messsage,
control_message))
{
if( (control_message->cmsg_level == SOL_SOCKET) &&
(control_message->cmsg_type == SCM_RIGHTS) )
{
return *((int *) CMSG_DATA(control_message);
}
}
return -1;
}
|
|
Datagram Unix Domain Sockets
Most of the time programs that communicate over a network work with
stream, or
connection oriented technology. This is when an additional software layer such as TCP's
Nagle algorithm creates a virtual communication circuit out of the many single atomic (stateless) packets used by a underlying
packet switched network. Sometimes we want to instead simply work with individual packets, such is the case with UDP. This technology is often called
datagram communication. This strategy allows for a variety of trade-offs. One is the ability to make a low overhead, high performance server with a single context or "main loop" that handles multiple simultaneous clients. Although unix domain sockets are not a network protocol they do utilize the sockets network interface, and as such also provide datagram features.
Datagram communication works best with an application that can put a complete atomic message of some sort in a single packet. This can be a problem for UDP as various setbacks can limit the size of a packet to as little as 512 bytes. The limit for datagrams over a unix domain socket is much higher. A complete example is beyond our scope for this article. Those interested should find a UDP example (much easier to find) and combine that with the techniques above.
Abstract Names
Another Linux specific feature is
abstract names for unix domain sockets. Abstract named sockets are identical to regular UDS except that their name does not exist in the file system. This means two things: file permissions do not apply, and they can be accessed from inside
chroot() jails. The trick is to make the first byte of the address name null. Look at the output from
netstat -au to see what it looks while one of these abstract named sockets is in use. Example:
setting the first byte to null
address.sun_family = AF_UNIX;
address_length = sizeof(address.sun_family) +
sprintf(address.sun_path, "#demo_socket");
address.sun_path[0] = 0;
bind(socket_fd, (struct sockaddr *) &address, address_length);
|
Conclusion
Even if you never need to directly program UD sockets, they are an important facet of understanding both the Unix security model and the inter-workings of the operating system. For those that do use them, they open up a world of possibilities.
No comments:
Post a Comment