Sunday, September 5, 2010

Multiplexing filehandles with select() in perl.


The problem

I/O requests such as read() and write() are blocking requests. Suppose you have a line in a program that get STDIN from a terminal like the following: $input = <STDIN>; What will happen here is that the program's execution will block until there a line of input is available, i.e. the user types something followed by a newline. In many cases this is the desired behavior. Suppose you have a program that accepts requests through a socket and does some processing for each request, then moves on to the next request.
01 # Create the receiving socket
02 my $s = new IO::Socket ( 
03 LocalHost => thekla, 
04 LocalPort => 7070, 
05 Proto => 'tcp' 
06 Listen => 16, 
07 Reuse => 1, 
08 ); 
09 die "Could not create socket: $!\n" unless $s; 
10 
11 my ($ns, $buf); 
12 while( $ns = $s->accept() ) { # wait for and accept a connection 
13 while( defined( $buf = <$ns> ) ) { # read from the socket 
14 # do some processing 
15 } 
16 } 
17 close($s); Although this is a perfectly valid way of handling the incoming requests, it does suffer some serious problems, especially if the frequency of incoming requests is high and the processing that needs to be performed for each is a lot. Clearly, the problem is that, once a request has been accepted, we have to keep other requests hanging in the queue while we read the request message and process it. Now, reading from a socket is a blocking call, so if the client takes too long to transmit the request message, we just sit there waiting while we could be doing useful processing of other requests. Obviously, not only this is not acceptable, but in cases where the demand for request processing is high, the program may not be able to meet its operating reqiurements. Also think that a single client failure at a critical point (in the middle of an ongoing transmission) poses the risk of making the server block indefinetly.

What can we do about it?

What we need to deal with situations like the above, is a way to handle I/O (we use sockets for this example, but the rules apply in general to any kind of filehandles) independently and with some sort of apparent parallelism/multiprocessing. There are two very common approaches to deal with this. One approach is to spawn separate threads of control to handle each request. This can be done either at process-level, using fork() to create a new process for each request, or at thread-level using perl's threading capabilities to create multiple threads within the same process. (Perl's support for threads was introduced in version 5.005) The other approach - which is the one that we will discuss here - is to use the select() to multiplex between several filehandles within a single thread of control, thus creating the effect of parallelism in the handling of I/O.

What does select() do?

The idea behind select() is to avoid blocking calls by making sure that a call will not block before attempting it. How do we do that? Suppose we have two filehandles, and we want to read data from them as it comes in. Let's call them A and B. Now, let's assume that A has no input pending yet, but B is ready to respond to a read() call. If we know this bit of information, we can try readin from B first, instead of A, knowing that our call will not block. select() gives us this bit of information. All we need to do is to define sets of filehandles (one for reading, one for writing and one for errors) and ask call select() on them which will return a filehandle which is ready to perform the operation for which it has been delegated (depending on which set it is in) as soon as such a filhandle is ready. Obviously this provides us with the advantage of always picking up a filehandle that will not block thus avoiding the possibility of delaying the entire program for one lazy filehandle just because it happened to be the first we picked at random. Still, it does not guarantee that the selected filehandle is the best choice, because we still don't know how much data can be read, or how qucikly it can take in data that we wrte to it. But it is definetly a big step forward from our initial program.

Using select()

We will try writing the example program we attempted on the beginnign of this article, but now using the select() method. Instead of using perl's select call directly we will use a wrapper module, IO::Select that makes life easier for us.
... create socket as before ... 
11 use IO::Select; 
12 $read_set = new IO::Select(); # create handle set for reading 
13 $read_set->add($s); # add the main socket to the set 
14 
15 while (1) { # forever 
16 # get a set of readable handles (blocks until at least one handle is ready) 
17 my ($rh_set) = IO::Select->select($read_set, undef, undef, 0); 
18 # take all readable handles in turn 
19 foreach $rh (@$rh_set) { 
20 # if it is the main socket then we have an incoming connection and 
21 # we should accept() it and then add the new socket to the $read_set 
22 if ($rh == $s) {
23 $ns = $rh->accept(); 
24 $read_set->add($ns); 
25 } 
26 # otherwise it is an ordinary socket and we should read and process the request 
27 else { 
28 $buf = <$rh>; 
29 if($buf) { # we get normal input 
30 # ... process $buf ... 
31 } 
32 else { # the client has closed the socket 
33 # remove the socket from the $read_set and close it
34 $read_set->remove($rh); 
35 close($rh);
36 } 
37 } 
38 } 
39 }
We create an IO::Select object, $read_set, which is our set of handles to test for readability, and add all open handles to it. We start by adding the main socket and each time a new connection is made returning a new socket for it, we add that socket to the set. Then we go into a loop where we ask select to give us a list of readable handles and we examine each one in turn. If it is the main socket then we want to call accept() to receive the incoming connection and add the new socket to the read set. Otherwise it must be an ordinary socket in which case we read from it and process its input. If the read fails, that means the socket has been closed on the client side, so we close it, too, and remove it from the read set. So we work our way continuously through the incoming requests, by making sure that a call for I/O on any filehandle will progress since select() tells us it will. As we already mentioned earlier, this method does not guarantee progress as it only tests whether a handle is ready to respond to I/O. The question still remains, whether the handle we pick from the ready ones is the one that will respond faster to I/O, and how much data there is available for reading or how much data it is ready to receive. So it is still possible to block a bit after the point where we picked the handle. Also, we did not take into account the impact on performance that the actual processing of requests will have. We might just be printing incoming data to a file, but then again, each request might need heavy processing that would slow down the entire handle processing loop. But these are issues that must be considered in the context of the individual application.

No comments:

Post a Comment