coleman - web server and C servlet engine
coleman config-file
Coleman is a couple different things. It’s a simple, fast, and small web server. It’s also a servlet engine in C instead of Java.
The servlet architecture is a great idea, a nice simple plug-in interface for web servers, but it has some problems. First, it’s tied to Java. If you don’t like that language then you are stuck. Second, the performance of servlet-based web servers is inherently mediocre because they handle concurrency with threads.
Coleman fixes both of these problems; the first, by using plain old C as the API language; the second, with a new hybrid concurrency model.
A web server’s concurrency model is how it handles multiple simultaneous requests. It’s fundamental to the server’s performance. The usual alternatives are: processes, threads, and non-blocking I/O. Forking a process for each connection is high-overhead and slow. Threads are cheaper and faster but still not great. Non-blocking I/O has the lowest overhead and is fastest, but it’s complicated to program.
Coleman uses a hybrid model: threads for simplicity and reasonable performance when starting requests, plus an interface to hand off static data back to the main loop where it is handled via non-blocking I/O for top performance. This avoids all the complicated parts of doing non-blocking I/O and just leaves the high speed and low cost.
There’s only one command-line option: the config file. It’s required, but it can be fairly small. Some example config files are included in the source tree.
The config file
is in JSON, not because there’s any JavaScript
involved in the server but just becuse it’s a nice
simple data format. Here’s a sample file:
{
"server": {
"listeners": [
{ "servlets": [ { "name":
"file_servlet" } ] }
]
}
}
There’s a server object, which contains a listeners
array with one listener, and the listener contains a
servlets array with one servlet. In more elaborate setups,
servers can include more than one listener and listeners can
include more than one servlet.
Unlike standard JSON, coleman config files may contain comments in the usual JavaScript double-slash form.
Note: if you are currently a thttpd or mini_httpd user, the coleman distribution includes a little program called conf_conv that converts thttpd/mini_httpd config files into coleman’s JSON format, to the extent possible.
Servers, listeners, and servlets can all include optional parameters. The example above shows some - the listener has a protocol parameter, and the servlet has a name parameter. Here’s a list of all the parameters you can use.
Server parameters:
dir |
An optional directory to switch to. A typical value might be "/usr/local/www/chroot/". No default. |
||
chroot |
Whether to use the chroot(2) system call. If this is true then the server will use chroot to isolate itself in one directory. This is an excellent security measure. Starting as root without specifying chroot is allowed but you get a warning message that it’s a bad idea. Default: false. |
logfile
A filename for request log entries, in the usual CERN combined log format. The default is to log to stderr. If you are going to use SIGHUP to close and re-open the log file, there are some subtle interactions with the dir and chroot parameters to be aware of. Note that this log file is only for request log entries. Errors, startup/shutdown notices, and stats all go to syslog. Look for them in /var/log/messages or your system’s equivalent.
servlet_path
A list of directories to search for loadable servlet modules. The default is "modules:." - search the subdirectory called "modules", then search the current directory. For a production system a more appropriate value might be something like "/usr/local/www/chroot/modules".
virtual_host
Whether to do virtual hosting. If this is true then each request’s Host header is used to specify a subdirectory below the main data directory. Default: false.
charset
The character set specifier to insert into text MIME types. Default is UTF-8.
threads
How many threads to run. The default is 10. If there are more simultaneous requests than threads to run then, the requests get queued.
background
Whether to run as a background process. Default: false.
pidfile
Filename to write the server’s process-ID to.
subdir |
An optional subdirectory to switch to after doing the chroot. A typical value might be "data". No default. |
||
user |
If the server is started as root, it will attempt to switch to another user to minimize security exposure. The default is "nobody". |
nlr_url_pattern
Disallow non-local referrers for these URLs. Default: non-local referrers are allowed.
nlr_no_empty_referrers
Disallow empty referrers too. Default: false, empty referrers are allowed.
errors_dir
Directory to search for custom error pages.
The error files should be named "errNNN.html", where NNN is the error number. If this parameter is not set or if no such file exists, then a default error message is generated. info Your server’s name. Defaults to "coleman" plus the version.
url |
A URL for the server. It shows up at the end of server-generated pages, such as directory listings and error pages. The default is acme.com’s coleman page. |
Listener
parameters:
protocol
Either http or https. Default: http.
port |
The port number to listen on. Defaults to 80 for http and 443 for https. |
local_address
Can be used to listen on a specific local address. Default: listen on all local addresses.
cert |
The SSL certificate filename. You can have a different one for each listener. |
||
info |
The listener’s name. Defaults to some automatically generated info about the listener. |
Servlet parameters:
name |
The name of the servlet. Required. |
pattern
If there’s more than one servlet in a listener, this pattern gets matched against the URL to determine which servlet to run.
A few parameters use wildcard patterns. These are similar to shell wildcards - ? matches any single character, * matches any sequence of characters except / - plus two additional features: ** matches any sequence including /, and | separates multiple patterns.
The first thing done in the initialization process is changing the current directory to the "dir" parameter, if specified. The second thing done is the chroot() call, if specified. Then the rest of the initialization. The second to last thing to happen is the second directory change to the "subdir" parameter, if specified. And the last thing to happen is switching UIDs to "user".
What this means is any filenames used in initialization - such as the log file, the servlet load path, an SSL certificate - are interpreted relative to "dir", and must be within the chroot tree. It also means sensitive files, such as SSL certificates, can be owned by root and protected against reading.
Coleman
can optionally be compiled with SSL support. If SSL is
available, you use it by creating a listener with protocol
https instead of http. You will also need to give the cert
option, specifying your PEM-format certificate file.
Here’s an example server config file for SSL:
{
"server": {
"listeners": [
{
"protocol": "http",
"servlets": [ { "name":
"file_servlet" } ],
}, {
"protocol": "https",
"cert": "example_com.pem",
"servlets": [ { "name":
"file_servlet" } ]
}
]
}
}
If you like, you can have multiple https listeners running
in the same server, each with a different local address and
certificate file.
The Makefile includes a "make cert" target for creating self-signed certificates. You can also get a commercial certificate. And in Summer 2015 a free certificate service will be available at https://letsencrypt.org/
Coleman’s
standard modules directory includes a cgi_servlet that
implements the CGI 1.1 spec. You can use it by including it
in your listener with an appropriate pattern, for example:
"listeners": [
{
"servlets": [
{ "name": "cgi_servlet",
"pattern": "**.cgi" },
{ "name": "file_servlet",
"pattern": "**" }
]
}
]
Servlets are small bits of C code that the server calls to handle requests. They are normally compiled separately from the server, and are loaded at runtime. As mentioned above, the server option "servlet_path" gives a list of directories to search for servlet modules to load. Coleman comes with a few servlets in the modules subdirectory. file_servlet implements the usual web server behavior of serving files and directories, and cgi_servlet implements the CGI spec. If you just want to use coleman as a web server, that’s all you need.
If, on the other hand, you want to add your own servlets, you’ll want to learn about the servlet API. It’s documented in the servlet(3) man page. Perusing the source code of the included servlets will help get you up to speed. In particular, sample_servlet is a "Hello world" example, and test_servlet exercises all the API calls.
Basic Authentication uses a password file called ".htpasswd", in the directory to be protected. This file is formatted as the familiar colon-separated username/encrypted-password pair, records delimited by newlines. The protection does not carry over to subdirectories. The utility program htpasswd(1) is included to help create and modify .htpasswd files.
chroot(2) is a system call that restricts the program’s view of the filesystem to the current directory and directories below it. It is impossible for remote users to access any file outside of the initial directory. The restriction is inherited by child processes, so CGI programs get it too. This is a very strong security measure, and is recommended. The only downside is that only root can call chroot, so this means the program must be started as root. However, the last thing it does during initialization is to give up root access by becoming another user, so this is safe.
Note that with some other web servers setting up a directory tree for use with chroot is complicated, involving creating a bunch of special directories and copying in various files. With coleman it’s a lot easier, all you have to do is make sure any shells, utilities, and config files used by your CGI programs and scripts are available. If you have CGI disabled, or if you make a policy that all CGI programs must be written in a compiled language such as C and statically linked, then you probably don’t have to do any setup at all.
However, one
thing you should do is tell syslogd about the chroot tree,
so that coleman can still generate syslog messages. Check
your system’s syslogd man page for how to do this. In
FreeBSD you would put something like this in /etc/rc.conf:
syslogd_flags="-l /usr/local/www/chroot/dev/log"
Substitute in your own chroot tree’s pathname, of
course. Don’t worry about creating the log socket,
syslogd wants to do that itself. (You may need to create the
dev directory.) In Linux the flag is -a instead of -l, and
there may be other differences.
Coleman can serve multiple virtual hosts on the same system. This is different from multiple listeners on different local addresses. In that case, each listener is actually on a different IP address. With virtual hosts, they are all on the same IP address but have different CNAMEs.
Setting this up is pretty easy. First, make the DNS CNAME aliases for each host you want to serve. Second, make subdirectories in the web tree for each host. Finally, set the virtual_host server parameter in coleman’s config file, restart, and you’re good to go.
Sometimes
another site on the net will embed your image files in their
HTML files, which basically means they’re stealing
your bandwidth. You can prevent them from doing this by
using non-local referrer filtering. With this option,
certain files can only be fetched via a local referrer. The
files have to be referenced by a local web page. If a web
page on some other site references the files, that fetch
will be blocked. There are two config-file server parameters
for this feature:
nlr_url_pattern
A wildcard pattern for the URLs
that should require a local referrer. This is typically just
image files, sound files, and so on. For example:
"nlr_url_pattern":
"**.jpg|**.gif|**.png|**.mp3|**.mpg"
nlr_no_empty_referrers
By default, requests with no referrer at all, or a null referrer, or a referrer with no apparent hostname, are allowed. With this variable set, such requests are disallowed.
Coleman
handles a few signals, which you can send via the standard
Unix kill(1) command:
INT,TERM
These signals tell coleman to shut down immediately. Any requests in progress get aborted.
USR1 |
Tells coleman to shut down as soon as it’s done servicing all current requests. In addition, the network socket it uses to accept new connections gets closed immediately, which means a fresh coleman can be started up immediately. |
||
USR2 |
Tells coleman to generate the statistics syslog messages immediately, instead of waiting for the regular hourly update. |
||
HUP |
Tells coleman to close and re-open its log file, for instance if you rotated the logs and want it to start using the new one. |
The name is from Denholm Elliott’s character in the movie "Trading Places", following the tradition of naming web servers after butlers.
servlet(3), htpasswd(1)
Copyright © 2014 by Jef Poskanzer <jef@mail.acme.com>. All rights reserved.