coleman

NAME SYNOPSIS DESCRIPTION CONCURRENCY MODEL OPTIONS CONFIG-FILE PARAMETERS WILDCARD PATTERNS INITIALIZATION ORDER SSL CGI SERVLETS AUTHENTICATION CHROOT VIRTUAL HOSTS NON-LOCAL REFERRERS SIGNALS NAME SEE ALSO AUTHOR

NAME

coleman - web server and C servlet engine

SYNOPSIS

coleman config-file

DESCRIPTION

Coleman is a couple different things. It’s a simple, fast, and small web server. It’s also a servlet engine in C instead of Java.

The servlet architecture is a great idea, a nice simple plug-in interface for web servers, but it has some problems. First, it’s tied to Java. If you don’t like that language then you are stuck. Second, the performance of servlet-based web servers is inherently mediocre because they handle concurrency with threads.

Coleman fixes both of these problems; the first, by using plain old C as the API language; the second, with a new hybrid concurrency model.

CONCURRENCY MODEL

A web server’s concurrency model is how it handles multiple simultaneous requests. It’s fundamental to the server’s performance. The usual alternatives are: processes, threads, and non-blocking I/O. Forking a process for each connection is high-overhead and slow. Threads are cheaper and faster but still not great. Non-blocking I/O has the lowest overhead and is fastest, but it’s complicated to program.

Coleman uses a hybrid model: threads for simplicity and reasonable performance when starting requests, plus an interface to hand off static data back to the main loop where it is handled via non-blocking I/O for top performance. This avoids all the complicated parts of doing non-blocking I/O and just leaves the high speed and low cost.

OPTIONS

There’s only one command-line option: the config file. It’s required, but it can be fairly small. Some example config files are included in the source tree.

CONFIG-FILE

The config file is in JSON, not because there’s any JavaScript involved in the server but just becuse it’s a nice simple data format. Here’s a sample file:
{
"server": {
"listeners": [
{ "servlets": [ { "name": "file_servlet" } ] }
]
}
}
There’s a server object, which contains a listeners array with one listener, and the listener contains a servlets array with one servlet. In more elaborate setups, servers can include more than one listener and listeners can include more than one servlet.

Unlike standard JSON, coleman config files may contain comments in the usual JavaScript double-slash form.

Note: if you are currently a thttpd or mini_httpd user, the coleman distribution includes a little program called conf_conv that converts thttpd/mini_httpd config files into coleman’s JSON format, to the extent possible.

PARAMETERS

Servers, listeners, and servlets can all include optional parameters. The example above shows some - the listener has a protocol parameter, and the servlet has a name parameter. Here’s a list of all the parameters you can use.

Server parameters:

dir

An optional directory to switch to. A typical value might be "/usr/local/www/chroot/". No default.

chroot

Whether to use the chroot(2) system call. If this is true then the server will use chroot to isolate itself in one directory. This is an excellent security measure. Starting as root without specifying chroot is allowed but you get a warning message that it’s a bad idea. Default: false.

logfile

A filename for request log entries, in the usual CERN combined log format. The default is to log to stderr. If you are going to use SIGHUP to close and re-open the log file, there are some subtle interactions with the dir and chroot parameters to be aware of. Note that this log file is only for request log entries. Errors, startup/shutdown notices, and stats all go to syslog. Look for them in /var/log/messages or your system’s equivalent.

servlet_path

A list of directories to search for loadable servlet modules. The default is "modules:." - search the subdirectory called "modules", then search the current directory. For a production system a more appropriate value might be something like "/usr/local/www/chroot/modules".

virtual_host

Whether to do virtual hosting. If this is true then each request’s Host header is used to specify a subdirectory below the main data directory. Default: false.

charset

The character set specifier to insert into text MIME types. Default is UTF-8.

threads

How many threads to run. The default is 10. If there are more simultaneous requests than threads to run then, the requests get queued.

background

Whether to run as a background process. Default: false.

pidfile

Filename to write the server’s process-ID to.

subdir

An optional subdirectory to switch to after doing the chroot. A typical value might be "data". No default.

user

If the server is started as root, it will attempt to switch to another user to minimize security exposure. The default is "nobody".

nlr_url_pattern

Disallow non-local referrers for these URLs. Default: non-local referrers are allowed.

nlr_no_empty_referrers

Disallow empty referrers too. Default: false, empty referrers are allowed.

errors_dir
Directory to search for custom error pages.

The error files should be named "errNNN.html", where NNN is the error number. If this parameter is not set or if no such file exists, then a default error message is generated. info Your server’s name. Defaults to "coleman" plus the version.

url

A URL for the server. It shows up at the end of server-generated pages, such as directory listings and error pages. The default is acme.com’s coleman page.

Listener parameters:
protocol

Either http or https. Default: http.

port

The port number to listen on. Defaults to 80 for http and 443 for https.

local_address

Can be used to listen on a specific local address. Default: listen on all local addresses.

cert

The SSL certificate filename. You can have a different one for each listener.

info

The listener’s name. Defaults to some automatically generated info about the listener.

Servlet parameters:

name

The name of the servlet. Required.

pattern

If there’s more than one servlet in a listener, this pattern gets matched against the URL to determine which servlet to run.

WILDCARD PATTERNS

A few parameters use wildcard patterns. These are similar to shell wildcards - ? matches any single character, * matches any sequence of characters except / - plus two additional features: ** matches any sequence including /, and | separates multiple patterns.

INITIALIZATION ORDER

The first thing done in the initialization process is changing the current directory to the "dir" parameter, if specified. The second thing done is the chroot() call, if specified. Then the rest of the initialization. The second to last thing to happen is the second directory change to the "subdir" parameter, if specified. And the last thing to happen is switching UIDs to "user".

What this means is any filenames used in initialization - such as the log file, the servlet load path, an SSL certificate - are interpreted relative to "dir", and must be within the chroot tree. It also means sensitive files, such as SSL certificates, can be owned by root and protected against reading.

SSL

Coleman can optionally be compiled with SSL support. If SSL is available, you use it by creating a listener with protocol https instead of http. You will also need to give the cert option, specifying your PEM-format certificate file. Here’s an example server config file for SSL:
{
"server": {
"listeners": [
{
"protocol": "http",
"servlets": [ { "name": "file_servlet" } ],
}, {
"protocol": "https",
"cert": "example_com.pem",
"servlets": [ { "name": "file_servlet" } ]
}
]
}
}
If you like, you can have multiple https listeners running in the same server, each with a different local address and certificate file.

The Makefile includes a "make cert" target for creating self-signed certificates. You can also get a commercial certificate. And in Summer 2015 a free certificate service will be available at https://letsencrypt.org/

CGI

Coleman’s standard modules directory includes a cgi_servlet that implements the CGI 1.1 spec. You can use it by including it in your listener with an appropriate pattern, for example:
"listeners": [
{
"servlets": [
{ "name": "cgi_servlet", "pattern": "**.cgi" },
{ "name": "file_servlet", "pattern": "**" }
]
}
]

SERVLETS

Servlets are small bits of C code that the server calls to handle requests. They are normally compiled separately from the server, and are loaded at runtime. As mentioned above, the server option "servlet_path" gives a list of directories to search for servlet modules to load. Coleman comes with a few servlets in the modules subdirectory. file_servlet implements the usual web server behavior of serving files and directories, and cgi_servlet implements the CGI spec. If you just want to use coleman as a web server, that’s all you need.

If, on the other hand, you want to add your own servlets, you’ll want to learn about the servlet API. It’s documented in the servlet(3) man page. Perusing the source code of the included servlets will help get you up to speed. In particular, sample_servlet is a "Hello world" example, and test_servlet exercises all the API calls.

AUTHENTICATION

Basic Authentication uses a password file called ".htpasswd", in the directory to be protected. This file is formatted as the familiar colon-separated username/encrypted-password pair, records delimited by newlines. The protection does not carry over to subdirectories. The utility program htpasswd(1) is included to help create and modify .htpasswd files.

CHROOT

chroot(2) is a system call that restricts the program’s view of the filesystem to the current directory and directories below it. It is impossible for remote users to access any file outside of the initial directory. The restriction is inherited by child processes, so CGI programs get it too. This is a very strong security measure, and is recommended. The only downside is that only root can call chroot, so this means the program must be started as root. However, the last thing it does during initialization is to give up root access by becoming another user, so this is safe.

Note that with some other web servers setting up a directory tree for use with chroot is complicated, involving creating a bunch of special directories and copying in various files. With coleman it’s a lot easier, all you have to do is make sure any shells, utilities, and config files used by your CGI programs and scripts are available. If you have CGI disabled, or if you make a policy that all CGI programs must be written in a compiled language such as C and statically linked, then you probably don’t have to do any setup at all.

However, one thing you should do is tell syslogd about the chroot tree, so that coleman can still generate syslog messages. Check your system’s syslogd man page for how to do this. In FreeBSD you would put something like this in /etc/rc.conf:
syslogd_flags="-l /usr/local/www/chroot/dev/log"
Substitute in your own chroot tree’s pathname, of course. Don’t worry about creating the log socket, syslogd wants to do that itself. (You may need to create the dev directory.) In Linux the flag is -a instead of -l, and there may be other differences.

VIRTUAL HOSTS

Coleman can serve multiple virtual hosts on the same system. This is different from multiple listeners on different local addresses. In that case, each listener is actually on a different IP address. With virtual hosts, they are all on the same IP address but have different CNAMEs.

Setting this up is pretty easy. First, make the DNS CNAME aliases for each host you want to serve. Second, make subdirectories in the web tree for each host. Finally, set the virtual_host server parameter in coleman’s config file, restart, and you’re good to go.

NON-LOCAL REFERRERS

Sometimes another site on the net will embed your image files in their HTML files, which basically means they’re stealing your bandwidth. You can prevent them from doing this by using non-local referrer filtering. With this option, certain files can only be fetched via a local referrer. The files have to be referenced by a local web page. If a web page on some other site references the files, that fetch will be blocked. There are two config-file server parameters for this feature:
nlr_url_pattern

A wildcard pattern for the URLs that should require a local referrer. This is typically just image files, sound files, and so on. For example:
"nlr_url_pattern": "**.jpg|**.gif|**.png|**.mp3|**.mpg"

nlr_no_empty_referrers

By default, requests with no referrer at all, or a null referrer, or a referrer with no apparent hostname, are allowed. With this variable set, such requests are disallowed.

SIGNALS

Coleman handles a few signals, which you can send via the standard Unix kill(1) command:
INT,TERM

These signals tell coleman to shut down immediately. Any requests in progress get aborted.

USR1

Tells coleman to shut down as soon as it’s done servicing all current requests. In addition, the network socket it uses to accept new connections gets closed immediately, which means a fresh coleman can be started up immediately.

USR2

Tells coleman to generate the statistics syslog messages immediately, instead of waiting for the regular hourly update.

HUP

Tells coleman to close and re-open its log file, for instance if you rotated the logs and want it to start using the new one.

NAME

The name is from Denholm Elliott’s character in the movie "Trading Places", following the tradition of naming web servers after butlers.

SEE ALSO

servlet(3), htpasswd(1)

AUTHOR

Copyright © 2014 by Jef Poskanzer <jef@mail.acme.com>. All rights reserved.