mkHttpd 1.0 - A compact web server for applications (and stand-alone)
Introduction
Architecture
Tcl Server Pages
The Http Server Object
The Session Object
Notes
Examples
Installation
Changes
Author
This HTTP server is especially designed to be integrated into other applications. It consists of only 400+ lines of Tcl code and is hence very compact. Obviously it is by far not as complete and feature rich as the "official" HTTP server TclHttpd, but the latter simply has become too big for what I needed it.
There is one nice thing about it, though: It provides a feature for server side programming, that is (IMHO) quite cool. I named it TSP, Tcl Server Pages, since it is similar to the popular Active Server Pages (ASP) on Windows, where one can interweave HTML and VBA. Only here it is HTML and Tcl that is mixed together. This allows for a clearer structure of server side code and frees the user from constructing valid HTML through irksome string operations. See below for more information.
A typical use case for mkHttpd would be an application that controls reading from and writing into a database, which for instants contains information about products. Users must be able to connect to this database via the application and retrieve or store data. The application would take care about user login, assigning permissions to users, tracing user activities, or simply make sure that the right data is stored at the right place in the right format. Such an application usually provides a GUI that interacts with the user. mkHttpd can replace that GUI and act as a proxy instead, for any users who would then connect through a browser. The browser is now the actual GUI, and mkHttpd and TSP provide the interface to the application.
Another example: Your customer's 7x24 production site is running some nifty client/server software of yours, and you need a means to remotely watch all these little programs running in a distributed environment, because the customer pays you for hotline support. Since your programs are all little server processes by nature, they don't have a GUI, and they don't need one because they are supposed to run fast and quiet in the background anyway. Now, by integrating mkHttpd into them and writing some TSPs, you get a "hidden" interface that is accessible from virtually anywhere. And better, there is no overhead job for the servers unless they are bugged by your HTTP requests.
Obviously, mkHttpd can run alone as a pure web server, using TSP as the primary server side programming language.
The architecture of mkHttpd is not very complex: There is one object, the Http server object, which runs a TCP server socket on a configurable port (usually 80 for HTTP) and accepts any incoming connects. For each connected client, the Http server object creates another object, a Session object, that does all the communication with the client. A client is defined by its IP address, not by the individual TCP socket itself.
Consequently, it is assumed that on the connecting computer there is one application that talks to the server. This is usually a browser. The corresponding Session object on the server side holds state information about the client's session, such as timers to terminate a session after a certain time, and also user-defined data. If the client opens several browsers, there is still only one Session object on the server side and hence one context for all of them. This is usually a feature, not a limitation.
There are some important details to the Session object: Each Session object creates an interpreter where all TSP code is evaluated in. This is like a sandbox for the code, where it can't do any damage to the real application. If the TSP code raises an exception, a simple "505 Internal Server Error" HTTP Response is returned, but the application does not break. Furthermore, the interpreter remains state for the lifetime of the Session object. The interpreter's context is separated from the application, and the TSPs can do with it whatever they want.
But how does the TSP code get access to the application's context, from which it wants to return some information? The answer is an opening in the sandbox interpreter, which leads to the corresponding Session object, and to this object only. That is, the interpreter can call methods of the Session object, which in turn collects the data from the application (because it is running in the same context), and returns it to the sandbox. In other words: From within the TSP sandbox, only that information can be retrieved that is explicitely provided by the Session object. This is another level of protection of the application from the TSP code.
TSP is for server side programming. It is a mixture of Tcl and HTML, just like ASP is a mixture of Visual Basic and HTML. That means that Tcl can be put right into the middle of some HTML code, as long as it is surrounded by the special tags <% and %>. If, for instants, the Tcl code specifies a loop, then the HTML code inside the loop is repeated according to the loop's parameters.
Suppose we want to display the contents of an array as a table with two columns, like this:
Index | Value |
---|---|
Make | Mazda |
Model | Miata |
Year | 2000 |
Price | FREE! |
...then the TSP code that would create that output looks like this:
<html> <head><title>Car Data</title></head> <body> <table> <tr><th>Index</th><th>Value</th></tr> <% foreach { sIndex sValue } [array get carData] { %> <tr><td>$sIndex</td><td>$sValue</td></tr> <% } %> </table> </body> </html>
Got it? The line <tr><td>$sIndex</td><td>$sValue</td></tr> is created once for each element in the array, because it is inside a foreach loop. All Tcl code is inside the <% ... %> braces. The only Tcl code that is allowed outside of the braces are variable references (denoted as usual by a $) and command substitution (denoted as usual by the [...] brackets).
If the current time needs to be integrated into an HTML page, then this could be done like this:
<html> <head><title>Print Time</title></head> <body> It is now <b>[clock format [clock seconds] -format {%D %T}]</b> </body> </html>
...and the result is obviously something like:
It is now 10/31/00 21:23:01 |
The thing with the $ and the [] is in no way different from the TML concept of tclHttpd. However, the <% ... %> braces and the possibility to intermingle Tcl and HTTP is. Its advantage is that it makes the manual construction of HTML by means of string operations obsolete, and that the code becomes more readable. Well, at least in most cases...
As mentioned earlier, all TSP code is evaluated in a separate interpreter (one for each client) that remains state across the various TSP calls. Consequently, it is possible to define variables and Tcl procedures in one particular page, e.g. index.html, and then use them in many TSP files throughout the session. Also, by testing on variables it is possible to enforce the order in which pages can be visited (e.g. the user has to go through a login page first, then can visit other pages. To implement a session time limit, a timer can be created after a successful login. That timer would change a global variable when it expires. Testing that variable on each page provides a simple indicator if the session has expired or not).
There are four global arrays in each interpreter, which are automatically defined by mkHttpd:
The Session array contains information about the session. It does not change during the lifetime
of the interpreter.
Session(ipaddress): The IP address of the client
Session(rootdir): The root directory of the document tree
Session(default): The default page to use (often index.html)
The Request array contains information about the current HTTP request. It can change for each request
for a TSP file.
Request(method): The HTTP method: Either "GET" or "POST"
Request(url): The URL of the request
Request(query): The (undecoded) query parameters
The Mime array contains all MIME keys and their values. It can change for each request
for a TSP file and depends on the type and number of MIME keys sent in the HTTP header. Examples:
Mime(connection): "keep-alive"
Mime(accept): "*.*"
Mime(host): "localhost"
The Query array contains all query parameters and their values. It is voilatile and depends on the query string that was passed along with the URL.
E.g., if the query was username=xyz&password=donttell, the array contains:
Query(username): "xyz"
Query(password): "donttell"
Note that POST data is also treated like a query string and put into this array.
All of the said above happens in the sandbox, the interpreter that was created by the Session object. Again, there is one interpreter for each client, where "client" means "IP address". For two different computers connecting to the web server, there would be two Session object, two interpreters and two variable contexts.
It hasn't been explained yet how the sandbox accesses the member functions of the Session object. That actually is done via the tsp command that exists in each interpreter. It is nothing more than an alias for the interpreter's associated Session object. Hence, its methods are called just like for any other object: tsp method ?args?
Suppose there is a member function of the Session object that accepts a number and returns the square of that number:
member Session:getSquare { fNum } { return [expr $fNum*$fNum] }
From within TSP, this function is called like this:
<html> <head><title>Get Square</title></head> <body> <% set fMyNum 10 %> The square of <b>$fMyNum</b> is <b>[tsp getSquare $fMyNum]</b> </body> </html>
...and the result is something similar to
The square of 10 is 100 |
Note that if this call fails for any reason and returns an exception, it is the sandbox that gets it, not the main application. The Session object simply catches the exception and returns the HTTP code "500 Internal Server Error" (unless the TSP code catches the exception explicitely).
Besides from the four arrays and the tsp command, the exit command in the sandbox requires some explanations as well: It is redefined so that it does not terminate the application, but rather interrupts the TSP execution. It is similar to return, except that it a) completely unwinds from the call stack, and b) allows to specify a certain HTTP result code to be returned to the server (e.g. "403 Permission denied").
A plain exit simply interrupts the TSP execution from whereever it is called (e.g. somewhere inside a nested procedure call) and returns the HTTP code "200 Data follows" to the client, along with the generated HTML code. This is the regular case and is identical with exit 200.
To perform a redirection to another URL, use the syntax exit 302 new-URL. 302 is the HTTP code for "Found" and tells the client where the requested resource really is. A browser would then immediately issue a new request for that URL.
To return other HTTP codes, the syntax is exit code ?string?. The specified code is sent back to the client along with some HTML that contains the optional string.
It is also possible to surpress any HTTP response and let the Session object take care of it. The syntax is: exit 0 function-name ?args ...?. The function name must specify a member function of the Session object and is called with the given arguments. This function is then responsible for sending a valid HTTP response back to the client (which is supported by the default Session object, see below). If the function call fails, a "500 Invalid Server Error" is returned instead.
One last thing: TSP creates HTML through Tcl. A TSP page is "compiled" into Tcl code, then evaluated within the sandbox. The result is (hopefully correct) HMTL code that is sent back to the requestor. For performance reasons, mkHttpd stores the Tcl code as a file, so that is does not have to compile the TSP page every time it is requested. It also stores the resulting HTML code as a file, because it uses the fcopy command to transfer it in the background to the client. The storage location of these files is a directory called .tsp that mkHttpd creates in the same directory where the requested TSP file resides. Consequently, mkHttpd needs write access to the document tree.
The application can instantiate as many Http Server objects as required, but naturally only one for each port. The command syntax is:
Httpd name ?options?
The object will immediately create a server socket and listen on the specified port. To make the object stop accepting connections, it must be destroyed. The object does not have any methods. The following options are defined for the Httpd object:
-address ip-address
Specifies the domain-style name or numerical IP address of the server-side network interface to use for
the connection. This option may be useful if the server machine has multiple network interfaces.
-port port-number
Specifies the port on which the server will accept connections. The default value is 80,
which is usually used for HTTP.
-rootdir directory
Specifies the root of the document tree. Usually an absolute path. The default is ".", the current
working directory.
-default filename
The file to use if the URL specified by the client is a directory. The server appends the name of
this file to the directory and tries to send this one over. The default is "index.html".
-session class-name
Specifies the class to use for Session objects. This class must either be the standard
class "Session" (the default), or be derived from the Session class.
-timeout seconds
Specifies how much time of inactivity has to expire, before a session object is destroyed.
The default is 120 seconds. After that time, the session object and its sandbox goes away.
When the corresponding client connects again, a fresh session object is created.
The Session object gets instantiated by the Http Server object for each connecting client machine. By default, the Http Server object uses the class "Session" to create a Session object. This class is part of the mkHttpd package and contains the HTTP protocol stack (the part of it that is implemented, to be precise), the TSP parser, and some functions for sending HTTP responses back to the client.
This is already enough to start a plain web server that serves static pages as well as dynamic contents via TSP. It does have no connection to the application's context in which it is running in, though. In order to achieve this, additional member functions must be defined, which can then be called from inside the TSP sandbox and return information from the application's context back to it.
There are two ways to achieve this: Either the standard Session class is extended by simple defining additional member functions, or a new class is derived off the standard Session class, and the -session option is specified for the Http Server object.
Obviously, the complexity of a self-defined member function is not limited. It might simply return the value of some variable of the application's global context, or complete chunks of HTML with embedded data. It is a design question, though, if it makes sense to deal with HTML in both TSP and the Session object.
Per naming convention, all member functions of the default Session object that start with an underscore are considered private and should not be used. There are four member functions that are considered protected and provide support for creating HTTP responses. They are designed to be used only in the case where the TSP code supresses an automatic HTTP response with exit 0. These functions are:
replyFile filename
Uses the HTTP response "200 Data follows" and sends the specified file back to the client. In other words,
it creates the most common HTTP response: The requested file was found and can be sent to the client.
replyError code ?reason? ?info?
Returns a user specified HTTP code. The code must be one of those defined in the constructor of the
default Session object. In addition, a small HTML page is returned that includes the reason string
plus the info string.
replyRedirect new-URL
Returns "302 Found" along with the specified URL. If the client is a browser, it would immediately
issue a new HTTP request containing the URL.
replyData mime-type data-string
Returns arbitrary data back to the client. The mime-type must be one of those defined in the constructor
of the default Session object, e.g. "text/html". The data string can be anything as long as the mime
type describes what it is so that the browser knows how to render it.
mkHttpd is not multi-threaded. The various session objects all run in the main context of the application where the Http server object is running. This is intentionally so, because mkHttpd was written to be integrated into applications where it could deliver status information of that application to an external viewer, the browser. Multiple threads are possible, though, but they would require the application to protect all its data structures and serialize access to them, e.g. via mutexes or critical sections. And after all, data transfer back to the client is done via fcopy, which is anyway threaded.
To integrate mkHttpd into an application, the following additional lines are required:
package require mkHttpd ;# loads the package mkHttpd www ;# creates a server object "www" vwait forever ;# ONLY REQUIRED FOR TCLSH, NOT FOR WISH!!!
The application must somehow enter the event loop for the server socket to work. For Tk applications this is the default case. For a pure Tcl application, the vwait command serves that purpose.
To extent the default Session class with additional member functions, use the following syntax:
member Session:functionName { args } { body }
mkHttpd comes with a little demo application that demonstrates the features described above. It is located in the demo directory of the distribution. Simply run "demo.tcl", then enter the URL http://localhost:1080 to get to the application's home page.
For a small example of how to integrate mkHttpd into a C/C++ application, refer to the file "ctest.c". For Windows, the executable "ctext.exe" is included. To check it out, start the program and go to http://localhost:1088 to display data from the program's variable context.
mkHttpd is written in pure Tcl/Tk (version 8) and requires the packages mkGeneric and mkClasses. These dependencies will be eliminated as soon as Tcl is available with an OOP system (Tcl 8.4).
To install, place the directory "mkHttpd1.0" in one of the directories contained in the global Tcl variable "auto_path". For a standard Tcl/Tk installation, this is commonly "c:/program files/tcl/lib" (Windows) and "/usr/local/lib" (Unix).
No changes - Initial version.
Michael Kraus
mailto:michael@kraus5.de
http://mkextensions.sourceforge.net