Monthly Archives: January 2007

Evolving Framework, Step 3: Cleaning up the server-side

Once I had my virtual host set up to route all requests into a single script, I then had to figure out what to do with each request. I knew that the essential request information was available in the $_SERVER autoglobal, but I wanted a cleaner way of dealing with the request. So I put together a very simple Request class, which I saved in the document root as Request.php:

<?php 
class Request 
{ 
    public $method; 
    public $url; 
    public $parameters = array();  
    public function __construct() 
    { 
        $this->method = $_SERVER['REQUEST_METHOD']; 
        $this->url = $_SERVER['SCRIPT_URI']; 
        $this->parameters = $_REQUEST; 
    } 
} 

Continue reading

Evolving Framework, Step 2: Make PHP understand the clean URL

So why am I bothering with “clean” URLs? For several reasons really. For one thing, they look nicer than un-clean URLs. I would much rather get rid of something like, this:

http://www.example.com/forumDisplay.php?fid=17

and instead use something like this:

http://www.example.com/forum/general

But aesthetics aside, why should I bother? My reason is inspired by Roy Fielding’s REST. I won’t try to get into the guts of it here, partly because there’s alot to it, and partly because my grasp of it is far from complete. Regardless, what I’ve taken from my brief introduction to it is a resolution to build applications from a resource-centric point of view, as opposed to a functional, process-centric point of view. Instead of building a collection of scripts that expect data to be passed in so that they can spit data out, I want to try to build a system of resources that can be manipulated.

In order to manipulate a resource, three essential pieces of information are required:

  1. The location of the resource in question (the URL)
  2. The type of manipulation to be performed on the resource (HTTP method; PUT, GET, POST, DELETE)
  3. The data needed to perform the manipulation (request parameters)

One thing I noticed at this point was that the request method could essentially take the place of the action parameter I’ve seen being passed around in so many frameworks (and which I’m guilty of having used in the past). With any resource in mind (e.g., user object, news article, forum thread, photo album, etc), PUT, GET, POST, and DELETE can do pretty much anything you need, or at least imply your intention to do pretty much anything.

Also, those four major request methods map pretty conveniently to CRUD:

  • PUT == CREATE
  • GET == READ
  • POST == UPDATE
  • DELETE == DELETE

I figured a good place to start trying to implement something like this would be structuring the URLs in such a way that they look like nouns. So an old school login form may have looked like:

<form action="/login.php" method="post">
  Username <input type="text" name="username" /><br />
  Password <input type="password" name="password" /><br /> 
  <input type="submit" value="Login..." />
</form>

The problem with that code (RESTfully clean URLs in mind) is the action and the method. The form is POSTing the username and password to login.php. But is login.php really the resource that I want to POST to? If I replace the word POST with the word UPDATE, it makes a little bit less sense. I’m not updating login.php itself. That script doesn’t represent a resource of any kind; it’s just a procedure, a machine that takes data in one side and spits data out the other.

So I took a step back and thought about what I was actually trying to accomplish. I wanted to authenticate a user; check their username and password against the database, and if it matched a valid user record then stick their information in the session for use throughout their visit.

In other words, my goal was to create an authenticated user session. Since “create” maps to the PUT method, I decided to change the form around to look like this:

<form action="/session" method="put">
  Username <input type="text" name="username" /><br />
  Password <input type="password" name="password" /><br /> 
  <input type="submit" value="Login..." /> 
</form>

My intention was to provide the following information to the web server:

  • Location of the resource: /session
  • Operation to be performed on the resource: PUT (i.e., create)
  • Additional information needed to operate on the resource: username and password
    • It seemed a good bit more logical this way. Instead of telling the web server, “here’s a username and password that I want the login script to authenticate“, I’m telling it, “I want to create a new session based on this username and password“.

      The tricky part was actually writing the code to interpret the request in such a way that this actually happened like I wanted. Step 3 is coming up next. In it I’ll explain my initial approach to solving my RESTlessness, and some unexpected problems I ran into along the way.

Evolving Framework, Step 1: mod_rewrite + PHP = Clean URLs

Using clean URLs with Apache is fairly simple with mod_rewrite’s help. I’d be the first to admit that rewrite rules can be as

ridiculously complicated as they are powerful, but plenty of people have already admitted this. Luckily, what I’m trying to accomplish is not very difficult, mod_rewrite-wise. Taking a forum as an example, you might see URLs like:

  • http://example.com/ – site’s main page
  • http://example.com/forum – list of forums
  • http://example.com/forum/general – list all threads in general forum
  • http://example.com/forum/general/some_arbitrary_topic – list all posts in some_arbitrary_topic thread

To handle URLs similar to this, I set up the virtual host container to simply rewrite every request through a single PHP script, which I’ll lovingly call index.php. To start off, it just looks like this:

<?php
    print_r($_SERVER);

Pretty basic and useless. It just dumps out everything about the request and the server environment in an easy-to-read manner. But we have to tell Apache to use index.php for everything, so the virtual host container in Apache’s config file looks like this:

<VirtualHost *:80>
    ServerName      dev.cholmon.com 
    DocumentRoot    /www/vhosts/cholmon.com/dev/
 
    RewriteEngine   On 
    RewriteRule     /*\.(css|js|gif|png|jpe?g)$ - [NC,L] 
    RewriteRule     ^/* /index.php
 
    <Directory "/www/vhosts/cholmon.com/dev/">
                AllowOverride None 
                Order allow,deny 
                Allow from all 
     </Directory>
</VirtualHost>

(To read up on configuring virtual hosts and figuring out mod_rewrite, check out http://httpd.apache.org/docs/2.2/)

Those three rewrite directives accomplish the following:

  1. RewriteEngine On: tells Apache to expect some rewrite rules
  2. RewriteRule /*\.(css|js|gif|png|jpe?g)$ – [NC,L]: don’t rewrite images, stylesheets, or javascripts.
  3. RewriteRule ^/* /index.php: any other request should just run index.php

So, for instance, if you type http://example.com/some/made/up/path?this=that&foo=bar into your browser’s address bar, the request would get sent into index.php and you’d see the following (pay particular attention to the highlighted lines):

Array 
( 
    [SCRIPT_URL] => /some/made/up/path 
    [SCRIPT_URI] => http://example.com/some/made/up/path 
    [HTTP_ACCEPT] => image/gif, image/x-xbitmap, image/jpeg, */* 
    [HTTP_ACCEPT_LANGUAGE] => en-us 
    [HTTP_UA_CPU] => x86 
    [HTTP_ACCEPT_ENCODING] => gzip, deflate 
    [HTTP_USER_AGENT] => Mozilla/4.0 (compatible; MSIE 7.0) 
    [HTTP_HOST] => dev.cholmon.com 
    [HTTP_CONNECTION] => Keep-Alive 
    [PATH] => /sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin 
    [SERVER_SIGNATURE] =>
    [SERVER_SOFTWARE] => Apache/2.2.0 (Unix) PHP/5.2.0  
    [SERVER_NAME] => dev.cholmon.com 
    [SERVER_ADDR] => 65.111.167.214 
    [SERVER_PORT] => 80 
    [REMOTE_ADDR] => 65.4.76.89 
    [DOCUMENT_ROOT] => /www/vhosts/cholmon.com/dev/ 
    [SERVER_ADMIN] => [no address given] 
    [SCRIPT_FILENAME] => /www/vhosts/cholmon.com/dev/index.php 
    [REMOTE_PORT] => 57503 
    [GATEWAY_INTERFACE] => CGI/1.1 
    [SERVER_PROTOCOL] => HTTP/1.1 
    [REQUEST_METHOD] => GET 
    [QUERY_STRING] => this=that&foor=bar 
    [REQUEST_URI] => /some/made/up/path?this=that&foor=bar 
    [SCRIPT_NAME] => /some/made/up/path
    [PHP_SELF] => /some/made/up/path 
    [REQUEST_TIME] => 1169876370 
)

At this point, the main parts of the request that I’m interested in are:

  • The method (GET)
  • The script name (/some/made/up/path)
  • The query string (this=that&foor=bar)

Step 2 will be up here in the next day or so. In it, I’ll modify index.php so that it parses those three pieces of information and decides what to do with the request. OMG STAY TUNED LOL!!!