« give and take | Main | draw me a river »

rearranging one's movable type archives

Sometimes you don't get around to doing something until somebody else asks you how to do it (thanks, Steph). I'd been thinking of rearranging my archives so that, rather than having hundreds of posts all in the one directory, they'd be arranged by date (like with my journal). I've been using Movable Type since it was publically released, and given that search engines have picked up certain posts of mine (eg. my Linux kernel patch to make my USB CompactFlash card reader work), it'd be Bad Form to stop my original archive URLs from working (realizing this made me suddenly feel a bit too lazy to do it, a few weeks back when I was considering it).

So here's how I did it. Consider the following a "rough as guts" braindump that I'll clean up later on, if anybody's interested.

Apache to the rescue ! I'd never had a use for mod_rewrite's map files before, but they made it laughably easy to make this work. The possibly rather major drawback of this method is that you either need to be able to modify your own Apache config files, or be on nice enough terms with your sysadmin to be able to get them to add a single config line for you. This reflects my existence as somebody who's spent a good part of the last 6 years or so fiddling with Apache configs, perhaps at the expense of reality, or better methods (like a more Movable Type-based solution). I'm sorry if it offends you.

  1. Change your Movable Type archive settings. for each of the archive file templates, you can use something like the following :

    • Individual archive : <$MTArchiveDate format="%y/%m/%d/"$><$MTEntryID pad="1"$>.html

      This means an entry that was archived at /blog/archives/000123.html will now be archived at (for instance) /blog/archives/2002/08/19/000123.html.

    • Monthly archive : <$MTArchiveDate format="%Y/%m/index.html"$>

    • ...and so on.

  2. Rebuild your whole site, and make sure it looks ok.

  3. Build a map file. If you've got shell access on the machine where your blog's hosted, go and sit yourself in the archive directory, and do something like the following :

    touch archivemap
    for x in 000*.html
    do
      echo $x `ls */*/*/$x`
    done >> ~/archivemap
    

    Otherwise, you'll have to build it by hand (urgh). The mapfile should look like the following :

    000001.html 2002/08/18/000001.html
    000002.html 2002/08/18/000002.html
    000003.html 2002/08/19/000003.html
    

    ...which simply maps the old "everything in one directory" file into the new "lots of subdirectories" file.

  4. Now you've got your mapfile ready, place it somewhere useful on your website - it doesn't have to be in your document root (the bit that's accessible to the web), it just needs to be accessible by Apache. Let's say your website lives in /data/websites/mywebsite.org/ - you could place the file at /data/websites/mywebsite.org/lib/archivemap, for instance. You now want to add this to your Apache server config for your virtual host (or ask your sysadmin nicely. we generally respond better to being asked nicely) :

    RewriteMap archivemap txt:/data/websites/mywebsite.org/lib/archivemap
    

    (the server needs to be prodded, with an apachectl graceful or an apachectl restart - of course, you did a apachectl configtest first, right ?)

  5. Once that's in, you can go and place the following RewriteRules in your own .htaccess file (eg. /data/websites/mywebsite.org/.htaccess). We're pretending your blog lives at http://mywebsite.org/blog/ :

    RewriteEngine On
    RewriteBase /
    #
    # monthly archives
    #
    RewriteRule ^blog/archives/([0-9][0-9][0-9][0-9])_([0-9][0-9])\.html$ /blog/archives/$1/$2/  [R]
    #
    # individual archives
    #
    RewriteRule ^blog/archives/([0-9][0-9][0-9][0-9][0-9][0-9]\.html)$ /blog/archives/${archivemap:$1}  [R]
    

    Right. How does this work ?

    • The monthly archive file used to look like /blog/archives/2002_05.html but now looks like /blog/archives/2002/05/index.html - the first RewriteRule does this change.

    • The individual archives used to look like /blog/archives/000123.html but now live in a subdirectory that depends on the date - we have our mapfile to define where each entry lives, so we use that in the second RewriteRule.

  6. Now try it out. Test out some URLs. When you're done, you probably want to clean out all the crud in your archive directory.

Note that we only build the mapfile as a one-off. If you want new posts to be accessible via the old URL format, you'd have to rebuild the mapfile very time you add a new post (you could probably build an MT template for this, in fact). But in any case, if the map lookup fails, people just get rewritten to /blog/archives/. I've got my "master archive template" pointing to archives/index.html, so that's what they'll get if it all goes wrong - it's an acceptable fallback.

* 11:42 * geek