Blog moved to wordpress on openshift

I moved this blog a while back from Blogger to Wordpress. I was looking to move away from Blogger/Blogspot, to something self-hosted. I had come up with the following list to make the move seamless (for me as well as regular visitors):

Red Hat's OpenShift PaaS platform had just announced support for domain aliases for applications, so I started looking at what would be involved in moving the blog on their platform.

Read on for my experiences and details on deploying this Wordpress blog on OpenShift.

I already had played with OpenShift a bit, and loved their workflow of deploying apps using git. Deploying a wordpress install on OpenShift would mean I wouldn't have to manage my own servers, operating systems, software updates, etc. It's all on the stable and secure RHEL platform, with PHP managed by the RHEL team. So all I would need to worry about is just the wordpress installation itself.  As long as I routinely check for security updates to wordpress, and push those updates to the site, I should be doing OK.

So I created a new php-5.3 app using 'rhc-create-app'. mysql is needed for the database, so I also added an instance to the app with the command

 rhc-ctl-app -e add-mysql-5.1 -a <appname>

To manage the mysql instance, a phpmyadmin cartridge is desirable too:

rhc-ctl-app -e add-phpmyadmin-3.4 -a <appname>

To make sure my custom domain works, let's add aliases as well:

rhc-ctl-app -c add-alias --alias log.amitshah.net
rhc-ctl-app -c add-alias --alias www.amitshah.net

I had used both, log. and www. for the blog, so let's continue using both so that both domains continue working. Of course, I changed the DNS CNAME entries for www. and log. over to <appname>-<domainname>.rhcloud.com via my name provider's site.

Next, using the admin credentials on the mysql db, I then created a new db and a new user and gave the user all permissions on that db.  All this is quite simple using the phpmyadmin interface.

That's it, all set with the app on OpenShift.

I then went and downloaded the latest wordpress release (3.2.1 then) zip file and extracted the files in a local directory.

Now here's where I started using the power of git and OpenShift: I created a git repo in the wordpress directory and added all files to it, and made an initial commit. This is my base from where I'll use wordpress.  New wordpress releases can be copied in this directory, and a new commit will map to the upstream release version. Any modifications to files I make in my wordpress installation (e.g. theme changes) are tracked in another branch in the same directory, with that branch being rebased on top of the latest release (the master branch).

With this setup, I can just copy the contents of this directory into my app's php directory and push the changes to OpenShift. The 'php' directory is where all the app code resides. I then added all files in the git repo and committed the result. I then created the wp-config.php file as a copy of the wp-config-sample.php file, modified it to suit my installation, committed the change, and also added the file to the other wordpress directory created in the first step above. I then just pushed the changes, and the app was  live on the cloud and I could get started with wordpress's wizard-based installation.

Now here's one oddity of hosting apps on OpenShift: the app directory isn't writable, or isn't the place where the app itself can make changes and assume they'd be preserved (I think this is a good thing). Since the app is deployed via git, any content written to the server app directory can be lost on the next git push. For wordpress, this means the 'uploads' directory has to be given a place where images, etc., can be uploaded without problems.

The OpenShift people have helpfully given us some environment variables and hooks in the app deployment process, which can be used to do this right.

The default wordpress uploads directory is 'wp-content/uploads'.  We can continue using this directory, with the following snippet placed in '.openshift/action_hooks/build':

cd $app_dir
cat >> .openshift/action_hooks/build
if [ ! -d $OPENSHIFT_DATA_DIR/uploads ]; then
    mkdir $OPENSHIFT_DATA_DIR/uploads
fi

ln -sf $OPENSHIFT_DATA_DIR/uploads $OPENSHIFT_REPO_DIR/php/wp-content/

This ensures the 'wp-content/uploads' location is available for wordpress to put stuff into, and it also ensures the content goes into a place where OpenShift will not destroy the data on the next git push.

OK, having done all this, I was now ready to import my older blog posts. I installed the blogger-to-wordpress and livejournal-to-wordpress plugins (well, since I'm doing this, I thought I might as well import my older lj entries), git push'ed them, and did the import from the web interface.

Comments from livejournal entries and some blogger posts didn't get fetched. I don't know why that happened. I tried the import a couple more times, but those posts didn't show up. I just decided to not bother about that; if there was any frequently-visited post, I could always go back and import it by hand. Since I didn't expect to do any more imports, I removed those plugins and pushed the result again.

There is a blogger-to-wordpress redirect plugin, but that plugin does a lot more than just redirecting: it imports images uploaded to blogger or picasaweb on the blogger posts, generates blogger template to redirect blogger posts to wordpress, maps blogger posts to wordpress posts, etc.  Now most of this functionality is one-time; importing pictures, generating blogger template for redirection, etc., doesn't need to be present all the time (can't be too careful with php apps and security). I used the plugin to import all the blogger/picasaweb pictures it could fetch, and removed it as well.

I then enabled wordpress's custom URL structure, which allows blogger-like post URLs, with the year and month as well as post title in the URL. Enabling this needs .htaccess modifications, which wordpress can't make directly in our setup (because it can't write to the app directory).  So created a new .htaccess file in the php/ dir. in the OpenShift app directory and included the snippet wordpress helpfully tells you it would include if the directory were writable (my code is in the snippet below).

I also took some hints from the blogger-to-wordpress plugin and created a minimal plugin that maps blogger URLs to wordpress URLs, and installed this plugin.

Next up was to ensure the older feeds kept working, and also ensuring the contents of the wp config file, and directory listings weren't displayed. I also searched for some wordpress hardening tips, and compiled a fun-looking .htaccess file, snippet included below:

# Disable directory listing
Options All -Indexes

<files .htaccess>
    order allow,deny
    deny from all
</files>

<files wp-config.php>
    order allow,deny
    deny from all
</files>

RewriteEngine On
RewriteBase /

# Most of following comes from
# http://bloggertowp.org/migrate-from-blogger-to-wordpress-best-tutorial/

# Redirect feeds from labels
RewriteRule feeds/posts/default/-/(.*) category/$1/feed/ [L,R=301]

# Redirect older blogger RSS feeds
RewriteRule rss.xml feed/ [L,R=301]
RewriteCond %{QUERY_STRING} ^alt=rss$
RewriteRule feeds/posts/default feed/? [L,R=301]

# Redirect older blogger ATOM feeds
RewriteRule atom.xml feed/atom/ [L,R=301]
RewriteRule feeds/posts/default feed/atom/ [L,R=301]

# Redirect older blogger comments feeds
RewriteRule feeds/comments/default comments/feed/ [L,R=301]

# Redirect archives
RewriteRule ^([0-9]{4})_([0-9]{1,2})_([0-9]{1,2})_archive\.html$ $1/$2 [L,R=301]

# Redirect labels
RewriteRule ^search/label/(.*)$ category/$1/ [L,R=301]

# This is WP default: makes pretty URLs possible.
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]

I also installed the WP-Piwik and smart-404 plugins. WP-Piwik is a plugin that adds Piwik javascript code to give me a summary of the visits to the site, and the search keywords people use to land on my site. More on Piwik and its setup in a follow-up blog post. Smart-404 shows a list of pages with similar titles to the one being used in the 404 page. I had noticed a few 404 page hits via Piwik.

I've enabled the Akismet plugin that comes with the wordpress distribution, and it has flagged over 600 comments as spam so far, with just 2 false-positives. That's impressive, but I intend to look further into this:

  1. Is there a way to reduce spam comments?
  2. Why do wordpress sites get spammed so much?

What I've seen so far is people search for specific terms on the 'net, land on some post, and put the spam comment. So these are actual humans, not bots. Since they're investing enough effort into finding blogs and adding comments, spam prevention techniques like CAPTCHAs aren't going to work all the time. Akismet is working fine so far, so I'll continue using it, but I'm going to think / search for ways to mitigate spam.

Overall, the move was really painless, done within a weekend and the most time was spent in learning about Wordpress and moving the existing posts to the new blog. There were hardly any OpenShift issues, it stayed nicely out of the way, and I really like that about the platform.

I still haven't figured out a way to map Blogger labels to Wordpress Categories/Tags; these are new concepts (to me), and I'll probably get something done here with some more htaccess trickery.