This post is part of a series on automatically backing up your website:
- Part 1: Getting Started with Amazon S3 and S3Fox
- Part 2: Using Amazon S3 from the Command Line with s3sync
- Part 3: Backing Up Your Data with Astrails-Safe
- Part 4: Automating Your Backup with Cron
Installing Astrails-Safe
Astrails-safe is a ruby gem, so you can install it on your web server like so:
sudo gem install astrails-safe
(Note that there are interesting forks of astrails-safe, such as Tom Cocca’s fork that supports Rackspace Cloud files as well as email alerts if there is a problem during the backup process.)
Configuring Astrails-Safe
You need to setup a config file for each backup task. Currently I’m backing up two websites on my server, so I have two config files. The config files are written in Ruby. Follow the example configuration in the documentation. Here’s my configuration file for backing up techiferous.com:
safe do local :path => "/var/backups/techiferous/:kind/:id" s3 do key "your Amazon access key ID would go here" secret "your Amazon S3 secret access key would go here" bucket "backup.techiferous.com" path ":kind/:id" end keep do local 100 s3 100 end mysqldump do options "-ceKq --single-transaction --create-options" user "root or some other mysql user" password "ain't gonna tell ya" socket "/var/run/mysqld/mysqld.sock" database :techiferous end tar do archive "techiferous-com" do files "/var/www/techiferous.com/" end end end
When astrails-safe runs, it fills in :kind with the type of backup it is doing (in my case, “mysqldump” when it is backing up my database and “archive” when it is backing up my website code). It uses :id to distinguish among the backups we are making. This would be useful if I was backing up multiple databases or directories in this config file. As it is, the :id is "techiferous" (the name of the database) when it’s backing up the database, and "techiferous-com" when it’s backing up the website code.
The keep do block lets you specify the number of backups to keep. Be careful that you don’t keep too many or your web server will run out of disk space and your web app will stop working.
Update: You need to be mindful of symlinks since astrails-safe doesn’t follow them automatically. If, for example, you are using Capistrano to deploy a Rails application, then you would expect to put
files "/var/www/yourapp/current"in the above configuration file. That won’t work because thecurrentdirectory is a symbolic link. One solution is to do this instead:files File.readlink("/var/www/yourapp/current"). However, as Vitaly Kushner pointed out, a better solution is to useoptions "-h"inside thetar doblock.
I chose to save my config files in /etc/astrails-safe. If you have a better suggestion of where to put them, let me know, as I’m no expert on where to put things according to the Linux FHS.
Backing Up Your Data
Now we finally get to the part where we back up your website! If you want to, you can do a dry run first:
sudo astrails-safe --dry-run /etc/astrails-safe/techiferous.rb
And now to actually backup your web site:
sudo astrails-safe --verbose /etc/astrails-safe/techiferous.rb
Restoring Your Data
We’re not done yet! No backup solution is complete until you’ve also tested the restoration process. So drop your database and rm -rf your web site code! Now that your millions of eager users are wondering why your site is down, let’s see whether we can restore it from your S3 backups.
First, let’s retrieve your database backup file from S3:
s3cmd get backup.techiferous.com:mysqldump/techiferous/mysqldump-techiferous.091027-1912.sql.gz /var/tmp/restore.sql.gz
Unzip it:
gunzip /var/tmp/restore.sql.gz
Create the database. One way is to use the MySQL command line client:
CREATE DATABASE techiferous; EXIT;
Now from the Linux command line, run this command to restore the data to your database:
mysql techiferous -u root -p < /var/tmp/restore.sql
And don’t forget to clean up:
rm /var/tmp/restore.sql
Now let’s retrieve your backed up web site code from S3:
s3cmd get backup.techiferous.com:archive/techiferous-com/archive-techiferous-com.091027-1912.tar.gz /var/tmp/restore.tar.gz
Unzip it (note: this will unzip it to the current directory, so you might want to be in a temporary directory like /var/tmp):
cd /var/tmp tar -xzvf /var/tmp/restore.tar.gz
Copy the restored files to wherever you serve your web sites from:
cp -r /var/tmp/var/www/techiferous.com /var/www/techiferous.com
And clean up after yourself:
rm /var/tmp/var -rf rm /var/tmp/restore.tar.gz
And you’re done! Your web site should be up and running again.
Next Step: Automating The Backup Process
Now let’s set up your web server to backup your website for you.
There is an easier solution for the symlink “problem” then File.readlink.
You can pass arbitrary options to ‘tar’ that is doing the archive.
tar do
options “-h” # this will dereference symbolic links
archive …
end
Also, I wouldn’t backup the ‘current’ directory anyway, since it contains data that is easier to recover with a simple deploy. what you do need to backup is the ‘shared’ dir of your capistrano installation (probably with an exclude of the ‘logs’ directory)
Astrails-safe is a perfect solution!
Thanks for investing your time into this…