Every server needs to be backed up periodically. The trouble is finding an affordable place to store your filesystem if it contains large amounts of data. Amazon S3 is the solution with reasonably priced standard storage ($0.0300 per GB), as well as reduced redundancy storage ($0.0240 per GB) at the time of writing this article. Updated pricing can be seen at http://aws.amazon.com/s3/pricing/
This short tutorial will show how to backup a servers filesystem using s3cmd. S3cmd is a command line tool for uploading, retrieving, and managing data in Amazon S3. This implementation will use a cronjob to automate the backup processing. The filesystem will be scheduled to be synced nightly.
How to install s3cmd?
This example assumes you are using CentOS, or RHEL. The s3cmd library is included in the default rpm repositories.
yum install s3cmd
After installation the library will be ready to configure.
Configuring s3cmd
An Access Key and Secret Key are required from your AWS account. These credentials can be found on the IAM page.
Start by logging in to AWS and navigating to the Identity & Access Management (IAM) service. Here you will first create a new user. I have excluded my username below.
Next create a group. This group will hold the permission for our user to be able to access all your S3 buckets. Notice under permissions the group has been granted the right to “AmazonS3FullAccess” which means any user in this group can modify any S3 bucket. To grant your new user access to the group click “Add Users to Group” and select your new user from the list.
For s3cmd to connect to AWS it requres a set of user security credentials. Generate an access key for the new user by navigating back to the user details page. Look to the bottom of the page for the “Security Credentials” tab. Under Access Key click “Create Access Key”. It will generate a Access Key ID and Secret Access Key. Both these are required for configuring s3cmd.
You now have a user setup with permissions to access the S3 API. Back on your server you need to input your new access key into s3cmd. To begin configuration type:
s3cmd --configure
You should now see this page and be able to enter your Access Key Id and Secret Key.
Enter new values or accept defaults in brackets with Enter. Refer to user manual for detailed description of all options. Access key and Secret key are your identifiers for Amazon S3 Access Key: xxxxxxxxxxxxxxxxxxxxxx Secret Key: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Encryption password is used to protect your files from reading by unauthorized persons while in transfer to S3 Encryption password: xxxxxxxxxx Path to GPG program [/usr/bin/gpg]: When using secure HTTPS protocol all communication with Amazon S3 servers is protected from 3rd party eavesdropping. This method is slower than plain HTTP and can't be used if you're behind a proxy Use HTTPS protocol [No]: Yes New settings: Access Key: xxxxxxxxxxxxxxxxxxxxxx Secret Key: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Encryption password: xxxxxxxxxx Path to GPG program: /usr/bin/gpg Use HTTPS protocol: True HTTP Proxy server name: HTTP Proxy server port: 0 Test access with supplied credentials? [Y/n] Y Please wait, attempting to list all buckets... Success. Your access key and secret key worked fine :-) Now verifying that encryption works... Success. Encryption and decryption worked fine :-) Save settings? [y/N] y Configuration saved to '/root/.s3cfg'
At this point s3cmd is fully configured and ready to push data to S3. The final step is to create your own S3 bucket. This bucket will serve as the storage location for our filesystem.
Setting up your first S3 bucket
Navigate to the AWS S3 service and create a new bucket. You can give the bucket any name you want and pick the region for the data to be stored. This bucket name will be used in the s3cmd command.
Each file pushed to S3 is given a storage category of standard or reduced redundancy storage. This is configurable when syncing files. For the purpose of this tutorial all files will be stored in reduced redundancy storage.
Standard vs Reduced Redundancy Storage
The primary difference between the two options is durability; or how quickly do you need access to your data. Standard storage gives you nearly instant access to your data, where as reduced redundancy storage (RRS) may take up to several hours to retrieve the file(s). For the use case of this tutorial all files are storage in RRS. As noted previous RRS is considerably cheaper than standard storage.
Configuring a simple cronjob
To enter the cronjob editor simply type
crontab -e
Once in the editor create the cronjob below which will run Monday – Friday at 3:30 a.m. every morning.
30 3 * * 1-5 /usr/bin/s3cmd sync -rv --config /root/.s3cfg --delete-removed --reduced-redundancy put /PATH/TO/FILESYSTEM/LOCATION/ s3://MYBUCKET/ >/dev/null 2>&1
This cronjob calls the s3cmd sync command and loads the default configuration which you have entered above. The –delete-removed option tells s3cmd to scan for locally deleted files, then remove them from the remove S3 bucket as well. The –reduced-redundancy option places all files in RRS for cost savings. Any folder location can be synced, just change the path to your desired location. Make sure to change mybucket to the name of your S3 bucket.
The server has now been configured to do nightly backups of the filesystem to AWS S3 using s3cmd library. Enjoy!