Backup filesystem to Amazon S3


Every server needs to be backed up periodically. The trouble is finding an affordable place to store your filesystem if it contains large amounts of data. Amazon S3 is the solution with reasonably priced standard storage ($0.0300 per GB), as well as reduced redundancy storage ($0.0240 per GB) at the time of writing this article. Updated pricing can be seen at

This short tutorial will show how to backup a servers filesystem using s3cmd. S3cmd is a command line tool for uploading, retrieving, and managing data in Amazon S3. This implementation will use a cronjob to automate the backup processing. The filesystem will be scheduled to be synced nightly.

How to install s3cmd?

This example assumes you are using CentOS, or RHEL. The s3cmd library is included in the default rpm repositories.

yum install s3cmd

After installation the library will be ready to configure.

Configuring s3cmd

An Access Key and Secret Key are required from your AWS account. These credentials can be found on the IAM page.

Start by logging in to AWS and navigating to the Identity & Access Management (IAM) service. Here you will first create a new user. I have excluded my username below.

Next create a group. This group will hold the permission for our user to be able to access all your S3 buckets. Notice under permissions the group has been granted the right to “AmazonS3FullAccess” which means any user in this group can modify any S3 bucket. To grant your new user access to the group click “Add Users to Group” and select your new user from the list.

For s3cmd to connect to AWS it requres a set of user security credentials. Generate an access key for the new user by navigating back to the user details page. Look to the bottom of the page for the “Security Credentials” tab. Under Access Key click “Create Access Key”. It will generate a Access Key ID and Secret Access Key. Both these are required for configuring s3cmd.

You now have a user setup with permissions to access the S3 API. Back on your server you need to input your new access key into s3cmd. To begin configuration type:

s3cmd --configure

You should now see this page and be able to enter your Access Key Id and Secret Key.

Enter new values or accept defaults in brackets with Enter.
Refer to user manual for detailed description of all options.

Access key and Secret key are your identifiers for Amazon S3
Access Key: xxxxxxxxxxxxxxxxxxxxxx
Secret Key: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Encryption password is used to protect your files from reading
by unauthorized persons while in transfer to S3
Encryption password: xxxxxxxxxx
Path to GPG program [/usr/bin/gpg]:

When using secure HTTPS protocol all communication with Amazon S3
servers is protected from 3rd party eavesdropping. This method is
slower than plain HTTP and can't be used if you're behind a proxy
Use HTTPS protocol [No]: Yes

New settings:
  Access Key: xxxxxxxxxxxxxxxxxxxxxx
  Secret Key: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  Encryption password: xxxxxxxxxx
  Path to GPG program: /usr/bin/gpg
  Use HTTPS protocol: True
  HTTP Proxy server name:
  HTTP Proxy server port: 0

Test access with supplied credentials? [Y/n] Y
Please wait, attempting to list all buckets...
Success. Your access key and secret key worked fine :-)

Now verifying that encryption works...
Success. Encryption and decryption worked fine :-)

Save settings? [y/N] y
Configuration saved to '/root/.s3cfg'

At this point s3cmd is fully configured and ready to push data to S3. The final step is to create your own S3 bucket. This bucket will serve as the storage location for our filesystem.

Setting up your first S3 bucket

Navigate to the AWS S3 service and create a new bucket. You can give the bucket any name you want and pick the region for the data to be stored. This bucket name will be used in the s3cmd command.

Each file pushed to S3 is given a storage category of standard or reduced redundancy storage. This is configurable when syncing files. For the purpose of this tutorial all files will be stored in reduced redundancy storage.

Standard vs Reduced Redundancy Storage

The primary difference between the two options is durability; or how quickly do you need access to your data. Standard storage gives you nearly instant access to your data, where as reduced redundancy storage (RRS) may take up to several hours to retrieve the file(s). For the use case of this tutorial all files are storage in RRS. As noted previous RRS is considerably cheaper than standard storage.

Configuring a simple cronjob

To enter the cronjob editor simply type

crontab -e

Once in the editor create the cronjob below which will run Monday – Friday at 3:30 a.m. every morning.

30      3       *       *       1-5     /usr/bin/s3cmd sync -rv --config /root/.s3cfg --delete-removed --reduced-redundancy put /PATH/TO/FILESYSTEM/LOCATION/ s3://MYBUCKET/ >/dev/null 2>&1

This cronjob calls the s3cmd sync command and loads the default configuration which you have entered above. The –delete-removed option tells s3cmd to scan for locally deleted files, then remove them from the remove S3 bucket as well. The –reduced-redundancy option places all files in RRS for cost savings. Any folder location can be synced, just change the path to your desired location. Make sure to change mybucket to the name of your S3 bucket.

The server has now been configured to do nightly backups of the filesystem to AWS S3  using s3cmd library. Enjoy!