Amazon AWS instance volume backup: automate rolling snapshots and purge schedule

“Amazon, Amazon,
Is it heaven or hell?”
The Band’s song “Amazon (River Of Dreams)” from “Jericho”, 1993

We begin a series of posts on our first-hand experience with Amazon Web Services (AWS) hosting platform and accompanied technologies. While allegedly possessing ten times more cloud resources than the top 14 other IaaS providers taken together (according to May 2015 reports), Amazon today is that certain behemoth reigning over them all. Economics aside, what amazes us most about AWS is how tons of feature-rich offerings, abundance of documentation, and gazillions of online discussions present so little for a young pioneer that takes on her first AWS quest. That is easy with AWS to launch an instance, though most of subsequent steps require planning, in some cases profound research, or, in other words, a difficult path full of trials, tribulations, and overdrawn accounts. That pay as you go paradigm needs budgeting and verification to be efficient.

Certainly, it is true that over the last few years Amazon has extensively developed and enhanced its numerous Web management interfaces and consoles, APIs, backends and frontends, fixed annoying glitches and inconsistencies, introduced security and network setup improvements. Today you do not hear about instances mysteriously and irrecoverably lost, dead storage links, mandated instance reboots, and scores of Web startup companies in disarray (that were careless enough to base everything they were at mighty Amazon). Though, from our point of view, what AWS still lacks today are detailed recipes and use cases (some of them are so simple!). Precious pieces of that wisdom are scattered across the Web or hidden under various conundrums online. Thus, if you do not mind, let us help you with what we think we are good at: recipes. As detailed, verified and complete as needed for successful implementation.

Most important things first: let us talk about backup automation. EBS volumes are persistent by their definition but not too immune to accidental data loss due to logical errors or a physical failure.

AWS EC2 allows us to take volume snapshots, seamlessly store them under S3 and create instance images (based on the same snapshot feature). According to Amazon AWS EBS document, storage allocation for a new snapshot would cost us only the amount of data blocks changed after the previous snapshot:

“If you have a device with 100 GB of data but only 5 GB has changed after your last snapshot, a subsequent snapshot consumes only 5 additional GB and you are billed only for the additional 5 GB of snapshot storage, even though both the earlier and later snapshots appear complete.”

This cost-effective feature makes snapshots useful for rolling backup automation. In order to achieve that, we used shell commands at a small Debian Linux instance. While the Web is full of links to third party tools and scripts, we chose to stick with Amazon’s native tools of trade, keeping it simple as a rule of thumb.

For the sake of this demo, our servers were: Linux server Lsrv1 or #1 and Windows server Wsrv2 or #2. Linux server #1 has volumes root (/) and a bigger home (/home), Windows server #2 has a boot volume C: and a bigger D: drive.

We defined rotating snapshots creation using daily and weekly schedules with the following cron commands:

# Create snapshots:
0 3 * * * /usr/bin/aws ec2 create-snapshot --volume-id vol-bc3158d1 --description Lsrv1_vol_home
0 4 * * 0 /usr/bin/aws ec2 create-snapshot --volume-id vol-4fd23a87 --description Lsrv1_vol_root
0 5 * * 0 /usr/bin/aws ec2 create-snapshot --volume-id vol-b15d2036 --description Wsrv2_vol_C
0 6 * * 0 /usr/bin/aws ec2 create-snapshot --volume-id vol-e273c82c --description Wsrv2_vol_D
# Purge snapshots:
0 22 * * * ~/bin/ec2-delete-snapshots.sh vol-bc3158d1 Lsrv1_vol_home 5
1 22 * * 1 ~/bin/ec2-delete-snapshots.sh vol-fce9633d Lsrv1_vol_root 3
2 22 * * 1 ~/bin/ec2-delete-snapshots.sh vol-a90e4c68 Wsrv2_vol_C 7
3 22 * * 1 ~/bin/ec2-delete-snapshots.sh vol-e273c82c Wsrv2_vol_D 10

That translates to English as follows:

Rules to create snapshots:
– Linux server #1 /home volume at 3am every day.
– Linux server #1 root volume at 4am every Sunday.
– Windows server #2 drive C: volume at 5am every Sunday.
– Windows server #2 drive D: volume at 6am every Sunday.
Rules to purge outdated snapshots:
– Linux server #1 /home volume at 10pm every day, keeping at least 5 snapshots.
– Linux server #1 root volume at 10:01pm every Monday, keeping at least 3 snapshots.
– Windows server #2 drive C: volume at 10:02pm every Monday with 7 live snapshots.
– Windows server #2 drive D: volume at 10:03pm every Monday with 10 snapshots.

For the aws commands to work, we suggest that you should create a special AWS IAM user ec2-admin. User name does not really make any difference, but it is better be a separate, not a personal IAM user id. Create that user and generate access keys under IAM Management console as follows:

IAM console – Users – Create new user – Generate access keys

In the process you would obtain a key pair similar to these used as example:

Access Key ID: AKIAID36ZRPYB2RH76ZQ
Secret Access Key: Y9xMFHzMr0o4PRuiTmXICSEXs5dXu73U2uA6Et5L

Then you should setup the newly created key values for aws command tool or AWS CLI at Linux shell under the Linux user id that would be controlling your EBS snapshots rotation:

$ aws configure
AWS Access Key ID [None]: AKIAID36ZRPYB2RH76ZQ
AWS Secret Access Key [None]: Y9xMFHzMr0o4PRuiTmXICSEXs5dXu73U2uA6Et5L
Default region name [None]: us-east-1
Default output format [None]:

You would need to grant proper permissions to ec2-admin user via custom IAM policy assigned to that user (it is done in IAM Management console as well):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt201511221",
            "Effect": "Allow",
            "Action": [
                "ec2:CreateSnapshot",
                "ec2:CreateTags",
                "ec2:DeleteSnapshot",
                "ec2:DescribeSnapshots",
                "ec2:DescribeVolumes"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}

Final touch would be to write a simple shell script that would attempt to purge one oldest snapshot at a time (in a set identified by common volume id and the snapshot description). The script needs to be placed under bin folder in the Linux home of a user that would run cron commands and that had aws CLI configured with proper access keys. For instance, in our case it was /home/ec2admin/bin/ec2-delete-snapshots.sh, and we used Linux user name ec2admin (again, no specific user name dependency except for home folder location).

This purge script takes three external parameters:

volume_id
snapshot_description snapshots_to_keep.

Please do not forget to replace owner variable in the code with your actual AWS owner number from your AWS account, or it will not work for your snapshots:

#!/bin/bash
# delete old snapshots
# $1 volume id, $2 snapshot description, $3 N snapshots to keep
owner=23250598230
snapshots=`aws ec2 describe-snapshots --output text --owner-ids $owner | grep "$1" | grep "$2" | wc -l`
echo "Snapshots left for $2: $snapshots."
if [ "$snapshots" = "" -o "$snapshots" -lt "$3" ]; then
    exit 2
fi
snapid=`aws ec2 describe-snapshots --output text --owner-ids $owner | grep "$1" | grep "$2" | sort -k8 | cut -f7 | head -n1`
echo "Snapshot ID to purge: $snapid"
if [ "$snapid" != "" ]; then
    aws ec2 delete-snapshot --snapshot-id "$snapid"
fi

Notes:
It makes not much of a difference which Linux instance you would be using to run aws cron commands. It might be Linux server #1 from the sample schedule setup that we mentioned above. It might be an independent Linux instance as well, a sort of server #3, if you want.

Depending on your favorite Linux distro and shell type this setup might be varying in some minor details.

Correction: source for shell scripts above contained a bug in previous article edition (that we fixed on January 15th 2016). Please see our comment below if you need more details on that.

Correction #2: thanks to our readers, we now can offer a solution for case with encrypted volumes. As mentioned before in the comments below, for encrypted volumes the output of aws command changed (oh why, Amazon, why?!) and there is additional column, an encryption key value, in the middle of the output. In order to address that in ec2-delete-snapshots.sh version above, you would need to modify line that starts with snapid= to edit the piped commands as follows: “sort -k8 | cut -f7”.

References:

Comments

(9)

Andrei Spassibojko
01/15/2016 at 3:51 pm #

Please note. We learned that our scripts ec2-copy-snapshot.sh and ec2-delete-snapshots.sh had a nasty bug in them that had to be addressed:

When finding a latest snapshot, the listing came out in random order (thanks a bunch, Amazon!). We had to modify certain commands to include “sort -k7” command in the Linux pipe chain, to allow sorting them in historical order and thus obtaining the latest snapshot id correctly:

snapid=`aws ec2 describe-snapshots –output text –owner-ids $owner | grep “$1” | sort -k7 | cut -f6 | tail -n1`

We have edited the main article with that adjustment.

Sam
02/26/2016 at 12:41 am #

It looks like you need to change your cut to column 7, otherwise it return the Progress% is grabbed

snapid=`aws ec2 describe-snapshots –output text –owner-ids $owner | grep “$1” | sort -k7 | cut -f7 | tail -n1`

- Andrei Spassibojko
  02/26/2016 at 12:55 am #
  
  Thanks for the comment, but that did not confirm. In our AWS environment the snap id parameter is number 6. Progress is number 5. The cut -f6 works at this moment to produce snap id. No need to adjust anything. Please explain further if you still see a discrepancy.
  
  - Chris Leipold
    08/30/2016 at 8:25 am #
    
    I’m using your script (thanks a lot!) and also noticed that Amazon changed the parameters:
    $ aws ec2 describe-snapshots –output text –owner-ids $owner |grep “$1” |grep “$2″|sort -k7 | cut -f6
    100%
    100%
    100%
    
    My guess is:
    If you only use unencrypted volumes, your output is something like:
    Description, Encrypted, VolumeId, State, VolumeSize, Progress, StartTime, SnapshotId, OwnerId
    Otherwise, it’s:
    Description, Encrypted, VolumeId, *KmsKeyId*, State, VolumeSize, Progress, StartTime, SnapshotId, OwnerId
    
    Any thoughts?
    
    - Andrei Spassibojko
      08/30/2016 at 10:49 am #
      
      Hallo Leopold! It is nice to hear from you. We would think for this change it would be a simple shift in positional numbers of parameter references –
      
      sort -k8 | cut -f7
      
      But we would confirm this for you when we have an opportunity to do so.
      
      - Chris Leipold
        08/31/2016 at 3:41 am #
        
        I can confirm that it works with cut -f7 (since this was the fix I already applied). The important note is, that users need to know if they use encryption or not :)
        
        Cheers,
        Chris
Aaron
07/27/2017 at 1:08 am #

Good stuff. On the sort, you’ll want to add a ” -t$’\t’ “to the -k7 switch. It makes for a tab delimiter (which cut seems to have by default on debian 9) which helps if you have spaces in your name/description.

Other than that, good stuff. The IAM stuff has changed around a tad, but if you’re determined, you can get through it easy enough.

Amazon AWS instance volume backup: automate rolling snapshots and purge schedule

Comments

Leave a Reply Cancel reply

Your USPS Mail via email:

Categories

Tags

Recent Comments

Spam Prevention