Automatically Deploying Website from Git to AWS S3

I am a big fan of Amazon AWS – this blog has been running on it for a few years now. Since moving to AWS S3 (for storage) and CloudFront (as a Content Delivery Network) to host static websites, such as my homepage, I have been trying to work out how to get them to automatically deploy when I update the Git repository I use to manage the source code. I looked in to it in some detail last year and concluded that AWS CodePipeline would get me close, but would require a workaround as it did not support deploying to S3. In the end I decided that a custom AWS Lambda function was needed.

Lambda is a service that hosts your code, in a state where it is ready to run when triggered, without needing to have a server. You are only billed for the time your code is running (above a free threshold), so it is perfect for small infrequent jobs, such as deploying changes to a website or even using it with Alexa for home automation. It seemed like an interesting area to explore and gain some knowledge, but I think I went in at the deep end, trying to develop a complex function, using an unfamiliar language (Node.js) on an unfamiliar platform. Then other tasks popped up and it fell by the wayside.

Then earlier this year I saw an announcement from AWS that CodePipeline would now support deploying to S3 and thought my problem had been solved. Although I must admit that I was a bit disappointed not to have the challenge to code it myself. Fast forward a few months and I had the opportunity to set up the CodePipeline, which was very easy. However, it only supported copying the code from the Git repository to the S3 bucket. It did not refresh Cloudfront, so my problem remained unsolved.

The CodePipeline did allow for an extra step to be added at the end of the process, which could be a Lambda function, so I went off in search of a Lambda function to trigger an invalidation on CloudFront when an S3 bucket has been updated. The first result I found was a blog post by Miguel Ángel Nieto, which explained the process well, but was designed to work for one S3 bucket and one CloudFront distribution. As I have multiple websites, I wanted a solution that I could deploy once, and use for all websites, so my search continued. Next I came across a blog post by Yago Nobre, which looked to do exactly what I needed. Except that I could not get the source code to work. I tried debugging it for a while, but was not making much progress. It did give me an understanding of how to link a bucket to a CloudFront distribution, trigger the Lambda function from the bucket and use the Boto3 AWS SDK for Python to extract the bucket ID and CloudFront distribution from the triggering bucket – all the things that were lacking from the first blog post/sample code. Fortunately both were written in Python, using the Boto3 AWS SDK, so I was able to start work on merging them.

I was not terribly familiar with the Python language, to the point of having to search how to make comments in the code, but I saw it as a good learning experience. What I actually found harder than the new-to-me language, was coding in the Lambda Management Console, which I had to do, due to both the inputs and outputs for the function being other AWS features, meaning I could not develop locally on my Mac. Discovering the CloudWatch logs console did make things easier, as I could use the print() function to check values of variables at various stages of the function running and work out where problems were. The comprehensive AWS documentation, particularly the Python Code Samples for S3 were also helpful. Another slight difficulty I experienced was the short delay between the bucket being updated and the Lambda function triggering, it was only a few minutes, but enough to add some confusion to the process.

Eventually I got to a point where adding or removing a file on an S3 bucket, would trigger an invalidation in the correct CloudFront distribution. In the end I did not need to link it to the end of the CodePipeline, as the Lambda function is triggered by the update to the S3 bucket (which itself is done by CodePipeline). All that was left to do was to tidy up the code, write some documentation, and share it on Github for anyone to use or modify. I have kept this post more about the backgound to this project, the code, and instructions to use it are all on Github.

This code probably only saves a few minutes each time I update one of my websites, and may take a number of years to cancel out the time I spent working on it. Even more if I factor in the time spent on the original version prior to the CodePipeline to S3 announcement, but I find coding so much more rewarding when you are solving an actual problem. I also feel like I have levelled up as a geek, by publishing my first repository on Github. Now with this little project out of the way, I can start work on a new server, and WordPress theme for this blog, which was one of my goals for 2019.

Saved by the Backup

In my last post I explained about my back up routine for WordPress, I wasn’t planning on testing it out so soon, but it has just saved my bacon! The plan was to spend an hour or so tweaking the blog to make it faster, by using the WP Super Cache plug in and Amazon Cloud Front, however something went badly wrong! The alarm bells should have started to ring when I noticed that most tutorials about using Amazon Cloud Front with WordPress referred to W3 Total Cache, however I preferred the look of WP Super Cache and fancied a challenge…

I was loosely following this guide, but somehow managed to take my website offline, probably by sending requests into a DNS blackhole. The problem was this meant I couldn’t get back onto my website to turn the caching off again. At this point I would also like to add that I couldn’t test this phase on my development server, as Cloud Front needed to pull data from the blog, which meant deploying on the live site.

I could still SSH into the server, so used the WP Super Cache uninstall instructions for “if all else fails and your site is broken”. However that didn’t help. At this point I was getting a little bit more panicked, but was very glad of my new backup strategy and that I’d had the foresight to make a backup just before I’d started fiddling with the blog. I feared the worst, that I would have to reinstall WordPress again from scratch and reload my data, reading this troubleshooting guide confirmed my fears.

Reinstalling WordPress isn’t the end of the world, I have done it a number of times, but for some reason I have been having a lot of permission issues on my web server, maybe I had taken security a bit too far. This meant that I couldn’t get my FTP client to upload my backup data. I ended up revisiting the AWS WordPress installation guide and also this blog post to find the correct settings and set them via SSL. At least I’ve had a lot of command line practice this evening!

Even with the permissions fixed, I couldn’t use the restore tool on Updraftplus (possibly due to restrictions I have added on AWS?), but was able to upload the data via FTP and got the blog up and running again. I still haven’t got the caching/CDN set up, but I think I’l take the easy route now and hopefully not need to test my backups again.