Automatically Deploying Website from Git to AWS S3

I am a big fan of Amazon AWS – this blog has been running on it for a few years now. Since moving to AWS S3 (for storage) and CloudFront (as a Content Delivery Network) to host static websites, such as my homepage, I have been trying to work out how to get them to automatically deploy when I update the Git repository I use to manage the source code. I looked in to it in some detail last year and concluded that AWS CodePipeline would get me close, but would require a workaround as it did not support deploying to S3. In the end I decided that a custom AWS Lambda function was needed.

Lambda is a service that hosts your code, in a state where it is ready to run when triggered, without needing to have a server. You are only billed for the time your code is running (above a free threshold), so it is perfect for small infrequent jobs, such as deploying changes to a website or even using it with Alexa for home automation. It seemed like an interesting area to explore and gain some knowledge, but I think I went in at the deep end, trying to develop a complex function, using an unfamiliar language (Node.js) on an unfamiliar platform. Then other tasks popped up and it fell by the wayside.

Then earlier this year I saw an announcement from AWS that CodePipeline would now support deploying to S3 and thought my problem had been solved. Although I must admit that I was a bit disappointed not to have the challenge to code it myself. Fast forward a few months and I had the opportunity to set up the CodePipeline, which was very easy. However, it only supported copying the code from the Git repository to the S3 bucket. It did not refresh Cloudfront, so my problem remained unsolved.

The CodePipeline did allow for an extra step to be added at the end of the process, which could be a Lambda function, so I went off in search of a Lambda function to trigger an invalidation on CloudFront when an S3 bucket has been updated. The first result I found was a blog post by Miguel Ángel Nieto, which explained the process well, but was designed to work for one S3 bucket and one CloudFront distribution. As I have multiple websites, I wanted a solution that I could deploy once, and use for all websites, so my search continued. Next I came across a blog post by Yago Nobre, which looked to do exactly what I needed. Except that I could not get the source code to work. I tried debugging it for a while, but was not making much progress. It did give me an understanding of how to link a bucket to a CloudFront distribution, trigger the Lambda function from the bucket and use the Boto3 AWS SDK for Python to extract the bucket ID and CloudFront distribution from the triggering bucket – all the things that were lacking from the first blog post/sample code. Fortunately both were written in Python, using the Boto3 AWS SDK, so I was able to start work on merging them.

I was not terribly familiar with the Python language, to the point of having to search how to make comments in the code, but I saw it as a good learning experience. What I actually found harder than the new-to-me language, was coding in the Lambda Management Console, which I had to do, due to both the inputs and outputs for the function being other AWS features, meaning I could not develop locally on my Mac. Discovering the CloudWatch logs console did make things easier, as I could use the print() function to check values of variables at various stages of the function running and work out where problems were. The comprehensive AWS documentation, particularly the Python Code Samples for S3 were also helpful. Another slight difficulty I experienced was the short delay between the bucket being updated and the Lambda function triggering, it was only a few minutes, but enough to add some confusion to the process.

Eventually I got to a point where adding or removing a file on an S3 bucket, would trigger an invalidation in the correct CloudFront distribution. In the end I did not need to link it to the end of the CodePipeline, as the Lambda function is triggered by the update to the S3 bucket (which itself is done by CodePipeline). All that was left to do was to tidy up the code, write some documentation, and share it on Github for anyone to use or modify. I have kept this post more about the backgound to this project, the code, and instructions to use it are all on Github.

This code probably only saves a few minutes each time I update one of my websites, and may take a number of years to cancel out the time I spent working on it. Even more if I factor in the time spent on the original version prior to the CodePipeline to S3 announcement, but I find coding so much more rewarding when you are solving an actual problem. I also feel like I have levelled up as a geek, by publishing my first repository on Github. Now with this little project out of the way, I can start work on a new server, and WordPress theme for this blog, which was one of my goals for 2019.