Optimizing AWS cost with Kops and Lambda.
Disclaimer: this is not a how-to guide. I have plans on a more in-depth article about setting up everything from this here. This time it’s more of an overview of what’s been done.
Hey, folks. Let’s talk about AWS costs and Lambda today.
At Pickio we started this year with a bunch of optimizations on both costs and performance.
Somewhere around the end of 2019 I had a post on my Facebook where I was talking about moving from EBS to EKS. Docker, k8s, and all the fancy stuff, y’know. Recently we moved from EKS to direct k8s orchestration via kops. This move alone cut the cost of our AWS from $380 to around $290 monthly. Only EKS price is about $75 + all the additional resources and services it requires. Yup, these are not huge numbers, but you can scale it up to any infrastructure size to see the impact. Almost 4x cut in price. At the end of the day we managed to lower monthly price for AWS to 110–115 USD. I’d take it as a win.
Managing AWS costs can be a tricky thing: some of the built-in automations require paid services that you can miss during the integration. I’d suggest setting up AWS Cost Explorer that shows how much you spend and for which resources you pay. Budgets and Forecasts are highly convenient services as well, so consider taking a look.
With proper tools and careful resources utilization you can manage your costs effectively. We also have plans to discover AWS spot instances which are much more friendly in terms of pricing, but require additional setup. Maybe there will be another article describing difference between on-demand and spot instances and how to setup automatic management.
Next up, software. We were struggling with performance issues during post creation. It turned out that a significant amount of time is taken for the images uploading and resizing (we’re generating a couple of different sizes for every picture). Yup, the upload itself and variants creations were synchronous and were blocking the request to the server. It’s a bad thing on its own, but the resize process was terribly slow, around 2.5–3 seconds per variant (resize + optimization + upload). We needed a way to release some resources and make the whole process faster.
As we’re using Rails as a backend for our application, we used Carrierwave as the uploader for our files. The problem with it is that it is slow, plus it uses Imagemagick under the hood, which is also slow as hell. We needed a better way of dealing with files. So, the decision was made: let’s get rid of Carrierwave completely.
The first step here was to re-write the uploader. Uploading files is a relatively simple process, especially when talking about AWS S3, where we store all of our files. AWS provides an excellent SDK to work with all of their services, S3 is no exception. The uploader was re-written using just AWS SDK, which helped reduce the upload time by 3–4 times. Impressive already. But we need to generate a few more images. That was the main benefit of Carrierwave: setting up image variants is a piece of cake. Still, we know it’s slow. Well, here AWS lambda comes into play.
AWS Lambda is a cloud functions service. In a nutshell, you have a piece of code stored somewhere on the Amazon servers. You can call it whenever you want. The benefit here is that you’re not relying on your own EC2 instances and their compute power. Instead, your code is executed on highly effective hardware as fast as possible. The process is simple: you invoke a lambda via the SDK, AWS launches it up, waits for the execution to complete, and shuts the function down. Also, along with the call, you can send whatever data you want. It’s like arguments (or parameters) in a traditional function.
Lambda, though, is a paid service. Yet it costs not that much if you use it properly. The faster your function is, the less you pay for the execution. AWS has a decent cost calculator that can help you estimate your costs.
Anyway, we decided to give Lambda a try and prepare our images there. Using Node.js and a Sharp library, we managed to get the average resizing time around 150 ms against 2500–3000 ms with Carrierwave. It’s like 8–10 times faster. Our Lambda is configured with 1024 Mb of memory with a timeout of 5 seconds. Along with that, we adjusted the upload process itself: to avoid heavy operations during the request time, we’re only uploading the original image. All the variants are generated on-demand and then cached. It is not only speeds up loading time but also prevents unnecessary lambda calls. By bringing these optimizations to the post creation process, we successfully cut the time from 10–15 seconds to around 1.5–2 sec, 7.5–10 times faster.
Speaking of additional costs, AWS provides a free tier for Lambda. You’ll have 400k Gb-seconds and 1M request a month. According to their calculator, service will be free for you with up to 30k requests and 2.5 seconds of each Lambda execution.
With Lambda and we had an opportunity to move our primary servers to a less expensive instance types. As the load on the servers decreased, we no longer need to support huge compute powers and at this point can use small instances. Just enough to handle requests to the API.
Now, let’s make a conclusion. Bad system design? Well, maybe. But it was an MVP, so we were worried more about getting things done rather than costs and performance. Do all the moves worth the effort? Numbers speak for themselves. We cut the costs and addressed some severe performance issues in about a week with almost no team. Looks good for me.