AWS

Highly Available WordPress on AWS: Don’t Learn The Hard Way

October 17, 2018 3 Comments AWS export Wordpress

Today we’ll be going over some of potential strategies for hosting WordPress on AWS in a scalable and highly available fashion.

This is a follow up to my previous article on the problems you’ll face implementing a WordPress architecture on AWS. If you haven’t read that yet, I’d suggest taking a short moment to do so.

Let’s dive right in!

Please note: This article is not intended to be a play by play tutorial. There are trade offs to each of the following solutions and it will be your job to determine what the best fit for your use case is.

What we need to cover

We want to have a WordPress site hosted on AWS that will be both highly available and support EC2 auto-scaling functionality.

There are a few things we’ll need to cover to end up with a setup that covers these bases.

By default, WordPress stores user uploaded content (Images, videos, etc) on the web/app servers file system. These should instead be stored in an Object Store like S3.
The webroot on the server contains plugins, themes, and things like the wp-config as well. This is a harder problem to solve (well) and there are numerous solutions.

Object Storage

Unless you use a solution that syncs the webroot folder across servers (We’ll get into that later) you will need to store any user uploaded files in an Object Store. Since we’re using AWS, the easy choice is S3. It’s inexpensive and has wide plugin support.

I’ve found numerous options for plugins but the two that seem to be the most actively updated are:

The configuration for each of these plugins (and many like them) is incredibly simple. It normally involves creating an S3 bucket and an IAM user with the appropriate credentials then setting those in the plugin. Due to the simple nature, this is an exercise left to the reader.

WP Offload Media Lite

WP Offload Media Lite seems like a solid solution but the Lite Version does not support importing existing images without upgrading to the paid version. If this is a problem you may want to consider the next option.

Media Cloud

Media Cloud not only handles newly uploaded files, it will also allow you to move over already uploaded files at no added cost.

This plugin seems intended to integrate with a tool called Imgix but it appears this integration is completely optional.

Since this option is completely free and supports uploading exists files at no cost, this will likely be your best bet.

An important note: Both of the above plugins support the AWS Cloudfront CDN and it would be a good idea to implement it. This can both help improve performance (by serving from locations closer to the client) and security (by implementing OAI).

Web Root

So now that the easy part is out of the way, let’s dig into the fun stuff (Read: The hard part).

While the “right” method is the one that best fits your use case, I will point out which solutions I believe are particularly bad and which are my preferred method.

All of these methods support utilizing EC2 Auto Scaling and most of them would also entail the use of a User Data script to properly configure the servers on boot up.

In some cases the creation of an AMI may also be possible (or even advisable).

What are our options?

EFS (Elastic File System)

EFS is Amazon’s answer for NFS as a service. EFS is the solution that Amazon suggests in it’s reference architecture for WordPress. This reference architecture is implemented in many templates online.

One can utilize EFS for this purpose by attaching it as the webroot on your EC2 servers during spin up (In your Auto Scaling Group of course)

While EFS is the most obvious solution to handling the web root issues (Amazon even suggests it!), it’s also one of the worst possible options in my experience (Ask me how I know!).

Pros

You can largely use WordPress as normal to install plugins/themes/upgrades. It’s an easy drop in solution requiring little further management on your part.
There is no practical limit on the size of an EFS volume.
The file system size scales elastically. There is no need to provision additional space.
EFS can support 10+ GB per second of throughput (That’s a lot of throughput)
Date is stored redundantly across multiple availability zones.

Cons

Shares many of the disadvantages of a regular network attached file system.
It’s incredibly expensive compared to other storage methods, like $300 expensive.
- Previously, EFS only supported expanding throughput by expanding the size of the volume (Often with a dummy file). Attaining a consistent throughput of 50 MiB/s meant storing at least a TB of data. The cost of this would be $300 at minimum. While EFS does support provisioned IOPs now, it costs $6 per MiB/s at minimum. This means you’d still be paying at minimum $300 for 50 MiB/s of throughput. Bit of a wash.
It uses burst credits (Like T series EC2 instances). I really hope you reviewed the documentation closely! You may find your application works reasonably well without provisioned IOPS and a small disk size until you receive a consistent burst of traffic. When that happens you’ll find the performance completely tanks and be scrambling to figure out why.
It has high latency. It’s about 3 times slower out of the box than EBS. There have been many different complaints about this.
- High latency in a web server is generally bad news bears. WordPress has a lot of small files that will be accessed.
- There are methods to work around this but that’s slapping a bandaid over an already expensive solution.

In short, EFS seems like a poor solution.

I’ve deployed a number of EFS based infrastructures and found the performance and trouble generally were not worth it.

Most of it’s benefits aren’t relevant for this use case, it’s poor latency adversely effects performance, and it’s expensive. The biggest benefit is it’s very easy to get going.

Self Hosted NFS/GlusterFS/Etc

Another option that’s similar to the above is simply hosting NFS, GlusterFS, or a similar network file system on your own.

Similar to EFS, one can attach the share as the webroot on your EC2 servers during spin up (In your Auto Scaling Group of course)

Pros

You can largely use WordPress as normal to install plugins/themes/upgrades. It’s an easy drop in solution on the WordPress side.
I’ve ready that many people have reported better performance and latency when compared to using EFS.
It is likely to be cheaper than EFS for the given performance and storage space.
Similar benefits when it comes to just dropping in and using WordPress as normal.

Cons

Unless you’re intending to configure a multi-AZ NFS/GlusterFS setup, you’ve now introduced a pretty major single point of failure. This potentially mitigates the other work made to create a HA infrastructure.
After considering the time to manage and monitor the system/s (particularly if you try to roll your own HA setup), it may end up even more expensive than simply using EFS.
It’s still going to share many of the other problems with network file systems and will still likely have worse latency than EBS or instance store.

Self hosting your own NFS setup shares many of the same problems as EFS and adds even more of it’s own.

My conclusion is that this is likely a poor choice for a scalable/HA WordPress infrastructure.

Version Control It

At this point I feel that we start to get to some solutions that are not only less expensive but also add some additional features that can help make the WordPress act a bit more like a modern web app.

If you read my previous article (you read it right?) you’ll remember that the biggest issues are the user uploaded files (We solve that with the object storage) followed by the plugins, themes, and version of WordPress installed on the servers.

The idea here is to utilize php-composer in conjunction with WordPress Packagist and then utilize a Git repository for the composer.json. Then composer is used to install the required versions of your plugins and themes. Alternatively, one could also use wp-cli to accomplish something similar (or both can even be used together).

At this point, each time a new box is spun up (For example with your EC2 Auto Scaling Group) it is installed from a known configuration.

CI/CD tools can be used to update existing servers or you can do a blue/green deployment when updating and then rollover.

Pros

Everything works as usual on the WordPress side (Uploading images, writing articles, users, etc), minus upgrading WordPress and installing plugins.
More performant and (much cheaper) than other methods mentioned.
Since you’re installing everything in an automated fashion, it becomes much easier to configure different environments for testing. For example, setting up a local development environment using Docker.

Cons

End users can no longer install/update plugins/themes on production (It’s likely a good idea to prevent them from doing so with permissions). This is arguably a pro.
Initial setup may be slightly more difficult than simply attaching EFS.
Auto-scaling may take slightly longer depending on how the user-data scripts are implemented.
There are occasionally plugins that write configs to local file systems which you’d want to look out for.

I’m of the opinion that if you are truly looking to setup a scalable, HA, and performant WordPress setup (and don’t want to spend a ton on the infrastructure) then this is likely where you should be looking.

If someone needs to install/test plugins this can be done in any of the other environments (Either on local dev or staging) before they are moved to production.

This method makes it far easier to duplicate the install for other environments (local dev, staging), supports CI/CD, and covers all the other bases (HA, scalable, performant).

Other Methods?

I’m certain there are other methods one might handle this.

One I’ve heard of handling is only allowing people to interact with a staging environment that is then used to create an AMI for auto-scaling in production, though I haven’t yet tested this myself.

Is there something else I missed? Are you handling this in another way? Don’t hesitate to share :).

Conclusion

Hosting WordPress in this fashion is not an easy task but it is doable with extensive consideration of how WordPress is managed and of the tools the AWS ecosystem provides you.

I hope you found this article informative and decide to share.

If you have any questions or comments, don’t hesitate to reach out in the comments. Thanks for reading!

Looking for someone to take care of this stuff for you? I understand completely 🙂

You can easily schedule a meeting with me at the bottom of the article.

Want to understand how I can help? Take a look at my article here.

Till next time.

You Shouldn’t Host WordPress on AWS (Except when you should)

May 28, 2018 9 Comments AWS export Wordpress

Or why you should pick the right tool for the right job

A number of past clients ask me to automate and configure WordPress on AWS (And a ton of potential clients ask the same). After architecting a number of AWS WordPress solutions, I feel pretty comfortable going over why I advise against their union in the vast majority of cases.

Let’s dig in to why WordPress is often a poor choice for AWS and situations where it may be a good fit.

Short note: This isn’t a hit piece on AWS or WordPress. AWS is an excellent tool, as is WordPress (This website is WordPress based. Surprise!) but unfortunately, they are not tools that often perform well together (though there are exceptions).

Why WordPress is (generally) a poor fit for AWS

Load Balancing

Let’s start with the most important reason first: WordPress was and still is ill suited for load balanced situations as it simply wasn’t designed with these concepts in mind. This is especially true for situations with dynamic scaling (I.E Using EC2 Autoscaling Groups)

WordPress has a good deal of assets and functionality that rely on the Web Servers flat file system for storage (Such as the plugin, theme, and media files).

Let’s do a quick thought experiment.

Diagram of WordPress Load Balanced Setup — Try not to weep at the beauty of my chart.

You have two web servers with WordPress installed behind a load balancer and backed by RDS.

You login to the WordPress admin page and decide to install a theme. You click install on the theme and when you did so, the Load Balancer was communicating with Server A.

Sometimes when you access the site everything works correctly and other times it doesn’t. Why?

Spoiler: WordPress only installed the theme on the Server A, leaving Server B without the proper theme files installed.

For context, the initial release of WordPress was over 15 years ago (It’s 15th anniversary was yesterday. Congrats WordPress!).

It was created in a time where public cloud providers like AWS, Azure, and Google Compute were years from creation and most webmasters would never touch (or possibly even know about) concepts like Load Balancers.

One of AWS’s greatest strengths is it’s high level of scalability (in particular the possibility to automate this scaling). Scaling cleanly is not easily accomplished (one may even argue impossible) without the ability to horizontally scale your application.

Scale

Now it’s hard to blame this one on WordPress but it’s still an important factor when deciding to host AWS on WordPress.

Your idea of “a lot of traffic”, probably isn’t a lot of traffic.

If you have a website with only a few thousand to the low hundred thousands of hits a day, it is almost a guarantee that you shouldn’t be hosting your website on AWS. You won’t really have the traffic to leverage AWS’s biggest weapon (As mentioned earlier, SCALABILITY!)

AWS is a product meant to scale websites up to hundreds of thousands to millions of hits or more. While a scalable solution can also be performant (I.E Fast) that isn’t always the case.

Confused about the difference? You can read a follow up here.

Price

People are told that AWS can save them a lot of money. For the right customer with the right solution (Read: properly architected) that makes heavy use of auto scaling, AWS is likely to save a great deal of money.

If you attempt to utilize AWS like a traditional VPS or Dedicated Server provider you’re likely to find that AWS can be very, very, very, expensive.

For the price, AWS is generally going to offer far worse performance per dollar than an equivalent solution on Digital Ocean, Vultr, or even a good old fashioned dedicated server.

If you look at the above comparison, pay particularly close attention to last section where the DO solution is significantly superior (sustained performance).

Aside from cost of the EC2 instances, some of the solutions in our mitigation section also have significant price tags attached.

So when is AWS the right choice for WordPress?

There are a few check marks I like to tick off before going “Alright, AWS may be the right choice” when working with WordPress.

The client has a site with a large amount of traffic (Hundreds of thousands plus+).
The client has very spiky traffic (Periods of very high usage followed by periods of very low usage)
The client is expecting to see a very large permanent increase in traffic in the near future.
The client is either willing to make changes to how they manage their site (Largely forgoing the use of the admin panel for tasks like plugin and theme installation) or is willing to take a significant hit to their pocket book with certain mitigation strategies.
The client intends to make good use AWS scalability features.
The client requires (or at least wishes to have) a high availability configuration.
The client intends to make significant use of AWS products such as RDS, IAM, S3, Cloudfront, and others.

In cases where the client meets all or some of the above criteria, AWS may be worthy of consideration for their particular case.

Summary

AWS and WordPress are both terrific products that have their place but heavy consideration should occur before deciding to bring the two together.

In a future article, I’ll go over the pro’s and con’s of various strategies for Load Balanced and Highly Available WordPress deployment on AWS. Stay tuned!

If you’re looking to contract with an experienced Automation Engineer/Linux Sys Admin, I hope you won’t hesitate to schedule a short meeting to talk about how I can help.

You can easily schedule a meeting below or by clicking the blue button at the bottom right.

Want to understand how I can help? Take a look at my article here.

Till next time.

AWS EC2 T2 Instances Demystified: Don’t Learn The Hard Way

January 30, 2018 6 Comments AWS export

Summary

In my time as a freelancer I’ve come across a number of clients using T2 instances for their infrastructure requirements.

In my experience, these instances often seem to be chosen based largely on their low price compared to other instance types and are often poorly understood.

While T2 instances can offer great value, they come with a number of advantages and disadvantages that must be considered (and understood) before choosing them for your infrastructure.

Let’s examine what T2 instances are.

What Are T2 Instances?

EC2 T2 instances are CPU “burstable” virtual machine instances offered by AWS. This is opposed to the other types of instances which provide a fixed level of CPU performance.

These instances offer baseline CPU performance along with CPU Credits that can be used to “burst” above this baseline performance when required.

The baseline CPU performance, maximum amount of CPU credits that can be earned, and the rate at which these CPU credits are earned are all based on the size of the T2 instance in question.

In the following sections, we’ll use a t2.micro as an example of these important concepts

What is a CPU Credit?

A (single) CPU credit allows 1 vCPU to operate at 100% usage for 1 minute.

It’s important to remember this for some of the math in the following sections

When are CPU Credits Used?

Anytime your instance uses any amount of CPU, for any reason, CPU credits will be used.

Yes, this means even if the CPU usage is below the baseline performance rate, you will use CPU credits.

To understand this (Already confusing subject), let’s look at how AWS calculates the use and earning of CPU credits.

How is CPU Credit usage calculated?

Remember from above that a single CPU Credit allows 1 vCPU to run at 100% usage for one minute.

This means that 10% CPU usage (The baseline performance of a t2.micro instance) for 1 minute would use 1/10th (0.1) of a CPU Credit. For an hour that would be 6 credits (The amount of credits a micro earns each hour)

20% CPU usage for a minute would by 1/5th (0.2) of a CPU credit and so forth.

How are CPU credits accrued?

CPU Credits are ALWAYS earned if your instance is on but they are only accrued whenever your T2 instance is utilizing less than the baseline performance of your instance.

Continuing our t2.micro example, assuming your server uses no CPU for an hour (An unlikely scenario but this is an example), you would earn 6 CPU credits.

You could continue to earn these credits until you held 144, the maximum amount that can be held for the micro instance type.

The 6 CPU credits earned every hour allow the t2.micro instance to indefinitely maintain it’s baseline CPU performance.

0.1 (10% baseline CPU performance) * 60 (minutes in an hour) = 6 (The number of CPU credits earned in an hour)

Let’s finally look at a basic calculation of the usage and earning over a multi-hour period.

An Example Calculation

Let’s do a basic illustration that can show both the disadvantages and advantages of this instance type.

Note: This is not looking at other metrics of server performance. We are simply looking at how CPU credits are used and calculated

Assumptions:

The t2 micro instance has 138 CPU credits accrued.
The CPU usage will be consistent. This does not happen in reality but does help us illustrate the concepts easily.
You have a t2 instance hosting a really awesome website about cats.

Hour 1

On the first hour your micro instance has no traffic at all and uses 0% of the CPU for the entire hour (maybe the server was taking a nap?). Since no CPU time was used that means you’ll keep all 6 CPU credits you earned.

138 (CPU credits you already accrued) + 6 (The credits you earned over the hour) = 144 CPU Credits Accrued

You now have the maximum amount of CPU credits you can accrue for a t2.micro so even if you continue to have CPU usage below the baseline performance, you’ll no longer earn credits above this amount.

Hour 2

On the second hour your micro instance maintains 10% CPU usage (The baseline performance). Since it earns enough CPU credits to maintain this rate, no credits are burned from the 144 you already have and no credits are accrued (Because you used the same amount of credits that you earned and because you already have the maximum allowable amount).

Hour 3

On the third hour you just got featured on a popular site and you’re getting hammered with traffic causing 100% CPU usage for the entire hour (Yay! Traffic!).

1.0 (100% CPU usage) * 60 (number of minutes in the hour) = 60 credits used
144 (Accrued CPU credits) – 60 = 84 credits remaining

But wait, there’s more!

Since you always earn credits, you would actually have more than 84.

84 (Number of remaining CPU credits) + 6 (The amount you earn in an hour) = 90 credits remaining

Assuming there were no other issues with the server and that everything was fine, your website likely continued to work without issue.

Hour 4

After Hour 3 everyone has had their fill of your really awesome cat website and your server sits largely unused again (Who can get enough cat pictures?!). The instance uses 5% of the CPU for the entire hour.

0.05 (5% CPU usage) * 60 = 3 (Credits used that hour)
90 (Accrued credits) + 6 (The number of credits you accrue an hour) – 3 (The number of credits you used that hour) = 93 credits remaining

Since 5% CPU usage is less than the 10% baseline CPU performance, the server accrues some of CPU credits.

Hour 5

Apparently I was right and people really can’t get enough cat pictures. Your website experiences an increase in traffic causing 100% CPU usage for the entire hour.

1.0 (100% CPU usage) * 60 (Minutes in an hour) = 60 CPU credits used
93 (Accrued credits) – 60 (CPU credited used) + 6 (The CPU credits you earn in an hour) = 39 CPU credits remaining

As we can see above, our credits are really starting to dry up but thankfully the server is still holding up.

Better hope all that traffic dies down.

Hour 6 (uh oh!)

Unfortunately, people REALLY love those cat pictures and the traffic remains at the same level. Your instance continues to utilize 100% of it’s CPU for hour 6.

1.0 (100% CPU usage) * 60 (Minutes in an hour) = 60 CPU credits used
39 (Accrued credits) – 60 (CPU credited used) = OH NO! We don’t have any credits left!

About 40 minutes into Hour 6 your instance no longer has any credits remaining. At this point, you’re now limited to the baseline performance of the instance, in this case, 10% of the vCPU.

At this point your application grinds to a halt but since it’s under such heavy load, the instance is not able to earn/accrue as fast as it can use them, resulting in almost complete downtime until the server traffic lowers (Since your website is no longer working properly, that’s going to be pretty quickly).

You will likely find it is difficult or impossible to access the server to take any measures to solve the issue until the CPU credits have accrued.

So why is all this important?

The examples above are contrived but it does illustrate the danger of not properly architecting your AWS system and monitoring it once it’s in place (especially if you’re using T2 instances).
I often see this issue with a website that receives increases in the baseline traffic over long periods of time with occasional spikes. If the instance type is not changed and sized appropriately, this eventually leads to all the accrued CPU credits being used paired with downtime and a significant amount of head scratching.

A Note On T2 Unlimited

Amazon (somewhat) recently added a new feature to T2 instances called T2 Unlimited.

T2 Unlimited instances can burst over baseline performance as long as required at an additional cost.

I’ll likely be writing an article explaining T2 Unlimited concepts in the near future so stay tuned!

Thanks go to rosege over at Hackernews for the suggestion!

Closing

Hopefully you walk away having a slightly better understanding of T2 instances and how using them can effect your infrastructure if they are not properly managed.

If you have any questions or comments, please don’t hesitate to mention below. Thanks for reading!

Looking for someone to help architect, manage, or automate your infrastructure?

You can easily schedule a consultation below or by clicking the blue button at the bottom right.

Want to understand how I can help? Take a look at my article here.

Till next time.