AWS for Deploying a Standard Web Application
AWS can be quite overwhelming. When starting out, the AWS web console can feel overly complex and unintuitive.
Here’s my take at explaining the capabilities of AWS to those who are new to their services.
What We Want to Build in AWS
In order to provide an idea of what AWS services are capable of, I’ll be going through the process of making a web application on AWS.
Now, like any well-designed application, we’ll require the following:
- Web Hosting
- Edge Caching (CDN)
- Domain Registration (DNS)
- API Hosting
- Database Hosting
- Communication Between Services
- Analytics
- Security
- Monitoring
So what AWS services will we need to get this up and running?
Web Hosting
The first question we want to ask is: where should we host our website?
The service we want to use to store files and raw objects of various sizes is Amazon S3, or Simple Storage Service.
We store our static content (e.g. HTML, CSS, JavaScript, images, etc.) in what we call S3 buckets. It is essentially an unlimited FTP server hosted by Amazon that allows many other AWS services to read and write from it.
Edge Caching (CDN)
Alright, we have our files living on a server using S3.
The issue is that S3 is designed for file storage in a single specific geographical region.
Normally, we would serve our website through a Content Delivery Network (CDN), which will proxy and cache our web data in servers as close as possible to our ends users, also known as edge caching. This will minimize the physical distance between our users and our servers.
Beforehand, we would serve our website through CDNs such as Cloudflare, StackPath, and Akamai.
The service we want to use as a CDN to optimize latency for access to our application is Amazon CloudFront.
Adding a CloudFront layer in front of our assets residing in S3 instantly provides edge caching, where we’ll have nodes distributed all around the world close to where users will access our website from.
Domain Registration (DNS)
Now, we want to buy a new domain and set up the DNS records for that domain.
Beforehand, we would use a domain name register such as NameCheap, DNSimple, and GoDaddy.
The service we want to use to register our domain is Amazon Route 53.
Route 53 is where we do all things in the scope of domain management: register our domain, configure DNS settings, build our API routes, etc.
API Hosting
Next, we need a place to host our APIs. Suppose we want to build and host a REST API.
This is where things can get a bit complicated. There are several ways to accomplish this task in AWS, which heavily depends on our use case.
1. Serverless Computing
If we are looking into serverless computing, where we won’t worry about infrastructure or provisioning servers, then we’ll want to use Amazon API Gateway in conjunction with AWS Lambda.
Lambda is a fully serverless service that run self-contained snippets of code. This service is responsible for scaling our application and provisioning the machines to run our code.
All we have to do is upload the code we want to run.
Lambda can serve as a RESTful endpoint when combined with API Gateway.
Gateway serves as our application’s proxy. We can define our REST endpoints using Route53 and API Gateway and forward any requests to a Lambda function.
2. Virtual Private Servers
Another option for API hosting is to serve our endpoints on a virtual private server (VPS).
Beforehand, we might’ve used a cloud hosting provider like DigitalOcean or Linode to purchase virtual private servers (VPS).
The service we want to use is Amazon EC2, or Elastic Cloud Compute.
I like to think of EC2 as Amazon’s virtual servers that we can rent out.
Essentially, we can do whatever we want with these servers, so why not combine this with a framework like nginx to host our REST API?
To scale up, we can place this server behind an application load balancer to delegate traffic to a machine in our cluster.
This option provides more control in how our machines are managed.
3. Containers
The last option is to host our API from a container.
The service we want to use is Amazon ECS, or Elastic Container Service, and Amazon ECR, or Elastic Container Registry.
We would first upload our Docker image to ECR, which will store and manage our container images.
Then, ECS will allow us to run our Docker images and manage our clusters.
Once again, we can put an application load balancer in front of this service to scale our application.
Database Hosting
SQL Database
We can host a SQL database using Amazon RDS, or Relational Database Service.
Just like Heroku Postgres, we can host MySQL, Microsoft SQL Server, Postgres, and Oracle databases in the cloud.
Our first option is to use an unmanaged service to host our SQL application. In this solution, we need to worry about adding things such as read replicas and multiple nodes for scaling. In general, there is more maintenance required to keep the database up and running.
The second option is to use the managed service. AWS will do all the grunt work to ensure that our database stays in a healthy state (i.e. passes health checks, performs backups, etc.). This solution even has a serverless option called Amazon Aurora.
NoSQL Database
The most popular option for hosting a NoSQL database is Amazon DynamoDB. This is our standard NoSQL, key-value database that will scale horizontally. It is incredibly fast and easy to use as well.
For all-purpose caching, AWS provides a service called Amazon ElastiCache that supports various frameworks such as Memcached and Redis.
If we’re looking into graph databases, AWS offers Amazon Neptune.
Communication Between Services
The next hurdle we want to overcome is asynchronous communication between services.
The orchestration of all this communication can be done with Amazon SNS and Amazon SQS.
SNS is our distributed publish-subscribe system. We would use this service to implement push notifications through email and SMS.
Like RabbitMQ, SQS is our distributed queuing system. We can store data for future processing in a queue.
Another very useful service for this kind of orchestration is AWS Step Functions. This is especially handy for managing workflows with lots of sequential steps.
Analytics
Now, let’s handle the analytics portions (i.e. big data, machine learning).
We can perform standard SQL queries and lookups on data stored in S3 using a service called Amazon Athena. It’s fairly cost-effective since we can avoid hosting a database like Amazon RedShift, which is a more expensive (but more reliable) data store for analytics. We can store large amounts of data in S3 and iterate through it using Athena.
In conjunction with Athena, we can use Amazon Quick Sight, which is like Tableau, but for data sources that exist in AWS. We can graph and filter data as well as easily build out dashboards.
If we’re looking to perform map-reduce operations, we can use Amazon EMR, or Elastic MapReduce. This tool uses Apache Spark, Apache Hive, Presto, and other open-source big data frameworks to process and analyze large amounts of data.
To handle the machine learning side of things, we can look into Amazon Sagemaker, which will allow us to build, train, and deploy machine learning models.
Monitoring
Monitoring is often neglected by those who are just starting out.
This is how we ensure our service stays reliable and healthy enough to serve traffic.
We can use a service called Amazon CloudWatch for monitoring. This service has many more capabilities, but is especially useful for monitoring. We can set up dashboards, graphs, and alarms on a metric that establishes the health of our resource. We can also use this service to view our logs and aggregate them in a way that is helpful to us. We can also instantiate cron jobs to trigger other services using CloudWatch.
Finally, we can use Amazon CloudTrail to audit who is doing what in our AWS stack. This is especially useful in an ecosystem with many AWS users that are constantly accessing resources day-by-day.
Security
Security is also often neglected by those who are just starting out.
If an attacker were to gain our AWS credentials, they could potentially deploy expensive services that our bank account would not enjoy.
We will need to ensure that our application is secure, both in terms of credentials (i.e. access tokens) and vulnerabilities that exist in our system (i.e. open ports).
The first service we want to use is Amazon VPC, or Virtual Private Cloud, to lock down our AWS resources (similar to VLANs).
With VPC, we can build a “digital firewall” around our AWS ecosystem. We can define inbound and outbound traffic rules. For instance, we could allow traffic only on certain ports and from certain machines. We essentially define when and how applications can access and retrieve data.
The second service to use is AWS IAM, or Identity and Access Management. This defines what resources users and AWS entities have access to. For instance, we can specify that a Lambda function only has access to a single DynamoDB table. There are lots of concepts in IAM that will be helpful to learn (i.e. users, roles, permissions, policies, etc.).