Home >

AWS Essentials 2018

Posted on October 14, 2018 by Ernesto Garbarino

AWS CLI
Identity and Access Management (IAM)
Simple Storage Service (S3)
S3 Lifecycle Rules
Cloud Front
Storage Gateway
Snowball
S3 Transfer Acceleration
EC2
- Security Groups
- Recipe for running Apache on an existing EC2 instance
- AWS CLI on EC2
- Get Metadata (!)
- Placement Groups
EBS (Elastic Block Storage)
- RAID and EBS
- Snapshots
- EBS vs Instance Store
- AMIs
Load Balancing
Cloud Watch
CloudTrail
AWS CLI
Elastic File System (EFS)
Lambda
- General Points
- Triggers
- Languages
Route 53 & DNS
- Top Level Domains
- Domain Registrars
- Start Of Authority Record (SOA)
- Name Server Records
- Common Record Types
ELBs and IP Addresses
- Routing Policies
Databases
- Elasticache
- Amazon RDS
- DynamoDB
  - Consistency Model
- RedShit
Aurora
Microsoft SQL Server
Amazon Virtual Private Cloud (VPC)
- ELBs and VPCs
- Creating a new VPC
- Unavailable IPs
- NAT Instances
- NAT Gateway
- Network ACL (NACL)
VPC Flow Logs
- Endpoints
- Internet Gateway
Application Services
- (Simple Queue Service) SQS
- Simple Workflow Service (SWF)
  - Queue Types
  - Amazon SNS
- Elastic Transcoder
- API Gateway
- Kinesis
- Simple E-Mail Service (SES)
Best Practices
- Cloud Benefits
- Design for Failure
- Decoupling
- Elasticity
- Security
Well Architected Framework (WAF)
- General Design Principles
- Security
- Reliability
  - Design Principles
  - Best Practices
- Performance Efficiency
  - Design Principles
  - Best Practices
- Cost Optimisation
  - Design Principles
  - Best Practices
- Operational Excellence
  - Design Principles
  - Best Practices
AWS Organizations
Cross Account Access
Consolidated Billing
Tags
Resource Groups
VPC Peering
Direct Connect
Security Token Service (STS)
Workspaces
Elastic Container Service (ECS)
Amazon EC2 Container Registry (ECR)
Security
Support Levels
AWS Trusted Advisor
Elastic Map Reduce
TODO
Reserved Instances

AWS CLI

# install
pip install awscli

# authenticate
# create user on console to get Access Key Id/Secret
aws configure

Identity and Access Management (IAM)

IAM is global; it applies to all regions
The initial account is called root and has global access.
Creating New Users
- New users have no permissions when first created by default
- New users are assigned:
  - Access Key ID
  - Secret Access Keys
- The above two should be saved immediately (e.g. CSV download) since they cannot be retrieved afterwards
Structure
- Users
- Groups: Set of users so that policies can be applied collectively
- Roles: Resources (e.g. EC2) rather than user-level access
- Policy Documents: The actual permissions (e.g. Grant, Deny) specified in JSON. (!)

Simple Storage Service (S3)

Buckets live in a universal namespace for all AWS accounts and users
The primarily interface is HTTP (e.g. successful upload is a 202)
Storage Classes apply per object, not per bucket: (!)
- Standard: 99.99% availability and 99.9999999 durability. (!)
- Standard-IA (Infrequent Access): 99.9% availability
- One Zone-IA: 99.5% availability + potential for 100% of data loss if zone is destroyed
- RRS (Reduced Redundancy Storage): 99.99% availability and durability. 0.01% expected object loss over a year. (!)
Security (!)
- In-Transit (SSL/TLS)
- Client Side Encryption
- Server Side Encryption
  - AWS Managed Keys (SSE-S3)
    - Based AES-256
  - KMS (SSE-KMS)
    - It includes an additional envelope key
    - It provides auditing on key usage
  - Customer Provided Keys (SSE-C)
Access control
- ACL
- Bucket Policies
Bucket Properties
- Key/Value pairs
- VersionID
- Metadata
- Access Control Lists
Access is Private by default
MFA may be used to prevent accidental deletion of files
Versioning
- Stores all writes/deletes to an object
- It can’t be disabled; only suspended
- It integrates with life cycle rules
- It can also integrate with MFA
Replication
- Versioning must be enabled on both source and destination buckets
- Replication, once enabled, only applies to new files, not existing ones (these must be copied manually)
- Delete markers are replicated but the deletion of the delete markers themselves is not
Normal bucket URL
- s3-region-amazonaws.com/bucket
Static Web Site Hosting
- Selected via Properties after creating a regular bucket
- Url format is for a bucket called garba-static: http://garba-static.s3-website.eu-west-2.amazonaws.com.
  - Syntax: (!)bucket-s3-website-region.amazonaws.com`
- Bucket names should use registered domain names for static web hosting
- index.html and error.html documents may be defined
Consistency Model (!)
- Read-after-write consistency for PUTS of new objects
- Eventually consistency for overwrite PUTS & DELETES
Access Control List identifies accounts with an email address or the canonical user id (!) http://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html

# list buckets
aws s3 ls

# list a bucket's files
aws s3 ls bucket1

# copy one bucket to another
aws s3 cp --recursive s3://bucket1 s3://bucket2

# copy static website under ./site to s3 and make public
aws s3 cp --recursive --acl public-read _site/ s3://garba-static

# dealing with InvalidRequest errors (specify EC2 region)
aws s3 cp s3://sao_paulo_bucket/cowboy.jpg /tmp/ --region eu-west-1

S3 Lifecycle Rules

Lifecycle management works with and without versioning
It can be applied to both current and previous versions
It allows to
- Transitions objects to Standard-IA (min 30 day)
- Archive to Glacier (min 30 days)
- Permanent Deletion

Cloud Front

It is a Content Delivery Network (CDN)
An Edge location is where content will be cached
- Edge locations are not just READ only. They can be written to.
Origin: A CDN has an origin which provides the files that the CDN will distribute:
- A S3 Bucket (most likely)
- An EC2 Instance
- An Elastic Load Balancer
- Router53
Distribution: A collection of Edge Locations for a given CDN configuration. There are two types:
- Web Distribution (for web sites)
- RTMP (for media streaming)
Objects are cached for the life of the TTL (Time to Live)
Objects can be purged/cleared explicitly too (but it has a cost)
AWS WAF can be used to secure an edge location

Storage Gateway

A virtual appliance (e.g VM image for ESXi of Hyper-V) installed on-site
It replicates data to S3 and Glacier
Connection Patterns
- Storage Getaway -> Internet -> S3 -> …
- Storage Gateway -> Direct Connect -> S3 -> …
- Storage Gateway -> Amazon VPC -> S3 …
Types of Storage Gateway
- File Gateway
  - Based on NFS
  - Files stores on S3 (ownership, permissions, timestamps)
  - Once transferred to S3 they become native S3 objects
- Volumes Gateway (iSCSI) based on block storage (not for s3) (!)
  - Stored Volumes
    - Data written locally at high-speed
    - Data synchronised at a block level asynchronously to S3 and then saved as EBS snapshots
    - From 1GB to 16TB
  - Cached Volumes
    - Primary storage is S3
    - Most of storage space in the cloud and little on premise
    - Frequently accessed data is cached on-premise
  - Tape Gateway (VTL)
    - Backup and archiving solution based on Virtual Tapes)
    - Uses existing tape backup infrastructure
    - Examples: NetBackup, Backup Exec, Veeam

Snowball

It used to be called AWS Import/Export
Petabyte-scale data transport solution
Secure appliance (25 but encryption)
Use of Trusted Platform Module (TPM)
Amazon performs software erasure of the Snowball appliance
It can
- Import to S3
- Export from S3
It is necessary to enter an on-line generated unlock code onto the appliance

Types

Regular Snowball
- The size of a briefcase
Snowball Edge
- “A little AWS on-premise”: It can run Lambda functions
- 100TB with on-board storage and compute
- Use to move data in and out of AWS
- It supports standard storage interfaces
Snowmobile
- It is 45-foot long shipping container pulled by a semi-trailer truck
- For petabytes worth of data
- Can transfer up to 100PB

The snowball software works similarly to the AWS cli tool. Software must be copied into “buckets” that will then end up in the proper cloud bucket when Amazon gets the appliance back:

./snowball cp hello.txt s3://my_bucket

S3 Transfer Acceleration

It uses the CloudFront Edge Network to accelerate uploads to S3
A different URL is used rather than the regular S3 bucket one
- Upload is to local edge node
- Amazon then transfers from the edge node to the actual s3 bucket
A CloudFront-powered URL is created such as rato-accelerate.s3-accelerate.amazonaws.com

EC2

Amazon Elastic Compute Cloud (Amazon EC2)
Pricing Options
- On-Demand
  - By the second for Linux and by the hour for Windows
  - Flexible for unpredictable workloads
- Reserved
  - Discount on the hourly charge with 1-3 Year Committment
  - Steady state/predictable usage
  - Sub-types
    - Standard RIs (Up to 75% off)
    - Convertible RIs (Up to 54% off)
      - May allow changing the machines’ properties provided that the value remains lower or the same
    - Scheduled RIs (applicable to a time window)
- Spot
  - A bidding scheme for buying discounted compute
  - For applications that are only feasible at very low compute prices
  - If Amazon terminates the instance, the customer will not be charged for the partial hour of usage in which the termination took place. However, if the customer terminates the instance, the charge will apply for the entire hour.
- Dedicated Hosts
  - Physical dedicated EC2 Host
  - Non multi-tenant
  - Adequate for regulatory and/or licencing constraints
  - Can be purchased on-demand too
Instance Types
- F1 - Field Programmable Gate Array (FPGA) for Genomics research, financial analytics, real-time video processing, etc.
- I3 - High Speed Storage for NoSQL DBs, Data Warehousing, etc.
- G3 - Graphics Intensive for video encoding, 3D Application streaming, etc.
- H1 - High Disk Throughput for MapReduce-based workloads, distributed file systems such as HDFS and MAPR-FS
- T2 - Lowest Cost, General Purpose for web servers, small DBs
- D2 - Dense Storage for file servers, data warehousing, Hadoop, etc.
- R4 - Memory Optmized for memory intensive apps
- M5 - General purpose for application servers
- C5 - Compute optimized for CPU intensive apps/DBs/etc
- P3 - Graphics/General Purpose GPU for machine learning, Bitcoin mining, etc.
- X1 - Memory Optimized for SAP HANA, Apache Spark, etc.
Mnemoics (FIGHT DR MAC PX)
- FPga
- IOps
- Ggraphics
- High Disk Throughput
- Ttrashy and cheap general purpose
- Density
- Ram
- Main choice for general purpose apps
- Ccompute
- Ppics (Graphics)
- Xtreme Memory
Termination Protection is turned off by default
The root EBS volume is deleted by default when the EC2 instance is terminated
EBS-backed root volumes may be encrypted as of 2018
Virtualisation types (!)
- Para-Virtual (PV)
- Hardware Virtual Machine (HVM)

Security Groups

A Security Group is a Virtual Firewall for EC2 instances
A EC2 instance may have multiple security groups
A security group may be applied to multiple instances
All Security Groups apply immediately after they are applied
Security Groups are Stateful: Inbound traffic is allowed back out again
Security Groups cannot block specific IP addresses
It is not possible to deny traffic. It is a whitelist
All outbound traffic is allowed by default (!)

Recipe for running Apache on an existing EC2 instance

# after downloading key, remove access to group and others
chmod 400 myEC2.pem

# ssh into EC2 instance
ssh ec2-user@4.8.23.237 -i myEC2.pem

# update packages on Linux AMI instance
sudo yum update -y

# install apache
sudo yum install httpd -y

# create page
echo "Hello World" > /var/www/html/index.html

# start httpd
sudo service httpd start

# always start at reboot
sudo chckconfig httpd on

AWS CLI on EC2

# Get EC2 Instances (including terminated ones)
$ aws ec2 describe-instances

# Get instance Ids
$ aws ec2 describe-instances | grep InstanceId

# aws ec2 terminate-instances --instance-ids 
aws ec2 terminate-instances --instance-ids i-0090856f1626a0928

Get Metadata (!)

curl http://169.254.169.254/latest/meta-data/
curl http://169.254.169.254/latest/user-data/

Placement Groups

Clustered Placement Group (the default “Placement Group”)
- It is for placing EC2 instanced within the same availability zone (!)
- Not all instances can be launched into a Clustered Placement Group
Spread Placement Group
- Each instance lands on distinct underlying hardware
General (all)
- The name should be unique within the AWS account
- Placement groups can’t be merged
- Existing instances cannot be moved into a placement group

EBS (Elastic Block Storage)

Storage volumes attached to EC2 instances
First volume (where the OS runs) is known as the root device volume
Types
- GP2 (General Purpose SSD)
  - Balance between price and performance
  - Ratio of 3 IOPS per GB with up to 10,000IOPS
  - Ability to burst up to 3000 IOPS for extended periods of time for volumes at 334 GiB and above
- IO1 (Provisioned IOPS SSD)
  - For I/O intensive applications such as large relational or NoSQL databases
  - Useful if more than 10,000 IOPS is required
  - Up to 20,000 IOPS may be provisioned per volume
- ST1 (Throughput Optimized HDD, Magnetic)
  - Big data, data warehouses, log processing
  - Cannot be a boot volume
- SC1 (Cold HDD, Magnetic)
  - Lowest cost storage for infrequently accessed workloads
  - File Server
  - Cannot be a boot volume
- Standard (Magnetic)
  - Lowest cost per GB for a bootable drive
  - Ideal for infrequently accessed data
Volumes exist on EBS
Volumes and EC2 need to be in the same availability zone
1 EBS volume:1 ECS2 instance
It is preferable to create roles for EC2 instances to access other resources (such as S3) rather than relying on the access key and secret. Such a role may be assigned after an instance has been created
Detaching rules (!)
- If it is a root volume, it can’t be detached without stopping the instance first
- If it is a non-root volume, it may be detached

RAID and EBS

RAID stands for Redundant Array of Independent Disks

RAID 0 (Striped)
RAID 1 (Mirrored, Redundancy)
RAID 5 (Good for reads, bad for writes) - Not recommended by AWS
RAID 10 - Striped & Mirrored, Good Redundancy, Good Performance

Snapshots

Snapshots exists on S3 (but there are no publicly accessible buckets)
Snapshots are point in time copies of Volumes
Snapshots are incremental (only deltas stored in S3)
AMI images can be created out of snapshots
Snapshots are the objects that can travel from region to region
Snapshots of encrypted volumes are encrypted automatically
Volumes from encrypted snapshots are encrypted automatically
Only unencrypted snapshots may be shared (to other AWS accounts or made public)
Amazon insists on stopping a instances before taking snapshots
On the CLI: aws ec2 create-snapshot

EBS vs Instance Store

EBS Volumes are created from an EBS snapshot. EBS is essentially network attached storage
The volume is created from an instance stored in Amazon S3
EBS can be preserved upon instance termination unlike Instance Store

AMIs

They are regional

Load Balancing

3 Types of Load Balancers
- Application Load Balancers (Layer 7)
- Network Load Balancers (Layer 4)
- Classic Load Balancers (ELB)
504 error: the gateway has timed out
X-Forwarded-For Header: It is a mechanism for the load balancer to identify the original requestor’s IP address.

Cloud Watch

Features
- Dashboards allow custom visualisation
- Alarms allow to create notifications when particular thresholds are hit
- Events allow to react to events in the state of AWS resources
- Logs help aggregate logs—it requires an agent to be installed.
EC2 Metrics (Out of the Box) (!)
- Disk
- Network
- CPU
Monitoring Types
- Standard = 5 minutes
- Detailed = 1 minute (extra price)
CloudTrail is for auditing (e.g. user john created an S3 bucket) rather than monitoring and it is not the same as CloudWatch

CloudTrail

AWS CLI

$ aws configure
$ cd ~/aws
$ ls -la

Elastic File System (EFS)

It is a storage service for EC2
Storage capacity is elastic
It doesn’t need to be pre-provisioned (e.g. like EBS volumes)
Supports NFS (NFSv4)
Pay per use
It can be mounted by multiple EC2 instances
Data is stored across multiple AZs within a region
Read After Write Consistency

Lambda

General Points

Event-driven with multiple trigger sources including HTTP
Maximum duration is 5 minutes

Triggers

API Gateway
AWS IoT
Alexa Skills
Alexa Smart Home
CloudFront
CloudWatch Events
CloudWatch Logs
CodeCommit
Cognito Sync Trigger
DynamoDB
Kinesis
S3
SNS

Languages

C#
Java
Node
Python

Route 53 & DNS

The name originates because the DNS port is 53 An apex record if one at the root of a DNS zone. They are also known as naked domains There is a limit of 50 domains that can be raised by contacting AWS support

Top Level Domains

* Domains such as .com, .edu, .gov
* Controlled by the Internet Assigned Numbers Authority (IANA)
* Database at http://www.iana.org/domains/root/db

Domain Registrars

* They can assign domain names under one or more top-level domains
* They are registered with InternNIC, a service of ICANN
* Each domain name is registered in the WhoIS database

Start Of Authority Record (SOA)

* The server that supplied the data for the zone
* The zone's administrator
* The current version of the data file
* The default number of seconds for the time-to-live (TTL) file on resource records

Name Server Records

Name Server Records (NS) are used by Top Level Domain servers to point to the authoritative DNS that holds the DNS records.

Example of a NS record pointing to Amazon set up at a Registrar (e.g. GoDaddy)

mydomain.com. 86400 IN NS ns.awsdns.com

Common Record Types

The Address (A) record translates a domain name to an IP address. For example www -> 192.34.34.1
The Canonical Name (CName) resolves one another into another. For example www2 -> www
The Start of Authority (SOA) record defines the boundary for which the DNS is responsible
The Name Server (NS) record defines the server(s) that solves names for a top-level domain
The Mail Server (MX) record defines the location of the mail server for a given domain
The Alias record type is unique in Route 53 and map names to AWS resources such as S3 buckets
The PTR record is used for reversed DNS look-ups

ELBs and IP Addresses

ELBS do not have pre-defined IPv4 addresses. They are resolved using DNS names

Routing Policies

Simple
- One single record with multiple IP addresses
- If more than one, all values are returned to the user in random order
- The returned value will be cached so it may behave in a sticky manner
Weighted
- A percentage of traffic goes to one region, a percentage to another
- For example: 20% eu-east1, 80% sa-east-1
- Set ID must be unique
Latency
- It allows to route traffic based on the lowest network latency applicable to the end user
- It uses a latency resource record set in each region associated with the EC2 or ELB resource
- Set ID must be unique
Failover
- Useful to create an active/passive setup
- A health check can transition from one set of IPs to another
Geolocation
- It allows to chose hosts or IPs based on the location of the requesting users
- It has continent-wise and country-wise granularity
- It has a catch-all location called default identified by *
Multivalue Answer
- Multiple resources per host
- Each host can have a health check
- Up to 8 records

Databases

Online Transaction Processing (OLTP) is about row-level transactions. Set Order #34234 Delivery Status to Shipped.
Online Analytics Processing (OLAP) is about multi-row calculation (e.g. Sum of sold goods).

Elasticache

Amazon managed-service for in-memory caching:

Memcached
Redis

Amazon RDS

An OLTP offering:

SQL Server
MySQL
MariaDB
PostgreSQL
Aurora (MySQL or PostgreSQL wire compatible)
Oracle

mysql -u ernie -p -h mydb.cugrv9uf52uw.eu-west-2.rds.amazonaws.com -D my_database

Replicating from the primary RDS instance to the secondary one is free
There is no need to specify ports when adding a rule to a RDS security group
I/O operations are suspended for the duration of the snapshot https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_CreateSnapshot.html

For provisioned IOPS SSD Storage the following ranges apply:

Databases	IOPS	Storage
MariaDB, MySQ, PostreSQL	1k-40k	100GiB-16TiB
SQL Server Web/Express	1k-32k	100GiB-16TiB
SQL Server Standard/EE	1k-32k	20GiB-16TiB
Oracle	1k-40k	100GiB-32TiB

IOPS is fishy. More info at https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Storage.html#USER_PIOPS

Automated Backups

They can recover data to any point in time within a retention period.
The retention period is between 1 and 35 days. (!)
Enabled by default (!)
Backup data is stored in S3
Free storage is equal to the size of the database
Backups take place during a defined window which may be changed to minimise performance impact
The change of a backup window takes place immediately (!)
The restoring of backups lead to new DNS endpoint since a new instance is created
Snapshots, unlike automated buckups are done manually (user-initiated)

Encryption

Encryption at rest is supported for nearly all non-free tier RDS databases.
Live DB instances cannot be encrypted
Snapshots may be encrypted though and then new encrypted instances created out of them
Encryption is done using the AWS Key Management (KMS) Service

Multi-AZ Replication

Multiple-AZ allows changes to be replicated from one-read/write replica in one availability zone to other read-only replicas for disaster recovery purposes.
Multi-AZ is for DR only and not performance.
The Multi-AZ built-in capability does not provide direct access to the replicated replicas.
A failover can be forced for RDS instances that have Multi-AZ configured

Red Replica

Up to 5 read replicas can be set up in production by default
Read replicas may be both in different availability zones and regions
It is based on asynchronous replication
It is not available for Oracle and neither SQL Server
It is for performance only, not DR!
Both Multi-AZ and multiple replicas capabilities can be applied concurrently. They are not mutually exclusive.

DynamoDB

A NoSQL offering
Uses SSD storage
Spread across 3 geographically distinct data centres
Provisioned Through Capacity
- Write $0.0065 per hour for every 10 units
- Read $0.0065 per hour for every 50 units
- Storage costs of $0.25/GB per month
It does not allow to select availability zone (!)

Consistency Model

Eventual Consistent Reads (Default)
- Changes are propagated within 1 second
Strongly Consistent Reads
- The result reflects all writes

RedShit

An OLAP offering.

Single Node (160Gb)
Multi-Node
- Leader Node (manages client connections)
- Compute Node (store data and perform queries and computations). Up to 128 Compute Nodes
Column-based system (sequential data storage)
Massively Parallel Processing (MPP) capability
Pricing
- Compute Node Hours
- Leader Node is not Charged
Encrypted in transit using SSL
Encrypted at rest using AES-256 encryption
Currently available only in 1 AZ
Can restore snapshots to new AZs in the event of an outage
The block size for its columnar storage is 1024KB / 1MB

Aurora

2 copies of data contained in each availability zone.
Since there are 3 availability zones, this means that there are 6 copies in total (!)
2 copies of data may be loss without affecting write availability
3 copies may be lost without affecting read availability
Aurora Replica Types (2)
- Aurora Replicas (15)
- MySQL Read Replicas (currently 5)

Microsoft SQL Server

Storage is fixed and cannot be increased (!)

Amazon Virtual Private Cloud (VPC)

Amazon VPC is a capability that allows to provision a logically isolated section and network so that resources can be secured and grouped into trust areas.

5 VPCs are allowed in each region by default
Hardware Virtual Private Network (VPNs) may be created between a corporate datacentre and a VPC so that AWS becomes an extension to the corporate data centre.
1 Subnet = 1 Availability Zone
VPC Components
- A connection method:
  - Internet Gateway
  - Virtual Private Gateway
- A router
  - Route Table
  - Network ACL
- Private and Public subnet(s)
  - Security Group
  - Resources secured using the Security Group
General Capabilities
- Launching instances into a specific subnet
- Assign custom IP addresses to ranges in each subnet
- Configure route tables between subnets
- Attach an Internet gateway to a VPC
- Establish network access control lists (ACLS) across subnets
Peering
- VPCs can be interconnected using a direct network route
- VPCs can be peered with other AWS accounts as well as with other VPCs within the same account
Default VPC
- All subnets in default VPC have a route out to the internet
- Each EC2 instance has both a public and private IP address
- No transitive peering
An Internet Gateway can only be attached to one VPC at a time.
Security Groups exists at the VPC Level
Subnets are associated with only one Network ACL
Subnets within a VPC can communicate with each other by default across availability zones (!)

ELBs and VPCs

ELBs can only operate on public subnets
Public subnets must have an Internet Gateway attached to them
At least two subnets must be specified

Subnet Ranges

CIDR Prefix	First IP	Last IP	Total
(10/8)	10.0.0.1	10.255.255.255	16,777,216
(182.16/12)	172.16.0.1	172.31.255.255	1,048,576
(192.168/16)	192.168.0.1	192.168.255.255	65,536

http://cidr.xyz/

Creating a new VPC

Creating a new VPC results in the automatic creation of:

Default Route
Default Network ACL
Default Security Group

Unavailable IPs

For example, in a subnet with CIDR block 10.0.0.0/24, the following five IP addresses are reserved:

10.0.0.0: Network address.
10.0.0.1: Reserved by AWS for the VPC router.
10.0.0.2: Reserved by AWS. The IP address of the DNS server is always the base of the VPC network range plus two; however, we also reserve the base of each subnet range plus two. For VPCs with multiple CIDR blocks, the IP address of the DNS server is located in the primary CIDR. For more information, see Amazon DNS Server.
10.0.0.3: Reserved by AWS for future use.
10.0.0.255: Network broadcast address. We do not support broadcast in a VPC, therefore we reserve this address.

NAT Instances

NAT Instances are AMI virtual machines that work as a NAT router.

On EC2/Network Settings, the option to check Source/Destination should be disabled in order for the NAT instance to work
NAT instances must be deployed in a public subnet
The instance size affects performance
Autoscaling Groups are necessary for high availability in multiple subnets
They are restricted by a security group
Route tables must be updated

NAT Gateway

A NAT gateway is a cloud native managed service rather than a user-managed EC2 instance

It scales automatically up to 10GBps
No need to patch
Not associated with security groups
It gets an IP address automatically
Route tables must be updated
They must be deployed in multiple AZs for high availability
NO need to disable source/destination

Network ACL (NACL)

Since NACL is stateless, both inbound and outbound rules must be created for regular TCP services like HTTP
Default NACLs allow all outbound and inbound traffic
New private NACLs have all inbound and outbound rules denied by default
Amazon recommends rule numbers to be in increments of 100
There is a *:1 relationship between subnets and NACLs
If a NACL isn’t specified, a subnet will be associated with the default NACL
NACLs are evaluated in order
Because of protocols using ephemeral ports such as FTP, a rule allowing traffic for ports 1024-65535 is typically defined as an outbound rule.
NACLs allow blocking IP addresses, unlike Security Groups

VPC Flow Logs

It allows capturing information about IP traffic going to and from network interfaces in a VPC using Amazon CloudWatch.

They can be created at three levels:

VPC
Subnet
Network Interface Level

General

Flow logs can only be enabled for VPCs under one’s account. This is important for “peered” VPCs
Flow logs can’t be tagged
Once created the configuration can’t be changed. They must be deleted and created again.
Not all IP traffic is monitored:
- Traffic to the Amazon DNS Server (rather than a user-provided one)
- Traffic generated by Window instances for license activation
- Traffic for metadata access to 169.254.169.254
- DHCP traffic
- Traffic to the reserved UP address for the default VPC router

Endpoints

Two types

Elastic Network Interface (ENI) serves as an entry point for traffic destined to a service
A gateway endpoint serves as a target for a route in one’s route table for traffic destined for the service

Internet Gateway

Online 1 Internet Gateway can be attached to a VPC

Application Services

(Simple Queue Service) SQS

Oldest Amazon Service
Decouples producers from consumers
It is pull rather than push service
Messages can be of up to 256Kb in size
Messages may be kept in the queue from 1 minute to 14 days
- Default retention period is 4 days
Visibility Timeout: the amount of time the message becomes invisible whilst being picked up by a reader client
- Default timeout: 30 seconds (may be increased)
- Maximum timeout: 12 hours
There is a long polling mechanism which allows to wait until a message arrives to the queue.

Simple Workflow Service (SWF)

It is a solution for workflows that involve human or human-like interaction
It ensures that tasks are only assigned once
It keeps track of the application state without requiring a user-provided application for this purpose
Maximum workflow can be 1 year
Concepts
- Domains: Domains scope related activity types, tasks lists, and so on.
- Workers: Programs that interact with SWF to get tasks, process received tasks, and return results.
- Deciders: The decider is a program that controls the coordination of tasks (their ordering, concurrency, scheduling, etc)
Differences with SQS
- It is task orientated rather than message-orientated
- It ensures that tasks are assigned only once and never duplicated
- It ensures that tasks are processed only once
- It keeps track of all tasks and events in an application

Queue Types

Standard
- Nearly unlimited number of TPS
- At least once delivery
- No ordering guarantees (best effort)
- No once and only once guarantee
FIFO Queues
- Limited to 300 TPS
- Once and only once guarantee
- No duplicates
- Strict order

Amazon SNS

Unlike SQS is a push system
A capability to send notifications to users (sms, e-mail, etc.)
Devices/Subscriber types:
- Apple
- Google
- Fire OS
- Windows devices
- Baidu Cloud Push
- SMS-Text
- Email
- SQS
- HTTP endpoints
- Lambda functions
They are stored redundantly across multiple AZs stored redundantly across mlultiple AZs
Concepts:
- Topics
- Subscriptions
- Subscribers to Topics
SNS Pricing
- $0.50 per 1 million SNS requests
- $0.06 per 100,000 Notification deliveries over HTP
- $0.75 per 100 Notification deliveries over SMS $ $2.00 per 100,000 Notifications over Email

Elastic Transcoder

Media files (MP3, MP4, etc.)
Presets for popular formats

API Gateway

API Caching with TTL
Security (Auth, etc)
Throttling
Cloudwatch hooks for request logging
Cross-Origin Resource Sharing (CORS)

Kinesis

Data that is generated continuously
Numerous but small data chunks/events
Use Cases/Examples:
- Stock prices
- Game data (as player moves)
- Social network data
- Geospatial data
- iOT sensor data

Kinesis Streams

It stores the data from producers (EC2, phones, iOT devices)
Data is stored in shards
Data retention is between 24 hrs and 7 days
Properties:
- Reads: 5 TPS at 2MB/s
- Writes: 1000 TPS at 1MB/s
Streams data back to consumers

Kinesis Firehose

It abstracts away from shards
It can hook lambda so that it is processed directly
Results can be sent directly to S3
There is no data retention window
- Data is processed by Lambda
- Data is sent to S3
- Data is sent to a Elasticsearch cluster
- Data is sent to Redshift

Kinesis Analytics

It abstract always from shards
It allows running SQL queries on incoming data
There is no data retention window
- Data is sent by S3
- Data is sent by Redshift
- Data is sent to Elasticsearch cluster

Simple E-Mail Service (SES)

Best Practices

Cloud Benefits

Automation/IaaS
Auto-scaling
Proactive Scaling
More efficient SDLC
Improved Testability
DR and BC
“Overflow” the traffic to the cloud

Design for Failure

Hardware will fail
Design with automated recovery from failure in mind
Assume higher TPS than expected

Decoupling

Components may
- die (fail)
- sleep (not respond)
- remain busy (slow to response)
Consumers need to be tolerant of the above as if no failure whatsoever were to occur

Elasticity

Proactive cyclic scaling: (daily, weekly, etc)
Proactive business event-based scaling (e.g. Christmas, product launch, etc.)
Auto-scaling based on demand: based on metrics and triggers

Security

Web server only public access to port 80/443
SSH only open to developers in corporate office network
Only App layer can have direct access to DB Server

Well Architected Framework (WAF)

File pillars:

Security
Reliability
Performance Efficiency
Cost Optimisation
Operational Excellence

General Design Principles

Stop guessing your capacity needs
Test systems at production scale
Automate to make architectural experimentation easier
Allow for evolutionary architectures
Data-Driven architectures so that decisions are fact-based
Improve through game days (simulation of production-like scenarios such as Black Friday)

Security

Design Principles

Apply security at all layers
Enable traceability
Automate responses to security events
Focus on securing your system
Automate security best practices

AWS Shared Responsibility Model

Customer
- Customer Data
- Platform, Applications, IAM
- Operating System, Network & Firewall Configuration
- Client-side data encryption & authentication
- Server-side encryption (File system and/or Data)
- Network traffic protection (encryption, integrity, identity)
AWS
- Compute
- Storage
- Database
- Networking
- AWS Global Infrastructure
  - Regions
  - Availability Zones
  - Edge Locations

Security Best Practices

The key areas data protection, privilege management, infrastructure protection, and detective controls.

Data protection

Start with a data classification process (public, private, confidential, CEO only, etc.)
Implement a need-to-know only access policy
Encrypt everything whenever possible, both data at rest (e.g EBS, S3, RDS) and in transit (e.g. ELB, SSL)
- AWS can encrypt data and rotate kids automatically
Use versioning to protect data against accidental modification, overwrites and deletes.
Consider explicit transfer of data to different regions—which is never automatic to avoid the inadvertent breaking of leglislation such as DPA 2018 in the UK.

Privilege management

Ensure that only authorised and authenticated users (IAM) are able to access your resources by using:
- Access Control Lists (ACLs)
- Role Based Access Controls
- Password Management
Take extreme care to protect the AWS root account credentials
- Enable MFA
Define R&R for system users to control human access to the AWS Management console and APIs
Limit automated access to AWS resources (e.g. EC2 instance to S3 bucket)
Devise a strategy to manage keys and credentials

Infrastructure protection

Implement physical access controls on-prem:
- RFID access
- Lockable cabinets
- CCTV
Enforce network boundary protection
- Public subnets
- Private subnets
Enforce host-level protection
- User-based access to resources
- Access to hosts through a bastion host
Implement AWS service-level protection
- Groups and privileges
- MFA
Protect the integrity of OSs installed on EC2 instances
- Patching
- Anti-virus

Detective controls

Use detective controls to detect or identify security breaches
Use services that help carry out investigations and auditing:
- AWS CloudTrail
  - Capture and analyse logs for applicable services
  - Make sure it is enabled in each relevant region
- Amazon CloudWatch
- AWS Config
- Amazon Simple Storage Service (S3)
- Amazon Glacier

Reliability

Design Principles

Test recovery procedures
Automatically recover from failure
Scaling horizontally to increase aggregate system availability
Stop guessing capacity

Best Practices

Key areas are foundations, change management, and failure management.

Foundations

Be mindful of the communications link between your HQ and datacentre
Plan your network topology in advance (VPC, Subnets, etc.)
Be mindful of the service limits set by Amazon to stop customers from over-provisioning resources (Google AWS Service Limits), for example:
- 5 VPCs per region
- 5 Internet gateways per region
Appoint someone responsible to manage AWS service limits
Define a path to escalate technical issues

Change Management

Have a plan to monitor changes to all relevant AWS resources (e.g. using AWS CloudTrail)
Instrument the detection of changes in the environment to react to them
Choose automated solutions such as autoscaling whenever possible to adapt to changes on demand

Failure Management

Architect your systems with the assumptions that failure will occur
Learn from failures when they do occur and plan how to prevent them in the future
Have a backup and recovery strategy
Have a failure coping strategy for each component (e.g. using AWS CloudFormation)

Performance Efficiency

Design Principles

Democratise advanced technologies (e.g. let teams consume as-a-service databases)
Go global in minutes (e.g. multi-region services using CloudFormation)
Use serverless architectures
Experiment more often (since it is easy using on-demand, pay-per use services)

Best Practices

The four key areas are compute, storage, database, and space-time trade-off.

Compute

Establish a performance monitoring mechanism
Change the machine type (EC2) when it no longer suits one’s needs (e.g. CPU and RAM consumption)
Change the number of instances when vertical scaling (e.g. bigger EC2 instance) is inappropriate (use Autoscaling)
Be aware of emerging new type of machine types (e.g GPU optimized)
Consider moving code to Lambda whenever possible

Storage

Keep in mind that the optimal storage solution depends on a number of factors.
- Access type
  - block (e.g. raw file system) (EBS)
  - individual files (S3)
- Access pattern
  - Random
  - Sequential
- Throughput (i.e. IOPS)
- Read Frequency
  - Online
  - Offline (e.g. Glacier)
  - Ad-hoc
- Update/Write Frequency
  - Worm
  - Dynamic
- Constraints/Trade offs
  - Availability
  - Durability
Set a system in place to scope the storage requirement and select the most appropriate solution
Set a system in place to learn about new storage solutions and switch to them whenever price and/or capability appropriate
For databases
- Select the most appropriate database solution for the use case at hand (e.g. SQL vs No-SQL)
- Monitor database performance
  - Capacity
  - Throughput

Space-Time Trade-off

Consider
- CloudFront for content and media caching
- ElastiCache for software object/data caching
- RDS Read Replicas for database read performance
- Direct Connect for lower and stable latency
Devise a system to select the most appropriate proximity and caching solutions for the problem at hand
Devise a system to measure performance and tell whether the current solution is still effective or whether a new one should be considered

Cost Optimisation

Design Principles

Transparently attribute expenditure
Use managed services to reduce TCO
Trade capital expense for operating expense
Benefit from economies of scale

Best Practices

The four key areas are: matched supply and demand, cost-effective resources, expenditure awareness, and optimizing over time.

Matched Supply and Demand

Don’t over or under provision: align supply with demand
Take advantage of Autoscaling and pay-per-use services like Lambda
Use CloudWatch to keep track of actual demand

Cost-effective Resources

Use the correct EC2 instance types
- A more powerful instance that completes its task in a few minutes may be more effective than a less powerful one that takes longer but is cheaper on a per hour basis
Select the most appropriate cost model (e.g. reserved vs spot instances)
Consider managed services to reduce maintenance/administration costs
Use services such as AWS Trusted Advisor

Expenditure Awareness

Set up access controls and procedures to control costs
Use cost allocation tags to keep track of expenditure by different teams
Set up billing alerts
Consider consolidated billing if applicable
Have a mechanism to decommission redundant resources
Have a mechanism to suspend or stop resources that are temporarily not needed
Consider data transfer charges into your architectural model

Optimizing Over Time

Implement a mechanism to be aware of new, most cost effective and/or capable AWS services
- For example: Aurora, launched in 2014, is, in most cases, cheaper and faster than the traditional MySQL and PostgreSQL alternatives
Consider subscribing to the AWS Blog
Consider relying on services such as AWS Trusted Advisor

Operational Excellence

Design Principles

Perform operations with code
Align operations processes to business objectives
Make regular, small, incremental changes
Test for responses to unexpected events
Learn from operational events and failures
Keep operations procedures current

Best Practices

The key areas are preparation, operation, and response.

Preparation

Use operations checklists to:
- ensure that workloads are ready for production
- prevent unintentional production promotion without effective preparations
Make sure that workloads have:
- Runbooks: operations guidance that Ops team can refer to
- Playbooks: guidance for responding to unexpected operational events
  - Escalation paths
  - Stakeholder notifications
Use AWS services for preparation
- CloudFormation for setting up environments
- Autoscaling to automatically respond to business events
- AWSConfig to automatically track and respond to changes in AWS workloads and environments
- Tagging to group related resources in a workload
- AWS Service Catalogue to create a standardized set of service offerings that are aligned to best practices.
- AWS SQS to decouple systems and minimise the effects of failure

Operation

Make sure documentation is up-to-date
Make sure operational focus is on:
- Automation
- Small frequent changes
- Quality assurance testing
- Tracking, auditing, roll back and review mechanisms
- Logs and metrics that prove operational health
Take advantage of AWS services
- CI/CD pipeline
- Release management processes
  - Tested
  - Based on incremental changes
  - Using tracked versions
  - With the ability to revert changes without impact
Automate routine operations and responses to unplanned event
Align monitoring to business needs so that responses support business continuity

Response

Responses to unexpected operational events should be automated (e.g CloudWatch)
Alerting should be have automatic triggers for:
- Mitigation
- Remediation
- Rollback
- Recovery
Quality assurance mechanisms should be in place to automatically roll back failed deployments.
Responses should follow a pre-defined playbook containing:
- Stakeholders
- Escalation process (automated: e.g. SNS)
  - Functional capabilities
  - Hierarchical capabilities
- Procedures

AWS Organizations

An account management service that enables to consolidate multiple AWS accounts into an organisation that can be created and centrally managed.

It allows applying policies to the organisation’s root account
It allows applying policies to organisation units (OUs) which encompass one or more accounts
- Service Control Policies (SCPs) is the mechanism by which enforces policies across multiple accounts
- It overrides account-level IAM settings
It helps automate AWS account creation and management
- A set of APIs allows creating accounts programmatically
It embeds Consolidated Billing capabilities

Cross Account Access

It allows users to “sudo” to different accounts’ without having to enter account credentials
It is useful to test functionality that is dependant on an account’s specific roles and privileges (e.g. dev privileges vs production ones)
It provides an intuitive account switching menu on the AWS GUI’s top navigation bar

Consolidated Billing

It allows linking various discrete accounts to a single paying account to obtain one single bill.
The paying account is independent. It cannot access resources of other accounts
There is a limit of 20 accounts by default.
Volume pricing discounts apply (e.g. volumes are calculated across all accounts)
CloudTrail Issues
- CloudTrail operates at the account and regional level
- The paying account will not collect data on the linked accounts by default
- The solution is to create a s3 bucket in the paying account and make it available to the linked accounts so that they dump their logs there and collected by the paying one

Resource Groups

AWS Systems Manager

Resource groups can be created out of tagged resources
Resource groups are crated on a per-region basis

VPC Peering

A connection between two VPCs so that traffic can be routed between them using IP addresses
It operates at the regional level
There should not have overlapping subnets
Transitive communication is not automatic: if A is peered with B, and B is peered with C, then A is not peered with C unless a separate peer is set up

Direct Connect

Configurations
- 10Gbps
- 1Gbps
- Below 1 Gbps can be purchased through AWS Direct Connect Partners
It uses Ethernet VLAN trunking (802.1Q)

Direct Connect vs VPN

VPNs can be configured in minutes
VPNs have modest bandwidth requirements
VPNs can tolerate inherit variability of Internet-based connectivity
AWS Direct Connect does not involve the Internet
- Dedicated private network between one’s intranet and an Amazon VPC

Security Token Service (STS)

It grants users limited and temporary access to AWS resources.

Users come from three sources:

Regular Enterprise Federation
- It typically uses Active Directory (AD)
- It uses the Security Assertion Markup Language (SAML)
- It relies on AD credentials
  - User does not need to be an IAM user
- It allows single sign-on to the AWS console without IAM credentials
Federation with Mobile Apps
- OpenID providers
- Examples:
  - Facebook
  - Google
Cross Account Access
- It lets users from one AWS account to access resources in another

Key terms

Federation
- Combining or joining a list of users in one domain (such as IAM) in another domain (AD, Facebook, etc)
Identity Broker
- A service that can take an identity from point A and join it to point B
Identity Store
- An identity service like Active Directory, Facebook, Google, etc.
Identities
- A specific user of a service (e.g. a Facebook user)

More facts

When STS grants access via the GetFederationToken function, three objects are returned:
1. an access key
2. a secret access key
3. a token
4. a duration (between 1 and 36 hours)
Identity Broker always authenticates with LDAP first and then with AWS STS
Applications get only temporary access to AWS resources

Workspaces

It is a Microsoft Windows VDI solution
It is a cloud-based replacement for a traditional desktop
It is possible to connect from any supported device (PC, Mac, Chromebook, iPad, etc)
It may integrate with an existing Active Directory domain.
Users can customise their desktop
Users are given local administrator access by default
They are persistent
All data on the D: drive is backed up every 12 hours
No AWS account is required

Elastic Container Service (ECS)

Regional service that may be run across one or more AZs
Container placement may be tuned based on:
- Resource needs
- Isolation policies
- Availability requirements
Use cases
- Batch/ETL workloads
- Microservices

Task Definitions

It is a JSON file that describes the container(s) that form an application like a Kubernetes Pod
Key parameters
- Docker image location
- CPU and Memory
- Coupling for a given task
- Networking details
- Mapping to a host container instance (if any)
- Fail/restart semantics
- Entry command
- Env variables
- Volumes
- IAM role for permissions

ECS Service

Maintains a desired number of task definition instances like a Kubernetes Deployment
It handles fail/restart semantics

ECS Clusters

May contain multiple different container instance types
Region specific
Container instances live in one given cluster at any time
IAM policies may allow/restrict access to specific clusters

Scheduler types:

Service Scheduler
- Guarantees a minimum number of running tasks
- Handles ELB registration
Custom Scheduler
- Based on custom business needs
- It integrates with third-party schedulers like Blox

Security

Security Groups attach at the instance level (i.e. the host, not the task or container)
The OS for an ECS cluster may be user-selected

Limits

Soft
- Clusters per region: 1000
- Instances per Cluster: 1000
- Services per Cluster: 500
Hard
- 1 Load Balancer per Service
- 1000 Tasks per Service
- Max 10 Containers per Task Definition
- Max 10 Tasks per Instance (host)

Amazon EC2 Container Registry (ECR)

Managed AWS Docker registry
It supports private Docker repositories
It supports resource-based permissions using AWS IAM
The Docker CLI may be used to push, pull, and manage images
Soft limit 20 instances per region (!)

Security

Security credentials when creating a new user: (!) * Private Key * Authorized Key

How to add new administrators to the AWS console:

Just create users and generate passwords for each user. No need for Access Key IDs and Secret Access Keys which are mainly for programmatic access.

Support Levels

Enterprise
Business
Developer

AWS Trusted Advisor

Security Checks (!)

Security Groups - Specific Ports Unrestricted
Securitu Groups - Unrestricted Access
IAM Use
Amazon S3 Bucket Permissions
MFA on Root Account
IAM Password Policy
Amazon RDS Security Group Access Risk
AWS CloudTrail Logging
Amazon Route 53 MX and SPF Resource Record Sets
ELB Listener Security
ELB Security Groups
CloudFront Custom SSL Certificates in the IAM Certificate Store
CloudFront SSL Certificate on the Origin Server
IAM Access Key Rotation
Exposed Access Keys
Amazon EBS Public Snapshots
Amazon RDS Public Snapshots

Elastic Map Reduce

It allows root access (!)

TODO

Error nodes in Amazon RDS responses Minimum and maximum size capacity for various RDS databases. E.g. Microsoft SQL Server Express which is 10GB You can conduct your own vulnerability scans within your own VPC without alerting AWS first? -> Answer is NO.

Reserved Instances

Reserved instances are available for multi-AZ deployments -> answer is YES
Reserved instances can be transfered from one availability zone to another

OPsWorks -> Chef / Puppet

AWS Support Levels and SLAs at -> https://aws.amazon.com/premiumsupport/compare-plans/ specially response times by case severity

AWS uses the Xen hypervisor AWS is PCI DSS 1.0 certified AWS number of regions: 14