AWS Essentials 2018
- AWS CLI
- Identity and Access Management (IAM)
- Simple Storage Service (S3)
- S3 Lifecycle Rules
- Cloud Front
- Storage Gateway
- Snowball
- S3 Transfer Acceleration
- EC2
- EBS (Elastic Block Storage)
- Load Balancing
- Cloud Watch
- CloudTrail
- AWS CLI
- Elastic File System (EFS)
- Lambda
- Route 53 & DNS
- ELBs and IP Addresses
- Databases
- Aurora
- Microsoft SQL Server
- Amazon Virtual Private Cloud (VPC)
- VPC Flow Logs
- Application Services
- Best Practices
- Well Architected Framework (WAF)
- AWS Organizations
- Cross Account Access
- Consolidated Billing
- Tags
- Resource Groups
- VPC Peering
- Direct Connect
- Security Token Service (STS)
- Workspaces
- Elastic Container Service (ECS)
- Amazon EC2 Container Registry (ECR)
- Security
- Support Levels
- AWS Trusted Advisor
- Elastic Map Reduce
- TODO
- Reserved Instances
AWS CLI
# install
pip install awscli
# authenticate
# create user on console to get Access Key Id/Secret
aws configure
Identity and Access Management (IAM)
- IAM is global; it applies to all regions
- The initial account is called root and has global access.
- Creating New Users
- New users have no permissions when first created by default
- New users are assigned:
- Access Key ID
- Secret Access Keys
- The above two should be saved immediately (e.g. CSV download) since they cannot be retrieved afterwards
- Structure
- Users
- Groups: Set of users so that policies can be applied collectively
- Roles: Resources (e.g. EC2) rather than user-level access
- Policy Documents: The actual permissions (e.g. Grant, Deny) specified in JSON. (!)
Simple Storage Service (S3)
- Buckets live in a universal namespace for all AWS accounts and users
- The primarily interface is HTTP (e.g. successful upload is a 202)
- Storage Classes apply per object, not per bucket: (!)
- Standard: 99.99% availability and 99.9999999 durability. (!)
- Standard-IA (Infrequent Access): 99.9% availability
- One Zone-IA: 99.5% availability + potential for 100% of data loss if zone is destroyed
- RRS (Reduced Redundancy Storage): 99.99% availability and durability. 0.01% expected object loss over a year. (!)
- Security (!)
- In-Transit (SSL/TLS)
- Client Side Encryption
- Server Side Encryption
- AWS Managed Keys (SSE-S3)
- Based AES-256
- KMS (SSE-KMS)
- It includes an additional envelope key
- It provides auditing on key usage
- Customer Provided Keys (SSE-C)
- AWS Managed Keys (SSE-S3)
- Access control
- ACL
- Bucket Policies
- Bucket Properties
- Key/Value pairs
- VersionID
- Metadata
- Access Control Lists
- Access is Private by default
- MFA may be used to prevent accidental deletion of files
- Versioning
- Stores all writes/deletes to an object
- It can’t be disabled; only suspended
- It integrates with life cycle rules
- It can also integrate with MFA
- Replication
- Versioning must be enabled on both source and destination buckets
- Replication, once enabled, only applies to new files, not existing ones (these must be copied manually)
- Delete markers are replicated but the deletion of the delete markers themselves is not
- Normal bucket URL
- s3-
region
-amazonaws.com/bucket
- s3-
- Static Web Site Hosting
- Selected via Properties after creating a regular bucket
- Url format is for a bucket called
garba-static
:http://garba-static.s3-website.eu-west-2.amazonaws.com
.- Syntax: (!)
bucket
-s3-website-region
.amazonaws.com`
- Syntax: (!)
- Bucket names should use registered domain names for static web hosting
index.html
anderror.html
documents may be defined
- Consistency Model (!)
- Read-after-write consistency for PUTS of new objects
- Eventually consistency for overwrite PUTS & DELETES
- Access Control List identifies accounts with an email address or the canonical user id (!) http://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html
# list buckets
aws s3 ls
# list a bucket's files
aws s3 ls bucket1
# copy one bucket to another
aws s3 cp --recursive s3://bucket1 s3://bucket2
# copy static website under ./site to s3 and make public
aws s3 cp --recursive --acl public-read _site/ s3://garba-static
# dealing with InvalidRequest errors (specify EC2 region)
aws s3 cp s3://sao_paulo_bucket/cowboy.jpg /tmp/ --region eu-west-1
S3 Lifecycle Rules
- Lifecycle management works with and without versioning
- It can be applied to both current and previous versions
- It allows to
- Transitions objects to Standard-IA (min 30 day)
- Archive to Glacier (min 30 days)
- Permanent Deletion
Cloud Front
- It is a Content Delivery Network (CDN)
- An Edge location is where content will be cached
- Edge locations are not just READ only. They can be written to.
- Origin: A CDN has an origin which provides the files that the CDN will distribute:
- A S3 Bucket (most likely)
- An EC2 Instance
- An Elastic Load Balancer
- Router53
- Distribution: A collection of Edge Locations for a given CDN configuration. There are two types:
- Web Distribution (for web sites)
- RTMP (for media streaming)
- Objects are cached for the life of the TTL (Time to Live)
- Objects can be purged/cleared explicitly too (but it has a cost)
- AWS WAF can be used to secure an edge location
Storage Gateway
- A virtual appliance (e.g VM image for ESXi of Hyper-V) installed on-site
- It replicates data to S3 and Glacier
- Connection Patterns
- Storage Getaway -> Internet -> S3 -> …
- Storage Gateway -> Direct Connect -> S3 -> …
- Storage Gateway -> Amazon VPC -> S3 …
- Types of Storage Gateway
- File Gateway
- Based on NFS
- Files stores on S3 (ownership, permissions, timestamps)
- Once transferred to S3 they become native S3 objects
- Volumes Gateway (iSCSI) based on block storage (not for s3) (!)
- Stored Volumes
- Data written locally at high-speed
- Data synchronised at a block level asynchronously to S3 and then saved as EBS snapshots
- From 1GB to 16TB
- Cached Volumes
- Primary storage is S3
- Most of storage space in the cloud and little on premise
- Frequently accessed data is cached on-premise
- Tape Gateway (VTL)
- Backup and archiving solution based on Virtual Tapes)
- Uses existing tape backup infrastructure
- Examples: NetBackup, Backup Exec, Veeam
- Stored Volumes
- File Gateway
Snowball
- It used to be called AWS Import/Export
- Petabyte-scale data transport solution
- Secure appliance (25 but encryption)
- Use of Trusted Platform Module (TPM)
- Amazon performs software erasure of the Snowball appliance
- It can
- Import to S3
- Export from S3
- It is necessary to enter an on-line generated unlock code onto the appliance
Types
- Regular Snowball
- The size of a briefcase
- Snowball Edge
- “A little AWS on-premise”: It can run Lambda functions
- 100TB with on-board storage and compute
- Use to move data in and out of AWS
- It supports standard storage interfaces
- Snowmobile
- It is 45-foot long shipping container pulled by a semi-trailer truck
- For petabytes worth of data
- Can transfer up to 100PB
The snowball software works similarly to the AWS cli tool. Software must be copied into “buckets” that will then end up in the proper cloud bucket when Amazon gets the appliance back:
./snowball cp hello.txt s3://my_bucket
S3 Transfer Acceleration
- It uses the CloudFront Edge Network to accelerate uploads to S3
- A different URL is used rather than the regular S3 bucket one
- Upload is to local edge node
- Amazon then transfers from the edge node to the actual s3 bucket
- A CloudFront-powered URL is created such as
rato-accelerate.s3-accelerate.amazonaws.com
EC2
- Amazon Elastic Compute Cloud (Amazon EC2)
- Pricing Options
- On-Demand
- By the second for Linux and by the hour for Windows
- Flexible for unpredictable workloads
- Reserved
- Discount on the hourly charge with 1-3 Year Committment
- Steady state/predictable usage
- Sub-types
- Standard RIs (Up to 75% off)
- Convertible RIs (Up to 54% off)
- May allow changing the machines’ properties provided that the value remains lower or the same
- Scheduled RIs (applicable to a time window)
- Spot
- A bidding scheme for buying discounted compute
- For applications that are only feasible at very low compute prices
- If Amazon terminates the instance, the customer will not be charged for the partial hour of usage in which the termination took place. However, if the customer terminates the instance, the charge will apply for the entire hour.
- Dedicated Hosts
- Physical dedicated EC2 Host
- Non multi-tenant
- Adequate for regulatory and/or licencing constraints
- Can be purchased on-demand too
- On-Demand
- Instance Types
- F1 - Field Programmable Gate Array (FPGA) for Genomics research, financial analytics, real-time video processing, etc.
- I3 - High Speed Storage for NoSQL DBs, Data Warehousing, etc.
- G3 - Graphics Intensive for video encoding, 3D Application streaming, etc.
- H1 - High Disk Throughput for MapReduce-based workloads, distributed file systems such as HDFS and MAPR-FS
- T2 - Lowest Cost, General Purpose for web servers, small DBs
- D2 - Dense Storage for file servers, data warehousing, Hadoop, etc.
- R4 - Memory Optmized for memory intensive apps
- M5 - General purpose for application servers
- C5 - Compute optimized for CPU intensive apps/DBs/etc
- P3 - Graphics/General Purpose GPU for machine learning, Bitcoin mining, etc.
- X1 - Memory Optimized for SAP HANA, Apache Spark, etc.
- Mnemoics (FIGHT DR MAC PX)
- FPga
- IOps
- Ggraphics
- High Disk Throughput
- Ttrashy and cheap general purpose
- Density
- Ram
- Main choice for general purpose apps
- Ccompute
- Ppics (Graphics)
- Xtreme Memory
- Termination Protection is turned off by default
- The root EBS volume is deleted by default when the EC2 instance is terminated
- EBS-backed root volumes may be encrypted as of 2018
- Virtualisation types (!)
- Para-Virtual (PV)
- Hardware Virtual Machine (HVM)
Security Groups
- A Security Group is a Virtual Firewall for EC2 instances
- A EC2 instance may have multiple security groups
- A security group may be applied to multiple instances
- All Security Groups apply immediately after they are applied
- Security Groups are Stateful: Inbound traffic is allowed back out again
- Security Groups cannot block specific IP addresses
- It is not possible to deny traffic. It is a whitelist
- All outbound traffic is allowed by default (!)
Recipe for running Apache on an existing EC2 instance
# after downloading key, remove access to group and others
chmod 400 myEC2.pem
# ssh into EC2 instance
ssh ec2-user@4.8.23.237 -i myEC2.pem
# update packages on Linux AMI instance
sudo yum update -y
# install apache
sudo yum install httpd -y
# create page
echo "Hello World" > /var/www/html/index.html
# start httpd
sudo service httpd start
# always start at reboot
sudo chckconfig httpd on
AWS CLI on EC2
# Get EC2 Instances (including terminated ones)
$ aws ec2 describe-instances
# Get instance Ids
$ aws ec2 describe-instances | grep InstanceId
# aws ec2 terminate-instances --instance-ids
aws ec2 terminate-instances --instance-ids i-0090856f1626a0928
Get Metadata (!)
curl http://169.254.169.254/latest/meta-data/
curl http://169.254.169.254/latest/user-data/
Placement Groups
- Clustered Placement Group (the default “Placement Group”)
- It is for placing EC2 instanced within the same availability zone (!)
- Not all instances can be launched into a Clustered Placement Group
- Spread Placement Group
- Each instance lands on distinct underlying hardware
- General (all)
- The name should be unique within the AWS account
- Placement groups can’t be merged
- Existing instances cannot be moved into a placement group
EBS (Elastic Block Storage)
- Storage volumes attached to EC2 instances
- First volume (where the OS runs) is known as the root device volume
- Types
- GP2 (General Purpose SSD)
- Balance between price and performance
- Ratio of 3 IOPS per GB with up to 10,000IOPS
- Ability to burst up to 3000 IOPS for extended periods of time for volumes at 334 GiB and above
- IO1 (Provisioned IOPS SSD)
- For I/O intensive applications such as large relational or NoSQL databases
- Useful if more than 10,000 IOPS is required
- Up to 20,000 IOPS may be provisioned per volume
- ST1 (Throughput Optimized HDD, Magnetic)
- Big data, data warehouses, log processing
- Cannot be a boot volume
- SC1 (Cold HDD, Magnetic)
- Lowest cost storage for infrequently accessed workloads
- File Server
- Cannot be a boot volume
- Standard (Magnetic)
- Lowest cost per GB for a bootable drive
- Ideal for infrequently accessed data
- GP2 (General Purpose SSD)
- Volumes exist on EBS
- Volumes and EC2 need to be in the same availability zone
- 1 EBS volume:1 ECS2 instance
- It is preferable to create roles for EC2 instances to access other resources (such as S3) rather than relying on the access key and secret. Such a role may be assigned after an instance has been created
- Detaching rules (!)
- If it is a root volume, it can’t be detached without stopping the instance first
- If it is a non-root volume, it may be detached
RAID and EBS
RAID stands for Redundant Array of Independent Disks
- RAID 0 (Striped)
- RAID 1 (Mirrored, Redundancy)
- RAID 5 (Good for reads, bad for writes) - Not recommended by AWS
- RAID 10 - Striped & Mirrored, Good Redundancy, Good Performance
Snapshots
- Snapshots exists on S3 (but there are no publicly accessible buckets)
- Snapshots are point in time copies of Volumes
- Snapshots are incremental (only deltas stored in S3)
- AMI images can be created out of snapshots
- Snapshots are the objects that can travel from region to region
- Snapshots of encrypted volumes are encrypted automatically
- Volumes from encrypted snapshots are encrypted automatically
- Only unencrypted snapshots may be shared (to other AWS accounts or made public)
- Amazon insists on stopping a instances before taking snapshots
- On the CLI:
aws ec2 create-snapshot
EBS vs Instance Store
- EBS Volumes are created from an EBS snapshot. EBS is essentially network attached storage
- The volume is created from an instance stored in Amazon S3
- EBS can be preserved upon instance termination unlike Instance Store
AMIs
- They are regional
Load Balancing
- 3 Types of Load Balancers
- Application Load Balancers (Layer 7)
- Network Load Balancers (Layer 4)
- Classic Load Balancers (ELB)
- 504 error: the gateway has timed out
- X-Forwarded-For Header: It is a mechanism for the load balancer to identify the original requestor’s IP address.
Cloud Watch
- Features
- Dashboards allow custom visualisation
- Alarms allow to create notifications when particular thresholds are hit
- Events allow to react to events in the state of AWS resources
- Logs help aggregate logs—it requires an agent to be installed.
- EC2 Metrics (Out of the Box) (!)
- Disk
- Network
- CPU
- Monitoring Types
- Standard = 5 minutes
- Detailed = 1 minute (extra price)
- CloudTrail is for auditing (e.g. user john created an S3 bucket) rather than monitoring and it is not the same as CloudWatch
CloudTrail
AWS CLI
$ aws configure
$ cd ~/aws
$ ls -la
Elastic File System (EFS)
- It is a storage service for EC2
- Storage capacity is elastic
- It doesn’t need to be pre-provisioned (e.g. like EBS volumes)
- Supports NFS (NFSv4)
- Pay per use
- It can be mounted by multiple EC2 instances
- Data is stored across multiple AZs within a region
- Read After Write Consistency
Lambda
General Points
- Event-driven with multiple trigger sources including HTTP
- Maximum duration is 5 minutes
Triggers
- API Gateway
- AWS IoT
- Alexa Skills
- Alexa Smart Home
- CloudFront
- CloudWatch Events
- CloudWatch Logs
- CodeCommit
- Cognito Sync Trigger
- DynamoDB
- Kinesis
- S3
- SNS
Languages
- C#
- Java
- Node
- Python
Route 53 & DNS
The name originates because the DNS port is 53 An apex record if one at the root of a DNS zone. They are also known as naked domains There is a limit of 50 domains that can be raised by contacting AWS support
Top Level Domains
* Domains such as .com, .edu, .gov
* Controlled by the Internet Assigned Numbers Authority (IANA)
* Database at http://www.iana.org/domains/root/db
Domain Registrars
* They can assign domain names under one or more top-level domains
* They are registered with InternNIC, a service of ICANN
* Each domain name is registered in the WhoIS database
Start Of Authority Record (SOA)
* The server that supplied the data for the zone
* The zone's administrator
* The current version of the data file
* The default number of seconds for the time-to-live (TTL) file on resource records
Name Server Records
Name Server Records (NS) are used by Top Level Domain servers to point to the authoritative DNS that holds the DNS records.
Example of a NS record pointing to Amazon set up at a Registrar (e.g. GoDaddy)
mydomain.com. 86400 IN NS ns.awsdns.com
Common Record Types
- The Address (A) record translates a domain name to an IP address. For example www -> 192.34.34.1
- The Canonical Name (CName) resolves one another into another. For example www2 -> www
- The Start of Authority (SOA) record defines the boundary for which the DNS is responsible
- The Name Server (NS) record defines the server(s) that solves names for a top-level domain
- The Mail Server (MX) record defines the location of the mail server for a given domain
- The Alias record type is unique in Route 53 and map names to AWS resources such as S3 buckets
- The PTR record is used for reversed DNS look-ups
ELBs and IP Addresses
- ELBS do not have pre-defined IPv4 addresses. They are resolved using DNS names
Routing Policies
- Simple
- One single record with multiple IP addresses
- If more than one, all values are returned to the user in random order
- The returned value will be cached so it may behave in a sticky manner
- Weighted
- A percentage of traffic goes to one region, a percentage to another
- For example: 20% eu-east1, 80% sa-east-1
- Set ID must be unique
- Latency
- It allows to route traffic based on the lowest network latency applicable to the end user
- It uses a latency resource record set in each region associated with the EC2 or ELB resource
- Set ID must be unique
- Failover
- Useful to create an active/passive setup
- A health check can transition from one set of IPs to another
- Geolocation
- It allows to chose hosts or IPs based on the location of the requesting users
- It has continent-wise and country-wise granularity
- It has a catch-all location called default identified by
*
- Multivalue Answer
- Multiple resources per host
- Each host can have a health check
- Up to 8 records
Databases
- Online Transaction Processing (OLTP) is about row-level transactions. Set Order #34234 Delivery Status to Shipped.
- Online Analytics Processing (OLAP) is about multi-row calculation (e.g. Sum of sold goods).
Elasticache
Amazon managed-service for in-memory caching:
- Memcached
- Redis
Amazon RDS
An OLTP offering:
- SQL Server
- MySQL
- MariaDB
- PostgreSQL
- Aurora (MySQL or PostgreSQL wire compatible)
- Oracle
mysql -u ernie -p -h mydb.cugrv9uf52uw.eu-west-2.rds.amazonaws.com -D my_database
- Replicating from the primary RDS instance to the secondary one is free
- There is no need to specify ports when adding a rule to a RDS security group
- I/O operations are suspended for the duration of the snapshot https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_CreateSnapshot.html
For provisioned IOPS SSD Storage the following ranges apply:
Databases | IOPS | Storage |
---|---|---|
MariaDB, MySQ, PostreSQL | 1k-40k | 100GiB-16TiB |
SQL Server Web/Express | 1k-32k | 100GiB-16TiB |
SQL Server Standard/EE | 1k-32k | 20GiB-16TiB |
Oracle | 1k-40k | 100GiB-32TiB |
IOPS is fishy. More info at https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Storage.html#USER_PIOPS
Automated Backups
- They can recover data to any point in time within a retention period.
- The retention period is between 1 and 35 days. (!)
- Enabled by default (!)
- Backup data is stored in S3
- Free storage is equal to the size of the database
- Backups take place during a defined window which may be changed to minimise performance impact
- The change of a backup window takes place immediately (!)
- The restoring of backups lead to new DNS endpoint since a new instance is created
- Snapshots, unlike automated buckups are done manually (user-initiated)
Encryption
- Encryption at rest is supported for nearly all non-free tier RDS databases.
- Live DB instances cannot be encrypted
- Snapshots may be encrypted though and then new encrypted instances created out of them
- Encryption is done using the AWS Key Management (KMS) Service
Multi-AZ Replication
- Multiple-AZ allows changes to be replicated from one-read/write replica in one availability zone to other read-only replicas for disaster recovery purposes.
- Multi-AZ is for DR only and not performance.
- The Multi-AZ built-in capability does not provide direct access to the replicated replicas.
- A failover can be forced for RDS instances that have Multi-AZ configured
Red Replica
- Up to 5 read replicas can be set up in production by default
- Read replicas may be both in different availability zones and regions
- It is based on asynchronous replication
- It is not available for Oracle and neither SQL Server
- It is for performance only, not DR!
- Both Multi-AZ and multiple replicas capabilities can be applied concurrently. They are not mutually exclusive.
DynamoDB
- A NoSQL offering
- Uses SSD storage
- Spread across 3 geographically distinct data centres
- Provisioned Through Capacity
- Write $0.0065 per hour for every 10 units
- Read $0.0065 per hour for every 50 units
- Storage costs of $0.25/GB per month
- It does not allow to select availability zone (!)
Consistency Model
- Eventual Consistent Reads (Default)
- Changes are propagated within 1 second
- Strongly Consistent Reads
- The result reflects all writes
RedShit
An OLAP offering.
- Single Node (160Gb)
- Multi-Node
- Leader Node (manages client connections)
- Compute Node (store data and perform queries and computations). Up to 128 Compute Nodes
- Column-based system (sequential data storage)
- Massively Parallel Processing (MPP) capability
- Pricing
- Compute Node Hours
- Leader Node is not Charged
- Encrypted in transit using SSL
- Encrypted at rest using AES-256 encryption
- Currently available only in 1 AZ
- Can restore snapshots to new AZs in the event of an outage
- The block size for its columnar storage is 1024KB / 1MB
Aurora
- 2 copies of data contained in each availability zone.
- Since there are 3 availability zones, this means that there are 6 copies in total (!)
- 2 copies of data may be loss without affecting write availability
- 3 copies may be lost without affecting read availability
- Aurora Replica Types (2)
- Aurora Replicas (15)
- MySQL Read Replicas (currently 5)
Microsoft SQL Server
- Storage is fixed and cannot be increased (!)
Amazon Virtual Private Cloud (VPC)
Amazon VPC is a capability that allows to provision a logically isolated section and network so that resources can be secured and grouped into trust areas.
- 5 VPCs are allowed in each region by default
- Hardware Virtual Private Network (VPNs) may be created between a corporate datacentre and a VPC so that AWS becomes an extension to the corporate data centre.
1 Subnet = 1 Availability Zone
- VPC Components
- A connection method:
- Internet Gateway
- Virtual Private Gateway
- A router
- Route Table
- Network ACL
- Private and Public subnet(s)
- Security Group
- Resources secured using the Security Group
- A connection method:
- General Capabilities
- Launching instances into a specific subnet
- Assign custom IP addresses to ranges in each subnet
- Configure route tables between subnets
- Attach an Internet gateway to a VPC
- Establish network access control lists (ACLS) across subnets
- Peering
- VPCs can be interconnected using a direct network route
- VPCs can be peered with other AWS accounts as well as with other VPCs within the same account
- Default VPC
- All subnets in default VPC have a route out to the internet
- Each EC2 instance has both a public and private IP address
- No transitive peering
- An Internet Gateway can only be attached to one VPC at a time.
- Security Groups exists at the VPC Level
- Subnets are associated with only one Network ACL
Subnets within a VPC can communicate with each other by default across availability zones (!)
ELBs and VPCs
- ELBs can only operate on public subnets
- Public subnets must have an Internet Gateway attached to them
- At least two subnets must be specified
Subnet Ranges
CIDR Prefix | First IP | Last IP | Total |
---|---|---|---|
(10/8) | 10.0.0.1 | 10.255.255.255 | 16,777,216 |
(182.16/12) | 172.16.0.1 | 172.31.255.255 | 1,048,576 |
(192.168/16) | 192.168.0.1 | 192.168.255.255 | 65,536 |
http://cidr.xyz/
Creating a new VPC
Creating a new VPC results in the automatic creation of:
- Default Route
- Default Network ACL
- Default Security Group
Unavailable IPs
For example, in a subnet with CIDR block 10.0.0.0/24, the following five IP addresses are reserved:
- 10.0.0.0: Network address.
- 10.0.0.1: Reserved by AWS for the VPC router.
- 10.0.0.2: Reserved by AWS. The IP address of the DNS server is always the base of the VPC network range plus two; however, we also reserve the base of each subnet range plus two. For VPCs with multiple CIDR blocks, the IP address of the DNS server is located in the primary CIDR. For more information, see Amazon DNS Server.
- 10.0.0.3: Reserved by AWS for future use.
- 10.0.0.255: Network broadcast address. We do not support broadcast in a VPC, therefore we reserve this address.
NAT Instances
NAT Instances are AMI virtual machines that work as a NAT router.
- On EC2/Network Settings, the option to check Source/Destination should be disabled in order for the NAT instance to work
- NAT instances must be deployed in a public subnet
- The instance size affects performance
- Autoscaling Groups are necessary for high availability in multiple subnets
- They are restricted by a security group
- Route tables must be updated
NAT Gateway
A NAT gateway is a cloud native managed service rather than a user-managed EC2 instance
- It scales automatically up to 10GBps
- No need to patch
- Not associated with security groups
- It gets an IP address automatically
- Route tables must be updated
- They must be deployed in multiple AZs for high availability
- NO need to disable source/destination
Network ACL (NACL)
- Since NACL is stateless, both inbound and outbound rules must be created for regular TCP services like HTTP
- Default NACLs allow all outbound and inbound traffic
- New private NACLs have all inbound and outbound rules denied by default
- Amazon recommends rule numbers to be in increments of 100
- There is a *:1 relationship between subnets and NACLs
- If a NACL isn’t specified, a subnet will be associated with the default NACL
- NACLs are evaluated in order
- Because of protocols using ephemeral ports such as FTP, a rule allowing traffic for ports 1024-65535 is typically defined as an outbound rule.
- NACLs allow blocking IP addresses, unlike Security Groups
VPC Flow Logs
It allows capturing information about IP traffic going to and from network interfaces in a VPC using Amazon CloudWatch.
They can be created at three levels:
- VPC
- Subnet
- Network Interface Level
General
- Flow logs can only be enabled for VPCs under one’s account. This is important for “peered” VPCs
- Flow logs can’t be tagged
- Once created the configuration can’t be changed. They must be deleted and created again.
- Not all IP traffic is monitored:
- Traffic to the Amazon DNS Server (rather than a user-provided one)
- Traffic generated by Window instances for license activation
- Traffic for metadata access to 169.254.169.254
- DHCP traffic
- Traffic to the reserved UP address for the default VPC router
Endpoints
Two types
- Elastic Network Interface (ENI) serves as an entry point for traffic destined to a service
- A gateway endpoint serves as a target for a route in one’s route table for traffic destined for the service
Internet Gateway
Online 1 Internet Gateway can be attached to a VPC
Application Services
(Simple Queue Service) SQS
- Oldest Amazon Service
- Decouples producers from consumers
- It is pull rather than push service
- Messages can be of up to 256Kb in size
- Messages may be kept in the queue from 1 minute to 14 days
- Default retention period is 4 days
- Visibility Timeout: the amount of time the message becomes invisible whilst being picked up by a reader client
- Default timeout: 30 seconds (may be increased)
- Maximum timeout: 12 hours
- There is a long polling mechanism which allows to wait until a message arrives to the queue.
Simple Workflow Service (SWF)
- It is a solution for workflows that involve human or human-like interaction
- It ensures that tasks are only assigned once
- It keeps track of the application state without requiring a user-provided application for this purpose
- Maximum workflow can be 1 year
- Concepts
- Domains: Domains scope related activity types, tasks lists, and so on.
- Workers: Programs that interact with SWF to get tasks, process received tasks, and return results.
- Deciders: The decider is a program that controls the coordination of tasks (their ordering, concurrency, scheduling, etc)
- Differences with SQS
- It is task orientated rather than message-orientated
- It ensures that tasks are assigned only once and never duplicated
- It ensures that tasks are processed only once
- It keeps track of all tasks and events in an application
Queue Types
- Standard
- Nearly unlimited number of TPS
- At least once delivery
- No ordering guarantees (best effort)
- No once and only once guarantee
- FIFO Queues
- Limited to 300 TPS
- Once and only once guarantee
- No duplicates
- Strict order
Amazon SNS
- Unlike SQS is a push system
- A capability to send notifications to users (sms, e-mail, etc.)
- Devices/Subscriber types:
- Apple
- Fire OS
- Windows devices
- Baidu Cloud Push
- SMS-Text
- SQS
- HTTP endpoints
- Lambda functions
- They are stored redundantly across multiple AZs stored redundantly across mlultiple AZs
- Concepts:
- Topics
- Subscriptions
- Subscribers to Topics
- SNS Pricing
- $0.50 per 1 million SNS requests
- $0.06 per 100,000 Notification deliveries over HTP
- $0.75 per 100 Notification deliveries over SMS $ $2.00 per 100,000 Notifications over Email
Elastic Transcoder
- Media files (MP3, MP4, etc.)
- Presets for popular formats
API Gateway
- API Caching with TTL
- Security (Auth, etc)
- Throttling
- Cloudwatch hooks for request logging
- Cross-Origin Resource Sharing (CORS)
Kinesis
- Data that is generated continuously
- Numerous but small data chunks/events
- Use Cases/Examples:
- Stock prices
- Game data (as player moves)
- Social network data
- Geospatial data
- iOT sensor data
Kinesis Streams
- It stores the data from producers (EC2, phones, iOT devices)
- Data is stored in shards
- Data retention is between 24 hrs and 7 days
- Properties:
- Reads: 5 TPS at 2MB/s
- Writes: 1000 TPS at 1MB/s
- Streams data back to consumers
Kinesis Firehose
- It abstracts away from shards
- It can hook lambda so that it is processed directly
- Results can be sent directly to S3
- There is no data retention window
- Data is processed by Lambda
- Data is sent to S3
- Data is sent to a Elasticsearch cluster
- Data is sent to Redshift
Kinesis Analytics
- It abstract always from shards
- It allows running SQL queries on incoming data
- There is no data retention window
- Data is sent by S3
- Data is sent by Redshift
- Data is sent to Elasticsearch cluster
Simple E-Mail Service (SES)
Best Practices
Cloud Benefits
- Automation/IaaS
- Auto-scaling
- Proactive Scaling
- More efficient SDLC
- Improved Testability
- DR and BC
- “Overflow” the traffic to the cloud
Design for Failure
- Hardware will fail
- Design with automated recovery from failure in mind
- Assume higher TPS than expected
Decoupling
- Components may
- die (fail)
- sleep (not respond)
- remain busy (slow to response)
- Consumers need to be tolerant of the above as if no failure whatsoever were to occur
Elasticity
- Proactive cyclic scaling: (daily, weekly, etc)
- Proactive business event-based scaling (e.g. Christmas, product launch, etc.)
- Auto-scaling based on demand: based on metrics and triggers
Security
- Web server only public access to port 80/443
- SSH only open to developers in corporate office network
- Only App layer can have direct access to DB Server
Well Architected Framework (WAF)
File pillars:
- Security
- Reliability
- Performance Efficiency
- Cost Optimisation
- Operational Excellence
General Design Principles
- Stop guessing your capacity needs
- Test systems at production scale
- Automate to make architectural experimentation easier
- Allow for evolutionary architectures
- Data-Driven architectures so that decisions are fact-based
- Improve through game days (simulation of production-like scenarios such as Black Friday)
Security
Design Principles
- Apply security at all layers
- Enable traceability
- Automate responses to security events
- Focus on securing your system
- Automate security best practices
AWS Shared Responsibility Model
- Customer
- Customer Data
- Platform, Applications, IAM
- Operating System, Network & Firewall Configuration
- Client-side data encryption & authentication
- Server-side encryption (File system and/or Data)
- Network traffic protection (encryption, integrity, identity)
- AWS
- Compute
- Storage
- Database
- Networking
- AWS Global Infrastructure
- Regions
- Availability Zones
- Edge Locations
Security Best Practices
The key areas data protection, privilege management, infrastructure protection, and detective controls.
Data protection
- Start with a data classification process (public, private, confidential, CEO only, etc.)
- Implement a need-to-know only access policy
- Encrypt everything whenever possible, both data at rest (e.g EBS, S3, RDS) and in transit (e.g. ELB, SSL)
- AWS can encrypt data and rotate kids automatically
- Use versioning to protect data against accidental modification, overwrites and deletes.
- Consider explicit transfer of data to different regions—which is never automatic to avoid the inadvertent breaking of leglislation such as DPA 2018 in the UK.
Privilege management
- Ensure that only authorised and authenticated users (IAM) are able to access your resources by using:
- Access Control Lists (ACLs)
- Role Based Access Controls
- Password Management
- Take extreme care to protect the AWS root account credentials
- Enable MFA
- Define R&R for system users to control human access to the AWS Management console and APIs
- Limit automated access to AWS resources (e.g. EC2 instance to S3 bucket)
- Devise a strategy to manage keys and credentials
Infrastructure protection
- Implement physical access controls on-prem:
- RFID access
- Lockable cabinets
- CCTV
- Enforce network boundary protection
- Public subnets
- Private subnets
- Enforce host-level protection
- User-based access to resources
- Access to hosts through a bastion host
- Implement AWS service-level protection
- Groups and privileges
- MFA
- Protect the integrity of OSs installed on EC2 instances
- Patching
- Anti-virus
Detective controls
- Use detective controls to detect or identify security breaches
- Use services that help carry out investigations and auditing:
- AWS CloudTrail
- Capture and analyse logs for applicable services
- Make sure it is enabled in each relevant region
- Amazon CloudWatch
- AWS Config
- Amazon Simple Storage Service (S3)
- Amazon Glacier
- AWS CloudTrail
Reliability
Design Principles
- Test recovery procedures
- Automatically recover from failure
- Scaling horizontally to increase aggregate system availability
- Stop guessing capacity
Best Practices
Key areas are foundations, change management, and failure management.
Foundations
- Be mindful of the communications link between your HQ and datacentre
- Plan your network topology in advance (VPC, Subnets, etc.)
- Be mindful of the service limits set by Amazon to stop customers from over-provisioning resources (Google AWS Service Limits), for example:
- 5 VPCs per region
- 5 Internet gateways per region
- Appoint someone responsible to manage AWS service limits
- Define a path to escalate technical issues
Change Management
- Have a plan to monitor changes to all relevant AWS resources (e.g. using AWS CloudTrail)
- Instrument the detection of changes in the environment to react to them
- Choose automated solutions such as autoscaling whenever possible to adapt to changes on demand
Failure Management
- Architect your systems with the assumptions that failure will occur
- Learn from failures when they do occur and plan how to prevent them in the future
- Have a backup and recovery strategy
- Have a failure coping strategy for each component (e.g. using AWS CloudFormation)
Performance Efficiency
Design Principles
- Democratise advanced technologies (e.g. let teams consume as-a-service databases)
- Go global in minutes (e.g. multi-region services using CloudFormation)
- Use serverless architectures
- Experiment more often (since it is easy using on-demand, pay-per use services)
Best Practices
The four key areas are compute, storage, database, and space-time trade-off.
Compute
- Establish a performance monitoring mechanism
- Change the machine type (EC2) when it no longer suits one’s needs (e.g. CPU and RAM consumption)
- Change the number of instances when vertical scaling (e.g. bigger EC2 instance) is inappropriate (use Autoscaling)
- Be aware of emerging new type of machine types (e.g GPU optimized)
- Consider moving code to Lambda whenever possible
Storage
- Keep in mind that the optimal storage solution depends on a number of factors.
- Access type
- block (e.g. raw file system) (EBS)
- individual files (S3)
- Access pattern
- Random
- Sequential
- Throughput (i.e. IOPS)
- Read Frequency
- Online
- Offline (e.g. Glacier)
- Ad-hoc
- Update/Write Frequency
- Worm
- Dynamic
- Constraints/Trade offs
- Availability
- Durability
- Access type
- Set a system in place to scope the storage requirement and select the most appropriate solution
- Set a system in place to learn about new storage solutions and switch to them whenever price and/or capability appropriate
- For databases
- Select the most appropriate database solution for the use case at hand (e.g. SQL vs No-SQL)
- Monitor database performance
- Capacity
- Throughput
Space-Time Trade-off
- Consider
- CloudFront for content and media caching
- ElastiCache for software object/data caching
- RDS Read Replicas for database read performance
- Direct Connect for lower and stable latency
- Devise a system to select the most appropriate proximity and caching solutions for the problem at hand
- Devise a system to measure performance and tell whether the current solution is still effective or whether a new one should be considered
Cost Optimisation
Design Principles
- Transparently attribute expenditure
- Use managed services to reduce TCO
- Trade capital expense for operating expense
- Benefit from economies of scale
Best Practices
The four key areas are: matched supply and demand, cost-effective resources, expenditure awareness, and optimizing over time.
Matched Supply and Demand
- Don’t over or under provision: align supply with demand
- Take advantage of Autoscaling and pay-per-use services like Lambda
- Use CloudWatch to keep track of actual demand
Cost-effective Resources
- Use the correct EC2 instance types
- A more powerful instance that completes its task in a few minutes may be more effective than a less powerful one that takes longer but is cheaper on a per hour basis
- Select the most appropriate cost model (e.g. reserved vs spot instances)
- Consider managed services to reduce maintenance/administration costs
- Use services such as AWS Trusted Advisor
Expenditure Awareness
- Set up access controls and procedures to control costs
- Use cost allocation tags to keep track of expenditure by different teams
- Set up billing alerts
- Consider consolidated billing if applicable
- Have a mechanism to decommission redundant resources
- Have a mechanism to suspend or stop resources that are temporarily not needed
- Consider data transfer charges into your architectural model
Optimizing Over Time
- Implement a mechanism to be aware of new, most cost effective and/or capable AWS services
- For example: Aurora, launched in 2014, is, in most cases, cheaper and faster than the traditional MySQL and PostgreSQL alternatives
- Consider subscribing to the AWS Blog
- Consider relying on services such as AWS Trusted Advisor
Operational Excellence
Design Principles
- Perform operations with code
- Align operations processes to business objectives
- Make regular, small, incremental changes
- Test for responses to unexpected events
- Learn from operational events and failures
- Keep operations procedures current
Best Practices
The key areas are preparation, operation, and response.
Preparation
- Use operations checklists to:
- ensure that workloads are ready for production
- prevent unintentional production promotion without effective preparations
- Make sure that workloads have:
- Runbooks: operations guidance that Ops team can refer to
- Playbooks: guidance for responding to unexpected operational events
- Escalation paths
- Stakeholder notifications
- Use AWS services for preparation
- CloudFormation for setting up environments
- Autoscaling to automatically respond to business events
- AWSConfig to automatically track and respond to changes in AWS workloads and environments
- Tagging to group related resources in a workload
- AWS Service Catalogue to create a standardized set of service offerings that are aligned to best practices.
- AWS SQS to decouple systems and minimise the effects of failure
Operation
- Make sure documentation is up-to-date
- Make sure operational focus is on:
- Automation
- Small frequent changes
- Quality assurance testing
- Tracking, auditing, roll back and review mechanisms
- Logs and metrics that prove operational health
- Take advantage of AWS services
- CI/CD pipeline
- Release management processes
- Tested
- Based on incremental changes
- Using tracked versions
- With the ability to revert changes without impact
- Automate routine operations and responses to unplanned event
- Align monitoring to business needs so that responses support business continuity
Response
- Responses to unexpected operational events should be automated (e.g CloudWatch)
- Alerting should be have automatic triggers for:
- Mitigation
- Remediation
- Rollback
- Recovery
- Quality assurance mechanisms should be in place to automatically roll back failed deployments.
- Responses should follow a pre-defined playbook containing:
- Stakeholders
- Escalation process (automated: e.g. SNS)
- Functional capabilities
- Hierarchical capabilities
- Procedures
AWS Organizations
An account management service that enables to consolidate multiple AWS accounts into an organisation that can be created and centrally managed.
- It allows applying policies to the organisation’s root account
- It allows applying policies to organisation units (OUs) which encompass one or more accounts
- Service Control Policies (SCPs) is the mechanism by which enforces policies across multiple accounts
- It overrides account-level IAM settings
- It helps automate AWS account creation and management
- A set of APIs allows creating accounts programmatically
- It embeds Consolidated Billing capabilities
Cross Account Access
- It allows users to “sudo” to different accounts’ without having to enter account credentials
- It is useful to test functionality that is dependant on an account’s specific roles and privileges (e.g. dev privileges vs production ones)
- It provides an intuitive account switching menu on the AWS GUI’s top navigation bar
Consolidated Billing
- It allows linking various discrete accounts to a single paying account to obtain one single bill.
- The paying account is independent. It cannot access resources of other accounts
- There is a limit of 20 accounts by default.
- Volume pricing discounts apply (e.g. volumes are calculated across all accounts)
- CloudTrail Issues
- CloudTrail operates at the account and regional level
- The paying account will not collect data on the linked accounts by default
- The solution is to create a s3 bucket in the paying account and make it available to the linked accounts so that they dump their logs there and collected by the paying one
Tags
- They are key value pairs attached to AWS resources
- They are used for metadata purposes
- They are often inherited from controlling servicesj
- Autoscaling
- CloudFormation
- Elastic Beanstalk
- etc
Resource Groups
AWS Systems Manager
- Resource groups can be created out of tagged resources
- Resource groups are crated on a per-region basis
VPC Peering
- A connection between two VPCs so that traffic can be routed between them using IP addresses
- It operates at the regional level
- There should not have overlapping subnets
- Transitive communication is not automatic: if A is peered with B, and B is peered with C, then A is not peered with C unless a separate peer is set up
Direct Connect
- Configurations
- 10Gbps
- 1Gbps
- Below 1 Gbps can be purchased through AWS Direct Connect Partners
- It uses Ethernet VLAN trunking (802.1Q)
Direct Connect vs VPN
- VPNs can be configured in minutes
- VPNs have modest bandwidth requirements
- VPNs can tolerate inherit variability of Internet-based connectivity
- AWS Direct Connect does not involve the Internet
- Dedicated private network between one’s intranet and an Amazon VPC
Security Token Service (STS)
It grants users limited and temporary access to AWS resources.
Users come from three sources:
- Regular Enterprise Federation
- It typically uses Active Directory (AD)
- It uses the Security Assertion Markup Language (SAML)
- It relies on AD credentials
- User does not need to be an IAM user
- It allows single sign-on to the AWS console without IAM credentials
- Federation with Mobile Apps
- OpenID providers
- Examples:
- Cross Account Access
- It lets users from one AWS account to access resources in another
Key terms
- Federation
- Combining or joining a list of users in one domain (such as IAM) in another domain (AD, Facebook, etc)
- Identity Broker
- A service that can take an identity from point A and join it to point B
- Identity Store
- An identity service like Active Directory, Facebook, Google, etc.
- Identities
- A specific user of a service (e.g. a Facebook user)
More facts
- When STS grants access via the GetFederationToken function, three objects are returned:
- an access key
- a secret access key
- a token
- a duration (between 1 and 36 hours)
- Identity Broker always authenticates with LDAP first and then with AWS STS
- Applications get only temporary access to AWS resources
Workspaces
- It is a Microsoft Windows VDI solution
- It is a cloud-based replacement for a traditional desktop
- It is possible to connect from any supported device (PC, Mac, Chromebook, iPad, etc)
- It may integrate with an existing Active Directory domain.
- Users can customise their desktop
- Users are given local administrator access by default
- They are persistent
- All data on the D: drive is backed up every 12 hours
- No AWS account is required
Elastic Container Service (ECS)
- Regional service that may be run across one or more AZs
- Container placement may be tuned based on:
- Resource needs
- Isolation policies
- Availability requirements
- Use cases
- Batch/ETL workloads
- Microservices
Task Definitions
- It is a JSON file that describes the container(s) that form an application like a Kubernetes Pod
- Key parameters
- Docker image location
- CPU and Memory
- Coupling for a given task
- Networking details
- Mapping to a host container instance (if any)
- Fail/restart semantics
- Entry command
- Env variables
- Volumes
- IAM role for permissions
ECS Service
- Maintains a desired number of task definition instances like a Kubernetes Deployment
- It handles fail/restart semantics
ECS Clusters
- May contain multiple different container instance types
- Region specific
- Container instances live in one given cluster at any time
- IAM policies may allow/restrict access to specific clusters
Scheduler types:
- Service Scheduler
- Guarantees a minimum number of running tasks
- Handles ELB registration
- Custom Scheduler
- Based on custom business needs
- It integrates with third-party schedulers like Blox
Security
- Security Groups attach at the instance level (i.e. the host, not the task or container)
- The OS for an ECS cluster may be user-selected
Limits
- Soft
- Clusters per region: 1000
- Instances per Cluster: 1000
- Services per Cluster: 500
- Hard
- 1 Load Balancer per Service
- 1000 Tasks per Service
- Max 10 Containers per Task Definition
- Max 10 Tasks per Instance (host)
Amazon EC2 Container Registry (ECR)
- Managed AWS Docker registry
- It supports private Docker repositories
- It supports resource-based permissions using AWS IAM
- The Docker CLI may be used to push, pull, and manage images
- Soft limit 20 instances per region (!)
Security
Security credentials when creating a new user: (!) * Private Key * Authorized Key
How to add new administrators to the AWS console:
- Just create users and generate passwords for each user. No need for Access Key IDs and Secret Access Keys which are mainly for programmatic access.
Support Levels
- Enterprise
- Business
- Developer
AWS Trusted Advisor
Security Checks (!)
- Security Groups - Specific Ports Unrestricted
- Securitu Groups - Unrestricted Access
- IAM Use
- Amazon S3 Bucket Permissions
- MFA on Root Account
- IAM Password Policy
- Amazon RDS Security Group Access Risk
- AWS CloudTrail Logging
- Amazon Route 53 MX and SPF Resource Record Sets
- ELB Listener Security
- ELB Security Groups
- CloudFront Custom SSL Certificates in the IAM Certificate Store
- CloudFront SSL Certificate on the Origin Server
- IAM Access Key Rotation
- Exposed Access Keys
- Amazon EBS Public Snapshots
- Amazon RDS Public Snapshots
Elastic Map Reduce
It allows root access (!)
TODO
Error nodes in Amazon RDS responses Minimum and maximum size capacity for various RDS databases. E.g. Microsoft SQL Server Express which is 10GB You can conduct your own vulnerability scans within your own VPC without alerting AWS first? -> Answer is NO.
Reserved Instances
- Reserved instances are available for multi-AZ deployments -> answer is YES
- Reserved instances can be transfered from one availability zone to another
OPsWorks -> Chef / Puppet
AWS Support Levels and SLAs at -> https://aws.amazon.com/premiumsupport/compare-plans/ specially response times by case severity
AWS uses the Xen hypervisor AWS is PCI DSS 1.0 certified AWS number of regions: 14