AWS Certified Solutions Architect Exam Notes - Databases

Databases

RDS

Runs on virtual machines. We have no access to those.
RDS is not serverless (except Aurora Serverless). They have a server but are maintained by Amazon.

Why not run SQL on an EC2 instance?

To avoid administration. (There may be some use cases that you need to handle your own DB, and EC2 may be the right choice)

Is there hybrid or on-premises deployment options for Amazon RDS?

Amazon RDS on Outpost
Amazon RDS on VMware

Backups

Automated Backups
- Recover your database to any point of time within a retention period (1-35 days).
- Enabled by default
- Take a full daily snapshot.
- Stored in S3. Free storage space equals the database size
Database Snapshots
- Manually.
- They are persisted even after the deletion of the RDS instance. (unlike automated backups)

Multi-AZ

Used for Disaster Recovery, not for performance.
Allows you to have an exact copy of the production database to another AZ.
AWS handles the replication
Automated sync on writes
In case of failure on the main DB or in the case of main DB maintenance, AWS will automatically switch to the standby database.
Available on :
- SQL Server
- Oracle
- MySQL
- Postgres
- MariaDB
Aurora is completely fault-tolerant by default

Read Replicas

Have Read replicas for read-only actions
Improved performance.
Used for read-heavy workloads
Must have automatic backups on to deploy read replica.
Up to 5 read replicas per database
Can have read replicas of read replicas, but watch for latency.
Available on :
- Aurora
- Oracle
- MySQL
- Postgres
- MariaDB

DynamoDB

NoSQL. Spread across 3 geographically distinct data centers
2 types:
- Eventually consistent reads by default (read after write > 1 sec) –
  - Best performance
- Strong consistent reads (read after write < 1 sec, immediately after)
  - Use it when the app needs to read new data in < 1 sec from write
  - Increases cost
  - Improves consistency
Serverless
Handles Frequent Schema changes

DynamoDB Performance

Stored in SSD. Partitioned on many nodes. NO read replicas

DynamoDB Accelerator (DAX)

In-memory cache
Improve performance from milliseconds to microseconds.

DynamoDB Streams (Connect with Lambda)

Associate the stream with a Lambda function.
Immediately after an item in the table is modified, a new record appears in the table’s stream. AWS Lambda polls the stream and invokes your Lambda function synchronously when it detects new stream records.

DynamoDB Pricing

DynamoDB charges for reading, writing, and storing data in your DynamoDB tables, along with any optional features you choose to enable.
On-Demand: pay only per request but pay More
- Better for startup and dev, create new tables with unknown workloads, unpredictable application traffic
Provisioned: specify the number of reads and writes per second
- Have predictable application traffic.
- Run applications whose traffic is consistent or ramps gradually.
- Can forecast capacity requirements to control costs.

Redshift

Data warehouse. OLAP. Used for business intelligence
Available in one AZ
Can be configured as
- Single node
- Multi-node
  - Leader Node: Client connections, receives queries
  - Compute Node: store data, perform queries, computations

Redshift backups

Enabled by default 1 day retention period (up to 35 days)
Redshift maintains at least 3 backups (original, replica on compute nodes, and an S3 backup)
Can asynchronously copy snapshots on an S3 on another region for Disaster Recovery

Redshift Logs

Enable Enhanced VPC routing. Redshift forces all COPY and UNLOAD traffic between your cluster and your data repositories through your Amazon VPC.
Use standard VPC features (security groups, ACLs, VPC endpoints, VPC endpoint policies, internet gateways, and DNS servers)

Redshift spectrum

Enables you to query and analyze all of your data in Amazon S3 using the open data formats you already use, with no data loading or transformations needed.

Redshift spectrum vs Athena

An analyst that already works with Redshift will benefit most from Redshift Spectrum because it can quickly access data in the cluster and extend out to infrequently accessed, external tables in S3. It’s also better suited for fast, complex queries on multiple data sets.
Athena is a simpler way to run interactive, ad hoc queries on data stored in S3. It doesn’t require any cluster management, and an analyst only needs to define a table to make a standard SQL query.

Aurora

Better performance of MySQL and Postgres. Much lower price.
Auto scales. Start on 10GB, scales to 64TB
2 data copies on 3 AZ = 6 copies minimum
Handle up to 2 loss of copies without affection write, up to 3 loss of copies without affection read
Replication on milliseconds instead of seconds (MySQL). Replication does not affect performance.
Replica however must be in the same region.
On failover, there is no data loss. Automated Failover recovery.
Backups are always enabled
Can take snapshots and share them with other accounts

Aurora Serverless

Cheap option, which auto-scales. Having a website or app where you do not know if people will access it. Good for unpredictable workloads.
Pay per invocations, not by the hour.

ElastiCache

Improve Performance for popular queries
When to use it? -> When the database is overloaded.
2 types Memcached – Redis
- Memcached
  - Really simple cache
  - Multi-Threaded performance
- Redis
  - Advanced data types
  - Ranking/Sorting
  - Backup
  - MultiAZ
  - Published/Subscribing

AWS Caching Strategies

CloudFront. Cache in Edge Locations
API Gateway
ElastiCache
DynamoDB Accelerator DAX

EMR – Elastic Map Reduce

Solution for Big Data. Hadoop.
Key Component: Cluster -> Collection of Nodes (EC2 instances)
Node types:
- Master Node – Manage Cluster, Tracks status, Monitor
- Core Node – Run tasks and stores in Hadoop
- Task Node (Optional) – Only run tasks, does not store
Architecture is Star, everything is talking with everything
By default, logs are stored in the master node.

DMS – Database Migration Service

Migrate databases between premise-AWS, premise-premise, and all the possible combinations.
AWS DMS is a server. You specify source and target
You can create the tables by yourself and let DMS do the migration.
You can also use Schema Conversion Tool SCT to create the DB. If the database is different you need it. If they are the same you DONT need it.
Supports also heterogeneous migrations, different SQL.

George Antonopoulos

AWS Certified Solutions Architect Exam Notes – Databases

Databases

RDS

DynamoDB

Redshift

Aurora

ElastiCache