Databases
RDS
- Runs on virtual machines. We have no access to those.
- RDS is not serverless (except Aurora Serverless). They have a server but are maintained by Amazon.
Why not run SQL on an EC2 instance?
- To avoid administration. (There may be some use cases that you need to handle your own DB, and EC2 may be the right choice)
Is there hybrid or on-premises deployment options for Amazon RDS?
- Amazon RDS on Outpost
- Amazon RDS on VMware
Backups
- Automated Backups
- Recover your database to any point of time within a retention period (1-35 days).
- Enabled by default
- Take a full daily snapshot.
- Stored in S3. Free storage space equals the database size
- Database Snapshots
- Manually.
- They are persisted even after the deletion of the RDS instance. (unlike automated backups)
Multi-AZ
- Used for Disaster Recovery, not for performance.
- Allows you to have an exact copy of the production database to another AZ.
- AWS handles the replication
- Automated sync on writes
- In case of failure on the main DB or in the case of main DB maintenance, AWS will automatically switch to the standby database.
- Available on :
- SQL Server
- Oracle
- MySQL
- Postgres
- MariaDB
- Aurora is completely fault-tolerant by default
Read Replicas
- Have Read replicas for read-only actions
- Improved performance.
- Used for read-heavy workloads
- Must have automatic backups on to deploy read replica.
- Up to 5 read replicas per database
- Can have read replicas of read replicas, but watch for latency.
- Available on :
- Aurora
- Oracle
- MySQL
- Postgres
- MariaDB
DynamoDB
- NoSQL. Spread across 3 geographically distinct data centers
- 2 types:
- Eventually consistent reads by default (read after write > 1 sec) –
- Best performance
- Strong consistent reads (read after write < 1 sec, immediately after)
- Use it when the app needs to read new data in < 1 sec from write
- Increases cost
- Improves consistency
- Eventually consistent reads by default (read after write > 1 sec) –
- Serverless
- Handles Frequent Schema changes
DynamoDB Performance
- Stored in SSD. Partitioned on many nodes. NO read replicas
DynamoDB Accelerator (DAX)
- In-memory cache
- Improve performance from milliseconds to microseconds.
DynamoDB Streams (Connect with Lambda)
- Associate the stream with a Lambda function.
- Immediately after an item in the table is modified, a new record appears in the table’s stream. AWS Lambda polls the stream and invokes your Lambda function synchronously when it detects new stream records.
DynamoDB Pricing
- DynamoDB charges for reading, writing, and storing data in your DynamoDB tables, along with any optional features you choose to enable.
- On-Demand: pay only per request but pay More
- Better for startup and dev, create new tables with unknown workloads, unpredictable application traffic
- Provisioned: specify the number of reads and writes per second
- Have predictable application traffic.
- Run applications whose traffic is consistent or ramps gradually.
- Can forecast capacity requirements to control costs.
Redshift
- Data warehouse. OLAP. Used for business intelligence
- Available in one AZ
- Can be configured as
- Single node
- Multi-node
- Leader Node: Client connections, receives queries
- Compute Node: store data, perform queries, computations
Redshift backups
- Enabled by default 1 day retention period (up to 35 days)
- Redshift maintains at least 3 backups (original, replica on compute nodes, and an S3 backup)
- Can asynchronously copy snapshots on an S3 on another region for Disaster Recovery
Redshift Logs
- Enable Enhanced VPC routing. Redshift forces all COPY and UNLOAD traffic between your cluster and your data repositories through your Amazon VPC.
- Use standard VPC features (security groups, ACLs, VPC endpoints, VPC endpoint policies, internet gateways, and DNS servers)
Redshift spectrum
- Enables you to query and analyze all of your data in Amazon S3 using the open data formats you already use, with no data loading or transformations needed.
Redshift spectrum vs Athena
- An analyst that already works with Redshift will benefit most from Redshift Spectrum because it can quickly access data in the cluster and extend out to infrequently accessed, external tables in S3. It’s also better suited for fast, complex queries on multiple data sets.
- Athena is a simpler way to run interactive, ad hoc queries on data stored in S3. It doesn’t require any cluster management, and an analyst only needs to define a table to make a standard SQL query.
Aurora
- Better performance of MySQL and Postgres. Much lower price.
- Auto scales. Start on 10GB, scales to 64TB
- 2 data copies on 3 AZ = 6 copies minimum
- Handle up to 2 loss of copies without affection write, up to 3 loss of copies without affection read
- Replication on milliseconds instead of seconds (MySQL). Replication does not affect performance.
- Replica however must be in the same region.
- On failover, there is no data loss. Automated Failover recovery.
- Backups are always enabled
- Can take snapshots and share them with other accounts
Aurora Serverless
- Cheap option, which auto-scales. Having a website or app where you do not know if people will access it. Good for unpredictable workloads.
- Pay per invocations, not by the hour.
ElastiCache
- Improve Performance for popular queries
- When to use it? -> When the database is overloaded.
- 2 types Memcached – Redis
- Memcached
- Really simple cache
- Multi-Threaded performance
- Redis
- Advanced data types
- Ranking/Sorting
- Backup
- MultiAZ
- Published/Subscribing
- Memcached
AWS Caching Strategies
- CloudFront. Cache in Edge Locations
- API Gateway
- ElastiCache
- DynamoDB Accelerator DAX
EMR – Elastic Map Reduce
- Solution for Big Data. Hadoop.
- Key Component: Cluster -> Collection of Nodes (EC2 instances)
- Node types:
- Master Node – Manage Cluster, Tracks status, Monitor
- Core Node – Run tasks and stores in Hadoop
- Task Node (Optional) – Only run tasks, does not store
- Architecture is Star, everything is talking with everything
- By default, logs are stored in the master node.
DMS – Database Migration Service
- Migrate databases between premise-AWS, premise-premise, and all the possible combinations.
- AWS DMS is a server. You specify source and target
- You can create the tables by yourself and let DMS do the migration.
- You can also use Schema Conversion Tool SCT to create the DB. If the database is different you need it. If they are the same you DONT need it.
- Supports also heterogeneous migrations, different SQL.