AWS Certified Developer Exam Notes - Dynamo

DynamoDB

Fast, flexible NoSQL database service.
Supports Horizontal scaling. Does not support query joins. Doesn’t support aggregations (like SUM)
Scale to massive workload (millions of requests per second)
Tables
- Each table has one primary key and many attributes
- Attributes can be null, and they can change over time
- Each item (row) has a max size of 400KB
Partitions
- The WCUs and RCUs are divided across partitions
Throttling
- Hot keys (popular item) or hot partitions
- Solutions:
  - Better partition keys. Add a random number suffix to the partition key values.
  - DAX accelerator
APIs
- PutItem – new item or update existing item
- UpdateItem – update existing item (only some attributes) or create new if not exist.
- GetItem
- Query
  - KeyConditionExpression
  - FilterExpression
- Scan
- DeleteItem
- DeleteTable
- TransactWriteItems
- BatchWriteItem
  - Up to 25 put or delete per call
  - It is possible that only some of the actions in the batch succeed while the others do not. (No transaction)
- BatchGetItem (up to 100 items)
CLI
- The following are arguments for aws dynamodb scan
- –projection-expression “…” : one or more attributes to retrieve
- –filter-expression “…”
- –page-size: get all items on batches
- –max-items
- –starting-token
Transactions
- Read Modes: Eventual Consistency, Strong Consistency, Transactional
- Write Modes: Standard, Transactional
- Transactional consumes 2x WCU & RCU. Used on the calculations.
Session State
- Common use case of DynamoDB or Elasticache. Both are key/value.
- Use Elasticache: If you need in memory cache
- Use Dynamo: If you need serverless, auto-scaling.
Writes
- Conditional Writes
- Concurrent Writes
- Atomic Writes
Copy table
- Use AWS Data Pipeline
- Or backup and restore into a new table. Slower.
Backups
- Two built-in backup methods that write to Amazon S3
  - On-demand
  - Point-in-time recovery
- You don’t have access to the S3 buckets that are used for these backups.
- For globally dispersed users, consider using global tables

Parallel scans

A Scan operation scans the entire table, which is very expensive. Improvement:
- Use the limit parameter to reduce the page size. Dynamo will not scan the entire table at once.
- Use parallel scans.

Conditional writes

Used for Concurrency. Ensure the item hasn’t changed before update/delete
Version number attribute on each item.
If two people try to write at the same time, the second one will fail.

Primary Keys

Partition Key
- unique
- String, number, or binary
- something like user_id for users
Partition Key + Sort key
- data is grouped by partition key
- We can have two items with the same partition key and different sort key
  - The uniqueness is on partition + sort

Capacity

Read/Write capacity
- Provisioned mode (default)
  - Plan capacity beforehand
  - Option to set up auto-scaling
- On-demand
  - Scale up / down
  - Pay for what you use (more expensive)
  - No capacity planning is needed. No need to setup WCU / RCU
If you exceed the capacity, the Burst Capacity is activated. But if this is also consumed then you get a ProvisionedThoughputExceededException
WCU Write capacity units
- One WCU is one write per second for 1KB item
  - 10 items * 2KB = 20 WCU
  - 6 items * 4,5KB = 30 WCU
  - 120 items/ min * 2KB = 4 WCU
- Size is rounded up on the calculations (4.5KB becomes 5KB)
RCU Read capacity units
- Strongly Consistent Read: RCU = 1SCR per second for 4KB item
  - 10 SCR * 4KB = 10 RCU
  - 10 SCR * 6KB = 20 RCU
- Eventually Consistent Read: RCU = 2ECR per second for 4KB item
  - 16 ECR * 12KB = 16/2 * 12/4 = 24 RCU

Read Types

Strongly consistent read
- Set ConsistentRead on True on GetItem
- Slower and more expensive
Eventually consistent read (default)
- Maybe we read old data cause the replication has not happened yet

Index

Local Secondary Index LSI
- Extra Sort Key, up to 5 LSI per table
- Created on table creation
Global Secondary Index LSI
- Extra Primary Key
- It’s like a new table
- Must provision RCU + WSU
- Can be added after table creation
- If the writes are throttled on GSI, the main table will be throttled as well!

TTL

Delete items after TTL (number with unix epoch timestamp)
Items are deleted within 48 hours from expiration
- They are still appearing in queries during this period
They are deleted from everywhere
Does not consume any WCU – no extra cost

Dynamo Accelerator DAX

In-memory cache. Caches the most popular DB queries. Accessible in Microseconds
Solves the hot key problem (too many reads)
TTL 5 minutes default
Up to 10 nodes in the cluster
Supports multiple nodes. Configure Subnets, VPC, AZ, SG, Encryption
We can add nodes after creation, but not change the node type.
Can be used together with Elasticache
- In DAX we store individual objects and query
- In Elasticache we store aggregation results

Streams

Ordered list of items modifications (create/update/delete)
Retention 24 hours
We can send them to Kinesis Data Streams, Lambda, Kinesis Client Library
Used for Analytics or React to change real-time
Types
- Keys attributes only: Only the item key attributes
- new image: The item after the update
- old image: The item before the update
- new and old images: The item before and after the update
Only new updates will be written to streams after activation, not the old ones

George Antonopoulos

AWS Certified Developer Exam Notes – Dynamo

DynamoDB