DynamoDB
- Fast, flexible NoSQL database service.
- Supports Horizontal scaling. Does not support query joins. Doesn’t support aggregations (like SUM)
- Scale to massive workload (millions of requests per second)
- Tables
- Each table has one primary key and many attributes
- Attributes can be null, and they can change over time
- Each item (row) has a max size of 400KB
- Partitions
- The WCUs and RCUs are divided across partitions
- Throttling
- Hot keys (popular item) or hot partitions
- Solutions:
- Better partition keys. Add a random number suffix to the partition key values.
- DAX accelerator
- APIs
- PutItem – new item or update existing item
- UpdateItem – update existing item (only some attributes) or create new if not exist.
- GetItem
- Query
- KeyConditionExpression
- FilterExpression
- Scan
- DeleteItem
- DeleteTable
- TransactWriteItems
- BatchWriteItem
- Up to 25 put or delete per call
- It is possible that only some of the actions in the batch succeed while the others do not. (No transaction)
- BatchGetItem (up to 100 items)
- CLI
- The following are arguments for aws dynamodb scan
- –projection-expression “…” : one or more attributes to retrieve
- –filter-expression “…”
- –page-size: get all items on batches
- –max-items
- –starting-token
- Transactions
- Read Modes: Eventual Consistency, Strong Consistency, Transactional
- Write Modes: Standard, Transactional
- Transactional consumes 2x WCU & RCU. Used on the calculations.
- Session State
- Common use case of DynamoDB or Elasticache. Both are key/value.
- Use Elasticache: If you need in memory cache
- Use Dynamo: If you need serverless, auto-scaling.
- Writes
- Conditional Writes
- Concurrent Writes
- Atomic Writes
- Copy table
- Use AWS Data Pipeline
- Or backup and restore into a new table. Slower.
- Backups
- Two built-in backup methods that write to Amazon S3
- On-demand
- Point-in-time recovery
- You don’t have access to the S3 buckets that are used for these backups.
- For globally dispersed users, consider using global tables
- Two built-in backup methods that write to Amazon S3
Parallel scans
- A Scan operation scans the entire table, which is very expensive. Improvement:
- Use the limit parameter to reduce the page size. Dynamo will not scan the entire table at once.
- Use parallel scans.
Conditional writes
- Used for Concurrency. Ensure the item hasn’t changed before update/delete
- Version number attribute on each item.
- If two people try to write at the same time, the second one will fail.
Primary Keys
- Partition Key
- unique
- String, number, or binary
- something like user_id for users
- Partition Key + Sort key
- data is grouped by partition key
- We can have two items with the same partition key and different sort key
- The uniqueness is on partition + sort
Capacity
- Read/Write capacity
- Provisioned mode (default)
- Plan capacity beforehand
- Option to set up auto-scaling
- On-demand
- Scale up / down
- Pay for what you use (more expensive)
- No capacity planning is needed. No need to setup WCU / RCU
- Provisioned mode (default)
- If you exceed the capacity, the Burst Capacity is activated. But if this is also consumed then you get a ProvisionedThoughputExceededException
- WCU Write capacity units
- One WCU is one write per second for 1KB item
- 10 items * 2KB = 20 WCU
- 6 items * 4,5KB = 30 WCU
- 120 items/ min * 2KB = 4 WCU
- Size is rounded up on the calculations (4.5KB becomes 5KB)
- One WCU is one write per second for 1KB item
- RCU Read capacity units
- Strongly Consistent Read: RCU = 1SCR per second for 4KB item
- 10 SCR * 4KB = 10 RCU
- 10 SCR * 6KB = 20 RCU
- Eventually Consistent Read: RCU = 2ECR per second for 4KB item
- 16 ECR * 12KB = 16/2 * 12/4 = 24 RCU
- Strongly Consistent Read: RCU = 1SCR per second for 4KB item
Read Types
- Strongly consistent read
- Set ConsistentRead on True on GetItem
- Slower and more expensive
- Eventually consistent read (default)
- Maybe we read old data cause the replication has not happened yet
Index
- Local Secondary Index LSI
- Extra Sort Key, up to 5 LSI per table
- Created on table creation
- Global Secondary Index LSI
- Extra Primary Key
- It’s like a new table
- Must provision RCU + WSU
- Can be added after table creation
- If the writes are throttled on GSI, the main table will be throttled as well!
TTL
- Delete items after TTL (number with unix epoch timestamp)
- Items are deleted within 48 hours from expiration
- They are still appearing in queries during this period
- They are deleted from everywhere
- Does not consume any WCU – no extra cost
Dynamo Accelerator DAX
- In-memory cache. Caches the most popular DB queries. Accessible in Microseconds
- Solves the hot key problem (too many reads)
- TTL 5 minutes default
- Up to 10 nodes in the cluster
- Supports multiple nodes. Configure Subnets, VPC, AZ, SG, Encryption
- We can add nodes after creation, but not change the node type.
- Can be used together with Elasticache
- In DAX we store individual objects and query
- In Elasticache we store aggregation results
Streams
- Ordered list of items modifications (create/update/delete)
- Retention 24 hours
- We can send them to Kinesis Data Streams, Lambda, Kinesis Client Library
- Used for Analytics or React to change real-time
- Types
- Keys attributes only: Only the item key attributes
- new image: The item after the update
- old image: The item before the update
- new and old images: The item before and after the update
- Only new updates will be written to streams after activation, not the old ones