AWS Certified Developer Exam Notes – Dynamo

DynamoDB

  • Fast, flexible NoSQL database service.
  • Supports Horizontal scaling. Does not support query joins. Doesn’t support aggregations (like SUM)
  • Scale to massive workload (millions of requests per second)
  • Tables
    • Each table has one  primary key and many attributes
    • Attributes can be null, and they can change over time
    • Each item (row) has a max size of 400KB
  • Partitions
    • The WCUs and RCUs are divided across partitions
  • Throttling
    • Hot keys (popular item) or hot partitions
    • Solutions: 
      • Better partition keys. Add a random number suffix to the partition key values.
      • DAX accelerator
  • APIs
    • PutItem – new item or update existing item
    • UpdateItem – update existing item (only some attributes) or create new if not exist.
    • GetItem
    • Query
      • KeyConditionExpression
      • FilterExpression
    • Scan
    • DeleteItem
    • DeleteTable
    • TransactWriteItems
    • BatchWriteItem
      • Up to 25 put or delete per call
      • It is possible that only some of the actions in the batch succeed while the others do not. (No transaction)
    • BatchGetItem (up to 100 items)
  • CLI
    • The following are arguments for aws dynamodb scan
    • –projection-expression “…” : one or more attributes to retrieve
    • –filter-expression “…”
    • –page-size: get all items on batches
    • –max-items
    • –starting-token
  • Transactions
    • Read Modes: Eventual Consistency, Strong Consistency, Transactional
    • Write Modes: Standard, Transactional
    • Transactional consumes 2x WCU & RCU. Used on the calculations.
  • Session State
    • Common use case of DynamoDB or Elasticache. Both are key/value.
    • Use Elasticache: If you need in memory cache
    • Use Dynamo: If you need serverless, auto-scaling.
  • Writes
    • Conditional Writes
    • Concurrent Writes
    • Atomic Writes
  • Copy table
    • Use AWS Data Pipeline
    • Or backup and restore into a new table. Slower.
  • Backups
    • Two built-in backup methods that write to Amazon S3
      • On-demand
      • Point-in-time recovery
    • You don’t have access to the S3 buckets that are used for these backups.
    • For globally dispersed users, consider using global tables

 

Parallel scans

  • A Scan operation scans the entire table, which is very expensive. Improvement:
    • Use the limit parameter to reduce the page size. Dynamo will not scan the entire table at once.
    • Use parallel scans.

Conditional writes

  • Used for Concurrency. Ensure the item hasn’t changed before update/delete
  • Version number attribute on each item. 
  • If two people try to write at the same time, the second one will fail.

Primary Keys

  • Partition Key
    • unique
    • String, number, or binary
    • something like user_id for users
  • Partition Key + Sort key
    • data is grouped by partition key 
    • We can have two items with the same partition key and different sort key
      • The uniqueness is on partition + sort

Capacity

  • Read/Write capacity
    • Provisioned mode (default)
      • Plan capacity beforehand
      • Option to set up auto-scaling
    • On-demand
      • Scale up / down
      • Pay for what you use (more expensive)
      • No capacity planning is needed. No need to setup WCU / RCU
  • If you exceed the capacity, the Burst Capacity is activated. But if this is also consumed then you get a ProvisionedThoughputExceededException
  • WCU Write capacity units
    • One WCU is one write per second for 1KB item
      • 10 items *  2KB = 20 WCU
      • 6 items * 4,5KB = 30 WCU
      • 120 items/ min * 2KB = 4 WCU
    • Size is rounded up on the calculations (4.5KB becomes 5KB)
  • RCU Read capacity units
    • Strongly Consistent Read: RCU = 1SCR per second for 4KB item
      • 10 SCR * 4KB = 10 RCU
      • 10 SCR * 6KB = 20 RCU
    • Eventually Consistent Read: RCU = 2ECR per second for 4KB item
      • 16 ECR * 12KB =  16/2 * 12/4 = 24 RCU

Read Types

  • Strongly consistent read
    • Set ConsistentRead on True on GetItem 
    • Slower and more expensive
  • Eventually consistent read (default)
    • Maybe we read old data cause the replication has not happened yet

Index

  • Local Secondary Index LSI
    • Extra Sort Key, up to 5 LSI per table
    • Created on table creation
  • Global Secondary Index LSI
    • Extra Primary Key
    • It’s like a new table
    • Must provision RCU + WSU
    • Can be added after table creation
    • If the writes are throttled on GSI, the main table will be throttled as well!

TTL

  • Delete items after TTL (number with unix epoch timestamp)
  • Items are deleted within 48 hours from expiration
    • They are still appearing in queries during this period
  • They are deleted from everywhere
  • Does not consume any WCU – no extra cost

Dynamo Accelerator DAX

  • In-memory cache. Caches the most popular DB queries. Accessible in Microseconds
  • Solves the hot key problem (too many reads)
  • TTL 5 minutes default
  • Up to 10 nodes in the cluster
  • Supports multiple nodes. Configure Subnets, VPC, AZ, SG, Encryption
  • We can add nodes after creation, but not change the node type.
  • Can be used together with Elasticache
    • In DAX we store individual objects and query
    • In Elasticache we store aggregation results

Streams 

  • Ordered list of items modifications (create/update/delete)
  • Retention 24 hours
  • We can send them to Kinesis Data Streams, Lambda, Kinesis Client Library
  • Used for Analytics or React to change real-time
  • Types
    • Keys attributes only: Only the item key attributes 
    • new image: The item after the update
    • old image: The item before the update
    • new and old images: The item before and after the update
  • Only new updates will be written to streams after activation, not the old ones