AWS Certified Solutions Architect Exam Notes – S3

S3 – Simple Storage Service

  • Key-basedObject-based storage
  • Cannot run an OS or a DB on S3. Can only store files.
  • S3 has a universal namespace. The name of the bucket has to be unique. A web address is created for the bucket using that name.
  • When upload a file -> HTTP 200 successful
  • Unlimited storage. 
  • Object size: 0 – 5 TB
    • Smallest Object: 0 b
    • Biggest Object : 5 TB
    • Biggest Object to be uploaded in single PUT: 5GB
  • By default all newly created buckets are Private 


You can update that using one of the following:

  • Bucket Policies
  • ACLs

Where is the data stored?

  • Data is spread across multiple facilities.
  • Default : 3 AZs on 1 Region.
  • Exception: One Zone IA class. It uses only 1 AZ instead of 3.

Data Consistency

  • Read after write -> PUTS of new objects
  • Eventual Consistency -> PUTS(update) and DELETEs

S3 – Cross-Account Access
3 ways:

  • Bucket Policies & IAM for the entire bucket – Programmatic access
  • Bucket ACLs & IAM for individual objects – Programmatic access
  • Cross Account IAM Roles – Programmatic/Console access


S3 – Cross-Region Replication

  • Files in the existing bucket are not replicated. Only new files.
  • Versioning must be enabled on both source/destination.
  • Delete markers are not replicated

S3 – Static website

Here are the prerequisites for routing traffic to a website that is hosted in an Amazon S3 Bucket:

  • An S3 bucket that is configured to host a static website. The bucket must have the same name as your domain or subdomain.
  • A registered domain name. You can use Route 53 as your domain registrar, or you can use a different registrar.
  • Route 53 as the DNS service for the domain. If you register your domain name by using Route 53, we automatically configure Route 53 as the DNS service for the domain.

 

S3 – URL styles

  • Virtual hosted style: https://my-bucket.s3.us-west-2.amazonaws.com/file.csv
  • Path style: https://s3.us-west-2.amazonaws.com/my-bucket/file.csv

 

S3 – Encryption

The main types are: 

  • Encryption in Transit (SSL/TLC)
  • Encryption at Rest
    • Server-Side Encryption
    • Client-Side Encryption 

Server-Side Encryption

You request Amazon S3 to encrypt your object before saving it on disks in its data centers and decrypt it when you download the objects.

  • Amazon S3-Managed Keys (SSE-S3): AWS manages both data key and master key
  • AWS KMS-Managed Keys (SSE-KMS): AWS manages data key and you manage the master key
  • Customer-Provided Keys (SSE-C): You manage both the data key and master key

Client-Side Encryption 
Encrypt data client-side and upload the encrypted data to Amazon S3. In this case, you manage the encryption process, the encryption keys, and related tools.

  • AWS KMS–Managed Customer Master Key (CMK)
  • Using a Client-Side Master Key

S3 – Versioning

  • Backup tool
  • Stores all versions of the object
  • Once enabled, cannot be disabled, only suspended
  • MFA Delete supported
  • Lifecycle Integration

S3 – Lifecycle

  • Add rules to move objects between storage classes. Can be applied to current or previous versions.
    • Example: Move to Glacier after 60 days.
  • Minimum days before the transition.
    • Before transitioning from S3 Standard/S3 Standard-IA to S3 Standard-IA/S3 One Zone-IA, you must store them for at least 30 days in the S3 Standard storage class.

S3 – Object Lock

  • Store objects using write once read many WORM model.
  • Do not allow modify/delete for a period of time or permanently.
  • Used in objects or in entire buckets.
  • ​​​​​​​Two types:
    • Governance. Only some users (root) can modify for the retention period
    • Compliance. No users (even root)  can modify for the retention period

Legal Hold vs Retention Period

  • Both are applied to objects and prevent modification
  • The legal hold does not have a time expiration. You need to remove it.

S3 – Performance

You can improve the S3 Performance by using the following features:

  • S3 Prefixes
    • Folder paths.
    • You achieve better performance if you spread your requests across multiple prefixes.
    • The more prefixes, the better performance.
  • Multipart uploads (Upload)
    • Allow Pause/resume of upload.
    • Quick recovery from network issues.
    • Use it to begin an upload before knowing the final object size.
    • Improved throughput / Improve upload speed
    • Should be used on any file > 100MB
  • S3 Byte-Range Fetches (Download)
    • Improve download speed
  • S3 – Transfer Acceleration (Upload)
    • Improve uploads.
    • Use Edge Locations to accelerate uploads.
    • Upload to edge location instead of S3. The file then gets transferred through AWS backbone.
  • S3 – Select
    • Pulling data from S3 using SQL.
    • Drastic performance increases.
    • A simpler Athena
      • Athena  is an analytical tool, allows you to perform many actions on your objects
      • S3 Select is just a way to filter your objects.
  S3 Standard

S3 IA (Infrequently access)

S3 One Zone IA (Replaced RRS)

S3 Intelligent Tiering

S3 Glacier

S3 Glacier Deep Archive

Availability 99.99% 99.9% 99.5% 99.99% 99.99% 99.99%
Durability 99.99999999999% 99.99999999999% 99.99999999999% 99.99999999999% 99.99999999999% 99.99999999999%
Notes Sustain loss of 2 facilities concurrently Use it for infrequent access files. Use is when there is no need for high durability like non-important or re-creatable data.

Uses Machine Learning. Option for Vault Lock policy: Lock data for a specific time for regulatory reasons.  
Price Expensive Lower than Standard. There is a retrieval fee. Lower than Standard and IA

Optimize costs by moving files to different Storage classes based on their usage.

Choose this most of the times over S3 Standard, unless you have millions of objects (there is a file transfer cost)

 Cheap

Cheapest
Retrieval Time Rapid access Rapid access Rapid access Rapid access

Three flavors:

Expedited. Quick access for urgent requests.1–5 minutes.

Standard. Typically complete within 3–5 hours. Default option.

Bulk.  Lowest cost retrieval option, retrieve large amounts even petabytes.  5–12 hours.

12 hours

CloudFront

  • Content Delivery Network (CDN)
  • Distribution – Collection of Edge Locations
  • Potential origins:  S3, EC2, ELB, Route53
  • The user asks the nearest Edge location first. Do you have this file? 
    • If not, Edge Location gets the file from Origin, and cache it for TTL.
  • You can write to Edge Locations
  • You can clear the cache but you will be charged!

CloudFront Types

  • Web – cache content
  • RTMP – used for streaming data


Signed URLs vs Signed Cookies

  • The Signed URL points to a file (1 URL = 1 file)
    • Policy: 
      • URL expiration
      • IP ranges of customers
      • Trusted signers 
    • How does it work?
      • Cannot be accessed directly from S3. S3 talks with Cloudfront and the file is accessed from there.
  • The Signed cookie allows access to multiple files(1 cookie = multiple files)


Lambda@Edge

  • Feature of CloudFront that lets you run code closer to users of your application, which improves performance and reduces latency. 
  • No need to provision or manage infrastructure in multiple locations around the world.
  • Pay only for the compute time you consume 

Transfer on-premises data to AWS

SnowBall

  • 50TB or 80TB
  • Big physical disks
  • Used for Data transfer

Snowball Edge 

  • 100TB
  • Compute capabilities also. Can run Lambda.
  • It is actually a mini AWS

SnowMobile 

  • 100 PB 
  • Truck


S3 – DataSync

  • Synchronize your data between the on-premise data center and AWS.
  • Use it to move your system from premises to AWS

Storage Gateway 

Storage Gateway supports basically two types: File Gateway and Volume Gateway.

File Gateway

  • Flat files
  • Stored on S3
  • Transfer data from premises to S3 


File gateway vs DataSync

  • Use AWS DataSync to migrate existing data to Amazon S3 (First time)
  • Use File Gateway to retain access to the migrated data and for ongoing updates from your on-premises file-based applications (Continuous)


Volume Gateway

Volume Gateway supports two types: Stored Volumes and Cached Volumes.

Stored Volumes  

  • Dataset stored on-site and async backed up to S3

Cached Volumes

  • Dataset stored on S3 and most frequently accessed data cached on site

Direct Connect 

  • Create a dedicated connection from your premises to AWS
  • 2 cages inside Direct Connect
    • AWS, connected to AWS Region
    • Customer, connected to customer premises

When use it? 
Need high throughput, stable reliable connection

Direct Connect vs Storage Gateway​​​​​​​​​​​​​​

  • Direct Connect sets up a connection between AWS and premises. 
  • Storage Gateway mainly connects premises with S3.

Direct Connect VS VPN

  • Direct connect bypasses the internet, but VPN does not.