S3 – Simple Storage Service
- Key-based, Object-based storage
- Cannot run an OS or a DB on S3. Can only store files.
- S3 has a universal namespace. The name of the bucket has to be unique. A web address is created for the bucket using that name.
- When upload a file -> HTTP 200 successful
- Unlimited storage.
- Object size: 0 – 5 TB
- Smallest Object: 0 b
- Biggest Object : 5 TB
- Biggest Object to be uploaded in single PUT: 5GB
- By default all newly created buckets are Private
You can update that using one of the following:
- Bucket Policies
- ACLs
Where is the data stored?
- Data is spread across multiple facilities.
- Default : 3 AZs on 1 Region.
- Exception: One Zone IA class. It uses only 1 AZ instead of 3.
Data Consistency
- Read after write -> PUTS of new objects
- Eventual Consistency -> PUTS(update) and DELETEs
S3 – Cross-Account Access
3 ways:
- Bucket Policies & IAM for the entire bucket – Programmatic access
- Bucket ACLs & IAM for individual objects – Programmatic access
- Cross Account IAM Roles – Programmatic/Console access
S3 – Cross-Region Replication
- Files in the existing bucket are not replicated. Only new files.
- Versioning must be enabled on both source/destination.
- Delete markers are not replicated
S3 – Static website
Here are the prerequisites for routing traffic to a website that is hosted in an Amazon S3 Bucket:
- An S3 bucket that is configured to host a static website. The bucket must have the same name as your domain or subdomain.
- A registered domain name. You can use Route 53 as your domain registrar, or you can use a different registrar.
- Route 53 as the DNS service for the domain. If you register your domain name by using Route 53, we automatically configure Route 53 as the DNS service for the domain.
S3 – URL styles
- Virtual hosted style: https://my-bucket.s3.us-west-2.amazonaws.com/file.csv
- Path style: https://s3.us-west-2.amazonaws.com/my-bucket/file.csv
S3 – Encryption
The main types are:
- Encryption in Transit (SSL/TLC)
- Encryption at Rest
- Server-Side Encryption
- Client-Side Encryption
Server-Side Encryption
You request Amazon S3 to encrypt your object before saving it on disks in its data centers and decrypt it when you download the objects.
- Amazon S3-Managed Keys (SSE-S3): AWS manages both data key and master key
- AWS KMS-Managed Keys (SSE-KMS): AWS manages data key and you manage the master key
- Customer-Provided Keys (SSE-C): You manage both the data key and master key
Client-Side Encryption
Encrypt data client-side and upload the encrypted data to Amazon S3. In this case, you manage the encryption process, the encryption keys, and related tools.
- AWS KMS–Managed Customer Master Key (CMK)
- Using a Client-Side Master Key
S3 – Versioning
- Backup tool
- Stores all versions of the object
- Once enabled, cannot be disabled, only suspended
- MFA Delete supported
- Lifecycle Integration
S3 – Lifecycle
- Add rules to move objects between storage classes. Can be applied to current or previous versions.
- Example: Move to Glacier after 60 days.
- Minimum days before the transition.
- Before transitioning from S3 Standard/S3 Standard-IA to S3 Standard-IA/S3 One Zone-IA, you must store them for at least 30 days in the S3 Standard storage class.
S3 – Object Lock
- Store objects using write once read many WORM model.
- Do not allow modify/delete for a period of time or permanently.
- Used in objects or in entire buckets.
- Two types:
- Governance. Only some users (root) can modify for the retention period
- Compliance. No users (even root) can modify for the retention period
Legal Hold vs Retention Period
- Both are applied to objects and prevent modification
- The legal hold does not have a time expiration. You need to remove it.
S3 – Performance
You can improve the S3 Performance by using the following features:
- S3 Prefixes
- Folder paths.
- You achieve better performance if you spread your requests across multiple prefixes.
- The more prefixes, the better performance.
- Multipart uploads (Upload)
- Allow Pause/resume of upload.
- Quick recovery from network issues.
- Use it to begin an upload before knowing the final object size.
- Improved throughput / Improve upload speed
- Should be used on any file > 100MB
- S3 Byte-Range Fetches (Download)
- Improve download speed
- S3 – Transfer Acceleration (Upload)
- Improve uploads.
- Use Edge Locations to accelerate uploads.
- Upload to edge location instead of S3. The file then gets transferred through AWS backbone.
- S3 – Select
- Pulling data from S3 using SQL.
- Drastic performance increases.
- A simpler Athena
- Athena is an analytical tool, allows you to perform many actions on your objects
- S3 Select is just a way to filter your objects.
S3 Standard |
S3 IA (Infrequently access) |
S3 One Zone IA (Replaced RRS) |
S3 Intelligent Tiering |
S3 Glacier |
S3 Glacier Deep Archive |
|
Availability | 99.99% | 99.9% | 99.5% | 99.99% | 99.99% | 99.99% |
Durability | 99.99999999999% | 99.99999999999% | 99.99999999999% | 99.99999999999% | 99.99999999999% | 99.99999999999% |
Notes | Sustain loss of 2 facilities concurrently | Use it for infrequent access files. | Use is when there is no need for high durability like non-important or re-creatable data. |
Uses Machine Learning. | Option for Vault Lock policy: Lock data for a specific time for regulatory reasons. | |
Price | Expensive | Lower than Standard. There is a retrieval fee. | Lower than Standard and IA |
Optimize costs by moving files to different Storage classes based on their usage. Choose this most of the times over S3 Standard, unless you have millions of objects (there is a file transfer cost) |
Cheap |
Cheapest |
Retrieval Time | Rapid access | Rapid access | Rapid access | Rapid access |
Three flavors: Expedited. Quick access for urgent requests.1–5 minutes. Standard. Typically complete within 3–5 hours. Default option. Bulk. Lowest cost retrieval option, retrieve large amounts even petabytes. 5–12 hours. |
12 hours |
CloudFront
- Content Delivery Network (CDN)
- Distribution – Collection of Edge Locations
- Potential origins: S3, EC2, ELB, Route53
- The user asks the nearest Edge location first. Do you have this file?
- If not, Edge Location gets the file from Origin, and cache it for TTL.
- You can write to Edge Locations
- You can clear the cache but you will be charged!
CloudFront Types
- Web – cache content
- RTMP – used for streaming data
Signed URLs vs Signed Cookies
- The Signed URL points to a file (1 URL = 1 file)
- Policy:
- URL expiration
- IP ranges of customers
- Trusted signers
- How does it work?
- Cannot be accessed directly from S3. S3 talks with Cloudfront and the file is accessed from there.
- Policy:
- The Signed cookie allows access to multiple files(1 cookie = multiple files)
Lambda@Edge
- Feature of CloudFront that lets you run code closer to users of your application, which improves performance and reduces latency.
- No need to provision or manage infrastructure in multiple locations around the world.
- Pay only for the compute time you consume
Transfer on-premises data to AWS
SnowBall
- 50TB or 80TB
- Big physical disks
- Used for Data transfer
Snowball Edge
- 100TB
- Compute capabilities also. Can run Lambda.
- It is actually a mini AWS
SnowMobile
- 100 PB
- Truck
S3 – DataSync
- Synchronize your data between the on-premise data center and AWS.
- Use it to move your system from premises to AWS
Storage Gateway
Storage Gateway supports basically two types: File Gateway and Volume Gateway.
File Gateway
- Flat files
- Stored on S3
- Transfer data from premises to S3
File gateway vs DataSync
- Use AWS DataSync to migrate existing data to Amazon S3 (First time)
- Use File Gateway to retain access to the migrated data and for ongoing updates from your on-premises file-based applications (Continuous)
Volume Gateway
Volume Gateway supports two types: Stored Volumes and Cached Volumes.
Stored Volumes
- Dataset stored on-site and async backed up to S3
Cached Volumes
- Dataset stored on S3 and most frequently accessed data cached on site
Direct Connect
- Create a dedicated connection from your premises to AWS
- 2 cages inside Direct Connect
- AWS, connected to AWS Region
- Customer, connected to customer premises
When use it?
Need high throughput, stable reliable connection
Direct Connect vs Storage Gateway
- Direct Connect sets up a connection between AWS and premises.
- Storage Gateway mainly connects premises with S3.
Direct Connect VS VPN
- Direct connect bypasses the internet, but VPN does not.