Amazon S3 Blob Storage Notes
Purpose
Amazon Simple Storage Service (Amazon S3) is an object storage service for storing and protecting any amount of data. It supports use cases like data lakes, websites, mobile apps, backups, archiving, enterprise applications, IoT devices, and big data analytics. Provides management tools for optimization, organization, and access configuration to meet business, compliance, and organizational needs.
Features
- Storage Classes:
- S3 Standard: Frequent access, mission-critical data.
- S3 Express One Zone: High-performance, single-zone (sub-millisecond latency, up to 10x faster than Standard, 50% lower request costs; uses directory buckets).
- S3 Standard-IA / One Zone-IA: Infrequent access, cost savings.
- S3 Glacier (Instant Retrieval, Flexible Retrieval, Deep Archive): Archival with varying retrieval times and lowest costs.
- S3 Intelligent-Tiering: Auto-moves data between tiers (frequent/infrequent access, archive) based on patterns.
- Storage Management:
- Lifecycle policies: Transition objects to other classes or expire them.
- Object Lock: WORM (Write Once, Read Many) to prevent deletions/overwrites.
- Replication: Copy objects/metadata across buckets/Regions.
- Batch Operations: Manage billions of objects (e.g., copy, invoke Lambda, restore).
- Access Management & Security:
- Private by default; S3 Block Public Access enabled (bucket/account level).
- IAM for authentication/authorization; bucket policies (JSON-based, resource-level).
- Access points: Named endpoints with dedicated policies/VPC-only access.
- ACLs: Granular read/write (legacy; prefer policies).
- Object Ownership: Disables ACLs, bucket owner enforces (default).
- IAM Access Analyzer: Policy evaluation for external access.
- Data Processing:
- Object Lambda: Modify data on GET/HEAD/LIST (e.g., filter, resize).
- Event notifications: Trigger SNS, SQS, or Lambda on changes.
- Logging & Monitoring:
- CloudWatch: Metrics for health/billing.
- CloudTrail: API tracking.
- Server access logging: Request records.
- Trusted Advisor: Optimization checks.
- Analytics & Insights:
- Storage Lens: Metrics/dashboards across accounts/Regions/buckets.
- Storage Class Analysis: Access patterns for class optimization.
- S3 Inventory: Reports on objects, metadata, replication, encryption.
- Consistency: Strong read-after-write for PUT/DELETE, Select, ACLs, tags, metadata. Atomic single-key updates; eventual for bucket configs.
Storage Classes
- General Purpose (Standard): Multi-AZ, high availability.
- Intelligent-Tiering: Auto-optimizes between 4 tiers.
- Infrequent Access: Standard-IA (multi-AZ), One Zone-IA (single-AZ).
- Archival: Glacier variants for cost-effective long-term storage.
- Express One Zone: Latency-sensitive apps, single-AZ.
Security
- Default privacy; explicit access grants required.
- Block Public Access: Prevents public exposure.
- IAM/Bucket Policies: Control based on requester, actions, conditions (e.g., IP).
- Object Lock/Replication: Compliance and security features.
- PCI DSS Level 1 compliant for sensitive data.
- Access Analyzer: Detects unintended external access.
Scalability
- Unlimited objects per bucket; no provisioning needed.
- Directory Buckets: 100 per account, horizontal scaling, no prefix limits.
- Table Buckets: 10 per account/Region, up to 10K tables/bucket (for Iceberg tabular data).
- Vector Buckets: For ML vector storage/querying with similarity search APIs.
- Batch Operations: Scale to billions of objects.
- High request rates (e.g., 100K+/sec with Express One Zone).
Durability
- 99.999999999% (11 9s) via replication across multiple devices/AZs.
- Safe storage on successful PUT; redundant across data centers (except single-zone classes).
Availability
- 99.99% for Standard; replicated across AZs.
- Express One Zone: Single AZ with device redundancy.
- Directory buckets in Local Zones for residency.
Other Important Developer Notes
- How It Works: Data as objects (file + metadata) in buckets. Buckets: Global names, Region-bound. Keys: Unique identifiers (path-like). Bucket types: General purpose (multi-AZ), directory (low-latency/single-zone), table (tabular/Iceberg), vector (ML embeddings).
- Versioning: Stores multiple versions for recovery; unique version IDs.
- Access Methods: Console (UI), CLI (commands), SDKs (programmatic, e.g., Python boto3), REST API (HTTP).
- Data Consistency: Strong for single-key ops; eventual for bucket-level (e.g., deleted buckets may linger briefly).
- Pricing: Pay-per-use (storage, requests, transfer); free tier for new users.
- Related Services: EC2 (compute), EMR (processing), Snow Family (offline), Transfer Family (SFTP/FTPS/FTP).