S3
Last updated
Last updated
Amazon S3 is a managed cloud storage solution that enables you to store data as objects in a bucket.
Amazon S3 is object-level storage.
Objects can be almost any data file, such as documents, images or videos.
Buckets are logical containers for objects.
When you add objects to a bucket, you must give them a unique name, which is called an object key.
Amazon S3 is designed to scale seamlessly and provide over 11 9s (99.999999999 percent) of durability.
Amazon S3 designed for unstructured/unlimited way of store/process/retrieve an data over web access fromAnywhere at anytime.
This is a global service unlike EC2 instance not a region-specific.
The data that you store in Amazon S3 isn’t associated with any particular server, and you don’t need manage any infrastructure by yourself
This act as a Version Control management system.
This is also used for static website hosting.
You can virtually store as many objects as you want.
The data is stored redundantly.
By default, none of your data is shared publicly.
You can also encrypt your data in transit and choose to enable server-side encryption on your objects.
Amazon S3 also provides low-latency access to the data over the internet by HTTP or HTTPS, so you can retrieve data anytime from anywhere.
Bucket names are universal and must be unique across all existing bucket names in Amazon S3.
Some of the benefits of AWS S3 are:
Durability: S3 provides 99.999999999 percent durability.
Low cost: S3 lets you store data in a range of “storage classes.” These classes are based on the frequency and immediacy you require in accessing files.
Scalability: S3 charges you only for what resources you actually use, and there are no hidden fees or overage charges. You can scale your storage resources to easily meet your organization’s ever-changing demands.
Availability: S3 offers 99.99 percent availability of objects. Amazon S3 automatically creates multiple data replicas, so it is never lost.
Security: High security involves encryption features and access management tools that prevent unauthorized access.
Flexibility: S3 is ideal for a wide range of uses like data storage, data backup, software delivery, data archiving, disaster recovery, website hosting, mobile applications, IoT devices, and much more.
Simple data transfer: You don’t have to be an IT genius to execute data transfers on S3. The service revolves around simplicity and ease of use.
Amazon S3 has following components, they are:
Bucket
Object
It is the top-level element of S3 and can be thought of as an object container, where the object denotes basic storage unit.
In other words, the bucket is a logical container storing data in Amazon S3 buckets.
It does not support nesting, meaning a bucket can’t contain another.
Users can choose the location of their choice for the creation of a bucket.
Bucket name should be globally unique. Only 3-63 characters allowed in that no upper special case is allowed and name can start with numerical value.
Users can create policies and permissions which restrict access to buckets.
The bucket can be deleted only when it contains no objects, meaning it should be empty.
An object is composed of data and any metadata that describes that file. The object has the following components:
Key: It is the unique identifier that identifies each object in a bucket. Key denotes the name of the object. It is the string that indicates a hierarchy of directories. Using the key, it is possible to retrieve objects in the bucket.
Value: It is the actual content that needs to be stored. It is made up of a sequence of bytes.
Metadata: It is the data about data that are being stored. It denotes a set of name-value pairs that stores data about an object.
Version ID: It is the system-generated string that uniquely identifies a specific version of an object.
Access control list: It controls access to objects or files stored in S3 by granting permissions.
Amazon S3 offers a range of object-level storage classes that are designed for different use cases. These classes include:
Amazon S3 Standard
Amazon S3 Intelligent-Tiering
Amazon S3 Standard-Infrequent Access (Amazon S3 Standard-IA)
Amazon S3 One Zone-Infrequent Access (Amazon S3 One Zone-IA)
Amazon S3 Glacier
Instant Retrieval
Flexible Retrieval
Amazon S3 Glacier Deep Archive
1. Amazon S3 Standard:
It is the default and the most expensive storage, class.
It supports for many use cases, including cloud applications, dynamic websites, content distribution, mobile and gaming applications, and big data analytics.
This is the best option when frequent data access is necessary.
2. Amazon S3 Standard-Infrequent Access (Amazon S3 Standard-IA):
Amazon S3 Standard-IA good for long-term storage and backups, and as a data store for disaster recovery (DR) files.
It has a lower price for data and it is used when data is rarely accessed.
3. Amazon S3 Intelligent-Tiering:
It delivers automatic cost savings by moving data to the most cost-effective access tier, without having an impact on performance.
In other words, it moves data between the Infrequent Access Tier and Frequent Access Tier.
Moves the objects that haven’t been accessed for 30 consecutive days to the Infrequent Access tier.
If an object in the Infrequent Access tier is accessed, it’s automatically moved back to the Frequent Access tier.
4. Amazon S3 One Zone-Infrequent Access (Amazon S3 One Zone-IA):
It is good for data that is used infrequently but requires rapid access.
Unlike other Amazon S3 storage classes, which store data in a minimum of three Availability Zones, but Amazon S3 One Zone-IA stores data in a single Availability Zone.
It is helpful in secondary backup storage.
5. Amazon S3 Glacier:
This service stores data as archives where data access is infrequent.
It provides low-cost and long-life archive storage.
Also, it uses server-side encryptions to encrypt all data.
6. Amazon S3 Glacier Deep Archive:
It is the lowest-cost storage class and provides long-term data retention where data access is infrequent.
It’s designed for customers—particularly customers in highly regulated industries, such as financial services, healthcare, and public sectors—that retain datasets for 7–10 years (or more) to meet regulatory compliance requirements.
Also, it has a minimum storage period of 90 days.
It has 99.9% availability over a given year.
Some of the features does Amazon S3 provides, they are
Versioning:
Amazon S3 provides a versioning feature to protect objects from accidental overwrites and deletes.
Versioning enables you to recover from both unintended user actions and application failures.
Enable versioning at the bucket level.
Each object in a bucket has a version ID, and when versioning is disabled, its value is set to null. When versioning is enabled, Amazon S3 creates a new object and assigns a unique value to its version ID (increments it) each time it is uploaded.
Versioning is disabled by default while creating an Amazon S3 bucket.
Object lifecycle management:
This feature allows you to manage your objects so that they are stored cost-effectively throughout their lifecycle.
By creating a lifecycle configuration, you can define rules that specify actions that Amazon S3 automatically applies to a group of objects.
Presigned object URL:
By default, an object by default, and only the object owner has permission to access it.
If the object owner wants to share the object with a user who does not have AWS security credentials or permissions, they can generate a presigned URL for it.
The presigned URL allows a recipient to access the object for a specified duration.
Bucket Policy:
Bucket policy is an IAM policy where you can allow or deny permission to your Amazon S3 resources.
With bucket policy, you also define security rules that apply to more than one file within a bucket.
For example: If you do not want a user to access the your bucket, then with the help of JSON script, you can set permissions. As a result, a user would be denied access to the bucket.
Amazon S3 Cross-Region Replication (CRR):
It is a feature that automatically duplicates your data from one Amazon S3 storage location to another in a different region.
This ensures that your files are not only stored in their original location but also have a backup in a separate geographic area.
By setting up rules for replication, new files added to the original storage are automatically copied to the backup storage.
This redundancy enhances data durability and provides a safety net in case of unforeseen events or disasters affecting the primary storage.
However, it's essential to configure permissions correctly, and users should be mindful of potential additional costs associated with cross-region data transfer.
Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading scalability, data availability, security, and performance.
Customers of all sizes and industries can store and protect any amount of data for virtually any use case, such as data lakes, cloud-native applications, and mobile apps.
With cost-effective storage classes and easy-to-use management features, you can optimize costs, organize data, and configure fine-tuned access controls to meet specific business, organizational, and compliance requirements.
Problem Statement:
Consider on-premises bigdata application is running, The total capacity of storage is 100GB, Per day 2GB data this produces in the form of Application logs and others
We have pre-defined the set-up to alert application owner when CPU and Memory Utilization breaches the limit of 80% ( High critical threshold limit)
Based on the Alert, Maintenance team has to raise SR and increase the volume of the application to manage the load.
Challenge:
We have to deploy more employess for monitoring and perform action.
If we fail to do on a time system start respond to end user very slow.
This internally impact the business growth and profit
Solution:
Integrate S3 with EC2 and just enabling the auto backup facility via Life cycle management in S3 can handle everything System performance issues never occurs.
S3 assures Availability (Accessibility) = 99.99% and Durability (Persistent) = 99.99999999999%.
We can store a unlimited data.
Some of the use cases of amazon S3 include:
Static Website Hosting:
Amazon S3 helps in hosting static websites. Hence users can use their domain. Serverless Web Applications can be developed using S3 and by using generated URLs, users can access the application.
Backup & Recovery:
Amazon S3 helps create backups and archive critical data by supporting Cross Region Replication. Due to versioning, which stores multiple versions of each file, it is easy to recover the files.
Low-cost data archiving:
It is possible to move data archives to certain levels of AWS S3 services like Glacier storage classes, which is one of the cheap and durable archiving solutions for compliance purposes; thus, data can be retained for a longer time
Security and Compliance:
Amazon S3 provides multiple levels of security, including Data Access Security, Identity and Access Management (IAM) policies, Access Control Lists (ACLs), etc. It supports compliance features for HIPAA/HITECH, Data Protection Directive, FedRAMP, and others.
Some of the limitations you have to remember while you work with Amazon S3, They are
Virtually unlimited storage, single object can hold upto 5TB of data.
Bucket names must globally unique.
An Amazon S3 bucket is owned by the AWS account that created it, and bucket ownership is not transferable to another account.
When creating a bucket, you select its name and the AWS Region for its creation. Once a bucket is created, its name or Region cannot be changed.
If you want to change a part of a file, you must make the change and then re-upload the entire modified file.
In a single AWS account, a maximum of 100 buckets can be created.
The default storage class in Amazon S3 is Amazon S3 Standard.
Free-tier eligibility allows storage of up to 5GB.
Amazon S3 supports all types of data, including .js, .java, .xml, .html, .txt, .xls, .py, etc., except for .exe files.
Amazon S3 cost estimation is based 4 parameters, they are
Storage Class type:
The cost of storing data in Amazon S3 varies depending on the chosen storage class.
For instance, the Amazon S3 Standard storage class is the most expensive, while the Amazon S3 Glacier Deep Archive is the least costly.
It is advisable to choose the storage class based on the specific use case.
Amount of storage:
The cost of Amazon S3 storage is influenced by the number and size of objects stored in your S3 buckets, as well as the type of storage used.
Requests:
Consider the number and type of requests.
GET requests incur charges at different rates than other requests, such as PUT and COPY requests.
GET – Retrieves an object from Amazon S3. You must have READ access to use this operation.
PUT – Adds an object to a bucket. You must have WRITE permissions on a bucket to add an object to it.
COPY – Creates a copy of an object that is already stored in Amazon S3. A COPY operation is the same as performing a GET and then a PUT.
Data transfer:
Consider the amount of data that’s transferred out of the Amazon S3 Region.
Remember that data transfer in is free, but you will be charged for data transfer out.
Pay only for what you use:
GBs per month.
Transfer OUT to other regions.
PUT, COPY, POST, LIST, and GET requests.
You don't have to pay for:
Transfer IN to Amazon S3
Transfer OUT from Amazon S3 to Amazon CloudFront or Amazon EC2 in the same region.