AWS Simple Storage Service (S3)- Concepts - All about AWS

Amazon Simple Storage Service (S3) 

  • Amazon Simple Storage Service is web service based storage solution for the Internet. 
  • With the purpose to use to store and retrieve any amount of data, at any time, from anywhere on the web. 
  • An Object storage, files are stored as an Objects. 
  • Files "gets" and "puts" by using of web service. 
  • S3 is not a block storage and cannot be used to install any OS or database or software. 

Amazon S3 Concepts:- 

Buckets:  Storage container is called bucket in AWS S3 terminology.
  • The bucket doesn't follow any file/folder structure, means does not have sub bucket. 
  • Bucket contains files as an object, and optionally meta data that can describe file/object characteristic. 
  • File size can be 1 byte to 5 TB. 
  • Maximum size limit of one bucket is 5TB by default. 
  • By default 100 buckets can be created for each AWS account. 
  • All files lie on the same level. 
  • A bucket can store an unlimited number of files
  • There is no performance impact if all objects are stored in one single bucket, use of multiple buckets just help in categorizing and managing the objects
  • To support higher request rate, it is recommended to use some level of random distribution like Hash into prefix

Now Let's discuss rules for bucket naming in details:

  • This is always recommended to use DNS complaint name for the S3 bucket and it has been forced by AWS as well in all the regions except US East (N.Virginia) region. US East (N. Virginia) is allowing to have non-complaint bucket name currently, but this may also force to have DNS complaint naming convention soon. 
  • Main goal of this to ensure a single, consistent naming approach for AWS S3 bucket
  • Minimum length of bucket name is 3 characters and max is 63 chars ( 3 to 63 chars)
  • Each bucket has globally unique key to identify, however, S3 is region based service but bucket name/Key must be unique globally
  • Based naming convention for bucket name, use the  account Id as prefix for each bucket
  • Bucket name can have letters, numbers, labels separated by a period (.), hyphen (-), but we cannot have hyphen just after the period or vice verse. As there is an issue with SSL certification for the name which contains a period (.) as Period is not identified during certification validation. Although this is not an issue with HTTP. This is not recommended to have period(.) and upper case letter in bucket name
  • Bucket name should not be formatted as IP address like 10.1.1.1
  • Bucket name case sensitive must be matched to access
  • If bucket name has upper case letter then there will be issue to access bucket through virtual host style, DNS name resolver will resolve FQDN in all lowercase, and S3 will return error message "bucket not found"

Let's understand accessing bucket, although all buckets are accessible through AWS Console, however, we would need to access bucket programmatically as well.

  • To access a bucket programmatically, each object must uniquely identify the resource URI. Note that Amazon S3 supports RESTful architecture in which buckets and objects are resources each with a resource URI that uniquely identifies the resource
  • Amazon S3 supports both virtual-hosted–style and path-style URLs to access a bucket. In a virtual-hosted–style URL, the bucket name is part of the domain name in the URL. For example:
  • HTTP://<bucketname>.s3.amazonaws.comhttp://<bucketname>.<s3-aws-regionname>.amazonaws.com
  • In a path-style URL, the bucket name is not part of the domain (unless using a region-specific endpoint). For example:
  • US East (N. Virginia) region endpoint, http://s3.amazonaws.com/bucket
    Region-specific endpoint, http://s3-aws-region.amazonaws.com/bucket
  • Amazon S3 has a set of dual-stack endpoints, which support requests to S3 buckets over both Internet Protocol version 6 (IPv6) and IPv4

Objects:  in very simple term, Object is  just a file, a fundamental unit which will be stored in AWS S3

  • Consists of object data and metadata
  • The metadata is a set of name-value pairs that describe the object such as the date last modified, and standard HTTP metadata, such as Content-Type. We can also customize the metadata at time objects stored
  • After upload, cannot modify object metadata. The only way to modify object metadata is to make a copy of the object and set the metadata
  • Each object has unique identifier called Object Key
  • Object Key is used to retrieve the object
  • The name for a key is a sequence of Unicode characters whose UTF-8 encoding is at most 1024 bytes long
  • The scope of object Key's uniqueness is within a particular bucket, the same key can be used in another bucket
  • Let’s understand the Object Key Naming Guidelines to get optimized performance in case we have more than 100 requests per second
  • Use safe characters. Following are considered as safe characters
  • Alphanumeric characters [0-9a-zA-Z]
    Special characters !, -, _, ., *, ', (, and )
  • Amazon S3 data model is a flat structure, and There is no hierarchy of sub buckets or subfolders. To infer logical hierarchy, key name prefixes and delimiters (/) can be used
  • AWS Console supports subfolder concept that is achieved by using of prefix and delimiters (/) e.g. Development/Projects1.xls, Finance/statement1.pdf, Private/taxdocument.pdf
  • Amazon S3 supports buckets and objects, there is no hierarchy in Amazon S3. However, the prefixes and delimiters in an object key name enables the Amazon S3 console and the AWS SDKs to infer hierarchy and introduce concept of folders
  • There are some character which required special handling and will likely need to be URL encoded or referenced as HEX. Some are nonprintable characters and your browser may not handle them
  • Ampersand ("&") Dollar ("$") ASCII character ranges 00–1F, hex (0–31 decimal) and 7F 127 decimal.),at' symbol ("@") Equals ("=") Semicolon (";"), Colon (":"), Plus ("+"), Space – Significant sequences of spaces may be lost in some uses (especially multiple spaces)Comma (",") Question mark ("?")
  • Characters to Avoid in a key name because of significant special handling for consistency across all applications 
  • Backslash ("\"), Left curly brace ("{"), Non-printable ASCII characters(128–255 decimal characters), Caret ("^"), Right curly brace ("}"), Percent character ("%"), Grave accent / back tick ("`") , Right square bracket ("]"), Quotation marks, 'Greater Than' symbol (">"), Left square bracket ("["), Tilde ("~"), Less Than' symbol ("<"), 'Pound' character ("#"), Vertical bar / pipe ("|")
  • Object Metadata:  System metadata and user-defined metadata
  • Amazon S3 maintains a set of system metadata for each object that uploaded to S3 and  Amazon S3 processes this system metadata as needed such as object creation date is system controlled where only Amazon S3 can modify the value
  • Other system metadata such as the storage class configured for the object and whether the object has server-side encryption enabled are examples of system metadata whose values user controls
  • If you have your bucket configured as a website, sometimes you might want to redirect a page request to another page or an external URL
  • Amazon S3 stores the page redirect value as system metadata whose value user controls

S3 Operations

 Operation supported
  • Create or delete bucket
  • Write an object
  • Read an Object
  • Delete an Object
  • List Keys in a bucket

Data Consistency

  • Amazon S3 support read after write consistency for all new objects
  • S3 is eventually consistent for all Put (overwrite) and delete operation on any existing object key
  • S3 will never return corrupted data/file or partially updated file, however, may return Stale document(Old document even same document was updated the just while ago
  • if there are two processes updating the same object/document simultaneously, the change data will reflect based on timestamp, the latest timestamp will win in this case

Server Access Loggin

  • By default, Server Access logging is disabled
  • Access logging helps to learn about customer base and billing
  • Access logging provides details about access request which includes who is requested, what is bucket name, request time and action, Response status and if error then errors code.
  • Access logs information used for security and access audits
  • Access logs deliver logs into one of the S3 buckets, ensure designated S3 bucket must have permission to server access logging to write logs into that bucket. remember all S3 bucket is private by default, explicit permission is required for any reading or writing action
  • Access logs are collected periodically and deliver to target S3 bucket. Recommendation is to use prefix to manage logs in separate S3 bucket, created especially for Loggin purpose
  • Completeness and timeliness are not guaranteed in S3, Logs may deliver long after actual request was processed

  • Logs can be deleted any point of time
Amazon S3 Data replication

  • For the Durability, S3 data is replicated into multiple Availability Zone automatically.
  • S3 bucket also supports cross region data replication, but not automatically in any circumstance. This replication must be initiated by user 
    •  Enabling versioning on Bucket 
    • And adjusting IAM policy to give explicit permission S3 to do replication
  • Cross-region S3 bucket replication mainly serves the purpose of network performance improvement by providing data closer to customer location Or due to Compliance & Regulatory




Comments

Popular posts from this blog

Amazon API Gateway - Notes for CSAA examination