Object based storage is a new type of storage system designed for cloud-scale scalability. Objects are stored and retrieved from an object store through the web-based APIs such as REST and SOAP. Each object can be linked with extensive metadata that can be searched and indexed. Object storage is ideal for rich content data that does not change often and does not require high performance. Object based storage mostly suitable for Cloud based solutions and architectures.
The key features of object-based storage Systems
Scale-out architecture: Scalability has always been the most important characteristic of enterprise storage systems, since the rationale of consolidating storage assumes that the system can easily grow with aggregate demand. OSD is based on distributed scale-out architecture where each node in the cluster contributes with its resources to the total amount of space and performance. Nodes are independently added to the cluster that provides massive scaling to support petabytes and even exabytes of capacity with billions of objects that make it suitable for cloud environment.
Multi-tenancy: Enables multiple applications to be securely served from the same infrastructure. Each application is securely partitioned and data is neither co-mingled nor accessible by other tenants. This feature is ideal for businesses providing cloud services for multiple customers or departments within an enterprise.
Metadata-driven policy: Metadata and policy-based information management capabilities combine to intelligently automate data placement, data protection, and other data services (compression, deduplication, retention, and deletion) based on the service requirements. For example, when an object is created, it is created on one node and subsequently copied to one or more additional nodes, depending on the policies in place. The nodes can be within the same data center or geographically dispersed.
Global namespace: A global namespace abstracts storage from the application and provides a common view, independent of location and making scaling seamless. This unburdens client applications from the need to keep track of where data is stored. The global namespace provides the ability to transparently spread data across storage systems for greater performance, load balancing, and non-disruptive operation. The global namespace is especially important when the infrastructure spans multiple sites and geographies.
Flexible data access method: OSD supports REST/SOAP APIs for web/mobile access, and file sharing protocols (CIFS and NFS) for file service access. Some OSD storage systems support HDFS interface for big data analytics.
Automated system management: OSD provides self-configuring and auto-healing capabilities to reduce administrative complexity and downtime. With respect to services or processes running in the OSD, there is no single point of failure. If one of the services goes down, and if the node becomes unavailable, or site becomes unavailable, there are redundant components and services that will facilitate normal operations.
Data protection: The objects stored in an OSD are protected using two methods: replication and erasure coding. The replication provides data redundancy by creating an exact copy of an object. The replica requires the same storage space as the source object. Based on the policy configured for the object, one or more replicas are created and distributed across different locations.
What is Erasure Coding ?
Object storage systems support erasure coding technique that provides space-optimal data redundancy to protect data loss against multiple drive failures. In storage systems, erasure coding can also ensure data integrity without using RAID. This avoids the capacity overhead of keeping multiple copies and the processing overhead of running RAID calculations on very large data sets. The result is data protection for very large storage systems without the risk of very long RAID rebuild cycles.
In general, erasure coding technique breaks the data into fragments, encoded with redundant data and stored across a set of different locations, such as disks, storage nodes, or geographic locations.
In a typical erasure coded storage system, a set of n disks is divided into m disks to hold data and k disks to hold coding information, where n, m, and k are integers. The coding information is calculated from the data. If up to k of the n disks fail, their contents can be recomputed from the surviving disks.
For example a data is divided into nine data segments (m = 9) and three coding fragments (k = 3). The maximum number of drive failure supported in this example is three. Erasure coding offers higher fault tolerance (tolerates k faults) than replication with less storage cost. The additional storage requirement for storing coding segments increases as the value of k/m increases.