Ceph has emerged as one of the leading distributed storage platforms. By using commodity hardware and software-defined controls, Ceph has proven its worth as an answer to the scaling data needs of today’s businesses.
Ceph allows storage to scale seamlessly. When properly deployed and configured, it is capable of streamlining data allocation and redundancy. Automated rebalancing ensures that data is protected in the event of hardware loss. New servers can be added to an existing cluster in a timely and cost-efficient manner. Fast and accurate read / write capabilities along with its high-throughput capacity make Ceph a popular choice for today’s object and block storage needs.
Ceph – A General Overview
Ceph was conceived by Sage Weil during his doctoral studies at University of California – Santa Cruz. Weil realized that the accepted system of the time, Lustre, presented a “storage ceiling” due to the finite number of storage targets it could configure. Weil designed Ceph to use a nearly-infinite quantity of nodes to achieve petabyte-level storage capacity. Decentralized request management would improve performance by processing requests on individual nodes. In 2004, Weil founded the Ceph open source project to accomplish these goals. He released the first version 2006, and refined Ceph after founding his web hosting company in 2007.
Components Used in a Ceph Deployment
When looking to understand Ceph, one must look at both the hardware and software that underpin it.
Ceph is designed to use commodity hardware in order to eliminate expensive proprietary solutions that can quickly become dated. Storage clusters can make use of either dedicated servers or cloud servers. The ability to use a wide range of servers allows the cluster to be customized to any need. High-speed network switching provided by an Ethernet fabric is needed to maintain the cluster’s performance.
Ceph is a software-defined, Linux-specific storage system that will run on Ubuntu, Debian, CentOS, RedHat Enterprise Linux, and other Linux-based operating systems (OS). Ceph’s core utilities allow all servers (nodes) within the cluster to manage the cluster as a whole. The Object Storage Daemon segments parts of each node, typically 1 or more hard drives, into logical Object Storage Devices (OSD) across the cluster. These OSDs contain all of the objects (files) that are stored in the Ceph cluster. Ceph’s core utilities and associated daemons are what make it highly flexible and scalable. For the rest of this article we will explore Ceph’s core functionality a little deeper.
How Ceph Works
Before jumping into the nuances of Ceph, it is important to note that Ceph is a “Reliable Autonomic Distributed Object Store” (RADOS) at its core. RADOS is a dependable, autonomous object store that is made up of self-managed, self-healing, and intelligent nodes.
Ceph’s CRUSH algorithm determines the distribution and configuration of all OSDs in a given node. CRUSH stands for Controlled Replication Under Scalable Hashing. It is highly configurable and allows for maximum flexibility when designing your data architecture. It produces and maintains a map of all active object locations within the cluster. This is called the CRUSH map. CRUSH is used to establish the desired redundancy ruleset and the CRUSH map is referenced when keeping redundant OSDs replicated across multiple nodes.
Because CRUSH (and the CRUSH Map) are not centralized to any one node, additional nodes can be brought online without affecting the stability of existing servers in the cluster. This is how Ceph retains its ability to seamlessly scale to any size. Some adjustments to the CRUSH configuration may be needed when new nodes are added to your cluster, however, scaling is still incredibly flexible and has no impact on existing nodes during integration.
CRUSH can also be used to weight specific hardware for specialized requests. For example:
- Primary object copies can be assigned to SSD drives to gain performance advantages.
- Nodes with faster processors can be used for requests that are more resource-intensive.
- Object types (like media, photos, etc.) can be evenly distributed across the cluster to avoid performance issues from request spikes.
Ceph Core Daemons
Ceph utilizes four core daemons to facilitate the storage, replication, and management of objects across the cluster. These daemons are strategically installed on various servers in your cluster. Typically, multiple types of daemons will run on a server along with some allocated OSDs. In some cases, a heavily-utilized daemon will require a server all to itself. Minimally, each daemon that you utilize should be installed on at least two nodes. However, most use-cases benefit from installing three or more of each type of daemon. Also, since these daemons are redundant and decentralized, requests can be processed in parallel – drastically improving request time. Here is an overview of Ceph’s core daemons.
RADOS Gateway Daemon – This is the main I/O conduit for data transfer to and from the OSDs. When the application submits a data request, the RADOS Gateway daemon identifies the data’s position within the cluster. It then passes the request to the OSD that stores the data so that it can be processed. Accessibility to the gateway is gained through Ceph’s Librados library.
OSD Daemon – An OSD daemon reads and write objects to and from its corresponding OSD. A separate OSD daemon is required for each OSD in the cluster. Requests are submitted to an OSD daemon from RADOS or the metadata servers [see below]. After receiving a request, the OSD uses the CRUSH map to determine location of the requested object. OSD Daemons are in constant communication with the monitor daemons and implement any change instructions they receive. Additionally, OSD daemons communicate with the other OSDs that hold the same replicated data. In the event of a failure, the remaining OSD daemons will work on restoring the preconfigured durability guarantee.
Monitor Daemon (MON) – MONs oversee the functionality of every component in the cluster, including the status of each OSD. When an OSD or object is lost, the MON will rewrite the CRUSH map, based on the established rules, to facilitate the reduplication of data. Once created, it alerts the affected OSDs to re-replicate objects from a failed drive. A similar process takes place when a node is added to the cluster, allowing data to be rebalanced.
MONs can be used to obtain real-time status updates from the cluster. Device status, storage capacity, and IOPS are metrics that typically need to be tracked. Logs are not kept of this data by default, however logging can be configured if desired.
Meta Data Server Daemon (MDS) – This daemon interprets object requests from POSIX and other non-RADOS systems. When POSIX requests come in, the MDS daemon will assemble the object’s metadata with its associated object and return a complete file. This ability allows for the implementation of CephFS, a file system that can be used by POSIX environments.
From its beginnings at UC-Santa Cruz, Ceph was designed to overcome scalability and performance issues of existing storage systems. Its power comes from its configurability and self-healing capabilities. The system uses fluid components and decentralized control to achieve this. Properly utilizing the Ceph daemons will allow your data to be replicated across multiple servers and provide the redundancy and performance your storage system needs.
While there are many options available for storing your data, Ceph provides a practical and effective solution that should be considered. Proper implementation will ensure your data’s security and your cluster’s performance. To learn more about Genesis Adaptive’s Ceph storage offerings, feel free to explore our Storage Consulting section or reach out to us. Genesis Adaptive’s certified IT professionals draw from a wide range of hosting, consulting, and IT experience. Our experts will provide you with the best service and resources that will meet and exceed your storage needs.
Still not sure? We’re here to help.
Provide us with some info and we’ll connect you with one of our trained experts.