The Key Storage Management Operations consists of Storage Monitoring, Storage Alerting, and Storage Reporting. Storage Monitoring provides the performance and availability status of various infrastructure components and services. It also helps to trigger alerts when thresholds are reached, security policies are violated, and service performance deviates from SLA. These functions are explained below.
Storage Management Operations Overview
1) Storage Monitoring
Monitoring forms the basis for performing management operations. Monitoring provides the performance and availability status of various infrastructure components and services. It also helps to measure the utilization and consumption of various storage infrastructure resources by the services. This measurement facilitates the metering of services, capacity planning, forecasting, and optimal use of these resources. Monitoring events in the storage infrastructure, such as a change in the performance or availability state of a component or a service, may be used to trigger automated routines or recovery procedures.
Such procedures can reduce downtime due to known infrastructure errors and the level of manual intervention needed to recover from them. Further, monitoring helps in generating reports for service usage and trends. Additionally, monitoring of the data center environment parameters such as heating, ventilating, and air-conditioning (HVAC) helps in tracking any anomaly from their normal status. A storage infrastructure is primarily monitored for
- Configuration Monitoring
- Availability Monitoring
- Capacity Monitoring
- Performance Monitoring
- Security Monitoring
Monitoring configuration involves tracking configuration changes and deployment of storage infrastructure components and services. It also detects configuration errors, non-compliance with configuration policies, and unauthorized configuration changes.
Availability refers to the ability of a component or a service to perform its desired function during its specified time of operation. Monitoring availability of hardware components (for example, a port, an HBA, or a storage controller) or software component for example, a database instance or an orchestration software involves checking their availability status by reviewing the alerts generated from the system. For example, a port failure might result in a chain of availability alerts. A storage infrastructure commonly uses redundant components to avoid a single point of failure. Failure of a component might cause an outage that affects service availability, or it might cause performance degradation even though availability is not compromised. Continuous monitoring for expected availability of each component and reporting any deviation help the administrator to identify failing services and plan corrective action to maintain SLA requirements.
Capacity refers to the total amount of storage infrastructure resources available. Inadequate capacity leads to degraded performance or even service unavailability. Monitoring capacity involves examining the amount of storage infrastructure resources used and usable such as the free space available on a file system or a storage pool, the numbers of ports available on a switch, or the utilization of allocated storage space to a service. Monitoring capacity helps an administrator to ensure uninterrupted data availability and scalability by averting outages before they occur. For example, if 90 percent of the ports are utilized in a particular SAN fabric, this could indicate that a new switch might be required if more servers and storage systems need to be attached to the same fabric. Monitoring usually leverages analytical tools to perform capacity trend analysis. These trends help to understand future resource requirements and provide an estimation of the time required to deploy them.
Performance monitoring evaluates how efficiently different storage infrastructure components and services are performing and helps to identify bottlenecks. Performance monitoring measures and analyzes behavior in terms of response time, throughput, and I/O wait time. It identifies whether the behavior of infrastructure components and services meets the acceptable and agreed performance level. This helps to identify performance bottlenecks. It also deals with the utilization of resources, which affects the way resources behave and respond. For example, if a VM is experiencing 80 percent of processor utilization continuously, it suggests that the VM may be running out of processing power, which can lead to degraded performance and slower response time. Similarly, if the cache and controllers of a storage system is consistently over utilized, it may lead to performance degradation.
Monitoring a storage infrastructure for security includes tracking unauthorized access, whether accidental or malicious, and unauthorized configuration changes. For example, monitoring tracks and reports the initial zoning configuration performed and all the subsequent changes. Another example of monitoring security is to track login failures and unauthorized access to switches for performing administrative changes. IT organizations typically comply with various information security policies that may be specific to government regulations, organizational rules, or deployed services. Monitoring detects all operations and data movement that deviate from predefined security policies. Monitoring also detects unavailability of information and services to authorized users due to security breach. Further, physical security of a storage infrastructure can also be continuously monitored using badge readers, biometric scans, or video cameras.
2) Storage Alerting
An alert is a system-to-user notification that provides information about events or impending threats or issues. Alerting of events is an integral part of monitoring. Alerting keeps administrators informed about the status of various components and processes. For example, conditions such as failure of power, storage drives, memory, switches, or availability zone, which can impact the availability of services and require immediate administrative attention. Other conditions, such as a file system reaching a capacity threshold, an operation breaching a configuration policy, or a soft media error on storage drives, are considered warning signs and may also require administrative attention.
Monitoring tools enable administrators to define various alerted conditions and assign different severity levels for these conditions based on the impact of the conditions. Whenever a condition with a particular severity level occurs, an alert is sent to the administrator, an orchestrated operation is triggered, or an incident ticket is opened to initiate a corrective action. Alert classifications can range from information alerts to fatal alerts.
- Information alerts provide useful information but do not require any intervention by the administrator. The creation of a zone or LUN is an example of an information alert.
- Warning alerts require administrative attention so that the alerted condition is contained and does not affect service availability. For example, if an alert indicates that a storage pool is approaching a predefined threshold value, the administrator can decide whether additional storage drives need to be added to the pool.
- Fatal alerts require immediate attention because the condition might affect the overall performance or availability. For example, if a service fails, the administrator must ensure that it is returned quickly.
As every IT environment is unique, most monitoring systems require initial set-up and configuration, including defining what types of alerts should be classified as informational, warning, and fatal. Whenever possible, an organization should limit the number of truly critical alerts so that important events are not lost amidst informational messages. Continuous monitoring, with automated alerting, enables administrators to respond to failures quickly and proactively. Alerting provides information that helps administrators prioritize their response to events.
3) Storage Reporting
Like alerting, reporting is also associated with monitoring. Reporting on a storage infrastructure involves keeping track and gathering information from various components and processes that are monitored. The gathered information is compiled to generate reports for trend analysis, capacity planning, chargeback, performance, and security breaches.
- Capacity planning reports contain current and historic information about the utilization of storage, file systems, database tablespace, ports, etc.
- Configuration and asset management reports include details about device allocation, local or remote replicas, and fabric configuration. This report also lists all the equipment, with details, such as their purchase date, lease status, and maintenance records.
- Chargeback reports contain information about the allocation or utilization of storage infrastructure resources by various users or user groups.
- Performance reports provide current and historical information about the performance of various storage infrastructure components and services as well as their compliance with agreed service levels.
- Security breach reports provide details on the security violations, duration of breach and its impact.
Reports are commonly displayed like a digital dashboard, which provide real time tabular or graphical views of gathered information. Dashboard reporting helps administrators to make instantaneous and informed decisions on resource procurement, plans for modifications in the existing infrastructure, policy enforcement, and improvements in management processes.
Chargeback is the ability to measure storage resource consumption per business unit or user group and charge them back accordingly. It aligns the cost of deployed storage services with organization’s business goals such as recovery of cost, making a profit, justifying new capital spending, influencing consumption behaviors by the business units, and making IT more service aware, cost conscious and accountable.
To perform chargeback, the storage usage data is collected by a billing system that generates chargeback report for each business unit or user group. The billing system is responsible for accurate measurement of the number of units of storage used and reports cost/charge for the consumed units.
Chargeback reports can be extended to include a pre-established cost of other resources, such as the number of switch ports, HBAs and storage system ports, and service level requested by the users. Chargeback reports enable metering of storage services, providing transparency for both the provider and the consumer of the utilized services.
Go To >> Index Page