The transition to cloud-based applications appears unstoppable. As the number of cloud users grows day by day due to various economic and performance benefits, data centers are continuing to grow in terms of hardware resources, virtual resources, and traffic volume, making cloud operation and management more complicated and heterogeneous.
This creates several crucial challenges in cloud security and monitoring that involve reviewing, controlling, and managing the operational workflow and processes within a cloud-based IT asset or infrastructure.
Cloud monitoring is a key tool for controlling and managing the three different cloud computing layers, i.e., infrastructure, platform, and application, by collecting information from different probes, aggregating related information, filtering the unrelated/unwanted data, and finally analyze or evaluate the performance of the cloud. It also takes control actions in the interest of performance improvement in the cloud.
Cloud monitoring is essential to maintain high system availability and performance of the system and is important for both providers and consumers. Primarily, monitoring is key for:
- managing software and hardware resources
- providing continuous information for those resources as well as for consumer-hosted applications on the cloud.
For effective and smooth system operations, cloud activities such as resource planning, resource management, data center management, SLA management, billing, troubleshooting, performance management, and security management require monitoring. As a result, given the elasticity of cloud computing, there is a strong need for monitoring.
An efficient cloud monitoring strategy has five phases:
- Data collection – Gather different types of information or metrics from the different cloud components to identify various challenges like a single point of failure, performance degradation, replication, and fault tolerance.
- Data filtering – Use various filtering methods like data preprocessing, data deduplication, data compression, and dimension reduction to remove redundant, invalid, conflict, and irrelevant data from the collected information and deliver more relevant data.
- Data aggregation – It is a process in which the collected information is articulated in a summary form for statistical analysis, in the regular interval depending upon the requirement of the application.
- Data analysis – It is a process of inspecting, transforming, and modeling data to discover useful information from the aggregated data that will help in decision making and improve the performance by identifying present resource status, predicting future status, and detecting critical conditions & abnormal conditions.
- Alert and reporting – Using data analysis, a complete report of cloud status either in graphical representation or descriptive format is generated to tell about the status of the cloud at a particular point in time. Certain notification is sent to the admin in the form of email or raising the alarm.
The key characteristics of the best cloud monitoring include accuracy, adaptability, autonomy, availability, customizability, elasticity, extensibility, intrusive, scalability, resilience, reliability, portability, and multi-tenancy.
Due to the architecture complexity, computational and network workload, volume of monitoring parameters, and changing computing environment of the cloud infrastructure, cloud monitoring tools face major challenges. Let’s discuss some of the key challenges/issues and future directions in cloud monitoring strategy.
1. Cloud dependency
Many public cloud providers now offer to monitor tools to their customers so they can keep track of their application’s CPU, storage, and network usage. These tools are frequently tightly integrated with the cloud provider’s own tools. For instance, Amazon’s CloudWatch is a monitoring tool that allows users to manage and monitor their applications hosted on AWS EC2 (CPU) services. However, this monitoring tool cannot monitor an application component that may be hosted on the infrastructure of other cloud providers, such as GoGrid and Azure.
2. Cloud agnostic
In contrast to single cloud monitoring, engineering cloud-agnostic monitoring tools is challenging. This is primarily because there is no common unified application programming interface (API) for calling on cloud computing services’ run-time QoS statistics. Though recent developments in cloud programming API, including Simple Cloud, Delta Cloud, JCloud, and Dasein Cloud, simplify the interaction of services (CPU, storage, and network) that may belong to multiple clouds, they have limited or no ability to monitor their run-time QoS statistics and application behaviors.
As a result, monitoring tools should retrieve QoS data from services and applications across multiple clouds. If a hybrid cloud architecture with services from private and public clouds is to be realized, cloud-agnostic monitoring tools are also required. Monitis is a monitoring tool that gives you access to various clouds, including Amazon EC2, Rackspace, and GoGrid. It uses the widget concept, which allows customers to view multiple widgets on a single page. To access monitoring data for their cloud applications running on different cloud provider infrastructure, consumers only need to provide their cloud account credentials in Monitis. They can also choose which instance to keep an eye on. As a result, on a single page, a customer can see two different cloud instances using two different widgets.
3. Cross-layer monitoring
A multimedia streaming application’s components (streaming server, web server, indexing server, compute service, storage service, and network) are distributed across cloud layers, including PaaS and IaaS. It is critical to monitor QoS parameters across multiple layers to ensure that QoS targets for the multimedia application as a whole are met. As a result, developing monitoring tools to capture and reason about the QoS parameters of application components across IaaS and PaaS layers is challenging.
Besides, cloud services are distributed among three layers, namely, SaaS, PaaS, and IaaS. Monitoring tools originally are oriented to perform monitoring tasks over services only in one of the layers. Most present-day commercial tools are designed to keep track of the performance of resources provisioned at the IaaS layer. CloudWatch, for example, is unable to track information about the load, availability, and throughput of each core of CPU services and their impact on the QoS (latency, availability, and so on) provided by hosted PaaS services (e.g., J2EE application server). As a result, there is a significant gap in monitoring tools that can monitor QoS statistics across multiple layers of the cloud stack and significant research challenges.