CPU Temperature Data Output

Source publication

C2MS: Dynamic Monitoring and Management of Cloud Infrastructures

Conference Paper

Full-text available

Dec 2013

Server clustering is a common design principle employed by many organisations who require high availability, scalability and easier management of their infrastructure. Servers are typically clustered according to the service they provide whether it be the application(s) installed, the role of the server or server accessibility for example. In order...

Context 1

... instructions on how to setup the C2MS are present within the C2MS downloadable file [1]. When an administrator creates a cloudlet via the C2MS interface, the server and cloudlet name specified is recorded in a file named clusters within the /etc/ganglia/ folder; this file contains a list of cloudlets and their member servers. The C2MS interface then displays this grouping by reading and parsing the clusters file. However at this point, Ganglia will not display cloudlet based monitoring data as it is unaware that a cloudlet or a number of them exist. To enable cloudlet based monitoring, Ganglia requires that each Ganglia cluster has a directory present in /var/lib/ganglia/rrds/ , named after the cloudlet, containing directories for individual remote servers; these directories contain monitoring data ( .rrd files) for the server. The stored .rrd files are created by RRDtool (Round Robin Database); an open source tool for data logging and graph- ing historical data between specified times. Upon cloudlet creation, the C2MS creates the appropriate cloudlet folder within the /var/lib/ganglia/rrds/ directory and links to the original .rrd files of each server within the Initial folder. We therefore do not need to replicate any data which would in turn introduce overheads. Hence with the creation of a new cloudlet, Ganglia is lead to believe that it has received monitoring data from a new cluster which contains the servers listed in the /var/lib/ganglia/rrds/cloudlet name directory. In the event of cloudlet creation, deletion or a change of a server’s cloudlet membership from one to another, the C2MS only needs to modify configuration files linked to our tool and none that are related to the operation of Ganglia; this allows us to avoid restarting the Ganglia daemons. These configuration changes are obscured from the administrator and are automatically performed by the C2MS interface. The information we are interested in displaying to the administrator is the entire state of multiple cloudlets via summary graphs per Ganglia metric. Each page displaying monitoring data of a cloudlet allows users to either view a summary of the current cloudlet state or select individual servers to examine their resource usage in more detail. Figure 3 shows both these features which are inherited from Ganglia. Firstly, we see that four servers exist within the ‘MySQL’ cloudlet, both from the number of ‘hosts up’ and the total of CPUs. The graphs shown are only specific to the ‘MySQL’ cloudlet with colours making the distinction between individual servers present in the cloudlet. To create cloudlet summary graphs, data aggregation is used and this is apparent in the graphs above where data from one cloud server is stacked upon another, in turn displaying the total resource use for the selected cloudlet; different cloudlets can be selected on the ‘Overview’ page of Figure 2. The depicted graphs automatically change when servers are added to or removed from the cloudlet. To create aggregated graphs dynamically, Ganglia calls the file /var/www/ganglia- web/stacked.php when the page is viewed; a default file of the Ganglia implementation. This has been modified to only create stacked graphs for servers present in a cloudlet rather than an entire system as regular Ganglia would do; the same has been applied to the number of ‘hosts up’, ‘hosts down’, and ‘CPUs Total’. The PHP file returns PNG files of the created graphs and these are displayed via the Ganglia interface. Graph data aggregation can be easily achieved through the use of RRDtool. We implement this through PHP calls to RRDtool however this can be easily explained by the use of rrdtool ’s graph function shown below. First we define variables, one for each of the server’s .rrd files to be aggregated (e.g. one and two ). The data sum is then plotted using average values. We use the AREA shape to plot the variable values with different colours and in the form of a STACK where one dataset is placed on top of another. We also enter a start and end time specified by the administrator to allow historical cloudlet monitoring data to be accessed. Other arguments are omitted here for clarity that relate to the appearance of graphs such as the width, height and labels. The C2MS not only measures basic resource usage such as CPU, memory, etc, but by installing additional modules, one can also monitor power consumption and temperature. Monitoring temperature requires a server’s CPU(s) to pos- sess built-in temperature monitoring capabilities such as those found in Intel Core based processors and others [13]. The C2MS collects temperature data by adding a monitoring module to the gmond daemon of every cloud server which periodically polls the CPU’s Digital Thermal Sensor to obtain temperature data. This data is then available to the gmeta daemon which in turn can display this information. Similarly, this information can be aggregated to show data for single cloudlets or for single servers. Figure 4 shows one other method we use for displaying this data where servers are presented as a heat map in the rack format. Similar to temperature monitoring, power observation requires the appropriate power monitoring hardware or a Power Distribution Unit (PDU). The data recorded by the PDU is then periodically queried and stored in RRD files following the Ganglia RRD structure. These are then exported to graphs and added to the Ganglia interface for viewing. As the purpose of our tool is to dynamically monitor sets of cloud servers, we also allow power consumption to be monitored for cloudlets by graph aggregation. To distinguish power usage for servers connected to the same PDU, the administrator must identify each PDU and it’s connected servers in a file accessed by the C2MS. These details include the server name, the MAC address as well as the PDU identifier and which outlet the server is connected to. The tool therefore allows per server or per cloudlet power usage monitoring. Our final contribution incorporates a server management component into the C2MS. Administrators are not only able to control individual servers but can issue specified instructions over cloudlets or individual servers. The instructions may be input manually or selected from a list of popular commands, as shown in Figure 5, where the MySQL cloudlet is selected. In order to introduce this functionality, we investigated a number of popular tools to determine whether they satisfied our requirements for use within the C2MS. Such a tool must: 1) allow the grouping of servers and concurrent command execution upon these groups. 2) not require the installation of software on remote servers within cloudlets. 3) be easy to integrate into the C2MS. The tools we investigated were: Webmin, Capistrano and cexec . Firstly, Webmin is a browser-based system administration tool for Unix [5]. Webmin allows the grouping of servers into cloudlets and commands to be executed per-cloudlet. However Webmin requires the installation of software on remote servers and the integration process of Webmin into the C2MS would not be simple as it would have to be modified. For example, the creation of a cloudlet via the C2MS interface would have to be reflected in the Webmin service to avoid administrators creating a cloudlet twice via the monitoring and control components. Secondly, Capistrano is an open source tool for running scripts and commands concurrently over multiple servers. It allows the grouping of servers by simply specifying these groups within their configuration capfile ; this therefore can be easily accessed and modified by the C2MS. Furthermore, Capistrano does not require any installation of software on remote servers meaning it satisfies all requirements. Capistrano does however use SSH and assumes that SSH keys are exchanged between the central and remote servers to allow password-less login for commands to execute [7]. This is in contrast to Webmin which uses the software installed on remote servers to create tunnels to send instructions over. Finally, cexec is cluster tool that simply executes commands over multiple servers concurrently [6]. To execute a command over a set of servers, cexec requires that a configuration file exists listing the hostname or IP address of the servers in a cloudlet alongside the cloudlet name; multiple cloudlets can exist allowing the administrator to specify the cloudlet to execute the command over. Like Capistrano, we can automatically generate this file by entering the hostnames of the cloudlet members, taken from the /etc/ganglia/clusters file, into cexec ’s configuration file, making integration into the C2MS easy. Furthermore, cexec does not require any software installation on target machines as it uses SSH and assumes SSH keys are exchanged between servers. The C2MS currently uses cexec as its control component. This is based on the simplicity of the tool as well as its performance as explored in the following Section. V. E VALUATION Ganglia is commonly used in the HPC and Grid communi- ties where clusters, like cloud infrastructures, typically contain a large number of servers. We now investigate how effectively the C2MS can monitor such systems by determining whether our implementation introduces any additional overhead above that already introduced by regular Ganglia. We then determine the optimal method of server management and if the C2MS can execute administrator commands over a large number of machines quickly. We perform these experiments on Amazon EC2 with the Ganglia/C2MS interface running on a Large Ubuntu 12.04 instance and remote servers running on Micro instances of the same type. Ganglia is well known for its scalable implementation hence the modifications we have made must also be able to cope with an increase in the number of servers. We use at most 130 servers; the maximum number of instances we could instantiate on Amazon EC2. First we test if our method of graph aggregation and ...

View in full-text

Multi-VO support in IHEP's distributed computing environment

Article

Full-text available

Dec 2015

Inspired by the success of BESDIRAC, the distributed computing environment based on DIRAC for BESIII experiment, several other experiments operated by Institute of High Energy Physics (IHEP), such as Circular Electron Positron Collider (CEPC), Jiangmen Underground Neutrino Observatory (JUNO), Large High Altitude Air Shower Observatory (LHAASO) and...

Predictive Disk Space Analysis For Microservice Based Applications On Public Cloud

Conference Paper

Full-text available

Feb 2023

EVWUDFW²3UHGLFWLYH DQDO\VLV RI WLPHVHULHV GDWD LV RI WUHPHQ GRXV LPSRUWDQFH WR DQ\ RUJDQL]DWLRQ ,W LV XVHG LQ PDQ\ VFHQDULRV OLNH FDSDFLW\ DQG UHVRXUFH SODQQLQJ HVSHFLDOO\ LQ FORXGEDVHG LQIUDVWUXFWXUH ZKHUH UHVRXUFHV KDYH FRVW 3UHGLFWLYH DQDO\VLV RI UHVRXUFHV VXFK DV GLVNV DOORZV XVHUV RU DGPLQLVWUDWRUV WR RSWLPL]H FRVWV E\ SUHGLFWLQJ IXWXUH YDOXHV WR PDQDJH WKH UHVRXUFHV EHWWHU 7KH V\VWHP LQ WKH SURSRVHG SDSHU GHVFULEHV HDFK VWDJH IRU WKH SUHGLFWLYH DQDO\VLV RI 'LVN VSDFH XVDJH IURP 'DWD JHQHUDWLRQ XVLQJ RSHQVRXUFH WRROV WR H[WUDFWLQJ GDWD IURP WKHVH WRROV DQG SUHSDULQJ D PRGHO WR SUHGLFW WKH IXWXUH YDOXHV IRU D .XEHUQHWHV %DVHG DSSOLFDWLRQ .H\ZRUGV²3UHGLFWLYH PRQLWRULQJ 6$5,0$ FORXG GLVN DQDO\VLV , ,1752'8&7,21 &DSDFLW\ 3ODQQLQJ LV RQH RI WKH HVVHQWLDO SDUWV RI DQ\ RUJDQL]DWLRQ 6ROXWLRQ $UFKLWHFWV QHHG WR SODQ WKH FDSDFLW\ WKDW LV RU ZLOO EH UHTXLUHG IRU WKH LQIUDVWUXFWXUH DQG WKH DSSOLFDWLRQV GHSOR\HG RQ WKDW LQIUDVWUXFWXUH &ORXG VHUYLFHV PDNH WKLQJV HDV\ E\ JLYLQJ RUJDQL]DWLRQV WKH FKRLFH WR SURFXUH WKH UHVRXUFHV OLNH GLVNV &38 DQG PHPRU\ RQ GHPDQG 7KHVH UHVRXUFHV KDYH FRVW EHKLQG WKHP DQG RQH RI WKH PDLQ DLPV RI DQ DUFKLWHFW LV WR RSWLPL]H WKLV FRVW 7KH LQFUHDVHG XVH RI WHFKQRORJLHV OLNH YLUWXDOL]DWLRQ RU FRQWDLQHUL]DWLRQ KDV PDGH LW SRVVLEOH WR RSWLPL]H UHVRXUFHV E\ VKDULQJ WKH XQGHUO\LQJ KDUG ZDUH 0RVW FRQWDLQHUV UHTXLUH DQG DUH RIWHQ SURYLGHG ZLWK WKH PLQLPXP UHVRXUFHV WR RSHUDWH &RQWDLQHUL]DWLRQ 2UFKHVWUDWLRQ WDNHV WKLQJV IXUWKHU DQG SURYLGHV D IDFLOLW\ WR PDQDJH PDQ\ FRQWDLQHUV ,W KDV LQFUHDVHG WKH XVDJH RI FORXGEDVHG VHUYLFHV OLNH (.6 $:6 >@ $.6 $]XUH >@ DQG *.(*RRJOH &ORXG >@ SURYLGHG E\ GLIIHUHQW FORXG VHUYLFH SURYLGHUV 0RVW PRQLWRULQJ WRROV JHQHUDWH DOHUWV WR UHSRUW XVHUV ZKHQ HYHU WKH WLPHVHULHV GDWD KDV UHDFKHG D VSHFLILHG WKUHVKROG)RU H[DPSOH LQ WKH FDVH RI GLVN VSDFH XVDJH ZH FDQ VHW WR JHW DQ DOHUW ZKHQHYHU D GLVN LV DW FDSDFLW\ 7KLV WKUHVKROGEDVHG DOHUWLQJ GRHV QRW FRQVLGHU WKH UDWH DW ZKLFK GDWD LV JHWWLQJ VWRUHG ,I WKH GLVN ZLOO WDNH D VKRUW WLPH WR UHDFK IURP WKH DOHUW WKUHVKROG LW LV UHTXLUHG WR DGG PRUH GLVNV EXW LI LW ZLOO WDNH ORQJHU WR UHDFK WKHQ WKHUH LV QR LPPHGLDWH QHHG ZKLFK FDQ VDYH WKDW FRVW 'HWHUPLQLQJ WKH WKUHVKROG EHFRPHV FUXFLDO DQG LQFOXGHV WKH ULVN RI VHUYLFHV JHWWLQJ GRZQ 7KH SURSRVHG V\VWHP PLQLPL]HV WKH ULVN E\ SUHGLFWLQJ WKH WLPH ZKHQ WKH GLVN JHWV IXOO 7LPH VHULHV GDWD KDYH FKDUDFWHULVWLFV OLNH 7UHQG 6HDVRQDOLW\ F\FOLF UHVLGXDO HWF 7KH WUHQG VKRZV WKH YDULDWLRQV LQ WKH GDWD RYHU WLPH LI LW LV LQFUHDVLQJ GHFUHDVLQJ RU VWD\LQJ FRQVWDQW 6HDVRQDOLW\ VKRZV UHSHDWLQJ F\FOHV LQ WKH GDWD RYHU D SHULRG &\FOLF DOVR VKRZV WKH F\FOHV LQ WKH GDWD EXW WKH F\FOHV DUH QRQVHDVRQDO SRVVLEO\ EHFDXVH RI VRPH SODQQHG DFWLYLWLHV OLNH EDFNXS RU EXVLQHVV SODQV 5HVLGXDO VKRZV WKH LUUHJXODULW\ LQ WKH WLPH VHULHV GDWD 7KH SURSRVHG V\VWHP XVHV RSHQVRXUFH PRQLWRULQJ WRROV OLNH 3URPHWKHXV >@ DQG 7KDQRV >@ WR PRQLWRU WKH XQGHUO\LQJ UHVRXUFHV XVHG IRU WKH PLFURVHUYLFH EDVHG Abstract-Predictive analysis of time-series data is of tremendous importance to any organization. It is used in many scenarios like capacity and resource planning, especially in cloud-based infrastructure where resources have cost. Predictive analysis of resources such as disks allows users or administrators to optimize costs by predicting future values to manage the resources better. The system in the proposed paper describes each stage for the predictive analysis of Disk space usage, from Data generation using open-source tools to extracting data from these tools and preparing a model to predict the future values for a Kubernetes Based application.

Unveiling The Interplay Between Timeliness and Scalability in Cloud Monitoring Systems

Thesis

Full-text available

Jun 2016

Comparing Infrastructure Monitoring with CloudStack Compute Services for Cloud Computing Systems

Conference Paper

Full-text available

Mar 2015

CloudStack is an open source IaaS cloud that provides compute, network and storage services to the users. Efficient management of available resources in the cloud is required in order to improve resource utilization and offer predictable performance to the customers. To facilitate providing of better quality of service, high availability and good performance; a comprehensive, reliable, centralized and accurate monitoring system is required. For this, the data needs to be collected from the components of CloudStack and analyzed in an efficient manner. In this paper, we present a detailed list of attributes required for monitoring the infrastructure associated with CloudStack. We identify the processes related with the compute services and its associated parameters that need to be monitored. We categorize the infrastructure monitoring, and list the parameters for monitoring parameters. Further, the proposed list is applied to three monitoring software that are commonly used for monitoring resources and processes associated with CloudStack. Developers and system administrators can benefit from this list while selecting the monitoring software for their system. The list is useful during the development of new monitoring software for CloudStack, as the functionality to be monitored can be selected from the list.

CPU Temperature Data Output

Context in source publication

Similar publications

Citations