ArticlePDF Available

Towards AIOps enabled services in continuously evolving software-intensive embedded systems

June 2023
Journal of Software: Evolution and Process

June 2023

DOI:10.1002/smr.2592

Authors:

Anas Dakkak

Ericsson

Jan Bosch

Chalmers University of Technology

Helena Holmstrom Olsson

Malmö University

Continuous deployment has been practiced for many years by companies developing web‐ and cloud‐based applications. To succeed with continuous deployment, these companies have a strong collaboration culture between the operations and development teams. In addition, these companies use AI, analytics, and big data to assist with time‐consuming postdeployment activities such as continuous monitoring and fault identification. Thus, the term AIOps has evolved to highlight the importance and difficulty of maintaining highly available applications in a complex and dynamic environment. In contrast, software‐intensive embedded systems often provide customer product‐related services, such as maintenance, optimization, and support. These services are critical for these companies as they provide significant revenue and increase customer satisfaction. Therefore, the objective of our study is to gain an in‐depth understanding of the impact of continuous deployment on product‐related services provided by software‐intensive embedded systems companies. In addition, we aim to understand how AIOps can support continuous deployment in the context of software‐intensive embedded systems. To address this objective, we conducted a case study at a large and multinational telecommunications systems provider focusing on the radio access network (RAN) systems for 4G and 5G networks. The company provides RAN products and three complementing services: rollout, optimization, and customer support. The results from the case study show that the boundaries between product‐related services become blurry with continuous deployment. In addition, product‐related services, which were conducted in sequence by independent projects, converge with continuous deployment and become part of the same project. Further, AIOps platforms play an important role in reducing costs and increasing postdeployment activities' efficiency and speed. These results show that continuous deployment has a profound impact on the software‐intensive system's provider service organization. The service organization becomes the connection between the R&D organization and the customer. In order to cope with the increased speed of releases, deployment and postdeployment activities need to be largely automated. AIOps platforms are seen as a critical enabler in managing the increasing complexity without increasing human involvement.

Content uploaded by Anas Dakkak

Content may be subject to copyright.

Towards AIOps enabled services in continuously evolving software-intensive embedded

systems

Anas Dakkaka,∗

, Jan Boschb, Helena Holmstrom Olssonc

aEricsson AB, Torshamnsgatan 21, Stockholm, 164 83, Sweden

bDepartment of Computer Science and Engineering, Chalmers University of Technology, Chalmersplatsen 1, Gothenburg, 412 96, Sweden

cDepartment of Computer Science and Media Technology, Malmo University, Nordenskioldsgatan, Malmo, 211 19, Sweden

Abstract

Context: Continuous deployment has been practiced for many years by companies developing web and cloud-based applica-

tions. To succeed with continuous deployment, these companies have a strong collaboration culture between the operations and

development teams. In addition, these companies use AI, analytics, and big data to assist in the time consumed by post-deployment

activities such as continuous monitoring and fault identiﬁcation. Thus, the term AIOps has evolved to highlight the importance and

diﬃculty of maintaining highly available applications in a complex and dynamic environment. In contrast, software-intensive em-

bedded systems often provide customer product-related services, such as maintenance, optimization, and support. These services

are critical for these companies as they provide signiﬁcant revenue and increase customer satisfaction.

Objectives: The objective of our study is to gain an in-depth understanding of the impact of continuous deployment on product-

related services provided by software-intensive embedded systems companies. In addition, we aim to understand how AIOps can

support continuous deployment in the context of software-intensive embedded systems.

Method: We conducted a case study at a large and multinational telecommunications systems provider focusing on the radio access

network (RAN) systems for 4G and 5G networks. The company provides RAN products and three complementing services: rollout,

optimization, and customer support.

Results: With continuous deployment, the boundaries between product-related services become blurry. Product-related services,

which were conducted in sequence by independent projects, converge with continuous deployment and become part of the same

project. In addition, AIOps platforms play an important role in reducing costs and increasing post-deployment activities’ eﬃciency

and speed.

Conclusion: Continuous deployment has a profound impact on the software-intensive system’s provider service organization. The

service organization becomes the connection between the R&D organization and the customer. In order to cope with the increased

speed of releases, deployment and post-deployment activities need to be largely automated. AIOps platforms are seen as a critical

enabler in managing the increasing complexity without increasing human involvement.

Keywords: Software-Intensive Embedded Systems, AIOps, Continuous Deployment, Product Service Systems

1. Introduction

Software-intensive embedded systems have changed our so-

cieties. Since their advent, these systems have become integral

to our lives. We use them daily and ﬁnd them almost every-

where around us. We ﬁnd them in the skies, such as airplanes,

on the roads, such as cars, or hidden in cabinets, such as mo-

bile network equipment [1,2]. These systems have been his-

torically mechanical and electrical driven; however, in the age

where software is eating the world [3], they have become in-

creasingly software-driven.

Therefore, traditional product manufacturing companies are

transitioning from producers of mechanical and electrical sys-

∗Corresponding author

Email addresses: anas.dakkak@ericsson.com (Anas Dakkak),

jan.bosch@chalmers.se (Jan Bosch),

helena.holmstrom.olsson@mau.se (Helena Holmstrom Olsson)

tems to software development companies, producing software-

intensive embedded systems instead [4]. A software-intensive

embedded system implies that the software is the critical com-

ponent of the system and shapes its functionality. Conse-

quently, these companies have been advancing their software

development process by embracing continuous practices [5,6].

The ultimate goal is to frequently satisfy customers with new

functionality by utilizing continuous deployment.

Providing new and improved functionality to customers con-

tinuously via software upgrades is a phenomenon that has been

around for a while in the software industry. It has been applied

in web and cloud-based applications for years and has become

the norm. For example, companies like Facebook deploy hun-

dreds of software changes daily to their production environment

[7]. In order to facilitate continuous deployment, these compa-

nies had to undergo a signiﬁcant change in various areas, not

least how the organization operates, people skills, and ways of

working [8,9].

Preprint submitted to Journal of Software: Evolution and Process February 15, 2024

Authors' version

Two notable factors stand out in how these companies work

to ensure high service availability while simultaneously deploy-

ing new software versions continuously. First, the development

and operation teams work closely to ensure code changes are

safely deployed to production [10,11]. Hence, the term opera-

tion is often used in conjunction with development as in DevOps

to highlight the importance of collaboration between the ones

doing the development of the software and the ones performing

the operations of the software [12].

Second, these companies use intelligent platforms assist-

ing with the tedious and time-consuming post-deployment ac-

tivities such as continuous monitoring, fault isolation, and

root cause analysis [13]. For example, Facebook uses a sys-

tem called Scuba for continuous monitoring, troubleshooting

problems as they happen, trend analysis, and pattern mining

[14,15]. Therefore, acknowledging the importance and diﬃ-

culty of maintaining highly available applications in a complex

and dynamic environment, the term AIOps was coined by Gart-

ner in 2016 [16]. AIOps advocates for using AI, analytics, and

big data to perform activities such as fault isolation, monitoring,

and anomaly detection [17,16].

In the same fashion, continuous deployment in software-

intensive embedded systems requires close collaboration be-

tween the customer and the system’s supplier [18,19]. How-

ever, unlike web and cloud-based applications deployed in

data centers or dedicated servers that users access remotely,

software-intensive embedded systems are physical products op-

erated and used by customers. These systems contain many

coupled components and increasingly complex software which

should work together to meet high reliability and performance

requirements [20]. In addition, software-intensive embedded

systems are often highly conﬁgurable to meet diﬀerent cus-

tomer customization needs [21,22].

Therefore, suppliers of software-intensive embedded sys-

tems often provide several product-related services, such as

deployment, maintenance, and optimization. Product-related

services are critical to support customers and ensure their

satisfaction during the system’s lifetime [23,24]. In addi-

tion, product-related services provide considerable revenue to

software-intensive embedded systems companies, often with

a higher margin than product sales [25,26]. Thus, to de-

liver product-related services, companies often have a dedicated

service organization that interfaces with customers and works

closely with them [27].

To succeed with continuous deployment, all of the supplier’s

organizational functions need to work in shorter cycles [5].

Thus, continuous deployment demands greater alignment be-

tween the R&D organization and other organizational functions

[28]. However, while many studies have investigated diﬀerent

aspects of continuous deployment in software-intensive embed-

ded systems, these studies focused primarily on the R&D orga-

nization [13]. To the authors’ knowledge, no previous empirical

study addressed the relationship between continuous deploy-

ment and the service organization in software-intensive embed-

ded systems companies. Therefore, this study has three objec-

tives:

•First, to address the gap identiﬁed in the literature, this

study investigates the relationship between continuous de-

ployment and product-related services. In our earlier study

[29], we focused on customer support service and its role

in continuous deployment. In this study, we take a more

holistic view by addressing all product-related services.

•Second, to explore how companies producing software-

intensive embedded systems utilize AIOps to support con-

tinuous deployment.

•Third, to identify the challenges faced by software-

intensive embedded systems companies when using

AIOps to support continuous deployment.

To address these objectives, we conducted a case study at

one of the largest telecommunications systems suppliers in the

world. The company produces telecommunications systems

consisting of dedicated hardware and complex software. In ad-

dition, it provides several product-related services to support its

customers.

The contribution of this paper is threefold. First, it provides

critical insights that have not been addressed before in the lit-

erature addressing the relationship between continuous deploy-

ment and product-related services in a software-intensive em-

bedded systems context. Second, we explored how AIOps can

be used to support continuous deployment and the challenges

associated with using AIOps in software-intensive embedded

systems. By bringing this perspective to our research, we are

able to explore how AIOps can support the evolution of ser-

vices with continuous deployment. To the authors’ knowledge,

no previous study has investigated AIOps applications and chal-

lenges when supporting continuous deployment in software-

intensive embedded systems. Third, the case study company

has practiced continuous deployment for several years with

large-scale and complex software-intensive embedded system

products. Thus, the ﬁndings of this paper are derived from the

real-life case of a company performing continuous deployment.

The remainder of the paper is organized as follows. Sec-

tion 2provides an overview and a literature review of software-

intensive product-related services, AIOps, and continuous de-

ployment. Section 3details the research method used in this

study, including data collection, analysis, and threats to validity.

Section 4provides a background about the case study company.

Sections 5,6, and 7present the empirical ﬁndings for each of

the three research questions, respectively. Section 8discusses

these ﬁndings. Finally, Section 9summarizes and concludes

the paper.

2. Background

This section provides a literature review of services and soft-

ware evolution in product-oriented companies. In addition, it

highlights the role and classiﬁcation of services as discussed in

the literature. Also, the section provides a literature review of

AIOps and continuous deployment.

2.1. Software and services evolution in product-oriented com-

panies

Product-oriented companies have been subject to two dra-

matic transformations during the last decades. The ﬁrst is the

rise of the embedded software from a complementary compo-

nent to the electronics and mechanics of the product to become

the central part shaping the product’s function. Consequently,

the embedded software has been proliferating in size and com-

plexity [30,31]. For example, modern cars are now running

on code, and the complexity of embedded software in a ﬁghter

jet is beyond humans’ cognitive ability [32,33]. As a result,

the notion of software-intensive embedded systems has evolved

to highlight that these products constitute multiple connected

components (systems) and to stress the importance of the em-

bedded software in shaping the product’s function (embedded

software-intense) [34]. As a consequence, software-intensive

embedded systems started to follow the same path as web and

cloud-based software development companies, adopting agile

software development practices such as continuous integration

and delivery [35]. Once agile software development capabilities

and continuous integration are established internally, software-

intensive embedded systems companies start transitioning to a

continuous deployment where the software is frequently and

rapidly deployed to customer systems’ [5,6]

The second is the elevation of product-related services. This

is because, during the 90s of the last millennium, product-

oriented companies faced several challenges, such as commodi-

tization of products, and reduced proﬁt, which led these com-

panies to look for other sources of revenue than product sales

[36]. As a result, product-oriented companies looked at services

as an opportunity to substitute for declining product sales and to

increase their competitiveness. Since then, several terms have

emerged to highlight the importance of services with products,

such as “product services” and “after-sales services”. Later, the

Product Service System (PSS) became primarily used to indi-

cate the integrated nature of products and services. Goedkoop

et al. deﬁned PSS as a marketable set of products and services

capable of jointly fulﬁlling a user’s need. The PS system is

provided by either a single company or by an alliance of com-

panies. It can enclose products (or just one) plus additional

services. It can enclose a service plus an additional product.

And product and service can be equally important for the func-

tion fulﬁlment [37].

There are three categories of PSS systems [38,26]: ﬁrst,

product oriented PSS where the product ownership is moved

to the customer while additional services are provided by the

vendor either free during a speciﬁc period, by contract or based

on a service transaction. Second, user oriented PSS where the

vendor owns the product, performs additional services to the

product but sells its function to the customer, for example, by

leasing. Third, result oriented PSS where the vendor owns and

services the product but sells the product’s capability, or result,

to the customer who pays only per usage.

2.1.1. The role of services in software-intensive embedded sys-

tems companies

The role of services has attracted researchers from multi-

ple disciplines, such as business management, industrial en-

gineering, and information, communications, and technology

(ICT) [39]. Business management researchers were early in

describing the movement of product-oriented companies to-

wards services, covering topics such as oﬀering development

strategies, business models, and criteria for the evolution to a

service-based business [40,41]. Researchers from industrial

engineering have investigated development methods for design-

ing products and services among other topics [42,43], while

ICT researchers have recently joined, focusing on the chang-

ing ecosystem as PSS become more intelligent and connected

[44,45].

Researchers across disciplines agree that services play a sig-

niﬁcant role in companies developing software-intensive sys-

tems. From a business perspective, services provide consider-

able revenue with an often higher margin than product sales.

Szwejczewski et al. [25] have investigated the role of services

in six companies, among them a company developing passenger

cars and another developing regional passenger aircraft. The

authors found that services sales account for 15-52% of the

revenue of the investigated companies and often have a higher

margin than product sales. A similar observation was raised

by Gebauer et al. [26], who highlighted the signiﬁcant revenue

contribution of services to companies like IBM, ABB, and Er-

icsson.

From an engineering perspective, services are important in

improving the product’s availability. McPhail et al. [46] indi-

cated that increasing the availability of a telecommunications

system beyond what the product is designed for requires a

highly responsive and technically excellent customer support

team. As described by Windley et al. [47], delivering high-

availability services relies not only on technology but also on

organization, processes, and people surrounding the technol-

ogy.

Furthermore, from a customer experience perspective, ser-

vices are a vital contributor to customer satisfaction [25,48].

Prior et al. [49] highlighted that services could contribute to

customers’ satisfaction by reducing faults detection and correc-

tion lead time.

2.1.2. Services classiﬁcation

Services can be categorized into two categories, services sup-

porting the product and services supporting the customer [50].

The services supporting the product aim to ensure adequate

product functionality, while the ones supporting the customer

aim to advance the client’s mission while utilizing the product.

Another approach to classifying services is highlighted by

Parida et al. [51], who conducted a survey study in 30 compa-

nies, including Ericsson and Volvo cars. The authors categorize

four services: essential services, maintenance services, R&D

services, and functional services. Furthermore, Chowdhury et

al. [44] highlighted new service types associated with smart

PSS, such as remote monitoring and diagnostics. The word

smart refers to the intelligent nature of these systems as they

are connected and are increasingly self-driven.

2.2. Services and continuous deployment in software-intensive

embedded systems

Continuous deployment is the ability to bring valuable soft-

ware features to customers in shorter cycles than traditional lead

times, from a couple of weeks to days or even hours [13]. Ac-

cording to Stahl et al. [52], continuous deployment is an oper-

ations practice where release candidates evaluated in continu-

ous delivery are frequently and rapidly placed in a production

environment.

In addition to the actual deployment of the new software ver-

sion, continuous deployment involves several post-deployment

activities, such as monitoring the system and customer behav-

ior, identifying unexpected patterns and run-time issues, and

collecting real-time data to feed both business and technical

planning [13,28]. These activities are often conducted by the

operations team in the context of web and cloud-based appli-

cations. However, in the case of software-intensive embedded

systems, they are often delivered by the service organization.

2.2.1. Product-related services challenges with continuous de-

ployment

Gerostathopoulos et al. [53] raised the observation that con-

tinuous deployment increases the speed of features and new

content delivery, which would be challenging for customer ser-

vice organizations as they need to be aware of all changes

and diﬀerent variations of features when responding to cus-

tomer queries. Similarly, Rodr´

ıguez et al. [13] highlighted

the diﬃculty of propagating information and increasing learn-

ing with continuous deployment. In addition, communication

of changes between diﬀerent parts of the organization, as well

as customers, becomes more challenging [54].

Moreover, with a continuous supply of software, new fea-

tures might not be discovered by customers, as continuous

deployment does not necessarily mean that new features are

adopted and used [55]. Therefore, Fitzgerald et al. included

continuous use as one of the continuous practices in the op-

erations phase of the continuous software engineering frame-

work [56]. In addition, the authors included continuous run-

time monitoring to enable early detection of service quality is-

sues and to ensure the fulﬁllment of Service Level Agreements

(SLAs). However, the authors didn’t elaborate on how these

practices can be performed [28].

Furthermore, continuous deployment increases the system’s

complexity as a consequence of more features being introduced

in the code leading to more sophisticated interactions between

the features, even if not used [57]. This increased complexity

makes post-deployment activities such as troubleshooting, fault

identiﬁcation, and root cause analysis more diﬃcult. There-

fore, to succeed in transitioning to continuous deployment, it is

critical to establish a dedicated support team equipped with in-

telligent tools to perform continuous monitoring of live systems

[18].

2.2.2. The evolvement of AIOps

Acknowledging the importance and diﬃculty of maintaining

highly available applications in a complex and dynamic envi-

ronment, the term AIOps was coined by Gartner in 2016 [16].

In its ﬁrst version, the acronym was referring to Algorithmic

IT Operations, which was later on changed to Artiﬁcial intelli-

gence for IT operations [58].

AIOps has been used to address several applications, such as

anomaly detection applied to the Key Performance Indicators

(KPIs) of data centers and distributed systems’ traces [59,60],

failure root cause analysis [61], fault localization [62], and

closed-loop Service Level Agreement assurance (SLA) [63].

Due to the recent nature of AIOps and its cross-disciplinary

nature, Notaro et al. [64] conducted a mapping study to struc-

ture the AIOps domain. The authors proposed a taxonomy con-

sisting of two macro-domains: Failure Management and Re-

source Provisioning. Failure Management consists of ﬁve cat-

egories: failure prediction, failure detection, failure prevention,

root cause analysis, and remediation. Similarly, Resource Pro-

visioning is divided into ﬁve categories: resource consolidat-

ing, scheduling, power management, service compositing, and

workload estimate. In a subsequent study focusing solely on

Failure Management, Notaro et al. [65] have characterized the

ﬁve categories of Failure Management while breaking them fur-

ther into 14 subcategories.

Dang et al. [17] have highlighted the role of AIOps in sup-

porting service engineers working in a cloud-based infrastruc-

ture. The authors indicated three areas AIOps can support:

high service intelligence, customer satisfaction, and engineer-

ing productivity. In the service intelligence area, AIOps can

detect quality degradation, as well as be able to predict future

status. AIOps can also contribute to higher customer satisfac-

tion by suggesting changes in the system proactively to meet

customers’ needs, such as parameter tuning and optimizations.

On the engineering productivity front, service engineers are

relieved from doing tedious work manually, such as data col-

lection and manually ﬁxing repeated issues. Furthermore, the

authors also summarize the real-world challenges when build-

ing an AIOps application based on their learnings and experi-

ence from Microsoft. The authors have identiﬁed three signiﬁ-

cant challenges: gaps in innovation methodologies and mindset,

engineering changes needed to support AIOps, and diﬃculty

building ML models for AIOps [17].

3. Research Method

This study aims to gain an in-depth understanding of the im-

pact of continuous deployment on product-related services in

the context of software-intensive embedded systems. Also, to

explore how AIOps can be used to support continuous deploy-

ment and AIOps challenges. Therefore, we formulated the fol-

lowing research questions:

•RQ1: What is the impact of continuous deployment on

product-related services in software-intensive embedded

systems?

•RQ2: How can AIOps be used to support product-related

services in a continuously evolving software-intensive em-

bedded system?

•RQ3: What challenges do software-intensive embedded

system companies face when using AIOps to support con-

tinuous deployment?

To answer these questions, we conducted a qualitative ex-

ploratory case study. The reasons for this choice of methods

were the following. First, product-related services involve both

human and technological aspects. Thus, qualitative methods

provide rich results when humans and technology are involved

as they force the researcher to examine the complexity of the

problem rather than abstract it away [66]. Second, case studies

are a suitable research method to evaluate software engineering

practices in industrial settings [67,68]. Third, the purpose of

this study is exploratory [69] , as we wanted to ﬁnd out what

is happening and seek new insights about the role of product-

related services and AIOps in supporting continuous deploy-

ment.

The steps applied in this study are based on the guidelines

outlined by Runeson and H¨

ost [69]. These guidelines follow the

recommendations deﬁned by Yin [68]. The steps we followed

are:

•Case study design: we set the objective, documented the

research questions, and identiﬁed the case study company

and product line.

•Preparation for data collection: we documented the case

study protocol, prepared the interview guide, created an

initial list of interviewees to invite for interview, and iden-

tiﬁed sources of relevant documentation.

•Data collection: we conducted interviews and collected

supporting documentation for analysis.

•Analysis of the collected data: we created a transcript of

the interviews and analyzed them. In addition, we ana-

lyzed the collected documents. This step was going in par-

allel with the data collection, which allowed us to adjust

the interview guide based on new insights emerging from

previous interviews and review related documents based

on the analysis of the collected ones.

•Reporting: we documented our ﬁndings and analysis in

this paper.

3.1. The case company

The case study company is a large multinational telecom-

munications systems supplier. The company has more than

a hundred thousand employees distributed across the globe.

The case company is divided into several large-scale domain-

speciﬁc business units responsible for developing products re-

lated to the unit’s domain. In addition, the case study company

has several service business units focusing on diﬀerent product

domains. Services account for roughly 40% of the company’s

overall revenue.

This paper focuses on the 4G and 5G Radio Access Network

(RAN) product line. The RAN part of the mobile network con-

sists of radio base stations which are embedded systems prod-

ucts responsible for providing radio coverage to mobile users.

Each base station consists of several interconnected hardware

units such as antennas, base-band processing boards, and trans-

mission equipment.

4G and 5G RAN software is developed by a dedicated R&D

development organization with a few thousand employees. The

activities of the R&D organization include designing, coding,

testing, and releasing the software. Product-related services

are provided by a dedicated organization responsible solely for

product-related services. The service organization has dedi-

cated personnel, tools, and systems. The service organization

often has a local presence, where support staﬀresides in the

same geography as the customers. In addition, it has several

global service centers supporting customers remotely across

diﬀerent regions.

3.2. Data collection

Data was collected from three primary sources: interviews

with selected participants, related internal documents review,

and meetings notes. Using multiple data sources helps to

achieve triangulation, as they provide a broader picture and in-

crease the research precision [69]. The ﬁrst author of this study

works as an R&D manager who has been closely involved in

multiple continuous deployment projects. Thus, we had exten-

sive access to internal documentation and participated in many

meetings.

•Interviews: we conducted 16 interviews with 20 partici-

pants from the R&D organization and the services organi-

zation on both local and global levels. We also interviewed

developers and solutions architects working with service

systems to ensure we covered technical details related to

AIOps challenges and how AIOps can be used for continu-

ous deployment. The interviews were conducted virtually

using Microsoft Teams between October 2021 to August

2022. The interviews lasted between 40 minutes to 1 hour

and were recorded and transcribed. At least two authors

were present in each interview. The participants were se-

lected by the ﬁrst author based on their relationship with

services, service systems development, and continuous de-

ployment. A preliminary list of persons was identiﬁed and

invited to interview. The invitation included a description

of the research objectives, highlighting that participation is

voluntary. Table 1shows an overview of the interviewee

participants and their roles.

•Documents review: we reviewed a signiﬁcant number of

documents covering product-related service and evolution

strategy. These documents were either already known to

the ﬁrst author, shared by interviewees, or available as

training or documents in the internal network.

•Meetings notes: the ﬁrst author participated in more than

30 meetings discussing service delivery ﬂow eﬃciency

Table 1: Interview participants and roles

Interview Interviewee Role

1 A Customer support portfolio manager

2 B Customer support strategy manager

3 C Customer operations manager

4 D Customer support consultancy manager

5 E Software release technical coordinator

6 F Software release technical coordinator

G Customer support engagement manager

7H Customer support team lead

I Senior customer support engineer

8J Senior customer support engineer

K Regional customer support manager

9L Customer support line manager

M Services Systems solution architect

10 N Services Systems solution architect

11 O Services systems principal developer

12 P Services architecture expert

13 Q Global customer services manager

14 R Services transformation lead

15 S Customer DevOps expert

16 T Customer services engagement lead

with continuous deployment. These meetings were pri-

marily focused on building intelligent tools and capabil-

ities to improve the eﬃciency of continuous deployment

service activities, such as continuous monitoring and trou-

bleshooting.

3.3. Data analysis

The collected data was analyzed using the six-phase induc-

tive thematic coding procedure proposed by Braun and Clarke

[70], which includes transcribing the interviews and gathering

related documentation, analysis of the data, and extraction of

the initial set of code, identiﬁcation of the main themes related

to the three research questions, review of the themes and their

related codes which was conducted by the authors and resulted

with new codes being added. Finally, before writing this paper,

we deﬁned the themes as the last step.

3.4. Threats to validity

The validity of a study implies the trustworthiness of the

results. Diﬀerent classiﬁcations to the threat of validity are

used by software engineering researchers [71]. In this study,

we selected the classiﬁcation chosen by Runeson et al. [69],

which adopts the classiﬁcation proposed by Yin [68], as they

are widely used by the software engineering community. This

classiﬁcation divides validity threats into four categories: con-

struct, external, and internal validity, in addition to reliability.

•Construct validity refers to what extent the operational

measures used in the study reﬂect what the researchers

aim to study and are represented by the research ques-

tions. In this study, all researchers are familiar with con-

tinuous deployment, AIOps and research related to this

topic research. However, while all participants have prac-

tical knowledge about continuous deployment and AIOps,

they do not always utilize the same terminology used in

research. Thus, to address the threat to construct validity,

the ﬁrst author, who is employed by the case study com-

pany and has several years of experience in the company,

has been present in all interviews accompanied by one of

the other authors. Thus, when a question was not under-

stood by an interviewee or clariﬁcations were needed, the

ﬁrst author used internal terms and examples to exemplify

the question. Furthermore, in this study, we used triangu-

lation with multiple sources of data as another measure to

address construct validity.

•External validity refers to the degree the case study ﬁnd-

ings can be generalized and how. This study’s results are

extracted from a case study in the telecommunications do-

main; therefore, we do not claim the generalizability of

the results. More empirical research is needed to achieve

external validity. However, we believe there are many sim-

ilarities between the case study company and other large-

scale software-intensive embedded systems vendors, espe-

cially the ones working in a business-to-business context.

In this study, we aimed to provide as much information

as possible to help the reader decide if the context of the

case study is similar to other industry segments or compa-

nies without compromising the non-disclosure agreement

(NDA) the authors have with the case company.

•Internal validity is concerned about missing to identify all

factors contributing to the results of investigated factors in

causal relationships. Thus, as described by Yin [68], inter-

nal validity is not applicable to descriptive or exploratory

case studies. While the inferences made in this case study

are not causal, we used peer debrieﬁng among the authors

of this paper, in addition to member check-in, where the

results of this research were shared with the participants

of the study to provide feedback and comments, as a va-

lidity mitigation techniques when conducting case study

research.

•Reliability refers to the degree the data collection and the

analysis are dependent on the speciﬁc researchers. As the

ﬁrst author is employed by the case study company, there

is a risk that his own views and interpretations might inﬂu-

ence the data collection and analysis. To reduce the risk of

bias during data collection and analysis, we interviewed

20 persons representing diﬀerent roles, such as solutions

architects, managers and service leaders. In addition, we

also used triangulation with multiple data sources, and the

three authors have been involved during the triangulation

and analysis of the data.

4. The case study context

This section provides background about the case study com-

pany and 4G/5G RAN. In addition, it describes the RAN soft-

ware development, release, and product-related services.

4.1. 4G and 5G Radio Access Network

Mobile telecommunications networks can be divided into

two parts, Radio Access Network (RAN) and Core Network

(CN). The RAN and CN are connected via standardized inter-

faces. In this study, we focus only on the RAN part of a mobile

network.

4G RAN consists of connected Enhanced Node B (eNodeB)

systems. Each eNodeB comprises several interconnected com-

ponents, such as antennas, baseband processing units, and

transmission equipment. Thus, each eNodeB provides 4G ra-

dio coverage for the surrounding geographical area. 5G archi-

tecture is similar to the 4G in that one system provides radio

coverage. However, the radio node is called gNodeB instead,

where the g stands for “Next generation”. 5G RAN has two ma-

jor deployment scenarios: Non-Stand-Alone (NSA) and Stand-

Alone (SA) [72]. Figure 1shows a simpliﬁed architecture of 4G

RAN, 5G SA, and 5G NSA and the interfaces between diﬀerent

nodes.

Figure 1: Simpliﬁed 4G, 5G NSA, and 5G SA RAN architecture [72]

4.2. Software development and release

RAN software is a large-scale software built by thousands of

developers. The development process has undergone dramatic

changes over the years. As a result, hundreds of development

teams have been established across several R&D centers dis-

tributed across the globe. Continuous integration and testing is

a 24/7 activity using a highly automated process.

Due to the dependency between 4G and 5G RAN technolo-

gies, especially with the 5G NSA conﬁguration, the case study

company has one software mainline for both 4G and 5G soft-

ware. Thus, one software release can be used for both tech-

nologies. By changing the software conﬁguration parameters,

the radio node can act as eNodeB or gNodeB. In addition to the

software conﬁguration, the hardware in the node shall support

the conﬁgured technology. Older hardware does not support

5G, while newer hardware supports 4G and 5G. Additionally,

the case study company oﬀers many hardware variants that en-

able the nodes to be highly customizable to support various ra-

dio frequencies and deployment scenarios, such as deployments

in urban and countryside areas.

As depicted in Figure 2, the case study company has three

types of software releases: major, maintenance, and Continuous

Deployment (CD) releases. New major releases become gener-

ally available to all customers every three months. The release

date of each major release is referred to as General Availability

(GA) date. Following each major release, a maintenance period

starts, which lasts for 18 months. During the maintenance pe-

riod, the case study company releases a number of maintenance

releases. Each maintenance release contains only bug ﬁxes.

CD releases are released every second week. They are

branched directly from the mainline and thus contain both bug

ﬁxes and new development. CD releases are deployed to a rep-

resentative subset of the customers’ networks, called CD zone.

Continuous deployment releases are only deployed in the CD

zone, while major and maintenance releases are deployed to

the rest of the network.

It can be observed that the case study company’s deployment

context can be considered as 1:many:many (one-to-many-to-

many), which means that each release is intended to be used

by many customers, and each customer has many base stations

where the software is deployed. This is a major diﬀerence to

the deployment context in web and cloud-based applications,

where a new release is deployed to one or a few production en-

vironments only.

Therefore, the case study company has two types of cus-

tomers: customers with CD zones and customers without CD

zones. To speed up the introduction of 5G networks, more cus-

tomers are establishing a CD zone in their networks. CD cus-

tomers account for more than 15% of the total customer base

for both 4G and 5G RAN.

4.3. RAN product-related services

The case study company oﬀers three main product-related

services to their customers: network rollout, customer support,

and network optimization.

4.3.1. Network roll-out

Network rollout aims at helping customers with radio site

acquisitions, hardware build, installation, commissioning, and

civil work. These activities are conducted during network mod-

ernization projects where site locations need to be changed or

during swap projects between diﬀerent vendors. In addition,

network rollout provides the project setup and execution to roll-

out new major software releases into the network. Rolling a

new software release into the network includes activities such

as lab testing, impact analysis, and the actual deployment and

acceptance procedure of the new software.

4.3.2. Customer support

Customer support is responsible for supporting customers.

This is done by responding to the customers’ Customer Sup-

port Requests (CSRs). Customers raise CSRs when they require

Figure 2: Releases and customers

further technical information or assistance while operating the

products. Thus, Customer support performs activities such as

answering technical questions, troubleshooting, fault identiﬁca-

tion, and root cause analysis.

4.3.3. Network design and optimization

Network design and optimization comprise a design part, of-

ten done when a new network is built or during modernization

projects. The design involves software and hardware dimen-

sions, high- and low-level network design, and related docu-

mentation. Network design is often associated with a new net-

work buildup.

On the other hand, network optimization consists of activities

to optimize the network’s performance, such as radio parame-

ters and feature tuning to meet the operator’s objectives. Net-

work optimization is conducted with new software or hardware

upgrades.

5. Continuous deployment impact on product-related ser-

vices (RQ1)

This section presents the empirical ﬁndings for the ﬁrst re-

search question. To understand how continuous deployment

impact product-related services, we compared the activities

conducted with customers without CD zones representing the

legacy services ﬂow versus service activities conducted with

CD zones customers, thus representing the ﬂow and activities

of the service with continuous deployment.

5.1. Legacy services and their activities

The three product-related services for customers without CD

zones start in sequence: rollout, optimization, and customer

support. As depicted in Figure 3, the rollout and optimiza-

tion projects begin after a new major release becomes generally

available . These two services are conducted as projects, each

with a deﬁned start and end date. The rollout project role is to

ensure that a new major release is introduced to the customer

network. Introducing a new major release into the network is

referred to as software upgrade activity.

The rollout project conducts an impact analysis of the release

on the customer’s network. The impact analysis reviews the

software’s legacy changes and prepares the customer’s opera-

tions team to handle these changes. This includes, for example,

adjusting for non-backward compatible changes in the conﬁg-

uration parameters of the software, including new performance

counters in the KPIs formulas, replacing deprecated counters

from the formulas, default-on improvements of existing func-

tionality that might impact the network’s KPIs, and operational

procedures or 3rd party products connected to the RAN net-

work. In addition, the rollout project conducts extensive lab

validation according to the customers’ testing scope.

“The rollout project starts with the GA [General Avail-

ability] . It starts with extensive testing lasting between

anytime from 1 to 1.5 months to 2.5 months” — Intervie-

wee Q - Global customer service manager

Figure 3: Services ﬂow without continuous deployment

In parallel with the rollout project, an optimization project

starts to optimize the customer network based on new features

available in the major software. Since these features are new,

their parameters and attributes often need to be adjusted to ﬁt

the customer’s network topology and conﬁguration. Therefore,

the network optimization project tests new features which the

customer is interested in and identiﬁes the appropriate conﬁg-

uration parameter values. The optimization project and rollout

project interact with each other to ensure that what is recom-

mended by the optimization project will reﬂect on what is re-

ferred to as the parameters and features baseline that will be

applied during the rollout.

If a software bug is identiﬁed in the rollout or optimization, a

bug ticket is ﬁled and sent to the R&D organization. If the R&D

organization conﬁrms the bug, a correction is included in the

upcoming maintenance release and merged into the mainline.

Thus, software bugs impact the rollout and optimization project

timelines, as they need to wait for the next maintenance release

to ensure customer acceptance.

After completing a rollout project and establishing the new

parameters and features baseline, a handover is executed from

the rollout project to the customer support project. The cus-

tomer support project is responsible for responding to Customer

Support Requests (CSR) tickets. Customers raise these tickets

by logging an issue in a dedicated web portal, calling the case

company’s front desk, or by email. Once these requests are re-

ceived, they are routed to an appropriate support team. Based

on the context of the CSR and its severity, the CSR is routed to a

local or global support team based on pre-deﬁned routing crite-

ria. In addition, the CSR will be given a priority tag to indicate

how fast the team should act on it. The priority also decides the

Service Level Agreements (SLAs) which should be applied.

In addition, the customer support project is responsible for

introducing a newer maintenance version from the same major

release branch. When the nodes are running on an older main-

tenance release, and a more recent maintenance version from

the same major release branch is introduced, the activity is con-

sidered a software update rather than software upgrade as used

in the rollout project. This is because the customer’s parame-

ters and features baseline do not change between maintenance

releases of the same major release, as the diﬀerence between

the running release and the newer maintenance release is only

bug ﬁxes. The customer support project performs lab validation

activities of the maintenance release in a similar fashion as per-

formed in the rollout project. However, the scope of testing is

often less than what is performed in the rollout project.

“When the rollout project is ﬁnished with the network up-

grade, a handover is executed to the customer support

project. Our responsibility is to respond to customer CSRs

and to update the network to newer maintenance releases

if needed” — Interviewee G - Customer support engage-

ment manager

The customer operations organization is constantly engaged

with the case study company’s rollout, optimization, and cus-

tomer support project. The customer operations organization

plays a critical role as it provides the lab testing infrastructure,

decides the testing scope, reviews new features, determines new

features of interest, follows-up support tickets, and supervises

the project activities. For example, a dedicated team is respon-

sible for new features’ review and validation, another dedicated

team is responsible for rollout preparation, and yet another team

is responsible for customer support.

5.2. Continuous deployment services and activities

Continuous deployment releases are branched directly from

the mainline every second week. Thus, the diﬀerence between

a software upgrade and update vanishes. As depicted in Figure

4, continuous deployment releases contain both bug ﬁxes and

new functionality.

Therefore, one of the signiﬁcant consequences of continuous

deployment is that there is no “maintenance” release anymore.

If a bug is identiﬁed post-deployment of the CD release, a roll-

back is to be executed, or the fault has to be tolerated till an

upcoming CD release with a ﬁx.

Figure 4: Software releases and mainline growth

In addition, within a two weeks release cycle, there is not

enough time to conduct service activities in sequence. There-

fore, service activities are performed in one project, as depicted

in Figure 5. Consequently, the boundaries between a rollout,

optimization, and customer support project disappear.

“With frequent releases and deployment, the borders are

removed between network rollout project and customer

support” — Interviewee J - Senior customer support en-

gineer

Figure 5: Services with continuous deployment

This has also led to changes to service activities. Therefore,

service activities in continuous deployment can be divided into

two groups: ﬁrst-time and continuous.

5.2.1. First-time service activities

Moving customers’ ways of working to embrace continu-

ous deployment requires an initial engagement eﬀort. Cus-

tomers’ operations organization is often structured to support

major software release deployments only. There are teams or

units dedicated to working solely with rollout, optimization, or

support. Therefore, several interviewees highlighted the impor-

tance of evolving customers’ way of working as a pre-request

for continuous deployment. Similarly, in several meetings we

participated, the approach to working with customers in a con-

sultative way was discussed. Thus, the ﬁrst step for the case

study company is often to elevate the customer’s operations or-

ganization’s way of working and structure to support continu-

ous deployment. This can be summarized by the following four

activities:

The ﬁrst is reducing lab testing scope while automating the

test cases as much as possible. In this case, the case study com-

pany shares the scope of the testing executed before releasing

the software with the customer to identify duplicate test cases

that can be removed from the customer’s lab scope. In addition,

the case study company provides an automation solution that

helps customers automate the execution of test cases in their lab

environment. Despite the reduction of scope, lab validation is

necessary to ensure that basic functionality, such as emergency

calls, works with the customers’ network and to validate the

interfaces between the radio base stations and the customer’s

third-party products (3PPs) in the network.

The second is to evaluate the customer’s network to design

a suitable Continuous Deployment Zone (CD Zone) represent-

ing the entire network. This process involves identifying the

diﬀerent hardware and software conﬁgurations the customer

has. The eNodeB and gNodeBs are highly conﬁgurable prod-

ucts from both hardware and software perspectives. From the

hardware side, customers often choose between diﬀerent radio

units, antennas, and digital processors to match the site’s spe-

ciﬁc needs, such as radio coverage distance, radio frequency,

and node capacity. While from the software side, there are

thousands of conﬁgurable parameters that customers can adjust

based on their needs. For example, a 4G eNodeB has over 7000

conﬁgurable software parameters and more than 200 software

features. As a result, each customer has hundreds or thousands

of conﬁgurations in their network if all possible combinations

of hardware and software are considered jointly. Thus, identi-

fying a suitable and representative CD zone requires a detailed

analysis of the customer’s network.

The third is to deﬁne and agree on continuous deployment-

speciﬁc Service Level Agreements (SLAs), stipulating how

long it takes for an issue to be identiﬁed, analyzed, and cor-

rected. Continuous deployment SLAs shall be faster than

legacy SLAs. In addition, the customer and the case study

company agree on the rollback criteria. Due to the mobile net-

work’s critical role, the slightest degradation in the network

KPIs would justify a rollback. However, shall a rollback be

triggered if a minor performance degradation is observed in a

secondary indicator without impacting any of the key primary

indicators (KPIs)? For how long can the degradation be allowed

to enable troubleshooting in this cases? Such questions are dis-

cussed and agreed upon with the customer.

Fourth, the establishment of a pipeline supporting both de-

ployment and data collection. From a deployment perspective,

the pipeline securely connects the customer network to the case

study company’s network, allowing new software releases to

be downloaded to the customer network once released. Af-

ter downloading the new software, the deployment pipelines

check the software’s integrity and automatically upgrade the lab

nodes. The lab test cases are triggered, and the results are stored

and made available for analysis. The deployment pipeline stops

at the lab stage without deployment automation to the CD zone.

This is because the customer and the case study company ser-

vice team supporting continuous deployment manually review

the impact of the continuous deployment software and new fea-

tures delivered.

In addition, from a data collection perspective, the pipeline

collects performance, conﬁguration, and diagnostic data from

the customer’s CD zone. However, the data collection pipeline

architecture diﬀers depending on the degree the customer al-

lows data to be shared with the case study company. For exam-

ple, customers might allow certain data types to be shared only

and block other types, or the customer might allow data to be

shared but limit its movement by demanding its storage in spe-

ciﬁc geographies. Nevertheless, during the onboarding phase,

the data pipeline architecture is agreed upon between the case

study company and the customer.

5.2.2. Continuous activities

After the ﬁrst time activities are done, several continuous

activities start. These activities are conducted by one service

project composed of customer-speciﬁc teams with local and

global presence. Each team has cross-functional competence,

such as support, optimization, and lab testing. Each team works

closely with its assigned customer and synchronizes at least

once daily with the customer’s operations team. Further, the

project manages the cross-team alignment and knowledge shar-

ing. The service project also includes members from the soft-

ware release program, which is responsible for the software’s

quality, documentation, and feature content. The release pro-

gram members act as feedback proxy to the rest of the R&D or-

ganization as they coordinate feedback information to the hun-

dreds of development teams working with the software.

The ﬁrst continuous activity is the CD zone software deploy-

ment procedure. The software deployment procedure involves

deployment preparations such as download, integrity check, lab

test, backup of the running conﬁguration, software upgrade,

customizing the software parameters per customer’s conﬁgu-

ration, and ensuring that the network KPIs are all in the accep-

tance levels following the upgrade. While the deployment pro-

cedure does not diﬀer from the ones performed in the rollout

project, the frequency of the activity is much higher. There-

fore, to be able to conduct these activities without increasing

the number of services people involved, the case study com-

pany has developed several tools to automate the software de-

ployment procedure. However, the deployment procedure does

not run entirely without human supervision. The procedure is

supervised by a service engineer in case there is a need for re-

mote manual intervention in the node. An example would be if

a node does not start after the upgrade or shows faulty behavior

requiring immediate troubleshooting and analysis.

Second, continuous monitoring of the node’s performance.

There are two ﬂavors of monitoring: one is referred to as

babysitting, while the other is ﬁne monitoring. The babysitting

monitoring takes place immediately after the new software has

been deployed for three hours. The main objective is to iden-

tify if there is a large deviation in the major KPIs following the

software deployment. This is because upgrades often happen

late at night when there is less traﬃc in the network. There-

fore, minor KPIs deviation will need more time to be identi-

ﬁed. Thus, the ﬁne monitoring aims to ﬁnd minor deviations in

the nodes’ performance by considering traﬃc seasonality. This

is done by evaluating traﬃc behavior during weekends, busy

hours, and diﬀerent weekdays with the KPI values before the

software change. Thus, ﬁne monitoring is a continuous activity

that spans the entire two weeks lifetime of the continuous de-

ployment release. To perform continuous monitoring, several

types of data, such as alarms, traﬃc performance, and program

execution logs and traces, are collected . The collected data are

then analyzed automatically to identify deviations.

Third, continuous optimization which is based on new fea-

tures made available in the software. The customer operations

and the service team discuss the trials of new features. The ser-

vice team often advises customers of new features suitable for

their network. Thus, the service team plays a critical role in in-

creasing the adoption of new features that customers might not

notice or when the feature value is unclear to them. The op-

timization activity also involves activating new features in the

customer network and quantifying the gain.

Fourth, continuous adjustment of the CD zone. Customers

often add new radio sites to expand the RAN network or adjust

the conﬁguration of some sites based on demographic needs,

such as new roads or buildings. In addition, with the release of

new hardware types, newly built or upgraded nodes might have

new hardware types that do not exist in the initially selected CD

zone. Thus, operators’ networks are not static. This means that

the initially selected CD zone must also change frequently to

reﬂect changes on the network.

Fifth, continuous root causes analysis and troubleshooting.

In an operational RAN network, there are often many alarms,

errors, and notiﬁcation events generated by the eNodeB’s and

gNodeB’s. Some of these are considered operational noise,

which is just a consequence of this equipment’s operation. A

typical example cited during several meetings is an alarm gen-

erated due to a transmission disturbance between the antenna

and the base station. If the alarm lasts for a very short pe-

riod and then ceases, it is likely to be due to a link distur-

bance, and it would not impact the performance KPIs. How-

ever, if the alarm lasts for a longer duration or its frequency has

changed, it is likely due to other reasons, such as a software

bug or conﬁguration mistake. However, as continuous deploy-

ment releases contain both bug ﬁxes and new content, the op-

erations noise changes frequently . Thus, the support team is

continuously troubleshooting and analyzing network issues to

determine what is noise, what is an expected behavior of the

software change, and what is not expected and thus could be a

software bug or conﬁguration error.

“Continuous deployment releases come with many im-

provements and changes to our software that continuously

impact this noise level at the customer network. We, there-

fore, need to continuously troubleshoot and evaluate if an

alarm, event, or log is a consequence of the new software,

is a new bug, has the customer changed something man-

ually on the conﬁguration, or is it part of the new noise

level” — Interviewee H - Customer support team lead

To be able to perform these continuous activities, the service

team conducts a daily synchronization with the customer’s op-

eration team. Both customer’s operation team and the service

team share observations and discuss actions and mitigation.

Thus, the team delivering continuous activities becomes close

to the customer as they are part of their operations team. As

the service team continuously monitors nodes, the team works

proactively with the customer rather than re-actively, as in the

legacy services delivery ﬂow. Further, the service team provides

consultative advice as they engage with activities typically de-

cided by the customer, such as lab testing scope and CD zone

structure.

In addition, these continuous activities are considered labor-

intensive. During several meetings, the relationship between

labor costs and the number of support teams is discussed. The

case study company started with one team to support the ﬁrst

continuous deployment customer when the company began re-

leasing software more frequently. The team performed contin-

uous activities manually with the ﬁrst customer. However, with

the number of continuous deployment customers starting to in-

crease, the case study company has realized the importance of

both automation and intelligence to increase the eﬃciency of

continuous deployment while maintaining the cost. Therefore,

to scale continuous deployment, the usage of both automation

and intelligence capabilities is seen as critical.

“If you do something much more often than before, you

need to reduce the cost every time you do it, and then

you need to automate manual work” — Interviewee T -

Customer services engagement lead

Therefore, the data collection and deployment pipeline is es-

sential to automate software deployment and data collection ac-

tivities which are repetitive activities with deﬁned procedures.

However, activities such as anomaly detection, troubleshoot-

ing, and root cause identiﬁcation are not only repetitive but also

complex. Thus, they are often conducted by domain experts.

Hence, the case study company considered introducing an in-

telligent system to assist the service team in performing these

activities as a critical enabler to scale continuous deployment

without scaling the number of domain experts needed to sup-

port customers.

6. AIOps and continuous deployment (RQ2)

To support the service team with time-consuming and com-

plex continuous activities, several interviewees highlighted the

need for a dedicated, intelligent platform. Therefore, the case

study company uses a dedicated and intelligent platform that

collects, parses, correlates, and potentially acts on various types

of network data. While the platform has a dedicated name used

within the case study company, we will refer to it in this paper

as AIOps Platform.

This section presents the ﬁndings of the second research

question addressing how AIOps can be used to support con-

tinuous deployment, which we present from three perspectives:

AIOps platform architecture, AIOps usage with continuous de-

ployment, and AIOps platform deployment scenarios.

6.1. AIOps platform architecture

From an architectural perspective, the results from the case

study show that the AIOps platform has a layered architecture

to separate diﬀerent functions in the system, as depicted in Fig-

ure 6. The AIOps platform is designed for speedy data collec-

tion and interactions with the radio nodes. In addition, visu-

alization is key to allowing customers and the service team to

interact with the platform.

Figure 6: AIOps platform with four layers

The platform consists of four layers: data collection, data

management, data intelligence, and visualization layer. The

data collection layer is responsible for connecting to the RAN

nodes, where the software is continuously deployed every sec-

ond week to collect various data types. The data collected are

performance, conﬁguration, alarm, events, or internal data such

as traces and program crashes. The data collection is executed

by a collection agent, which resides in the customer network,

representing the data collection layer. Further, the collection

layer does not perform collection only but can also interact with

the nodes by sending actions or commands. Thus, the pipeline

established by the data collection layer is a two-way pipeline.

After the data is collected, it is passed to the data manage-

ment layer, which reads, parses, and stores data. In addition,

the layer controls data access, privacy, and life cycles of the

collected data by destroying older data after the retention pe-

riod has elapsed or achieves the data. The data collection and

management layers are designed to achieve fast data collection

and ingestion.

“We need to collect data with speed. We designed our

system to allow for quick data collection and ingestion

while considering the bandwidth required on the O&M

interface” — Interviewee M - Services systems solution

architect

The data intelligence layer holds the analytical and machine

learning capabilities. The layer is programmable where users

can deﬁne use cases, either AI or analytics based, and specify

their execution criteria. The use cases have three main types.

First, reports generation use cases where a report is produced

if a particular trigger is identiﬁed, such as when an event log

contains speciﬁc information in the collected data. Report gen-

eration use cases can be also run on-demand or be scheduled

periodically. Second, open-loop automation use cases where an

alert is sent to subscribed users if a speciﬁc trigger is met. This

is often where an undesirable situation is identiﬁed and requires

a human to investigate further. Third, closed-loop automation

use cases where the intelligence layer automatically attempts to

recover from the incident by interacting with the radio base sta-

tions. In this case, no human is involved in the loop, and the

AIOps platform takes the end-to-end automated action.

The visualization layer provides an interface for users to

query the data on demand. In addition, it provides various dash-

boards aggregating performance metrics such as the number

of alerts per day, closed-loop use cases’ execution status, and

the base stations’ health status. In addition, the visualization

layer shows the status of the underneath layer and the health of

the AIOps platform. Furthermore, the layer allows the user to

schedule use cases, terminate them or trigger new execution.

“The visualization is key because, without that, it is hard

for the customer to understand. The customer sees great

value when the impact is visualized, which action has

been applied, what are the top oﬀenders and so forth”

— Interviewee T - Customer services engagement lead

6.2. AIOps usage with continuous deployment

We have identiﬁed six areas where AIOps can be used with

continuous deployment, which are: reducing the number of

support tickets, reducing the investigation time and eﬀort, re-

ducing the number of site visits, continuous monitoring, con-

tinuous adjustment of the CD zone, and additional automatic

data collection and correlation.

6.2.1. Reducing the number of support tickets

AIOps is used to reduce the number of received support tick-

ets from the customer. Using AIOps to predict, detect, and

correct issues without human involvement is seen as a critical

capability to support continuous deployment as it reduces the

cost per deployment, enables eﬃciencies in service delivery,

and ensures customer satisfaction. Thus, the case study com-

pany has implemented many closed-loop use cases that would

automatically predict operational issues before they happen and

ﬁx the underlying problems. In this case, the closed-loop use

cases provide preemption capabilities. If the closed-loop use

case detects an issue after it happens and applies a workaround

or correction on the node without human intervention, the use

case is considered reactive as it does not prevent the issue from

happening but rather reduces the impact of the issue.

“If we can solve 20% of the issues with closed-loop use

cases, it becomes extremely beneﬁcial” — Interviewee O

- Services systems principal developer

In addition to closed-loop use cases, the case company uses

a number of open-loop use cases. The main diﬀerence between

open and closed-loop is the degree of human involvement. Un-

like closed-loop automation, where humans are not in the loop,

in open-loop automation, an alert is sent to a service agent if an

issue has been detected or predicted. The human agent will re-

view the alert and signature of the issue and determine the next

set of actions.

6.2.2. Reducing the investigation time and eﬀort

The AIops platform is used to speed up the investigation time

of issues while reducing human eﬀorts. Therefore, with closed-

loop use cases, the actual investigation time is zero, and the

human eﬀort to investigate and conduct the corrective action

is also zero. In open-loop cases, the Mean Time To Detect

(MTTD) is relatively short, as the use case will alert once an

unwanted behavior is present. Thus, this eliminates the man-

ual eﬀort needed to detect unwanted behavior. In addition, the

alerts generated by the AIOps platform and sent to the service

engineers containing a set of pre-collected conﬁguration param-

eters such as the node’s hardware type, its current running soft-

ware version, type of radio, historical events on the node, and

any recent changes on the conﬁgurations. Furthermore, the alert

might contain links to knowledge articles with matching symp-

toms in the services knowledge management system. There-

fore, the alerts contain a prepared set of information that helps

the support engineer to have a heads up in the investigation as

relevant information is pre-collected and linked to the alert.

“We design the use cases to contain as much information

as possible to help the receiver quickly understand where

and what the problem is. This requires us to correlate

between multiple data sources and represent the informa-

tion in a usable way” — Interviewee N - Services systems

Solution Architect

6.2.3. Reducing the number of site visits

RAN nodes are exposed to various environmental factors

such as rain, snow, and heat. Thus, the hardware might be im-

pacted and require a replacement. While environmental issues

are out of control, software-related faults or wrong conﬁgura-

tions might manifest as hardware issues in the system’s logs.

Therefore, before sending someone to the site to replace the

hardware, an extensive check on the site’s conﬁguration needs

to be run to identify any software issues or faulty changes in

the conﬁguration that might have been applied intentionally or

by mistake. This process is often tedious, requiring much man-

ual eﬀort. Thus, the AIOps platform conducts extensive con-

ﬁguration checks, compares known software faults that might

manifest as faulty hardware, and provides a report to the ser-

vice engineer. This allows the case study company to recover

the hardware using remote actions rather than sending someone

onsite to replace non-faulty hardware.

“One of our main objectives is to reduce site visits. Last

year, we reduced our site visits by 32% to 33%. In this

case, we are helping both the customer and our unit by

increasing the eﬃciency of our Services” — Interviewee

T - Customer services engagement lead

While reducing site visits is also an applicable objective for

any customer, regardless if they are continuous deployment cus-

tomers, the pace of conﬁguration changes is much higher in

continuous deployment. Thus, this use case ensures that unnec-

essary site visits are avoided.

6.2.4. Continuous monitoring

Continuous monitoring is one of the main usage areas for

the AIOps platform. The AIOps platform is built to ensure

that RAN nodes’ data are collected as soon as they are pro-

duced. The data collection layer collects and sends the data

further to the data management layer, which ingests the data

and makes it readable. The intelligent layer then executes the

use cases against the new information to identify if there is any

KPI anomaly. In addition to performance data, conﬁguration

and alarm data are continuously collected and correlated with

the performance data.

Therefore, continuous monitoring is able to detect new soft-

ware changes automatically by correlating performance and

conﬁguration data. Once a change in the software version is

identiﬁed, continuous monitoring immediately starts to com-

pare the performance of the new software with the performance

of the previous one.

In order to perform this comparison, the AIOps platform

scans through hundreds of diﬀerent KPIs, conﬁgurations, and

alarms data continuously. KPIs comparison is considered a te-

dious and error-prone human activity if done manually. There-

fore, a machine learning-based use-case has been developed

in the AIOps platform that understands the traﬃc trends and

seasonality before the software change, then performs a be-

fore/after comparison automatically once a software change

is detected. This is performed by applying diﬀerent time se-

ries analysis and forecasting methods to traﬃc KPIs, such as

Long Short-Term Memory (LSTM) and Autoregressive Inte-

grated Moving Average (ARIMA).

6.2.5. Continuous adjustment of the CD zone

To keep track of how much the CD zone represents the en-

tire network, the case study company has developed a use case

that continuously checks conﬁgurations that exist in the entire

network versus the ones in the CD zone. The result of the com-

parison is a report providing a representation index, which in-

dicates the degree the CD zone mimics the entire network. The

use case scans both the hardware and software attributes, such

as the radio antennas and base-band processing unit types for

the hardware, in addition to active software features and con-

ﬁguration parameter values for the software. After that, the use

case creates clusters of conﬁgurations and compares the ones in

the CD zones versus the entire network. For this purpose, ma-

chine learning clustering algorithms such as hierarchical clus-

tering and K-means are used.

“The use case that identiﬁes major conﬁgurations in the

entire network and continuously indicates how represen-

tative the CD zone is, is, I would say, a critical use case.

We would have to spend a lot of time trying to ﬁnd this,

and we would not get into the same level of details as the

use case does” — Interviewee S - Customer DevOps Ex-

pert

6.2.6. Additional automatic data collection and correlation

Both eNodeBs and gNodeBs produce many types of data.

While some data are continuously collected, such as perfor-

mance, conﬁguration, and alarm data, detailed diagnostic data

are collected on demand. This is because the diagnostic data are

extensive in size; thus, their continuous collection impacts the

node’s performance. In addition, diagnostic data might contain

personal information; thus, appropriate approvals should be in

place before starting the collection. Thus, the ability to collect

the in-depth diagnostic data manually, as part of the open-loop

or closed-loop use cases, is considered important in the archi-

tecture of the AIOps platform.

6.3. AIOps platform deployment scenarios

The case study company uses two deployment scenarios for

the AIOps platform. The ﬁrst deployment scenario is called

white box, which indicates that the system’s data management,

data intelligence, and visualization layers are hosted in the case

company’s network, while the data collection layer is hosted in

the customers’ network as illustrated in Figure 7. Therefore,

the raw data are transferred from the customer’s network to the

case study company’s network to be processed.

This solution has two ﬂavors: ﬁrst, the AIOps platform is lo-

cated in the same country or geography as the customer. In this

case, while the raw data leaves the customer’s network to the

case study’s network, data is still located in the same country or

geography as the customer. Due to legal limitations, some cus-

tomers wish to keep their data in the same country or geography

as the data origin.

Figure 7: White box deployment scenario

The second deployment mode is called grey box, where the

raw data does not leave the customer’s network as depicted in

Figure 8. In this case, the AIOps platform, including the data

collection layer, is hosted in the customer’s network. The data

management layer interacts with an SMTP proxy at the case

study company over a secure VPN tunnel established between

the customer’s and the case study company’s network. This al-

lows email alerts to be forwarded to the support team, who can

then do some oﬄine analysis or request a remote manual con-

nection to the customer’s network to continue the investigation

if needed.

Figure 8: Grey box deployment scenario

7. AIOps challenges (RQ3)

This section presents the ﬁndings of the third research ques-

tion addressing the challenges when using AIOps to support

continuous deployment. During our study, we have identiﬁed

eight challenges when using AIOps to enable continuous de-

ployment: continuous data correlation, ecosystem data collec-

tion, multi-collection of the same data, digitalized information

ﬂow, creating trusted closed-loop use cases, alerts fatigue, es-

tablishing a two-way pipeline and services team mindset.

7.1. Continuous data correlation

The data generated by 4G and 5G RAN nodes have diﬀer-

ent types, come with many forms, and are generated by many

components within the node. Thus, the AIOps system needs

to have the capability to correlate diﬀerent data types to cre-

ate meaningful results. For example, in an open-loop anomaly

detection use case, if performance degradation is identiﬁed in

one node, the AIOps system needs to correlate the performance

data with other data types, such as alarms, conﬁguration, and

system events, before generating an alert. However, the chal-

lenge is that troubleshooting data, such as the system’s internal

constants and logs, changes often and is not backward com-

patible. Thus, correlating data involving internal logs becomes

diﬃcult as these data mean diﬀerent things in each release.

“System’s constants are challenging. Their values change

regularly, and new constants are added or removed. We

always need to update our use cases to make sure we cor-

relate data with the correct identiﬁers and values” — In-

terviewee N - Services systems solution architect

7.2. Ecosystem data collection

Data collection from the RAN nodes alone is insuﬃcient for

detailed root cause analysis and fault isolation of issues that

require end-to-end tracing and troubleshooting. For example,

detailed network end-to-end troubleshooting is needed if degra-

dation is observed in mobile connections’ access to the base sta-

tion. This involves data collection and analysis from the RAN

nodes, the core network, and the mobile equipment. Thus, for a

speedy failure cause identiﬁcation, data must be collected from

the entire mobile network ecosystem, including mobile equip-

ment and core network, not only the RAN.

7.3. Multi-collection of the same data

In many customer networks, there are often existing collec-

tions of some RAN data that overlap with the data the AIOps

system would need. An example cited in one of the meetings

is the performance data which are collected by customers to

support network planning, business, and operations. Therefore,

customers have existing data collectors which connect to the

network elements and do the collections. However, these col-

lectors can not be utilized for AIOps system for three reasons.

First, they lack the possibility to establish a two-way pipeline,

which means that they can collect data only without the abil-

ity to interact with the nodes in the opposite direction. Second,

they are also slow in collecting data which impacts the possi-

bility of doing timely observations, especially with continuous

monitoring. Third, connecting 3rd party data collectors to the

AIOps platform takes time and eﬀort, and it is not a feasible

option since every customer has customized collectors.

Several data collectors extracting the same data from the

nodes impact the bandwidth of the O&M interface. Thus, to

address the issue in the short term, the case study company as-

sesses the O&M bandwidth with the customer when the sys-

tem is installed to ensure that the O&M interface does not be-

come congested. Several interviewees highlighted that this is

not enough for the long term.

7.4. Digitized information ﬂow

With the continuous ﬂow of new features, corrections, and

software improvements, existing use cases need to be updated.

In addition, new use cases need to be created to cater to new

functionality. This would require a considerable amount of

work from the services team. In several meetings, the service

team members highlighted the diﬃculty of understanding the

content of the new software. Although the software is released

every two weeks, it still contains hundreds of changes and sev-

eral new features. The list of changes and new features becomes

known a few days before the release of the software, thus giv-

ing the service team little time to understand the consequences

of the new release on the AIOps system and adjust running use

cases.

Therefore, digitizing information ﬂow from when a soft-

ware change is triggered in the code repository until the soft-

ware is released is needed to enable continuous deployment

service team update use cases on time and with the least ef-

fort. To achieve that, the code change commit shall contain

enough information describing what has changed and the ex-

pected impact. The information shall be aggregated and pre-

sented in machine-readable format; thus, it becomes possible

for the AIOps platform to read these changes and possibly ad-

just existing use cases accordingly. In several discussions, it

was highlighted that information ﬂow is mainly manual, where

developers highlight changes they think will cause an impact on

the system to the release program. The release program ensures

that these changes are accounted for in continuous integration

and delivery, and their impact is documented in the release doc-

umentation. However, this process is subject to human mistakes

where developers forget to communicate changes on time.

“One of the major challenges is that the software includes

many changes that impact customers’ networks. Some

of these changes are described in release documentation,

while many others do not get visibility to be documented”

— Interviewee H - Customer support team lead

7.5. Trusted closed-loop use cases

Creating closed-loop use cases is challenging as they need

to be trustworthy with a proven track record of successful exe-

cution. In addition, these use cases should work with diﬀerent

customers’ conﬁgurations. Thus, customers are not often conﬁ-

dent to allow closed-loop use cases to be executed in their net-

work before making sure that they work as they should all the

times. As indicated by several interviewees, if a closed-loop

automation use case does not work as it should, the customer

would perceive it negatively.

“Customers are often not conﬁdent enough to allow

closed-loop use cases. Since it means tweaking the net-

work elements by, for example, resetting the radios and

changing the conﬁguration. With these actions, there are

always risk factors” — Interviewee S - Customer DevOps

expert

7.6. Alerts fatigue

Open loop use cases might generate a ﬂood of alerts. This

creates an ”alert fatigue” for the services team. To reduce the

number of false positives, a lot of work is needed to qualify

the alerts and label them. However, as the baseline changes

frequently with new software levels, new conﬁguration param-

eters, and new features, this makes it more diﬃcult.

“If an open loop use case runs several times a day, gen-

erating many alerts with each run, then it means a huge

amount of alerts a day to be investigated. So, we try to

keep the alerts informative and easy to digest ... However,

as the software change quite frequently, we have to ad-

just the use case logic continuously” — Interviewee N -

Services systems solution architect

7.7. Establishing a two-way pipeline

Establishing a two-way pipeline requires the AIOps system

to access the nodes to send customized commands, giving the

system higher control over the nodes. Customers often want

to keep this to the absolute minimum, as access to the nodes

poses a security risk. Customers often have a thorough process

to evaluate who would need access, when and for how long.

Providing functional user access to AIOps system requires the

customer’s security team to be involved and access the AIOps

platform from a security perspective. Aspects like if the plat-

form has security threats or can someone use the platform as

a proxy to gain access to the nodes, for example, need to be

checked and veriﬁed.

Thus, to interact with the nodes in the downstream direction

(i.e. from the AIOps system to the nodes), detailed planning

and understanding of what the system will trigger, how the sys-

tem will be identiﬁed, and security credential exchange are dis-

cussed with the security team continuously which slows down

the introduction of new use cases that involve new actions or

commands which need security approvals.

7.8. Service team mindset

Developing use cases in the AIOps platform requires the ser-

vice team to think proactively and preemptively and create use

cases that would address future scenarios. This is a change to

how service engineers’ previous way of working when they in-

vestigate issues after they happen by customer-initiated tickets.

With AIOps, the service team needs to anticipate how things

could go wrong and prepare suitable use cases for that. Sev-

eral interviews have highlighted that such a mindset change is

not easy for many service engineers who spent years working

in a reactive fashion. In addition, these engineers also need to

evolve their programming and data science knowledge when

working with the AIOps system, which requires them to invest

in learning new competences in addition to their domain knowl-

edge.

8. Discussion

In this section, we discuss the empirical ﬁndings from the

case study company, which are structured in three sections ad-

dressing each research question.

8.1. Continuous deployment impact on product-related ser-

vices

Continuous deployment impacts product-related services in

three ways. First, in how these services are conducted. With-

out continuous deployment, services are delivered by indepen-

dent projects and executed in sequence. However, introduc-

ing continuous deployment results in service convergence,

as the boundaries between diﬀerent services become blurry.

When the software release cycle is long, there is enough time

for services to be conducted in sequence. However, with con-

tinuous deployment, there is little time to do activities in se-

quence. Thus, service activities need to be done in parallel.

Therefore, services that have been traditionally separated, in-

dependent, and conducted by dedicated teams or organizations

converge. This is similar to the impact of agile software devel-

opment practices on how software development activities are

performed. Software development activities used to be con-

ducted in sequence by diﬀerent teams following a waterfall ap-

proach. However, in an agile software development context,

software development activities are conducted in parallel by the

same team. In the same agile sprint, the team performs activ-

ities such as coding, testing, and releasing. In addition, agile

teams are composed of diﬀerent competencies, such as devel-

opment, testing, and integration [73].

Second, services that used to be conducted in a reactive way

transition to becoming proactive and preemptive with con-

tinuous deployment. While services can be proactive and pre-

emptive without continuous deployment, as in the case of data-

driven product service systems [45], empirical results from the

case study company show that continuous deployment needs

proactive and preemptive services. Providing proactive and pre-

emptive services requires new ways of working and technical

capabilities.

From the ways of working, the service team shall work

closely on two fronts: the R&D organization and the cus-

tomers’ operation team. On the R&D front, changes in the

product shall be communicated to the services team. This re-

quires strong synchronization between the service and R&D or-

ganizations, especially with information ﬂow. This is achieved

by having members of the R&D’s release program in the ser-

vices team. On the customer front, the service team holds a

daily meeting with the customer’s operations team. In addi-

tion, the service team works proactively to identify features

suitable for the customer conﬁguration and suggest them to the

customer. In addition, the service team continuously monitors

the customer’s network to evaluate the value of new features

and identify any software-related issue in the customer network.

While the emphasis on collaboration has been highlighted ex-

tensively in the DevOps context, the focus has been on devel-

opment and operation teams [12,74]. However, in software-

intensive embedded systems, the customer is the one operating

it [75], while services are often the touch point between the cus-

tomers and product suppliers [76]. Therefore, services play a

crucial role in bridging between the R&D organization and cus-

tomers. This study reveals that continuous deployment requires

strong collaboration between three entities: the R&D organi-

zation representing development, the customer representing the

operation, and the service organization being the connector

between development and operation.

Furthermore, the service organization needs the technical

capabilities to enable the continuous deployment service team

to work proactively and preemptively. Continuous monitoring

provides the means to identify issues as soon as they happen

and thus gives the teams the visibility needed to work proac-

tively. Additionally, with the availability of historical data, it is

possible to move further to provide preemptive support where

issues are predicted and a workaround is applied before they

happen. To achieve that, the capability to perform continuous

data collection and interact with the systems in the opposite di-

rection is needed. Thus, a reliable two-way pipeline needs to

be established. In addition, the AIOps platform has a leading

role in enabling services activities to be conducted proactively

and preemptively utilizing the massive amount of collected data

and the ability to interact with the systems.

Third, continuous deployment comes with new activities,

such as establishing and maintaining the data pipeline, con-

tinuous monitoring, and adjustment of the CD zone. While

continuous monitoring has been highlighted as an important ac-

tivity in the context of DevOps, and Continuous Software En-

gineering [77,28], the continuous adjustment of the CD zone

is a new aspect that needs to be considered in the context of

software-intensive embedded systems. As these systems have

many instances in the ﬁeld, with often diﬀerent hardware com-

positions and customized conﬁgurations, it is important to con-

tinuously adjust the deployment targets to factor in the changes

in conﬁgurations and newly introduced hardware.

8.2. AIOps and continuous deployment

With the convergence of services, the need to be more proac-

tive and preemptive, and reduced operational time of new soft-

ware releases, services need to be conducted much faster and

more intelligently than before. In a traditional case, product-

related services are often manually delivered; however, deliver-

ing services by relying on human force only becomes challeng-

ing. As depicted by El Sawy et al. [78], the required level

of customer support rises exponentially with the increase of

complexity, connectivity, and criticality of the product. From

this perspective, software-intensive embedded systems are of-

ten critical systems with high availability and reliability require-

ments, and with the advent of the Internet of Things (IoT) they

are often connected. Further, the embedded software complex-

ity is increasing rapidly with new features and generations of

the product, for example, 5G is more complex than 4G, and 6G

is even expected to be more complex [79]. In addition, contin-

uous deployment further increases the complexity of the soft-

ware as a consequence of increased complexity in features in-

teractions [57]. Therefore, delivering services by relying on

human force only becomes challenging. Thus, as depicted in

Figure 9, AIOps become critical to break the complexity curve

while delivering faster and more intelligent services.

Figure 9: Breaking the eﬀorts vs. complexity curve

Further, software-intensive systems are getting more intel-

ligent and smarter. These systems can perform aspects like

self-conﬁguration and self-optimization, for example. In mo-

bile networks, a signiﬁcant focus is dedicated to the self-

optimization network (SON) features [80,81]. However, while

the system’s self-service capabilities are increasing, AIOps are

used to support several aspects. In this study, we identiﬁed

six usage scenarios where AIOps is used to support continu-

ous deployment: reduce the number of support tickets, reduce

investigation time and eﬀort, reduce the number of site visits,

continuous monitoring, continuous adjustment of the CD zone

and ﬁnally, additional automatic data collection and correlation.

Therefore, while the product’s self-service capabilities are in-

creasing, AIOps can be seen as ﬁlling a considerable space in

the overall service scope of the product, as depicted in Figure

10.

Figure 10: system’s self-service capabilities, Product-related Service and

AIOps

In addition, the AIOps platform has to have the ﬂexibility

to be deployed in diﬀerent scenarios, for example, in the cus-

tomer’s network if data sharing is not allowed due to legal rea-

sons or customers’ preferences. In the case study company,

this is achieved by having two deployment scenarios, white and

grey box. While having two deployment scenarios increases the

eﬀort needed to develop, operate and maintain the platform, it

allows the case study company to provide customized service

solutions fulﬁlling diﬀerent customers’ needs. In this context,

customized service solutions increase perceived service qual-

ity, customer satisfaction, customer trust, and customer loyalty

[82]. Thus, this shows that the usage of AIOps to support con-

tinuous deployment should consider the customization needs of

customers when used to deliver product-related services.

8.3. AIOps challenges

In this study, we have identiﬁed several challenges related

to AIOps. Continuous data correlation and ecosystem data

collection are two of the challenges, as the AIOps platform

should be able to collect, process, and correlate diﬀerent data

types not only from the nodes where the software is being de-

ployed continuously but also from the ecosystem surrounding

them. In addition, multi-data collection and the establish-

ment of a two-way pipeline are closely related challenges in

this study. As we described in our earlier research, data gen-

erated by RAN nodes have diﬀerent dimensions [83], which

makes data correlation in order to provide meaningful results

a challenging task, especially when the content of the data

changes more frequently with a new software release. Fur-

thermore, collecting the same data multiple times might not be

a practical option, especially when the bandwidth of the data

pipeline is limited. There seems to be no common solution that

can be used to address these challenges. However, one pos-

sible way is to provide customized solutions to address each

customer’s speciﬁc situation.

Furthermore, another challenge identiﬁed in our study is the

establishment of a digitalized information ﬂow where the im-

pacts of the code changes introduced by the R&D development

teams are quickly propagated to the service organization. With

continuous deployment, there is little time for the service team

to update the use cases based on the new software content.

Thus, a parity has to be established between the release con-

tent and the AIOps platform logic, i.e., the service readiness

has to be achieved with the software release, not after. Thus a

digitalized information ﬂow becomes important. However, this

is challenging, especially in the context of large organizations

where the software has many components and is built by the

contributions of thousands of developers.

In addition, the service teams need to chase alerts gener-

ated from open-loop use cases continuously. Thus, keeping the

AIOps use cases up to date, reducing the number of false-

positive alerts, and at the same time, increasing the trust-

worthiness of closed-loop use cases are major challenges for

the case study company.

Changing the mindset of the service team to adopt a proac-

tive and preemptive way of working is also another challenge.

The social challenges associated with adopting continuous de-

ployment have been highlighted before in multiple studies, such

as [55,84]. However, these studies have primarily focused on

the social challenges from software developers’ point of view.

In this study, embracing AIOps is also challenging for the ser-

vice teams as it requires a mindset change from a reactive way

of working to a proactive and preemptive one.

9. Conclusion

Continuous deployment profoundly impacts product-related

services oﬀered by software-intensive embedded systems com-

panies. In this study, we have explored how continuous de-

ployment impacts product-related services. Our results show

that services transition from being conducted in a water-fall ap-

proach to becoming continuous and conducted in parallel. In

addition, the service organization needs to work in a proactive

and preemptive way while maintaining a close relationship with

the customer. Further, the service organization acts as a proxy

connecting both the R&D organization and the customer.

Furthermore, to support continuous deployment, an AIOps

platform is used to automate manual activities, thus reducing

the cost per software deployment. In addition, the AIOps plat-

form is used to perform continuous monitoring, track the rep-

resentation of the continuous deployment zone to the entire

network, and for on-demand diagnostic data collection. The

AIOps platform has a layered architecture and supports diﬀer-

ent deployment scenarios to suit diﬀerent customer preferences

and legal limitations on data collection.

Finally, using AIOps with continuous deployment comes

with many challenges. In this paper, we have identiﬁed several

challenges related to AIOps, such as continuous data correla-

tion and ecosystem collection, the establishment of two-way

data pipelines, and a services team mindset.

10. Acknowledgment

We would like to express our gratitude to everyone who con-

tributed with valuable input and insights from the case study

company.

References

[1] M. V. Stringfellow, N. G. Leveson, B. D. Owens, Safety-driven design for

software-intensive aerospace and automotive systems, Proceedings of the

IEEE 98 (4) (2010) 515–525.

[2] M. Broy, The’grand challenge’in informatics: engineering software-

intensive systems, Computer 39 (10) (2006) 72–80.

[3] M. Andreessen, Why software is eating the world, Wall Street Journal

20 (2011) (2011) C2.

[4] J. Bosch, H. H. Olsson, Digital for real: A multicase study on the digital

transformation of companies in the embedded systems domain, Journal

of Software: Evolution and Process 33 (5) (2021) e2333.

[5] H. H. Olsson, J. Bosch, Climbing the “stairway to heaven”: evolving from

agile development to continuous deployment of software, in: Continuous

software engineering, Springer, 2014, pp. 15–27.

[6] H. H. Olsson, H. Alahyari, J. Bosch, Climbing the” stairway to heaven”–a

multiple-case study exploring barriers in the transition from agile devel-

opment towards continuous deployment of software, in: 2012 38th eu-

romicro conference on software engineering and advanced applications,

IEEE, 2012, pp. 392–399.

[7] T. Savor, M. Douglas, M. Gentili, L. Williams, K. Beck, M. Stumm, Con-

tinuous deployment at facebook and oanda, in: 2016 IEEE/ACM 38th In-

ternational Conference on Software Engineering Companion (ICSE-C),

IEEE, 2016, pp. 21–30.

[8] L. Riungu-Kalliosaari, S. M ¨

akinen, L. E. Lwakatare, J. Tiihonen,

T. M¨

annist¨

o, Devops adoption beneﬁts and challenges in practice: a case

study, in: International conference on product-focused software process

improvement, Springer, 2016, pp. 590–597.

[9] D. G. Feitelson, E. Frachtenberg, K. L. Beck, Development and deploy-

ment at facebook, IEEE Internet Computing 17 (4) (2013) 8–17.

[10] C. Parnin, E. Helms, C. Atlee, H. Boughton, M. Ghattas, A. Glover,

J. Holman, J. Micco, B. Murphy, T. Savor, et al., The top 10 adages in

continuous deployment, IEEE Software 34 (3) (2017) 86–95.

[11] F. M. Erich, C. Amrit, M. Daneva, A qualitative study of devops usage in

practice, Journal of software: Evolution and Process 29 (6) (2017) e1885.

[12] L. E. Lwakatare, P. Kuvaja, M. Oivo, Dimensions of devops, in: Inter-

national conference on agile software development, Springer, 2015, pp.

212–217.

[13] P. Rodr´

ıguez, A. Haghighatkhah, L. E. Lwakatare, S. Teppola, T. Suoma-

lainen, J. Eskeli, T. Karvonen, P. Kuvaja, J. M. Verner, M. Oivo, Continu-

ous deployment of software intensive products and services: A systematic

mapping study, Journal of Systems and Software 123 (2017) 263–291.

[14] G. J. Chen, J. L. Wiener, S. Iyer, A. Jaiswal, R. Lei, N. Simha, W. Wang,

K. Wilfong, T. Williamson, S. Yilmaz, Realtime data processing at face-

book, in: Proceedings of the 2016 International Conference on Manage-

ment of Data, 2016, pp. 1087–1098.

[15] L. Abraham, J. Allen, O. Barykin, V. Borkar, B. Chopra, C. Gerea,

D. Merl, J. Metzler, D. Reiss, S. Subramanian, et al., Scuba: diving into

data at facebook, Proceedings of the VLDB Endowment 6 (11) (2013)

1057–1067.

[16] A. Masood, A. Hashmi, Aiops: Predictive analytics & machine learning

in operations, in: Cognitive Computing Recipes, Springer, 2019, pp. 359–

382.

[17] Y. Dang, Q. Lin, P. Huang, Aiops: real-world challenges and research

innovations, in: 2019 IEEE/ACM 41st International Conference on Soft-

ware Engineering: Companion Proceedings (ICSE-Companion), IEEE,

2019, pp. 4–5.

[18] A. Dakkak, D. Issa Mattos, J. Bosch, Success factors when transitioning

to continuous deployment in software-intensive embedded systems, in:

2021 47th Euromicro Conference on Software Engineering and Advanced

Applications (SEAA), IEEE, 2021, pp. 129–137.

[19] S. G. Yaman, T. Sauvola, L. Riungu-Kalliosaari, L. Hokkanen, P. Kuvaja,

M. Oivo, T. M¨

annist¨

o, Customer involvement in continuous deployment:

a systematic literature review, in: International Working Conference on

Requirements Engineering: Foundation for Software Quality, Springer,

2016, pp. 249–265.

[20] K. Beetz, W. B ¨

ohm, Challenges in engineering for software-intensive em-

bedded systems, in: Model-Based Engineering of Embedded Systems,

Springer, 2012, pp. 3–14.

[21] S. Fischer, R. Ramler, C. Klammer, R. Rabiser, Testing of highly con-

ﬁgurable cyber-physical systems–a multiple case study, in: 15th In-

ternational Working Conference on Variability Modelling of Software-

Intensive Systems, 2021, pp. 1–10.

[22] J. Bosch, R. Capilla, Dynamic variability in software-intensive embedded

system families, Computer 45 (10) (2012) 28–35.

[23] S. Shokouhyar, S. Shokoohyar, S. Safari, Research on the inﬂuence of

after-sales service quality factors on customer satisfaction, Journal of Re-

tailing and Consumer Services 56 (2020) 102139.

[24] V. Bindroo, B. J. Mariadoss, R. Echambadi, K. R. Sarangee, Customer

satisfaction with consumption systems, Journal of Business-to-Business

Marketing 27 (1) (2020) 1–17.

[25] M. Szwejczewski, K. Goﬃn, Z. Anagnostopoulos, Product service sys-

tems, after-sales service and new product development, International

Journal of Production Research 53 (17) (2015) 5334–5353.

[26] H. Gebauer, S. Joncourt, C. Saul, Services in product-oriented companies:

past, present, and future, Universia Business Review (49) (2016) 32–53.

[27] M. Wilson, K. Wnuk, L. Bengtsson, Business model ﬂexibility

and software-intensive companies: Opportunities and challenges, e-

Informatica Software Engineering Journal 15 (1) (2021).

[28] B. Fitzgerald, K.-J. Stol, Continuous software engineering: A roadmap

and agenda, Journal of Systems and Software 123 (2017) 176–189.

[29] A. Dakkak, A. R. Munappy, J. Bosch, H. H. Olsson, Customer support in

the era of continuous deployment: A software-intensive embedded sys-

tems case study, in: 2022 IEEE 46th Annual Computers, Software, and

Applications Conference (COMPSAC), IEEE, 2022, pp. 914–923.

[30] P. Liggesmeyer, M. Trapp, Trends in embedded software engineering,

IEEE software 26 (3) (2009) 19–25.

[31] C. Ebert, C. Jones, Embedded software: Facts, ﬁgures, and future, Com-

puter 42 (4) (2009) 42–52.

[32] R. N. Charette, This car runs on code, IEEE spectrum 46 (3) (2009) 3.

[33] N. Lakemond, G. Holmberg, A. Pettersson, Digital transformation

in complex systems, IEEE Transactions on Engineering Management

(2021).

[34] M. M¨

ullerburg, Software intensive embedded systems, Information and

Software Technology 41 (14) (1999) 979–984.

[35] J. Bosch, Continuous software engineering: An introduction, in: Contin-

uous software engineering, Springer, 2014, pp. 3–13.

[36] A. R. Tan, Service-oriented product development strategies, Danmarks

Tekniske Universitet. DTU Management, DTU Management Engineering

(2010).

[37] M. J. Goedkoop, C. J. Van Halen, H. R. Te Riele, P. J. Rommens,

et al., Product service systems, ecological and economic basics, Report

for Dutch Ministries of environment (VROM) and economic aﬀairs (EZ)

36 (1) (1999) 1–122.

[38] M. Salwin, A. Kraslawski, State-of-the-art in product-service system clas-

siﬁcation, Design, Simulation, Manufacturing: The Innovation Exchange

(2020) 187–200.

[39] M. Boehm, O. Thomas, Looking beyond the rim of one’s teacup: a multi-

disciplinary literature review of product-service systems in information

systems, business management, and engineering & design, Journal of

cleaner production 51 (2013) 245–260.

[40] M. G. de Oliveira, G. H. de Sousa Mendes, A. A. de Albuquerque,

H. Rozenfeld, Lessons learned from a successful industrial product ser-

vice system business model: emphasis on ﬁnancial aspects, Journal of

Business & Industrial Marketing (2018).

[41] A. Annarelli, C. Battistella, F. Nonino, Product service system: A concep-

tual framework from a systematic review, Journal of cleaner production

139 (2016) 1011–1032.

[42] G. H. Mendes, M. G. Oliveira, H. Rozenfeld, C. A. N. Marques, J. M. H.

Costa, et al., Product-service system (pss) design process methodologies:

a systematic literature review, in: DS 80-7 Proceedings of the 20th In-

ternational Conference on Engineering Design (ICED 15) Vol 7: Product

Modularisation, Product Architecture, systems Engineering, Product Ser-

vice Systems, Milan, Italy, 27-30.07. 15, 2015, pp. 291–300.

[43] T. A. Tran, J. Y. Park, Development of integrated design methodology

for various types of product—service systems, Journal of Computational

Design and Engineering 1 (1) (2014) 37–47.

[44] S. Chowdhury, D. Haftor, N. Pashkevich, Smart product-service systems

(smart pss) in industrial ﬁrms: a literature review, Procedia Cirp 73 (2018)

26–31.

[45] M. Zambetti, F. Adrodegari, G. Pezzotta, R. Pinto, M. Rapaccini, C. Bar-

bieri, From data to value: Conceptualising data-driven product service

system, Production Planning & Control (2021) 1–17.

[46] C. E. McPhail, Respond, restore, resolve: Achieving 7-nines availabil-

ity telecommunications systems in the ﬁeld, Bell Labs Technical Journal

11 (3) (2006) 173–189.

[47] P. J. Windley, Delivering high availability services using a multi-tiered

support model, Windley’s Technometria 16 (2002) 1–9.

[48] N. Ramasubbu, S. Mithas, M. S. Krishnan, High tech, high touch: The ef-

fect of employee skills and customer heterogeneity on customer satisfac-

tion with enterprise system support services, Decision Support Systems

44 (2) (2008) 509–523.

[49] M. Prior, ” you want to do what?” breaking the rules to increase customer

satisfaction, in: 2011 Agile Conference, IEEE, 2011, pp. 269–273.

[50] V. Mathieu, Product services: from a service supporting the product to a

service supporting the client, Journal of Business & Industrial Marketing

(2001).

[51] V. Parida, D. R. Sj¨

odin, J. Wincent, M. Kohtam¨

aki, A survey study of

the transitioning towards high-value industrial product-services, Procedia

CIRP 16 (2014) 176–180.

[52] D. Stahl, T. Martensson, J. Bosch, Continuous practices and devops: be-

yond the buzz, what does it all mean?, in: 2017 43rd Euromicro Con-

ference on Software Engineering and Advanced Applications (SEAA),

IEEE, 2017, pp. 440–448.

[53] I. Gerostathopoulos, M. Konersmann, S. Krusche, D. I. Mattos, J. Bosch,

T. Bures, B. Fitzgerald, M. Goedicke, H. Muccini, H. H. Olsson, et al.,

Continuous data-driven software engineering-towards a research agenda:

Report on the joint 5th international workshop on rapid continuous soft-

ware engineering (rcose 2019) and 1st international works, ACM SIG-

SOFT Software Engineering Notes 44 (3) (2019) 60–64.

[54] T. E. Cardoso, A. R. Santos, R. Chanin, A. Sales, Communication of

changes in continuous software development, in: International Confer-

ence on Software Business, Springer, 2020, pp. 86–101.

[55] G. G. Claps, R. B. Svensson, A. Aurum, On the journey to continuous

deployment: Technical and social challenges along the way, Information

and Software technology 57 (2015) 21–31.

[56] B. Fitzgerald, K.-J. Stol, Continuous software engineering and beyond:

trends and challenges, in: Proceedings of the 1st International Workshop

on Rapid Continuous Software Engineering, 2014, pp. 1–9.

[57] A. Fabijan, H. H. Olsson, J. Bosch, Time to say’good bye’: Feature life-

cycle, in: 2016 42th Euromicro Conference on Software Engineering and

Advanced Applications (SEAA), IEEE, 2016, pp. 9–16.

[58] L. Rijal, R. Colomo-Palacios, M. S´

anchez-Gord´

on, Aiops: A multivocal

literature review, Artiﬁcial Intelligence for Cloud and Edge Computing

(2022) 31–50.

[59] H. Su, Q. He, B. Guo, Kpi anomaly detection method for data center

aiops based on gru-gan, in: 2021 10th International Conference on Inter-

net Computing for Science and Engineering, 2021, pp. 23–29.

[60] S. Nedelkoski, J. Cardoso, O. Kao, Anomaly detection and classiﬁcation

using distributed tracing and deep learning, in: 2019 19th IEEE/ACM in-

ternational symposium on cluster, cloud and grid computing (CCGRID),

IEEE, 2019, pp. 241–250.

[61] Y. Zhang, Z. Guan, H. Qian, L. Xu, H. Liu, Q. Wen, L. Sun, J. Jiang,

L. Fan, M. Ke, Cloudrca: A root cause analysis framework for cloud com-

puting platforms, in: Proceedings of the 30th ACM International Confer-

ence on Information & Knowledge Management, 2021, pp. 4373–4382.

[62] R. Harper, P. Tee, Cookbook, a recipe for fault localization, in: NOMS

2018-2018 IEEE/IFIP Network Operations and Management Sympo-

sium, IEEE, 2018, pp. 1–6.

[63] P. Naik, C. Govindarajan, S. Goel, K. Govindarajan, D. Behl, A. Singh,

M. Thomas, U. Mangla, P. Jayachandran, Closed-loop automation for 5g

slice assurance, in: 2022 14th International Conference on COMmunica-

tion Systems & NETworkS (COMSNETS), IEEE, 2022, pp. 424–426.

[64] P. Notaro, J. Cardoso, M. Gerndt, A systematic mapping study in aiops,

in: International Conference on Service-Oriented Computing, Springer,

2020, pp. 110–123.

[65] P. Notaro, J. Cardoso, M. Gerndt, A survey of aiops methods for failure

management, ACM Transactions on Intelligent Systems and Technology

(TIST) 12 (6) (2021) 1–45.

[66] C. B. Seaman, Qualitative methods in empirical studies of software engi-

neering, IEEE Transactions on software engineering 25 (4) (1999) 557–

572.

[67] C. Wohlin, M. H ¨

ost, K. Henningsson, Empirical research methods in soft-

ware engineering, in: Empirical methods and studies in software engi-

neering, Springer, 2003, pp. 7–23.

[68] R. K. Yin, Case study research and applications: Design and methods,

Sage publications, 2017.

[69] P. Runeson, M. H¨

ost, Guidelines for conducting and reporting case study

research in software engineering, Empirical software engineering 14 (2)

(2009) 131.

[70] V. Braun, V. Clarke, Using thematic analysis in psychology, Qualitative

research in psychology 3 (2) (2006) 77–101.

[71] K. Petersen, C. Gencel, Worldviews, research methods, and their rela-

tionship to validity in empirical software engineering research, in: 2013

joint conference of the 23rd international workshop on software measure-

ment and the 8th international conference on software process and prod-

uct measurement, IEEE, 2013, pp. 81–89.

[72] A. El Rhayour, T. Mazri, 5g architecture: Deployment scenarios and

options, in: 2019 International Symposium on Advanced Electrical and

Communication Technologies (ISAECT), IEEE, 2019, pp. 1–6.

[73] S. Kim, H. Lee, Y. Kwon, M. Yu, H. Jo, Our journey to becoming agile:

Experiences with agile transformation in samsung electronics, in: 2016

23rd Asia-Paciﬁc Software Engineering Conference (APSEC), IEEE,

2016, pp. 377–380.

[74] L. E. Lwakatare, T. Kilamo, T. Karvonen, T. Sauvola, V. Heikkil¨

a, J. Itko-

nen, P. Kuvaja, T. Mikkonen, M. Oivo, C. Lassenius, Devops in practice:

A multiple case study of ﬁve companies, Information and Software Tech-

nology 114 (2019) 217–230.

[75] L. E. Lwakatare, T. Karvonen, T. Sauvola, P. Kuvaja, H. H. Olsson,

J. Bosch, M. Oivo, Towards devops in the embedded systems domain:

Why is it so hard?, in: 2016 49th hawaii international conference on sys-

tem sciences (hicss), IEEE, 2016, pp. 5437–5446.

[76] T. Sauvola, L. E. Lwakatare, T. Karvonen, P. Kuvaja, H. H. Olsson,

J. Bosch, M. Oivo, Towards customer-centric software development: a

multiple-case study, in: 2015 41st Euromicro Conference on Software

Engineering and Advanced Applications, IEEE, 2015, pp. 9–17.

[77] L. E. Lwakatare, P. Kuvaja, M. Oivo, An exploratory study of devops

extending the dimensions of devops with practices, ICSEA 104 (2016)

2016.

[78] O. A. El Sawy, Redesigning it-enabled customer support processes for

dynamic environments, in: Business Process Transformation, Routledge,

2015, pp. 173–200.

[79] A. Imran, A. Zoha, A. Abu-Dayya, Challenges in 5g: how to empower

son with big data for enabling 5g, IEEE network 28 (6) (2014) 27–33.

[80] H. Hu, J. Zhang, X. Zheng, Y. Yang, P. Wu, Self-conﬁguration and self-

optimization for lte networks, IEEE Communications Magazine 48 (2)

(2010) 94–100.

[81] A. Alhammadi, M. Roslee, M. Y. Alias, I. Shayea, A. Alquhali, Velocity-

aware handover self-optimization management for next generation net-

works, Applied Sciences 10 (4) (2020) 1354.

[82] P. S. Coelho, J. Henseler, Creating customer loyalty through service cus-

tomization, European Journal of Marketing (2012).

[83] A. Dakkak, H. Zhang, D. Issa Mattos, J. Bosch, H. Holmstr ¨

om Ols-

son, Towards continuous data collection from in-service products: Ex-

ploring the relation between data dimensions and collection challenges,

in: 2021 28th Asia-Paciﬁc Software Engineering Conference (APSEC),

IEEE, 2021, pp. 200–209.

[84] M. Lepp¨

anen, S. M¨

akinen, M. Pagels, V.-P. Eloranta, J. Itkonen, M. V.

M¨

antyl¨

a, T. M¨

annist¨

o, The highways and country roads to continuous

deployment, Ieee software 32 (2) (2015) 64–72.

ResearchGate has not been able to resolve any citations for this publication.

Customer Support In The Era of Continuous Deployment: A Software-Intensive Embedded Systems Case Study

Conference Paper

Full-text available

May 2022

Digital Transformation in Complex Systems

Article

Full-text available

Oct 2021

Complex systems increasingly include embedded digital technologies that interact with and are constrained by physical components and systems. Although these systems play a central role in our society, they have only been scarcely addressed in contemporary research on digital transformation and the organization of innovation. This article explores the digital transformation in complex products and systems and its consequences for organizational design. A longitudinal study of avionics development since the 1950s uncovers the application of digital technologies, first as a sequence of initial experiments, followed by the use as add-on functionality, then as an integral part of achieving critical functionality in systems, and currently combining add-on and critical functionalities enabling generativity. The findings emphasize the evolution of the intricate relationships between the systems architecture and organizational approaches when digital technology enables and enforces increased complexity, expanded functionality, increased systems integration, and continuous development. These nested dependencies are accentuated by the complexity that has emerged beyond human cognition, where increasingly sophisticated boundary objects based on modeling, simulation, and data play an important role in the organization's ability. Boundary objects relate and decouple the multifacetted dynamic relation between organization and architecture. The results also extend existing perspectives on platform strategies by outlining the importance of generativity in combination with criticality control, rather than market control. Criticality control in combination with generativity has become imperative not least as generative digital technologies have become central in achieving critical properties such as safety. Several avenues for further research are outlined.

Business Model Flexibility and Software-intensive Companies: Opportunities and Challenges

Article

Full-text available

Sep 2021

Background: Software plays an essential role in enabling digital transformation via digital services added to traditional products or fully digital business offerings. This calls for a better understanding of the relationships between the dynamic nature of business models and their realization using software engineering practices. Aim: In this paper, we synthesize the implications of digitalization on business model flexibility for software-intensive companies based on an extensive literature survey and a longitudinal case study at Ericsson AB. We analyze how software-intensive companies can better synchronize business model changes with software development processes and organizations. Method: We synthesize six propositions based on the literature review and extensive industrial experience with a large software-intensive company working in the telecommunication domain. Conclusions: Our work is designed to facilitate the cross-disciplinary analysis of business model dynamics and business model flexibility by linking value, transaction, and organizational learning to business model change. We believe that software engineering tools and methods can play a crucial role in enabling more automated synchronization between technology and business model changes.

Towards Continuous Data Collection from In-service Products: Exploring the Relation Between Data Dimensions and Collection Challenges

Conference Paper

Dec 2021

KPI anomaly detection method for Data Center AIOps based on GRU-GAN

Conference Paper

Jul 2021

AIOps: A Multivocal Literature Review

Chapter

Jan 2022

In the age of Internet of Things (IoT) and big data, artificial intelligence for IT operations (AIOps) plays an important role in enhancing IT operations. Such operation tasks include automation, performance monitoring, and event correlations, among others. Although AIOps has proved to be important, it has not received much academic attention. Thus, by means of Multivocal Literature Review, this study is aiming to define AIOps, the benefits gained from it, the challenges an organization might face, and, finally, what lies in the foreseen future of the AIOps. The findings revealed that adopting AIOps helps in monitoring IT work, efficient time saving, improving human-AI collaboration, proactive IT work, and boosting faster mean time to recovery (MTTR). However, there are also reported challenges like doubt about the efficiency of artificial intelligence and machine learning, low-quality data, and identifying use cases, constrained by traditional engineering approaches. In conclusion, this study aims to contribute to the body of knowledge to the adaptation of AIOps in the IT industry which may benefit IT organizations. Finally, further research can be done to better understand how AIOps provides human augmentation to enhance human productivity in terms of senses, cognition, and human action.

Closed-Loop Automation for 5G Slice Assurance

Conference Paper

Jan 2022

A Survey of AIOps Methods for Failure Management

Article

Nov 2021

Modern society is increasingly moving toward complex and distributed computing systems. The increase in scale and complexity of these systems challenges O&M teams that perform daily monitoring and repair operations, in contrast with the increasing demand for reliability and scalability of modern applications. For this reason, the study of automated and intelligent monitoring systems has recently sparked much interest across applied IT industry and academia. Artificial Intelligence for IT Operations (AIOps) has been proposed to tackle modern IT administration challenges thanks to Machine Learning, AI, and Big Data. However, AIOps as a research topic is still largely unstructured and unexplored, due to missing conventions in categorizing contributions for their data requirements, target goals, and components. In this work, we focus on AIOps for Failure Management (FM), characterizing and describing 5 different categories and 14 subcategories of contributions, based on their time intervention window and the target problem being solved. We review 100 FM solutions, focusing on applicability requirements and the quantitative results achieved, to facilitate an effective application of AIOps solutions. Finally, we discuss current development problems in the areas covered by AIOps and delineate possible future trends for AI-based failure management.

CloudRCA: A Root Cause Analysis Framework for Cloud Computing Platforms

Conference Paper

Oct 2021

Success Factors when Transitioning to Continuous Deployment in Software-Intensive Embedded Systems

Conference Paper

Sep 2021

Towards AIOps enabled services in continuously evolving software-intensive embedded systems

Abstract

Recommended publications

DevServOps: DevOps For Product-Oriented Product-Service Systems

Customer Support In The Era of Continuous Deployment: A Software-Intensive Embedded Systems Case Stu...

DevServOps: DevOps For Product-Oriented Product Service Systems

Controlled Continuous Deployment: A Case Study From The Telecommunications Domain