ArticlePDF Available

Towards AIOps enabled services in continuously evolving software-intensive embedded systems

Authors:

Abstract

Continuous deployment has been practiced for many years by companies developing web‐ and cloud‐based applications. To succeed with continuous deployment, these companies have a strong collaboration culture between the operations and development teams. In addition, these companies use AI, analytics, and big data to assist with time‐consuming postdeployment activities such as continuous monitoring and fault identification. Thus, the term AIOps has evolved to highlight the importance and difficulty of maintaining highly available applications in a complex and dynamic environment. In contrast, software‐intensive embedded systems often provide customer product‐related services, such as maintenance, optimization, and support. These services are critical for these companies as they provide significant revenue and increase customer satisfaction. Therefore, the objective of our study is to gain an in‐depth understanding of the impact of continuous deployment on product‐related services provided by software‐intensive embedded systems companies. In addition, we aim to understand how AIOps can support continuous deployment in the context of software‐intensive embedded systems. To address this objective, we conducted a case study at a large and multinational telecommunications systems provider focusing on the radio access network (RAN) systems for 4G and 5G networks. The company provides RAN products and three complementing services: rollout, optimization, and customer support. The results from the case study show that the boundaries between product‐related services become blurry with continuous deployment. In addition, product‐related services, which were conducted in sequence by independent projects, converge with continuous deployment and become part of the same project. Further, AIOps platforms play an important role in reducing costs and increasing postdeployment activities' efficiency and speed. These results show that continuous deployment has a profound impact on the software‐intensive system's provider service organization. The service organization becomes the connection between the R&D organization and the customer. In order to cope with the increased speed of releases, deployment and postdeployment activities need to be largely automated. AIOps platforms are seen as a critical enabler in managing the increasing complexity without increasing human involvement.
Towards AIOps enabled services in continuously evolving software-intensive embedded
systems
Anas Dakkaka,
, Jan Boschb, Helena Holmstrom Olssonc
aEricsson AB, Torshamnsgatan 21, Stockholm, 164 83, Sweden
bDepartment of Computer Science and Engineering, Chalmers University of Technology, Chalmersplatsen 1, Gothenburg, 412 96, Sweden
cDepartment of Computer Science and Media Technology, Malmo University, Nordenskioldsgatan, Malmo, 211 19, Sweden
Abstract
Context: Continuous deployment has been practiced for many years by companies developing web and cloud-based applica-
tions. To succeed with continuous deployment, these companies have a strong collaboration culture between the operations and
development teams. In addition, these companies use AI, analytics, and big data to assist in the time consumed by post-deployment
activities such as continuous monitoring and fault identification. Thus, the term AIOps has evolved to highlight the importance and
diculty of maintaining highly available applications in a complex and dynamic environment. In contrast, software-intensive em-
bedded systems often provide customer product-related services, such as maintenance, optimization, and support. These services
are critical for these companies as they provide significant revenue and increase customer satisfaction.
Objectives: The objective of our study is to gain an in-depth understanding of the impact of continuous deployment on product-
related services provided by software-intensive embedded systems companies. In addition, we aim to understand how AIOps can
support continuous deployment in the context of software-intensive embedded systems.
Method: We conducted a case study at a large and multinational telecommunications systems provider focusing on the radio access
network (RAN) systems for 4G and 5G networks. The company provides RAN products and three complementing services: rollout,
optimization, and customer support.
Results: With continuous deployment, the boundaries between product-related services become blurry. Product-related services,
which were conducted in sequence by independent projects, converge with continuous deployment and become part of the same
project. In addition, AIOps platforms play an important role in reducing costs and increasing post-deployment activities’ eciency
and speed.
Conclusion: Continuous deployment has a profound impact on the software-intensive system’s provider service organization. The
service organization becomes the connection between the R&D organization and the customer. In order to cope with the increased
speed of releases, deployment and post-deployment activities need to be largely automated. AIOps platforms are seen as a critical
enabler in managing the increasing complexity without increasing human involvement.
Keywords: Software-Intensive Embedded Systems, AIOps, Continuous Deployment, Product Service Systems
1. Introduction
Software-intensive embedded systems have changed our so-
cieties. Since their advent, these systems have become integral
to our lives. We use them daily and find them almost every-
where around us. We find them in the skies, such as airplanes,
on the roads, such as cars, or hidden in cabinets, such as mo-
bile network equipment [1,2]. These systems have been his-
torically mechanical and electrical driven; however, in the age
where software is eating the world [3], they have become in-
creasingly software-driven.
Therefore, traditional product manufacturing companies are
transitioning from producers of mechanical and electrical sys-
Corresponding author
Email addresses: anas.dakkak@ericsson.com (Anas Dakkak),
jan.bosch@chalmers.se (Jan Bosch),
helena.holmstrom.olsson@mau.se (Helena Holmstrom Olsson)
tems to software development companies, producing software-
intensive embedded systems instead [4]. A software-intensive
embedded system implies that the software is the critical com-
ponent of the system and shapes its functionality. Conse-
quently, these companies have been advancing their software
development process by embracing continuous practices [5,6].
The ultimate goal is to frequently satisfy customers with new
functionality by utilizing continuous deployment.
Providing new and improved functionality to customers con-
tinuously via software upgrades is a phenomenon that has been
around for a while in the software industry. It has been applied
in web and cloud-based applications for years and has become
the norm. For example, companies like Facebook deploy hun-
dreds of software changes daily to their production environment
[7]. In order to facilitate continuous deployment, these compa-
nies had to undergo a significant change in various areas, not
least how the organization operates, people skills, and ways of
working [8,9].
Preprint submitted to Journal of Software: Evolution and Process February 15, 2024
Authors' version
Two notable factors stand out in how these companies work
to ensure high service availability while simultaneously deploy-
ing new software versions continuously. First, the development
and operation teams work closely to ensure code changes are
safely deployed to production [10,11]. Hence, the term opera-
tion is often used in conjunction with development as in DevOps
to highlight the importance of collaboration between the ones
doing the development of the software and the ones performing
the operations of the software [12].
Second, these companies use intelligent platforms assist-
ing with the tedious and time-consuming post-deployment ac-
tivities such as continuous monitoring, fault isolation, and
root cause analysis [13]. For example, Facebook uses a sys-
tem called Scuba for continuous monitoring, troubleshooting
problems as they happen, trend analysis, and pattern mining
[14,15]. Therefore, acknowledging the importance and di-
culty of maintaining highly available applications in a complex
and dynamic environment, the term AIOps was coined by Gart-
ner in 2016 [16]. AIOps advocates for using AI, analytics, and
big data to perform activities such as fault isolation, monitoring,
and anomaly detection [17,16].
In the same fashion, continuous deployment in software-
intensive embedded systems requires close collaboration be-
tween the customer and the system’s supplier [18,19]. How-
ever, unlike web and cloud-based applications deployed in
data centers or dedicated servers that users access remotely,
software-intensive embedded systems are physical products op-
erated and used by customers. These systems contain many
coupled components and increasingly complex software which
should work together to meet high reliability and performance
requirements [20]. In addition, software-intensive embedded
systems are often highly configurable to meet dierent cus-
tomer customization needs [21,22].
Therefore, suppliers of software-intensive embedded sys-
tems often provide several product-related services, such as
deployment, maintenance, and optimization. Product-related
services are critical to support customers and ensure their
satisfaction during the system’s lifetime [23,24]. In addi-
tion, product-related services provide considerable revenue to
software-intensive embedded systems companies, often with
a higher margin than product sales [25,26]. Thus, to de-
liver product-related services, companies often have a dedicated
service organization that interfaces with customers and works
closely with them [27].
To succeed with continuous deployment, all of the supplier’s
organizational functions need to work in shorter cycles [5].
Thus, continuous deployment demands greater alignment be-
tween the R&D organization and other organizational functions
[28]. However, while many studies have investigated dierent
aspects of continuous deployment in software-intensive embed-
ded systems, these studies focused primarily on the R&D orga-
nization [13]. To the authors’ knowledge, no previous empirical
study addressed the relationship between continuous deploy-
ment and the service organization in software-intensive embed-
ded systems companies. Therefore, this study has three objec-
tives:
First, to address the gap identified in the literature, this
study investigates the relationship between continuous de-
ployment and product-related services. In our earlier study
[29], we focused on customer support service and its role
in continuous deployment. In this study, we take a more
holistic view by addressing all product-related services.
Second, to explore how companies producing software-
intensive embedded systems utilize AIOps to support con-
tinuous deployment.
Third, to identify the challenges faced by software-
intensive embedded systems companies when using
AIOps to support continuous deployment.
To address these objectives, we conducted a case study at
one of the largest telecommunications systems suppliers in the
world. The company produces telecommunications systems
consisting of dedicated hardware and complex software. In ad-
dition, it provides several product-related services to support its
customers.
The contribution of this paper is threefold. First, it provides
critical insights that have not been addressed before in the lit-
erature addressing the relationship between continuous deploy-
ment and product-related services in a software-intensive em-
bedded systems context. Second, we explored how AIOps can
be used to support continuous deployment and the challenges
associated with using AIOps in software-intensive embedded
systems. By bringing this perspective to our research, we are
able to explore how AIOps can support the evolution of ser-
vices with continuous deployment. To the authors’ knowledge,
no previous study has investigated AIOps applications and chal-
lenges when supporting continuous deployment in software-
intensive embedded systems. Third, the case study company
has practiced continuous deployment for several years with
large-scale and complex software-intensive embedded system
products. Thus, the findings of this paper are derived from the
real-life case of a company performing continuous deployment.
The remainder of the paper is organized as follows. Sec-
tion 2provides an overview and a literature review of software-
intensive product-related services, AIOps, and continuous de-
ployment. Section 3details the research method used in this
study, including data collection, analysis, and threats to validity.
Section 4provides a background about the case study company.
Sections 5,6, and 7present the empirical findings for each of
the three research questions, respectively. Section 8discusses
these findings. Finally, Section 9summarizes and concludes
the paper.
2. Background
This section provides a literature review of services and soft-
ware evolution in product-oriented companies. In addition, it
highlights the role and classification of services as discussed in
the literature. Also, the section provides a literature review of
AIOps and continuous deployment.
2
2.1. Software and services evolution in product-oriented com-
panies
Product-oriented companies have been subject to two dra-
matic transformations during the last decades. The first is the
rise of the embedded software from a complementary compo-
nent to the electronics and mechanics of the product to become
the central part shaping the product’s function. Consequently,
the embedded software has been proliferating in size and com-
plexity [30,31]. For example, modern cars are now running
on code, and the complexity of embedded software in a fighter
jet is beyond humans’ cognitive ability [32,33]. As a result,
the notion of software-intensive embedded systems has evolved
to highlight that these products constitute multiple connected
components (systems) and to stress the importance of the em-
bedded software in shaping the product’s function (embedded
software-intense) [34]. As a consequence, software-intensive
embedded systems started to follow the same path as web and
cloud-based software development companies, adopting agile
software development practices such as continuous integration
and delivery [35]. Once agile software development capabilities
and continuous integration are established internally, software-
intensive embedded systems companies start transitioning to a
continuous deployment where the software is frequently and
rapidly deployed to customer systems’ [5,6]
The second is the elevation of product-related services. This
is because, during the 90s of the last millennium, product-
oriented companies faced several challenges, such as commodi-
tization of products, and reduced profit, which led these com-
panies to look for other sources of revenue than product sales
[36]. As a result, product-oriented companies looked at services
as an opportunity to substitute for declining product sales and to
increase their competitiveness. Since then, several terms have
emerged to highlight the importance of services with products,
such as “product services” and “after-sales services”. Later, the
Product Service System (PSS) became primarily used to indi-
cate the integrated nature of products and services. Goedkoop
et al. defined PSS as a marketable set of products and services
capable of jointly fulfilling a user’s need. The PS system is
provided by either a single company or by an alliance of com-
panies. It can enclose products (or just one) plus additional
services. It can enclose a service plus an additional product.
And product and service can be equally important for the func-
tion fulfilment [37].
There are three categories of PSS systems [38,26]: first,
product oriented PSS where the product ownership is moved
to the customer while additional services are provided by the
vendor either free during a specific period, by contract or based
on a service transaction. Second, user oriented PSS where the
vendor owns the product, performs additional services to the
product but sells its function to the customer, for example, by
leasing. Third, result oriented PSS where the vendor owns and
services the product but sells the product’s capability, or result,
to the customer who pays only per usage.
2.1.1. The role of services in software-intensive embedded sys-
tems companies
The role of services has attracted researchers from multi-
ple disciplines, such as business management, industrial en-
gineering, and information, communications, and technology
(ICT) [39]. Business management researchers were early in
describing the movement of product-oriented companies to-
wards services, covering topics such as oering development
strategies, business models, and criteria for the evolution to a
service-based business [40,41]. Researchers from industrial
engineering have investigated development methods for design-
ing products and services among other topics [42,43], while
ICT researchers have recently joined, focusing on the chang-
ing ecosystem as PSS become more intelligent and connected
[44,45].
Researchers across disciplines agree that services play a sig-
nificant role in companies developing software-intensive sys-
tems. From a business perspective, services provide consider-
able revenue with an often higher margin than product sales.
Szwejczewski et al. [25] have investigated the role of services
in six companies, among them a company developing passenger
cars and another developing regional passenger aircraft. The
authors found that services sales account for 15-52% of the
revenue of the investigated companies and often have a higher
margin than product sales. A similar observation was raised
by Gebauer et al. [26], who highlighted the significant revenue
contribution of services to companies like IBM, ABB, and Er-
icsson.
From an engineering perspective, services are important in
improving the product’s availability. McPhail et al. [46] indi-
cated that increasing the availability of a telecommunications
system beyond what the product is designed for requires a
highly responsive and technically excellent customer support
team. As described by Windley et al. [47], delivering high-
availability services relies not only on technology but also on
organization, processes, and people surrounding the technol-
ogy.
Furthermore, from a customer experience perspective, ser-
vices are a vital contributor to customer satisfaction [25,48].
Prior et al. [49] highlighted that services could contribute to
customers’ satisfaction by reducing faults detection and correc-
tion lead time.
2.1.2. Services classification
Services can be categorized into two categories, services sup-
porting the product and services supporting the customer [50].
The services supporting the product aim to ensure adequate
product functionality, while the ones supporting the customer
aim to advance the client’s mission while utilizing the product.
Another approach to classifying services is highlighted by
Parida et al. [51], who conducted a survey study in 30 compa-
nies, including Ericsson and Volvo cars. The authors categorize
four services: essential services, maintenance services, R&D
services, and functional services. Furthermore, Chowdhury et
al. [44] highlighted new service types associated with smart
PSS, such as remote monitoring and diagnostics. The word
3
smart refers to the intelligent nature of these systems as they
are connected and are increasingly self-driven.
2.2. Services and continuous deployment in software-intensive
embedded systems
Continuous deployment is the ability to bring valuable soft-
ware features to customers in shorter cycles than traditional lead
times, from a couple of weeks to days or even hours [13]. Ac-
cording to Stahl et al. [52], continuous deployment is an oper-
ations practice where release candidates evaluated in continu-
ous delivery are frequently and rapidly placed in a production
environment.
In addition to the actual deployment of the new software ver-
sion, continuous deployment involves several post-deployment
activities, such as monitoring the system and customer behav-
ior, identifying unexpected patterns and run-time issues, and
collecting real-time data to feed both business and technical
planning [13,28]. These activities are often conducted by the
operations team in the context of web and cloud-based appli-
cations. However, in the case of software-intensive embedded
systems, they are often delivered by the service organization.
2.2.1. Product-related services challenges with continuous de-
ployment
Gerostathopoulos et al. [53] raised the observation that con-
tinuous deployment increases the speed of features and new
content delivery, which would be challenging for customer ser-
vice organizations as they need to be aware of all changes
and dierent variations of features when responding to cus-
tomer queries. Similarly, Rodr´
ıguez et al. [13] highlighted
the diculty of propagating information and increasing learn-
ing with continuous deployment. In addition, communication
of changes between dierent parts of the organization, as well
as customers, becomes more challenging [54].
Moreover, with a continuous supply of software, new fea-
tures might not be discovered by customers, as continuous
deployment does not necessarily mean that new features are
adopted and used [55]. Therefore, Fitzgerald et al. included
continuous use as one of the continuous practices in the op-
erations phase of the continuous software engineering frame-
work [56]. In addition, the authors included continuous run-
time monitoring to enable early detection of service quality is-
sues and to ensure the fulfillment of Service Level Agreements
(SLAs). However, the authors didn’t elaborate on how these
practices can be performed [28].
Furthermore, continuous deployment increases the system’s
complexity as a consequence of more features being introduced
in the code leading to more sophisticated interactions between
the features, even if not used [57]. This increased complexity
makes post-deployment activities such as troubleshooting, fault
identification, and root cause analysis more dicult. There-
fore, to succeed in transitioning to continuous deployment, it is
critical to establish a dedicated support team equipped with in-
telligent tools to perform continuous monitoring of live systems
[18].
2.2.2. The evolvement of AIOps
Acknowledging the importance and diculty of maintaining
highly available applications in a complex and dynamic envi-
ronment, the term AIOps was coined by Gartner in 2016 [16].
In its first version, the acronym was referring to Algorithmic
IT Operations, which was later on changed to Artificial intelli-
gence for IT operations [58].
AIOps has been used to address several applications, such as
anomaly detection applied to the Key Performance Indicators
(KPIs) of data centers and distributed systems’ traces [59,60],
failure root cause analysis [61], fault localization [62], and
closed-loop Service Level Agreement assurance (SLA) [63].
Due to the recent nature of AIOps and its cross-disciplinary
nature, Notaro et al. [64] conducted a mapping study to struc-
ture the AIOps domain. The authors proposed a taxonomy con-
sisting of two macro-domains: Failure Management and Re-
source Provisioning. Failure Management consists of five cat-
egories: failure prediction, failure detection, failure prevention,
root cause analysis, and remediation. Similarly, Resource Pro-
visioning is divided into five categories: resource consolidat-
ing, scheduling, power management, service compositing, and
workload estimate. In a subsequent study focusing solely on
Failure Management, Notaro et al. [65] have characterized the
five categories of Failure Management while breaking them fur-
ther into 14 subcategories.
Dang et al. [17] have highlighted the role of AIOps in sup-
porting service engineers working in a cloud-based infrastruc-
ture. The authors indicated three areas AIOps can support:
high service intelligence, customer satisfaction, and engineer-
ing productivity. In the service intelligence area, AIOps can
detect quality degradation, as well as be able to predict future
status. AIOps can also contribute to higher customer satisfac-
tion by suggesting changes in the system proactively to meet
customers’ needs, such as parameter tuning and optimizations.
On the engineering productivity front, service engineers are
relieved from doing tedious work manually, such as data col-
lection and manually fixing repeated issues. Furthermore, the
authors also summarize the real-world challenges when build-
ing an AIOps application based on their learnings and experi-
ence from Microsoft. The authors have identified three signifi-
cant challenges: gaps in innovation methodologies and mindset,
engineering changes needed to support AIOps, and diculty
building ML models for AIOps [17].
3. Research Method
This study aims to gain an in-depth understanding of the im-
pact of continuous deployment on product-related services in
the context of software-intensive embedded systems. Also, to
explore how AIOps can be used to support continuous deploy-
ment and AIOps challenges. Therefore, we formulated the fol-
lowing research questions:
RQ1: What is the impact of continuous deployment on
product-related services in software-intensive embedded
systems?
4
RQ2: How can AIOps be used to support product-related
services in a continuously evolving software-intensive em-
bedded system?
RQ3: What challenges do software-intensive embedded
system companies face when using AIOps to support con-
tinuous deployment?
To answer these questions, we conducted a qualitative ex-
ploratory case study. The reasons for this choice of methods
were the following. First, product-related services involve both
human and technological aspects. Thus, qualitative methods
provide rich results when humans and technology are involved
as they force the researcher to examine the complexity of the
problem rather than abstract it away [66]. Second, case studies
are a suitable research method to evaluate software engineering
practices in industrial settings [67,68]. Third, the purpose of
this study is exploratory [69] , as we wanted to find out what
is happening and seek new insights about the role of product-
related services and AIOps in supporting continuous deploy-
ment.
The steps applied in this study are based on the guidelines
outlined by Runeson and H¨
ost [69]. These guidelines follow the
recommendations defined by Yin [68]. The steps we followed
are:
Case study design: we set the objective, documented the
research questions, and identified the case study company
and product line.
Preparation for data collection: we documented the case
study protocol, prepared the interview guide, created an
initial list of interviewees to invite for interview, and iden-
tified sources of relevant documentation.
Data collection: we conducted interviews and collected
supporting documentation for analysis.
Analysis of the collected data: we created a transcript of
the interviews and analyzed them. In addition, we ana-
lyzed the collected documents. This step was going in par-
allel with the data collection, which allowed us to adjust
the interview guide based on new insights emerging from
previous interviews and review related documents based
on the analysis of the collected ones.
Reporting: we documented our findings and analysis in
this paper.
3.1. The case company
The case study company is a large multinational telecom-
munications systems supplier. The company has more than
a hundred thousand employees distributed across the globe.
The case company is divided into several large-scale domain-
specific business units responsible for developing products re-
lated to the unit’s domain. In addition, the case study company
has several service business units focusing on dierent product
domains. Services account for roughly 40% of the company’s
overall revenue.
This paper focuses on the 4G and 5G Radio Access Network
(RAN) product line. The RAN part of the mobile network con-
sists of radio base stations which are embedded systems prod-
ucts responsible for providing radio coverage to mobile users.
Each base station consists of several interconnected hardware
units such as antennas, base-band processing boards, and trans-
mission equipment.
4G and 5G RAN software is developed by a dedicated R&D
development organization with a few thousand employees. The
activities of the R&D organization include designing, coding,
testing, and releasing the software. Product-related services
are provided by a dedicated organization responsible solely for
product-related services. The service organization has dedi-
cated personnel, tools, and systems. The service organization
often has a local presence, where support staresides in the
same geography as the customers. In addition, it has several
global service centers supporting customers remotely across
dierent regions.
3.2. Data collection
Data was collected from three primary sources: interviews
with selected participants, related internal documents review,
and meetings notes. Using multiple data sources helps to
achieve triangulation, as they provide a broader picture and in-
crease the research precision [69]. The first author of this study
works as an R&D manager who has been closely involved in
multiple continuous deployment projects. Thus, we had exten-
sive access to internal documentation and participated in many
meetings.
Interviews: we conducted 16 interviews with 20 partici-
pants from the R&D organization and the services organi-
zation on both local and global levels. We also interviewed
developers and solutions architects working with service
systems to ensure we covered technical details related to
AIOps challenges and how AIOps can be used for continu-
ous deployment. The interviews were conducted virtually
using Microsoft Teams between October 2021 to August
2022. The interviews lasted between 40 minutes to 1 hour
and were recorded and transcribed. At least two authors
were present in each interview. The participants were se-
lected by the first author based on their relationship with
services, service systems development, and continuous de-
ployment. A preliminary list of persons was identified and
invited to interview. The invitation included a description
of the research objectives, highlighting that participation is
voluntary. Table 1shows an overview of the interviewee
participants and their roles.
Documents review: we reviewed a significant number of
documents covering product-related service and evolution
strategy. These documents were either already known to
the first author, shared by interviewees, or available as
training or documents in the internal network.
Meetings notes: the first author participated in more than
30 meetings discussing service delivery flow eciency
5
Table 1: Interview participants and roles
Interview Interviewee Role
1 A Customer support portfolio manager
2 B Customer support strategy manager
3 C Customer operations manager
4 D Customer support consultancy manager
5 E Software release technical coordinator
6 F Software release technical coordinator
G Customer support engagement manager
7H Customer support team lead
I Senior customer support engineer
8J Senior customer support engineer
K Regional customer support manager
9L Customer support line manager
M Services Systems solution architect
10 N Services Systems solution architect
11 O Services systems principal developer
12 P Services architecture expert
13 Q Global customer services manager
14 R Services transformation lead
15 S Customer DevOps expert
16 T Customer services engagement lead
with continuous deployment. These meetings were pri-
marily focused on building intelligent tools and capabil-
ities to improve the eciency of continuous deployment
service activities, such as continuous monitoring and trou-
bleshooting.
3.3. Data analysis
The collected data was analyzed using the six-phase induc-
tive thematic coding procedure proposed by Braun and Clarke
[70], which includes transcribing the interviews and gathering
related documentation, analysis of the data, and extraction of
the initial set of code, identification of the main themes related
to the three research questions, review of the themes and their
related codes which was conducted by the authors and resulted
with new codes being added. Finally, before writing this paper,
we defined the themes as the last step.
3.4. Threats to validity
The validity of a study implies the trustworthiness of the
results. Dierent classifications to the threat of validity are
used by software engineering researchers [71]. In this study,
we selected the classification chosen by Runeson et al. [69],
which adopts the classification proposed by Yin [68], as they
are widely used by the software engineering community. This
classification divides validity threats into four categories: con-
struct, external, and internal validity, in addition to reliability.
Construct validity refers to what extent the operational
measures used in the study reflect what the researchers
aim to study and are represented by the research ques-
tions. In this study, all researchers are familiar with con-
tinuous deployment, AIOps and research related to this
topic research. However, while all participants have prac-
tical knowledge about continuous deployment and AIOps,
they do not always utilize the same terminology used in
research. Thus, to address the threat to construct validity,
the first author, who is employed by the case study com-
pany and has several years of experience in the company,
has been present in all interviews accompanied by one of
the other authors. Thus, when a question was not under-
stood by an interviewee or clarifications were needed, the
first author used internal terms and examples to exemplify
the question. Furthermore, in this study, we used triangu-
lation with multiple sources of data as another measure to
address construct validity.
External validity refers to the degree the case study find-
ings can be generalized and how. This study’s results are
extracted from a case study in the telecommunications do-
main; therefore, we do not claim the generalizability of
the results. More empirical research is needed to achieve
external validity. However, we believe there are many sim-
ilarities between the case study company and other large-
scale software-intensive embedded systems vendors, espe-
cially the ones working in a business-to-business context.
In this study, we aimed to provide as much information
as possible to help the reader decide if the context of the
case study is similar to other industry segments or compa-
nies without compromising the non-disclosure agreement
(NDA) the authors have with the case company.
Internal validity is concerned about missing to identify all
factors contributing to the results of investigated factors in
causal relationships. Thus, as described by Yin [68], inter-
nal validity is not applicable to descriptive or exploratory
case studies. While the inferences made in this case study
are not causal, we used peer debriefing among the authors
of this paper, in addition to member check-in, where the
results of this research were shared with the participants
of the study to provide feedback and comments, as a va-
lidity mitigation techniques when conducting case study
research.
Reliability refers to the degree the data collection and the
analysis are dependent on the specific researchers. As the
first author is employed by the case study company, there
is a risk that his own views and interpretations might influ-
ence the data collection and analysis. To reduce the risk of
bias during data collection and analysis, we interviewed
20 persons representing dierent roles, such as solutions
architects, managers and service leaders. In addition, we
also used triangulation with multiple data sources, and the
three authors have been involved during the triangulation
and analysis of the data.
4. The case study context
This section provides background about the case study com-
pany and 4G/5G RAN. In addition, it describes the RAN soft-
ware development, release, and product-related services.
6
4.1. 4G and 5G Radio Access Network
Mobile telecommunications networks can be divided into
two parts, Radio Access Network (RAN) and Core Network
(CN). The RAN and CN are connected via standardized inter-
faces. In this study, we focus only on the RAN part of a mobile
network.
4G RAN consists of connected Enhanced Node B (eNodeB)
systems. Each eNodeB comprises several interconnected com-
ponents, such as antennas, baseband processing units, and
transmission equipment. Thus, each eNodeB provides 4G ra-
dio coverage for the surrounding geographical area. 5G archi-
tecture is similar to the 4G in that one system provides radio
coverage. However, the radio node is called gNodeB instead,
where the g stands for “Next generation”. 5G RAN has two ma-
jor deployment scenarios: Non-Stand-Alone (NSA) and Stand-
Alone (SA) [72]. Figure 1shows a simplified architecture of 4G
RAN, 5G SA, and 5G NSA and the interfaces between dierent
nodes.
Figure 1: Simplified 4G, 5G NSA, and 5G SA RAN architecture [72]
4.2. Software development and release
RAN software is a large-scale software built by thousands of
developers. The development process has undergone dramatic
changes over the years. As a result, hundreds of development
teams have been established across several R&D centers dis-
tributed across the globe. Continuous integration and testing is
a 24/7 activity using a highly automated process.
Due to the dependency between 4G and 5G RAN technolo-
gies, especially with the 5G NSA configuration, the case study
company has one software mainline for both 4G and 5G soft-
ware. Thus, one software release can be used for both tech-
nologies. By changing the software configuration parameters,
the radio node can act as eNodeB or gNodeB. In addition to the
software configuration, the hardware in the node shall support
the configured technology. Older hardware does not support
5G, while newer hardware supports 4G and 5G. Additionally,
the case study company oers many hardware variants that en-
able the nodes to be highly customizable to support various ra-
dio frequencies and deployment scenarios, such as deployments
in urban and countryside areas.
As depicted in Figure 2, the case study company has three
types of software releases: major, maintenance, and Continuous
Deployment (CD) releases. New major releases become gener-
ally available to all customers every three months. The release
date of each major release is referred to as General Availability
(GA) date. Following each major release, a maintenance period
starts, which lasts for 18 months. During the maintenance pe-
riod, the case study company releases a number of maintenance
releases. Each maintenance release contains only bug fixes.
CD releases are released every second week. They are
branched directly from the mainline and thus contain both bug
fixes and new development. CD releases are deployed to a rep-
resentative subset of the customers’ networks, called CD zone.
Continuous deployment releases are only deployed in the CD
zone, while major and maintenance releases are deployed to
the rest of the network.
It can be observed that the case study company’s deployment
context can be considered as 1:many:many (one-to-many-to-
many), which means that each release is intended to be used
by many customers, and each customer has many base stations
where the software is deployed. This is a major dierence to
the deployment context in web and cloud-based applications,
where a new release is deployed to one or a few production en-
vironments only.
Therefore, the case study company has two types of cus-
tomers: customers with CD zones and customers without CD
zones. To speed up the introduction of 5G networks, more cus-
tomers are establishing a CD zone in their networks. CD cus-
tomers account for more than 15% of the total customer base
for both 4G and 5G RAN.
4.3. RAN product-related services
The case study company oers three main product-related
services to their customers: network rollout, customer support,
and network optimization.
4.3.1. Network roll-out
Network rollout aims at helping customers with radio site
acquisitions, hardware build, installation, commissioning, and
civil work. These activities are conducted during network mod-
ernization projects where site locations need to be changed or
during swap projects between dierent vendors. In addition,
network rollout provides the project setup and execution to roll-
out new major software releases into the network. Rolling a
new software release into the network includes activities such
as lab testing, impact analysis, and the actual deployment and
acceptance procedure of the new software.
4.3.2. Customer support
Customer support is responsible for supporting customers.
This is done by responding to the customers’ Customer Sup-
port Requests (CSRs). Customers raise CSRs when they require
7
Figure 2: Releases and customers
further technical information or assistance while operating the
products. Thus, Customer support performs activities such as
answering technical questions, troubleshooting, fault identifica-
tion, and root cause analysis.
4.3.3. Network design and optimization
Network design and optimization comprise a design part, of-
ten done when a new network is built or during modernization
projects. The design involves software and hardware dimen-
sions, high- and low-level network design, and related docu-
mentation. Network design is often associated with a new net-
work buildup.
On the other hand, network optimization consists of activities
to optimize the network’s performance, such as radio parame-
ters and feature tuning to meet the operator’s objectives. Net-
work optimization is conducted with new software or hardware
upgrades.
5. Continuous deployment impact on product-related ser-
vices (RQ1)
This section presents the empirical findings for the first re-
search question. To understand how continuous deployment
impact product-related services, we compared the activities
conducted with customers without CD zones representing the
legacy services flow versus service activities conducted with
CD zones customers, thus representing the flow and activities
of the service with continuous deployment.
5.1. Legacy services and their activities
The three product-related services for customers without CD
zones start in sequence: rollout, optimization, and customer
support. As depicted in Figure 3, the rollout and optimiza-
tion projects begin after a new major release becomes generally
available . These two services are conducted as projects, each
with a defined start and end date. The rollout project role is to
ensure that a new major release is introduced to the customer
network. Introducing a new major release into the network is
referred to as software upgrade activity.
The rollout project conducts an impact analysis of the release
on the customer’s network. The impact analysis reviews the
software’s legacy changes and prepares the customer’s opera-
tions team to handle these changes. This includes, for example,
adjusting for non-backward compatible changes in the config-
uration parameters of the software, including new performance
counters in the KPIs formulas, replacing deprecated counters
from the formulas, default-on improvements of existing func-
tionality that might impact the network’s KPIs, and operational
procedures or 3rd party products connected to the RAN net-
work. In addition, the rollout project conducts extensive lab
validation according to the customers’ testing scope.
“The rollout project starts with the GA [General Avail-
ability] . It starts with extensive testing lasting between
anytime from 1 to 1.5 months to 2.5 months” Intervie-
wee Q - Global customer service manager
8
Figure 3: Services flow without continuous deployment
In parallel with the rollout project, an optimization project
starts to optimize the customer network based on new features
available in the major software. Since these features are new,
their parameters and attributes often need to be adjusted to fit
the customer’s network topology and configuration. Therefore,
the network optimization project tests new features which the
customer is interested in and identifies the appropriate config-
uration parameter values. The optimization project and rollout
project interact with each other to ensure that what is recom-
mended by the optimization project will reflect on what is re-
ferred to as the parameters and features baseline that will be
applied during the rollout.
If a software bug is identified in the rollout or optimization, a
bug ticket is filed and sent to the R&D organization. If the R&D
organization confirms the bug, a correction is included in the
upcoming maintenance release and merged into the mainline.
Thus, software bugs impact the rollout and optimization project
timelines, as they need to wait for the next maintenance release
to ensure customer acceptance.
After completing a rollout project and establishing the new
parameters and features baseline, a handover is executed from
the rollout project to the customer support project. The cus-
tomer support project is responsible for responding to Customer
Support Requests (CSR) tickets. Customers raise these tickets
by logging an issue in a dedicated web portal, calling the case
company’s front desk, or by email. Once these requests are re-
ceived, they are routed to an appropriate support team. Based
on the context of the CSR and its severity, the CSR is routed to a
local or global support team based on pre-defined routing crite-
ria. In addition, the CSR will be given a priority tag to indicate
how fast the team should act on it. The priority also decides the
Service Level Agreements (SLAs) which should be applied.
In addition, the customer support project is responsible for
introducing a newer maintenance version from the same major
release branch. When the nodes are running on an older main-
tenance release, and a more recent maintenance version from
the same major release branch is introduced, the activity is con-
sidered a software update rather than software upgrade as used
in the rollout project. This is because the customer’s parame-
ters and features baseline do not change between maintenance
releases of the same major release, as the dierence between
the running release and the newer maintenance release is only
bug fixes. The customer support project performs lab validation
activities of the maintenance release in a similar fashion as per-
formed in the rollout project. However, the scope of testing is
often less than what is performed in the rollout project.
“When the rollout project is finished with the network up-
grade, a handover is executed to the customer support
project. Our responsibility is to respond to customer CSRs
and to update the network to newer maintenance releases
if needed” Interviewee G - Customer support engage-
ment manager
The customer operations organization is constantly engaged
with the case study company’s rollout, optimization, and cus-
tomer support project. The customer operations organization
plays a critical role as it provides the lab testing infrastructure,
decides the testing scope, reviews new features, determines new
features of interest, follows-up support tickets, and supervises
the project activities. For example, a dedicated team is respon-
sible for new features’ review and validation, another dedicated
team is responsible for rollout preparation, and yet another team
is responsible for customer support.
5.2. Continuous deployment services and activities
Continuous deployment releases are branched directly from
the mainline every second week. Thus, the dierence between
a software upgrade and update vanishes. As depicted in Figure
4, continuous deployment releases contain both bug fixes and
new functionality.
Therefore, one of the significant consequences of continuous
deployment is that there is no “maintenance” release anymore.
If a bug is identified post-deployment of the CD release, a roll-
back is to be executed, or the fault has to be tolerated till an
upcoming CD release with a fix.
9
Figure 4: Software releases and mainline growth
In addition, within a two weeks release cycle, there is not
enough time to conduct service activities in sequence. There-
fore, service activities are performed in one project, as depicted
in Figure 5. Consequently, the boundaries between a rollout,
optimization, and customer support project disappear.
“With frequent releases and deployment, the borders are
removed between network rollout project and customer
support” Interviewee J - Senior customer support en-
gineer
Figure 5: Services with continuous deployment
This has also led to changes to service activities. Therefore,
service activities in continuous deployment can be divided into
two groups: first-time and continuous.
5.2.1. First-time service activities
Moving customers’ ways of working to embrace continu-
ous deployment requires an initial engagement eort. Cus-
tomers’ operations organization is often structured to support
major software release deployments only. There are teams or
units dedicated to working solely with rollout, optimization, or
support. Therefore, several interviewees highlighted the impor-
tance of evolving customers’ way of working as a pre-request
for continuous deployment. Similarly, in several meetings we
participated, the approach to working with customers in a con-
sultative way was discussed. Thus, the first step for the case
study company is often to elevate the customer’s operations or-
ganization’s way of working and structure to support continu-
ous deployment. This can be summarized by the following four
activities:
The first is reducing lab testing scope while automating the
test cases as much as possible. In this case, the case study com-
pany shares the scope of the testing executed before releasing
the software with the customer to identify duplicate test cases
that can be removed from the customer’s lab scope. In addition,
the case study company provides an automation solution that
helps customers automate the execution of test cases in their lab
environment. Despite the reduction of scope, lab validation is
necessary to ensure that basic functionality, such as emergency
calls, works with the customers’ network and to validate the
interfaces between the radio base stations and the customer’s
third-party products (3PPs) in the network.
The second is to evaluate the customer’s network to design
a suitable Continuous Deployment Zone (CD Zone) represent-
ing the entire network. This process involves identifying the
dierent hardware and software configurations the customer
has. The eNodeB and gNodeBs are highly configurable prod-
ucts from both hardware and software perspectives. From the
hardware side, customers often choose between dierent radio
units, antennas, and digital processors to match the site’s spe-
cific needs, such as radio coverage distance, radio frequency,
and node capacity. While from the software side, there are
thousands of configurable parameters that customers can adjust
based on their needs. For example, a 4G eNodeB has over 7000
configurable software parameters and more than 200 software
features. As a result, each customer has hundreds or thousands
of configurations in their network if all possible combinations
of hardware and software are considered jointly. Thus, identi-
fying a suitable and representative CD zone requires a detailed
analysis of the customer’s network.
The third is to define and agree on continuous deployment-
specific Service Level Agreements (SLAs), stipulating how
long it takes for an issue to be identified, analyzed, and cor-
rected. Continuous deployment SLAs shall be faster than
legacy SLAs. In addition, the customer and the case study
company agree on the rollback criteria. Due to the mobile net-
work’s critical role, the slightest degradation in the network
KPIs would justify a rollback. However, shall a rollback be
triggered if a minor performance degradation is observed in a
secondary indicator without impacting any of the key primary
indicators (KPIs)? For how long can the degradation be allowed
to enable troubleshooting in this cases? Such questions are dis-
cussed and agreed upon with the customer.
Fourth, the establishment of a pipeline supporting both de-
ployment and data collection. From a deployment perspective,
the pipeline securely connects the customer network to the case
study company’s network, allowing new software releases to
be downloaded to the customer network once released. Af-
ter downloading the new software, the deployment pipelines
check the software’s integrity and automatically upgrade the lab
nodes. The lab test cases are triggered, and the results are stored
10
and made available for analysis. The deployment pipeline stops
at the lab stage without deployment automation to the CD zone.
This is because the customer and the case study company ser-
vice team supporting continuous deployment manually review
the impact of the continuous deployment software and new fea-
tures delivered.
In addition, from a data collection perspective, the pipeline
collects performance, configuration, and diagnostic data from
the customer’s CD zone. However, the data collection pipeline
architecture diers depending on the degree the customer al-
lows data to be shared with the case study company. For exam-
ple, customers might allow certain data types to be shared only
and block other types, or the customer might allow data to be
shared but limit its movement by demanding its storage in spe-
cific geographies. Nevertheless, during the onboarding phase,
the data pipeline architecture is agreed upon between the case
study company and the customer.
5.2.2. Continuous activities
After the first time activities are done, several continuous
activities start. These activities are conducted by one service
project composed of customer-specific teams with local and
global presence. Each team has cross-functional competence,
such as support, optimization, and lab testing. Each team works
closely with its assigned customer and synchronizes at least
once daily with the customer’s operations team. Further, the
project manages the cross-team alignment and knowledge shar-
ing. The service project also includes members from the soft-
ware release program, which is responsible for the software’s
quality, documentation, and feature content. The release pro-
gram members act as feedback proxy to the rest of the R&D or-
ganization as they coordinate feedback information to the hun-
dreds of development teams working with the software.
The first continuous activity is the CD zone software deploy-
ment procedure. The software deployment procedure involves
deployment preparations such as download, integrity check, lab
test, backup of the running configuration, software upgrade,
customizing the software parameters per customer’s configu-
ration, and ensuring that the network KPIs are all in the accep-
tance levels following the upgrade. While the deployment pro-
cedure does not dier from the ones performed in the rollout
project, the frequency of the activity is much higher. There-
fore, to be able to conduct these activities without increasing
the number of services people involved, the case study com-
pany has developed several tools to automate the software de-
ployment procedure. However, the deployment procedure does
not run entirely without human supervision. The procedure is
supervised by a service engineer in case there is a need for re-
mote manual intervention in the node. An example would be if
a node does not start after the upgrade or shows faulty behavior
requiring immediate troubleshooting and analysis.
Second, continuous monitoring of the node’s performance.
There are two flavors of monitoring: one is referred to as
babysitting, while the other is fine monitoring. The babysitting
monitoring takes place immediately after the new software has
been deployed for three hours. The main objective is to iden-
tify if there is a large deviation in the major KPIs following the
software deployment. This is because upgrades often happen
late at night when there is less trac in the network. There-
fore, minor KPIs deviation will need more time to be identi-
fied. Thus, the fine monitoring aims to find minor deviations in
the nodes’ performance by considering trac seasonality. This
is done by evaluating trac behavior during weekends, busy
hours, and dierent weekdays with the KPI values before the
software change. Thus, fine monitoring is a continuous activity
that spans the entire two weeks lifetime of the continuous de-
ployment release. To perform continuous monitoring, several
types of data, such as alarms, trac performance, and program
execution logs and traces, are collected . The collected data are
then analyzed automatically to identify deviations.
Third, continuous optimization which is based on new fea-
tures made available in the software. The customer operations
and the service team discuss the trials of new features. The ser-
vice team often advises customers of new features suitable for
their network. Thus, the service team plays a critical role in in-
creasing the adoption of new features that customers might not
notice or when the feature value is unclear to them. The op-
timization activity also involves activating new features in the
customer network and quantifying the gain.
Fourth, continuous adjustment of the CD zone. Customers
often add new radio sites to expand the RAN network or adjust
the configuration of some sites based on demographic needs,
such as new roads or buildings. In addition, with the release of
new hardware types, newly built or upgraded nodes might have
new hardware types that do not exist in the initially selected CD
zone. Thus, operators’ networks are not static. This means that
the initially selected CD zone must also change frequently to
reflect changes on the network.
Fifth, continuous root causes analysis and troubleshooting.
In an operational RAN network, there are often many alarms,
errors, and notification events generated by the eNodeB’s and
gNodeB’s. Some of these are considered operational noise,
which is just a consequence of this equipment’s operation. A
typical example cited during several meetings is an alarm gen-
erated due to a transmission disturbance between the antenna
and the base station. If the alarm lasts for a very short pe-
riod and then ceases, it is likely to be due to a link distur-
bance, and it would not impact the performance KPIs. How-
ever, if the alarm lasts for a longer duration or its frequency has
changed, it is likely due to other reasons, such as a software
bug or configuration mistake. However, as continuous deploy-
ment releases contain both bug fixes and new content, the op-
erations noise changes frequently . Thus, the support team is
continuously troubleshooting and analyzing network issues to
determine what is noise, what is an expected behavior of the
software change, and what is not expected and thus could be a
software bug or configuration error.
“Continuous deployment releases come with many im-
provements and changes to our software that continuously
impact this noise level at the customer network. We, there-
fore, need to continuously troubleshoot and evaluate if an
alarm, event, or log is a consequence of the new software,
11
is a new bug, has the customer changed something man-
ually on the configuration, or is it part of the new noise
level” Interviewee H - Customer support team lead
To be able to perform these continuous activities, the service
team conducts a daily synchronization with the customer’s op-
eration team. Both customer’s operation team and the service
team share observations and discuss actions and mitigation.
Thus, the team delivering continuous activities becomes close
to the customer as they are part of their operations team. As
the service team continuously monitors nodes, the team works
proactively with the customer rather than re-actively, as in the
legacy services delivery flow. Further, the service team provides
consultative advice as they engage with activities typically de-
cided by the customer, such as lab testing scope and CD zone
structure.
In addition, these continuous activities are considered labor-
intensive. During several meetings, the relationship between
labor costs and the number of support teams is discussed. The
case study company started with one team to support the first
continuous deployment customer when the company began re-
leasing software more frequently. The team performed contin-
uous activities manually with the first customer. However, with
the number of continuous deployment customers starting to in-
crease, the case study company has realized the importance of
both automation and intelligence to increase the eciency of
continuous deployment while maintaining the cost. Therefore,
to scale continuous deployment, the usage of both automation
and intelligence capabilities is seen as critical.
“If you do something much more often than before, you
need to reduce the cost every time you do it, and then
you need to automate manual work” Interviewee T -
Customer services engagement lead
Therefore, the data collection and deployment pipeline is es-
sential to automate software deployment and data collection ac-
tivities which are repetitive activities with defined procedures.
However, activities such as anomaly detection, troubleshoot-
ing, and root cause identification are not only repetitive but also
complex. Thus, they are often conducted by domain experts.
Hence, the case study company considered introducing an in-
telligent system to assist the service team in performing these
activities as a critical enabler to scale continuous deployment
without scaling the number of domain experts needed to sup-
port customers.
6. AIOps and continuous deployment (RQ2)
To support the service team with time-consuming and com-
plex continuous activities, several interviewees highlighted the
need for a dedicated, intelligent platform. Therefore, the case
study company uses a dedicated and intelligent platform that
collects, parses, correlates, and potentially acts on various types
of network data. While the platform has a dedicated name used
within the case study company, we will refer to it in this paper
as AIOps Platform.
This section presents the findings of the second research
question addressing how AIOps can be used to support con-
tinuous deployment, which we present from three perspectives:
AIOps platform architecture, AIOps usage with continuous de-
ployment, and AIOps platform deployment scenarios.
6.1. AIOps platform architecture
From an architectural perspective, the results from the case
study show that the AIOps platform has a layered architecture
to separate dierent functions in the system, as depicted in Fig-
ure 6. The AIOps platform is designed for speedy data collec-
tion and interactions with the radio nodes. In addition, visu-
alization is key to allowing customers and the service team to
interact with the platform.
Figure 6: AIOps platform with four layers
The platform consists of four layers: data collection, data
management, data intelligence, and visualization layer. The
data collection layer is responsible for connecting to the RAN
nodes, where the software is continuously deployed every sec-
ond week to collect various data types. The data collected are
performance, configuration, alarm, events, or internal data such
as traces and program crashes. The data collection is executed
by a collection agent, which resides in the customer network,
representing the data collection layer. Further, the collection
layer does not perform collection only but can also interact with
the nodes by sending actions or commands. Thus, the pipeline
established by the data collection layer is a two-way pipeline.
After the data is collected, it is passed to the data manage-
ment layer, which reads, parses, and stores data. In addition,
the layer controls data access, privacy, and life cycles of the
collected data by destroying older data after the retention pe-
riod has elapsed or achieves the data. The data collection and
management layers are designed to achieve fast data collection
and ingestion.
“We need to collect data with speed. We designed our
system to allow for quick data collection and ingestion
while considering the bandwidth required on the O&M
interface” Interviewee M - Services systems solution
architect
The data intelligence layer holds the analytical and machine
learning capabilities. The layer is programmable where users
12
can define use cases, either AI or analytics based, and specify
their execution criteria. The use cases have three main types.
First, reports generation use cases where a report is produced
if a particular trigger is identified, such as when an event log
contains specific information in the collected data. Report gen-
eration use cases can be also run on-demand or be scheduled
periodically. Second, open-loop automation use cases where an
alert is sent to subscribed users if a specific trigger is met. This
is often where an undesirable situation is identified and requires
a human to investigate further. Third, closed-loop automation
use cases where the intelligence layer automatically attempts to
recover from the incident by interacting with the radio base sta-
tions. In this case, no human is involved in the loop, and the
AIOps platform takes the end-to-end automated action.
The visualization layer provides an interface for users to
query the data on demand. In addition, it provides various dash-
boards aggregating performance metrics such as the number
of alerts per day, closed-loop use cases’ execution status, and
the base stations’ health status. In addition, the visualization
layer shows the status of the underneath layer and the health of
the AIOps platform. Furthermore, the layer allows the user to
schedule use cases, terminate them or trigger new execution.
“The visualization is key because, without that, it is hard
for the customer to understand. The customer sees great
value when the impact is visualized, which action has
been applied, what are the top oenders and so forth”
Interviewee T - Customer services engagement lead
6.2. AIOps usage with continuous deployment
We have identified six areas where AIOps can be used with
continuous deployment, which are: reducing the number of
support tickets, reducing the investigation time and eort, re-
ducing the number of site visits, continuous monitoring, con-
tinuous adjustment of the CD zone, and additional automatic
data collection and correlation.
6.2.1. Reducing the number of support tickets
AIOps is used to reduce the number of received support tick-
ets from the customer. Using AIOps to predict, detect, and
correct issues without human involvement is seen as a critical
capability to support continuous deployment as it reduces the
cost per deployment, enables eciencies in service delivery,
and ensures customer satisfaction. Thus, the case study com-
pany has implemented many closed-loop use cases that would
automatically predict operational issues before they happen and
fix the underlying problems. In this case, the closed-loop use
cases provide preemption capabilities. If the closed-loop use
case detects an issue after it happens and applies a workaround
or correction on the node without human intervention, the use
case is considered reactive as it does not prevent the issue from
happening but rather reduces the impact of the issue.
“If we can solve 20% of the issues with closed-loop use
cases, it becomes extremely beneficial” Interviewee O
- Services systems principal developer
In addition to closed-loop use cases, the case company uses
a number of open-loop use cases. The main dierence between
open and closed-loop is the degree of human involvement. Un-
like closed-loop automation, where humans are not in the loop,
in open-loop automation, an alert is sent to a service agent if an
issue has been detected or predicted. The human agent will re-
view the alert and signature of the issue and determine the next
set of actions.
6.2.2. Reducing the investigation time and eort
The AIops platform is used to speed up the investigation time
of issues while reducing human eorts. Therefore, with closed-
loop use cases, the actual investigation time is zero, and the
human eort to investigate and conduct the corrective action
is also zero. In open-loop cases, the Mean Time To Detect
(MTTD) is relatively short, as the use case will alert once an
unwanted behavior is present. Thus, this eliminates the man-
ual eort needed to detect unwanted behavior. In addition, the
alerts generated by the AIOps platform and sent to the service
engineers containing a set of pre-collected configuration param-
eters such as the node’s hardware type, its current running soft-
ware version, type of radio, historical events on the node, and
any recent changes on the configurations. Furthermore, the alert
might contain links to knowledge articles with matching symp-
toms in the services knowledge management system. There-
fore, the alerts contain a prepared set of information that helps
the support engineer to have a heads up in the investigation as
relevant information is pre-collected and linked to the alert.
“We design the use cases to contain as much information
as possible to help the receiver quickly understand where
and what the problem is. This requires us to correlate
between multiple data sources and represent the informa-
tion in a usable way” Interviewee N - Services systems
Solution Architect
6.2.3. Reducing the number of site visits
RAN nodes are exposed to various environmental factors
such as rain, snow, and heat. Thus, the hardware might be im-
pacted and require a replacement. While environmental issues
are out of control, software-related faults or wrong configura-
tions might manifest as hardware issues in the system’s logs.
Therefore, before sending someone to the site to replace the
hardware, an extensive check on the site’s configuration needs
to be run to identify any software issues or faulty changes in
the configuration that might have been applied intentionally or
by mistake. This process is often tedious, requiring much man-
ual eort. Thus, the AIOps platform conducts extensive con-
figuration checks, compares known software faults that might
manifest as faulty hardware, and provides a report to the ser-
vice engineer. This allows the case study company to recover
the hardware using remote actions rather than sending someone
onsite to replace non-faulty hardware.
“One of our main objectives is to reduce site visits. Last
year, we reduced our site visits by 32% to 33%. In this
13
case, we are helping both the customer and our unit by
increasing the eciency of our Services” Interviewee
T - Customer services engagement lead
While reducing site visits is also an applicable objective for
any customer, regardless if they are continuous deployment cus-
tomers, the pace of configuration changes is much higher in
continuous deployment. Thus, this use case ensures that unnec-
essary site visits are avoided.
6.2.4. Continuous monitoring
Continuous monitoring is one of the main usage areas for
the AIOps platform. The AIOps platform is built to ensure
that RAN nodes’ data are collected as soon as they are pro-
duced. The data collection layer collects and sends the data
further to the data management layer, which ingests the data
and makes it readable. The intelligent layer then executes the
use cases against the new information to identify if there is any
KPI anomaly. In addition to performance data, configuration
and alarm data are continuously collected and correlated with
the performance data.
Therefore, continuous monitoring is able to detect new soft-
ware changes automatically by correlating performance and
configuration data. Once a change in the software version is
identified, continuous monitoring immediately starts to com-
pare the performance of the new software with the performance
of the previous one.
In order to perform this comparison, the AIOps platform
scans through hundreds of dierent KPIs, configurations, and
alarms data continuously. KPIs comparison is considered a te-
dious and error-prone human activity if done manually. There-
fore, a machine learning-based use-case has been developed
in the AIOps platform that understands the trac trends and
seasonality before the software change, then performs a be-
fore/after comparison automatically once a software change
is detected. This is performed by applying dierent time se-
ries analysis and forecasting methods to trac KPIs, such as
Long Short-Term Memory (LSTM) and Autoregressive Inte-
grated Moving Average (ARIMA).
6.2.5. Continuous adjustment of the CD zone
To keep track of how much the CD zone represents the en-
tire network, the case study company has developed a use case
that continuously checks configurations that exist in the entire
network versus the ones in the CD zone. The result of the com-
parison is a report providing a representation index, which in-
dicates the degree the CD zone mimics the entire network. The
use case scans both the hardware and software attributes, such
as the radio antennas and base-band processing unit types for
the hardware, in addition to active software features and con-
figuration parameter values for the software. After that, the use
case creates clusters of configurations and compares the ones in
the CD zones versus the entire network. For this purpose, ma-
chine learning clustering algorithms such as hierarchical clus-
tering and K-means are used.
“The use case that identifies major configurations in the
entire network and continuously indicates how represen-
tative the CD zone is, is, I would say, a critical use case.
We would have to spend a lot of time trying to find this,
and we would not get into the same level of details as the
use case does” Interviewee S - Customer DevOps Ex-
pert
6.2.6. Additional automatic data collection and correlation
Both eNodeBs and gNodeBs produce many types of data.
While some data are continuously collected, such as perfor-
mance, configuration, and alarm data, detailed diagnostic data
are collected on demand. This is because the diagnostic data are
extensive in size; thus, their continuous collection impacts the
node’s performance. In addition, diagnostic data might contain
personal information; thus, appropriate approvals should be in
place before starting the collection. Thus, the ability to collect
the in-depth diagnostic data manually, as part of the open-loop
or closed-loop use cases, is considered important in the archi-
tecture of the AIOps platform.
6.3. AIOps platform deployment scenarios
The case study company uses two deployment scenarios for
the AIOps platform. The first deployment scenario is called
white box, which indicates that the system’s data management,
data intelligence, and visualization layers are hosted in the case
company’s network, while the data collection layer is hosted in
the customers’ network as illustrated in Figure 7. Therefore,
the raw data are transferred from the customer’s network to the
case study company’s network to be processed.
This solution has two flavors: first, the AIOps platform is lo-
cated in the same country or geography as the customer. In this
case, while the raw data leaves the customer’s network to the
case study’s network, data is still located in the same country or
geography as the customer. Due to legal limitations, some cus-
tomers wish to keep their data in the same country or geography
as the data origin.
Figure 7: White box deployment scenario
The second deployment mode is called grey box, where the
raw data does not leave the customer’s network as depicted in
Figure 8. In this case, the AIOps platform, including the data
14
collection layer, is hosted in the customer’s network. The data
management layer interacts with an SMTP proxy at the case
study company over a secure VPN tunnel established between
the customer’s and the case study company’s network. This al-
lows email alerts to be forwarded to the support team, who can
then do some oine analysis or request a remote manual con-
nection to the customer’s network to continue the investigation
if needed.
Figure 8: Grey box deployment scenario
7. AIOps challenges (RQ3)
This section presents the findings of the third research ques-
tion addressing the challenges when using AIOps to support
continuous deployment. During our study, we have identified
eight challenges when using AIOps to enable continuous de-
ployment: continuous data correlation, ecosystem data collec-
tion, multi-collection of the same data, digitalized information
flow, creating trusted closed-loop use cases, alerts fatigue, es-
tablishing a two-way pipeline and services team mindset.
7.1. Continuous data correlation
The data generated by 4G and 5G RAN nodes have dier-
ent types, come with many forms, and are generated by many
components within the node. Thus, the AIOps system needs
to have the capability to correlate dierent data types to cre-
ate meaningful results. For example, in an open-loop anomaly
detection use case, if performance degradation is identified in
one node, the AIOps system needs to correlate the performance
data with other data types, such as alarms, configuration, and
system events, before generating an alert. However, the chal-
lenge is that troubleshooting data, such as the system’s internal
constants and logs, changes often and is not backward com-
patible. Thus, correlating data involving internal logs becomes
dicult as these data mean dierent things in each release.
“System’s constants are challenging. Their values change
regularly, and new constants are added or removed. We
always need to update our use cases to make sure we cor-
relate data with the correct identifiers and values” In-
terviewee N - Services systems solution architect
7.2. Ecosystem data collection
Data collection from the RAN nodes alone is insucient for
detailed root cause analysis and fault isolation of issues that
require end-to-end tracing and troubleshooting. For example,
detailed network end-to-end troubleshooting is needed if degra-
dation is observed in mobile connections’ access to the base sta-
tion. This involves data collection and analysis from the RAN
nodes, the core network, and the mobile equipment. Thus, for a
speedy failure cause identification, data must be collected from
the entire mobile network ecosystem, including mobile equip-
ment and core network, not only the RAN.
7.3. Multi-collection of the same data
In many customer networks, there are often existing collec-
tions of some RAN data that overlap with the data the AIOps
system would need. An example cited in one of the meetings
is the performance data which are collected by customers to
support network planning, business, and operations. Therefore,
customers have existing data collectors which connect to the
network elements and do the collections. However, these col-
lectors can not be utilized for AIOps system for three reasons.
First, they lack the possibility to establish a two-way pipeline,
which means that they can collect data only without the abil-
ity to interact with the nodes in the opposite direction. Second,
they are also slow in collecting data which impacts the possi-
bility of doing timely observations, especially with continuous
monitoring. Third, connecting 3rd party data collectors to the
AIOps platform takes time and eort, and it is not a feasible
option since every customer has customized collectors.
Several data collectors extracting the same data from the
nodes impact the bandwidth of the O&M interface. Thus, to
address the issue in the short term, the case study company as-
sesses the O&M bandwidth with the customer when the sys-
tem is installed to ensure that the O&M interface does not be-
come congested. Several interviewees highlighted that this is
not enough for the long term.
7.4. Digitized information flow
With the continuous flow of new features, corrections, and
software improvements, existing use cases need to be updated.
In addition, new use cases need to be created to cater to new
functionality. This would require a considerable amount of
work from the services team. In several meetings, the service
team members highlighted the diculty of understanding the
content of the new software. Although the software is released
every two weeks, it still contains hundreds of changes and sev-
eral new features. The list of changes and new features becomes
known a few days before the release of the software, thus giv-
ing the service team little time to understand the consequences
of the new release on the AIOps system and adjust running use
cases.
Therefore, digitizing information flow from when a soft-
ware change is triggered in the code repository until the soft-
ware is released is needed to enable continuous deployment
service team update use cases on time and with the least ef-
fort. To achieve that, the code change commit shall contain
15
enough information describing what has changed and the ex-
pected impact. The information shall be aggregated and pre-
sented in machine-readable format; thus, it becomes possible
for the AIOps platform to read these changes and possibly ad-
just existing use cases accordingly. In several discussions, it
was highlighted that information flow is mainly manual, where
developers highlight changes they think will cause an impact on
the system to the release program. The release program ensures
that these changes are accounted for in continuous integration
and delivery, and their impact is documented in the release doc-
umentation. However, this process is subject to human mistakes
where developers forget to communicate changes on time.
“One of the major challenges is that the software includes
many changes that impact customers’ networks. Some
of these changes are described in release documentation,
while many others do not get visibility to be documented”
Interviewee H - Customer support team lead
7.5. Trusted closed-loop use cases
Creating closed-loop use cases is challenging as they need
to be trustworthy with a proven track record of successful exe-
cution. In addition, these use cases should work with dierent
customers’ configurations. Thus, customers are not often confi-
dent to allow closed-loop use cases to be executed in their net-
work before making sure that they work as they should all the
times. As indicated by several interviewees, if a closed-loop
automation use case does not work as it should, the customer
would perceive it negatively.
“Customers are often not confident enough to allow
closed-loop use cases. Since it means tweaking the net-
work elements by, for example, resetting the radios and
changing the configuration. With these actions, there are
always risk factors” Interviewee S - Customer DevOps
expert
7.6. Alerts fatigue
Open loop use cases might generate a flood of alerts. This
creates an ”alert fatigue” for the services team. To reduce the
number of false positives, a lot of work is needed to qualify
the alerts and label them. However, as the baseline changes
frequently with new software levels, new configuration param-
eters, and new features, this makes it more dicult.
“If an open loop use case runs several times a day, gen-
erating many alerts with each run, then it means a huge
amount of alerts a day to be investigated. So, we try to
keep the alerts informative and easy to digest ... However,
as the software change quite frequently, we have to ad-
just the use case logic continuously” Interviewee N -
Services systems solution architect
7.7. Establishing a two-way pipeline
Establishing a two-way pipeline requires the AIOps system
to access the nodes to send customized commands, giving the
system higher control over the nodes. Customers often want
to keep this to the absolute minimum, as access to the nodes
poses a security risk. Customers often have a thorough process
to evaluate who would need access, when and for how long.
Providing functional user access to AIOps system requires the
customer’s security team to be involved and access the AIOps
platform from a security perspective. Aspects like if the plat-
form has security threats or can someone use the platform as
a proxy to gain access to the nodes, for example, need to be
checked and verified.
Thus, to interact with the nodes in the downstream direction
(i.e. from the AIOps system to the nodes), detailed planning
and understanding of what the system will trigger, how the sys-
tem will be identified, and security credential exchange are dis-
cussed with the security team continuously which slows down
the introduction of new use cases that involve new actions or
commands which need security approvals.
7.8. Service team mindset
Developing use cases in the AIOps platform requires the ser-
vice team to think proactively and preemptively and create use
cases that would address future scenarios. This is a change to
how service engineers’ previous way of working when they in-
vestigate issues after they happen by customer-initiated tickets.
With AIOps, the service team needs to anticipate how things
could go wrong and prepare suitable use cases for that. Sev-
eral interviews have highlighted that such a mindset change is
not easy for many service engineers who spent years working
in a reactive fashion. In addition, these engineers also need to
evolve their programming and data science knowledge when
working with the AIOps system, which requires them to invest
in learning new competences in addition to their domain knowl-
edge.
8. Discussion
In this section, we discuss the empirical findings from the
case study company, which are structured in three sections ad-
dressing each research question.
8.1. Continuous deployment impact on product-related ser-
vices
Continuous deployment impacts product-related services in
three ways. First, in how these services are conducted. With-
out continuous deployment, services are delivered by indepen-
dent projects and executed in sequence. However, introduc-
ing continuous deployment results in service convergence,
as the boundaries between dierent services become blurry.
When the software release cycle is long, there is enough time
for services to be conducted in sequence. However, with con-
tinuous deployment, there is little time to do activities in se-
quence. Thus, service activities need to be done in parallel.
16
Therefore, services that have been traditionally separated, in-
dependent, and conducted by dedicated teams or organizations
converge. This is similar to the impact of agile software devel-
opment practices on how software development activities are
performed. Software development activities used to be con-
ducted in sequence by dierent teams following a waterfall ap-
proach. However, in an agile software development context,
software development activities are conducted in parallel by the
same team. In the same agile sprint, the team performs activ-
ities such as coding, testing, and releasing. In addition, agile
teams are composed of dierent competencies, such as devel-
opment, testing, and integration [73].
Second, services that used to be conducted in a reactive way
transition to becoming proactive and preemptive with con-
tinuous deployment. While services can be proactive and pre-
emptive without continuous deployment, as in the case of data-
driven product service systems [45], empirical results from the
case study company show that continuous deployment needs
proactive and preemptive services. Providing proactive and pre-
emptive services requires new ways of working and technical
capabilities.
From the ways of working, the service team shall work
closely on two fronts: the R&D organization and the cus-
tomers’ operation team. On the R&D front, changes in the
product shall be communicated to the services team. This re-
quires strong synchronization between the service and R&D or-
ganizations, especially with information flow. This is achieved
by having members of the R&D’s release program in the ser-
vices team. On the customer front, the service team holds a
daily meeting with the customer’s operations team. In addi-
tion, the service team works proactively to identify features
suitable for the customer configuration and suggest them to the
customer. In addition, the service team continuously monitors
the customer’s network to evaluate the value of new features
and identify any software-related issue in the customer network.
While the emphasis on collaboration has been highlighted ex-
tensively in the DevOps context, the focus has been on devel-
opment and operation teams [12,74]. However, in software-
intensive embedded systems, the customer is the one operating
it [75], while services are often the touch point between the cus-
tomers and product suppliers [76]. Therefore, services play a
crucial role in bridging between the R&D organization and cus-
tomers. This study reveals that continuous deployment requires
strong collaboration between three entities: the R&D organi-
zation representing development, the customer representing the
operation, and the service organization being the connector
between development and operation.
Furthermore, the service organization needs the technical
capabilities to enable the continuous deployment service team
to work proactively and preemptively. Continuous monitoring
provides the means to identify issues as soon as they happen
and thus gives the teams the visibility needed to work proac-
tively. Additionally, with the availability of historical data, it is
possible to move further to provide preemptive support where
issues are predicted and a workaround is applied before they
happen. To achieve that, the capability to perform continuous
data collection and interact with the systems in the opposite di-
rection is needed. Thus, a reliable two-way pipeline needs to
be established. In addition, the AIOps platform has a leading
role in enabling services activities to be conducted proactively
and preemptively utilizing the massive amount of collected data
and the ability to interact with the systems.
Third, continuous deployment comes with new activities,
such as establishing and maintaining the data pipeline, con-
tinuous monitoring, and adjustment of the CD zone. While
continuous monitoring has been highlighted as an important ac-
tivity in the context of DevOps, and Continuous Software En-
gineering [77,28], the continuous adjustment of the CD zone
is a new aspect that needs to be considered in the context of
software-intensive embedded systems. As these systems have
many instances in the field, with often dierent hardware com-
positions and customized configurations, it is important to con-
tinuously adjust the deployment targets to factor in the changes
in configurations and newly introduced hardware.
8.2. AIOps and continuous deployment
With the convergence of services, the need to be more proac-
tive and preemptive, and reduced operational time of new soft-
ware releases, services need to be conducted much faster and
more intelligently than before. In a traditional case, product-
related services are often manually delivered; however, deliver-
ing services by relying on human force only becomes challeng-
ing. As depicted by El Sawy et al. [78], the required level
of customer support rises exponentially with the increase of
complexity, connectivity, and criticality of the product. From
this perspective, software-intensive embedded systems are of-
ten critical systems with high availability and reliability require-
ments, and with the advent of the Internet of Things (IoT) they
are often connected. Further, the embedded software complex-
ity is increasing rapidly with new features and generations of
the product, for example, 5G is more complex than 4G, and 6G
is even expected to be more complex [79]. In addition, contin-
uous deployment further increases the complexity of the soft-
ware as a consequence of increased complexity in features in-
teractions [57]. Therefore, delivering services by relying on
human force only becomes challenging. Thus, as depicted in
Figure 9, AIOps become critical to break the complexity curve
while delivering faster and more intelligent services.
Figure 9: Breaking the eorts vs. complexity curve
17
Further, software-intensive systems are getting more intel-
ligent and smarter. These systems can perform aspects like
self-configuration and self-optimization, for example. In mo-
bile networks, a significant focus is dedicated to the self-
optimization network (SON) features [80,81]. However, while
the system’s self-service capabilities are increasing, AIOps are
used to support several aspects. In this study, we identified
six usage scenarios where AIOps is used to support continu-
ous deployment: reduce the number of support tickets, reduce
investigation time and eort, reduce the number of site visits,
continuous monitoring, continuous adjustment of the CD zone
and finally, additional automatic data collection and correlation.
Therefore, while the product’s self-service capabilities are in-
creasing, AIOps can be seen as filling a considerable space in
the overall service scope of the product, as depicted in Figure
10.
Figure 10: system’s self-service capabilities, Product-related Service and
AIOps
In addition, the AIOps platform has to have the flexibility
to be deployed in dierent scenarios, for example, in the cus-
tomer’s network if data sharing is not allowed due to legal rea-
sons or customers’ preferences. In the case study company,
this is achieved by having two deployment scenarios, white and
grey box. While having two deployment scenarios increases the
eort needed to develop, operate and maintain the platform, it
allows the case study company to provide customized service
solutions fulfilling dierent customers’ needs. In this context,
customized service solutions increase perceived service qual-
ity, customer satisfaction, customer trust, and customer loyalty
[82]. Thus, this shows that the usage of AIOps to support con-
tinuous deployment should consider the customization needs of
customers when used to deliver product-related services.
8.3. AIOps challenges
In this study, we have identified several challenges related
to AIOps. Continuous data correlation and ecosystem data
collection are two of the challenges, as the AIOps platform
should be able to collect, process, and correlate dierent data
types not only from the nodes where the software is being de-
ployed continuously but also from the ecosystem surrounding
them. In addition, multi-data collection and the establish-
ment of a two-way pipeline are closely related challenges in
this study. As we described in our earlier research, data gen-
erated by RAN nodes have dierent dimensions [83], which
makes data correlation in order to provide meaningful results
a challenging task, especially when the content of the data
changes more frequently with a new software release. Fur-
thermore, collecting the same data multiple times might not be
a practical option, especially when the bandwidth of the data
pipeline is limited. There seems to be no common solution that
can be used to address these challenges. However, one pos-
sible way is to provide customized solutions to address each
customer’s specific situation.
Furthermore, another challenge identified in our study is the
establishment of a digitalized information flow where the im-
pacts of the code changes introduced by the R&D development
teams are quickly propagated to the service organization. With
continuous deployment, there is little time for the service team
to update the use cases based on the new software content.
Thus, a parity has to be established between the release con-
tent and the AIOps platform logic, i.e., the service readiness
has to be achieved with the software release, not after. Thus a
digitalized information flow becomes important. However, this
is challenging, especially in the context of large organizations
where the software has many components and is built by the
contributions of thousands of developers.
In addition, the service teams need to chase alerts gener-
ated from open-loop use cases continuously. Thus, keeping the
AIOps use cases up to date, reducing the number of false-
positive alerts, and at the same time, increasing the trust-
worthiness of closed-loop use cases are major challenges for
the case study company.
Changing the mindset of the service team to adopt a proac-
tive and preemptive way of working is also another challenge.
The social challenges associated with adopting continuous de-
ployment have been highlighted before in multiple studies, such
as [55,84]. However, these studies have primarily focused on
the social challenges from software developers’ point of view.
In this study, embracing AIOps is also challenging for the ser-
vice teams as it requires a mindset change from a reactive way
of working to a proactive and preemptive one.
9. Conclusion
Continuous deployment profoundly impacts product-related
services oered by software-intensive embedded systems com-
panies. In this study, we have explored how continuous de-
ployment impacts product-related services. Our results show
that services transition from being conducted in a water-fall ap-
proach to becoming continuous and conducted in parallel. In
addition, the service organization needs to work in a proactive
and preemptive way while maintaining a close relationship with
the customer. Further, the service organization acts as a proxy
connecting both the R&D organization and the customer.
Furthermore, to support continuous deployment, an AIOps
platform is used to automate manual activities, thus reducing
the cost per software deployment. In addition, the AIOps plat-
form is used to perform continuous monitoring, track the rep-
resentation of the continuous deployment zone to the entire
18
network, and for on-demand diagnostic data collection. The
AIOps platform has a layered architecture and supports dier-
ent deployment scenarios to suit dierent customer preferences
and legal limitations on data collection.
Finally, using AIOps with continuous deployment comes
with many challenges. In this paper, we have identified several
challenges related to AIOps, such as continuous data correla-
tion and ecosystem collection, the establishment of two-way
data pipelines, and a services team mindset.
10. Acknowledgment
We would like to express our gratitude to everyone who con-
tributed with valuable input and insights from the case study
company.
References
[1] M. V. Stringfellow, N. G. Leveson, B. D. Owens, Safety-driven design for
software-intensive aerospace and automotive systems, Proceedings of the
IEEE 98 (4) (2010) 515–525.
[2] M. Broy, The’grand challenge’in informatics: engineering software-
intensive systems, Computer 39 (10) (2006) 72–80.
[3] M. Andreessen, Why software is eating the world, Wall Street Journal
20 (2011) (2011) C2.
[4] J. Bosch, H. H. Olsson, Digital for real: A multicase study on the digital
transformation of companies in the embedded systems domain, Journal
of Software: Evolution and Process 33 (5) (2021) e2333.
[5] H. H. Olsson, J. Bosch, Climbing the “stairway to heaven”: evolving from
agile development to continuous deployment of software, in: Continuous
software engineering, Springer, 2014, pp. 15–27.
[6] H. H. Olsson, H. Alahyari, J. Bosch, Climbing the” stairway to heaven”–a
multiple-case study exploring barriers in the transition from agile devel-
opment towards continuous deployment of software, in: 2012 38th eu-
romicro conference on software engineering and advanced applications,
IEEE, 2012, pp. 392–399.
[7] T. Savor, M. Douglas, M. Gentili, L. Williams, K. Beck, M. Stumm, Con-
tinuous deployment at facebook and oanda, in: 2016 IEEE/ACM 38th In-
ternational Conference on Software Engineering Companion (ICSE-C),
IEEE, 2016, pp. 21–30.
[8] L. Riungu-Kalliosaari, S. M ¨
akinen, L. E. Lwakatare, J. Tiihonen,
T. M¨
annist¨
o, Devops adoption benefits and challenges in practice: a case
study, in: International conference on product-focused software process
improvement, Springer, 2016, pp. 590–597.
[9] D. G. Feitelson, E. Frachtenberg, K. L. Beck, Development and deploy-
ment at facebook, IEEE Internet Computing 17 (4) (2013) 8–17.
[10] C. Parnin, E. Helms, C. Atlee, H. Boughton, M. Ghattas, A. Glover,
J. Holman, J. Micco, B. Murphy, T. Savor, et al., The top 10 adages in
continuous deployment, IEEE Software 34 (3) (2017) 86–95.
[11] F. M. Erich, C. Amrit, M. Daneva, A qualitative study of devops usage in
practice, Journal of software: Evolution and Process 29 (6) (2017) e1885.
[12] L. E. Lwakatare, P. Kuvaja, M. Oivo, Dimensions of devops, in: Inter-
national conference on agile software development, Springer, 2015, pp.
212–217.
[13] P. Rodr´
ıguez, A. Haghighatkhah, L. E. Lwakatare, S. Teppola, T. Suoma-
lainen, J. Eskeli, T. Karvonen, P. Kuvaja, J. M. Verner, M. Oivo, Continu-
ous deployment of software intensive products and services: A systematic
mapping study, Journal of Systems and Software 123 (2017) 263–291.
[14] G. J. Chen, J. L. Wiener, S. Iyer, A. Jaiswal, R. Lei, N. Simha, W. Wang,
K. Wilfong, T. Williamson, S. Yilmaz, Realtime data processing at face-
book, in: Proceedings of the 2016 International Conference on Manage-
ment of Data, 2016, pp. 1087–1098.
[15] L. Abraham, J. Allen, O. Barykin, V. Borkar, B. Chopra, C. Gerea,
D. Merl, J. Metzler, D. Reiss, S. Subramanian, et al., Scuba: diving into
data at facebook, Proceedings of the VLDB Endowment 6 (11) (2013)
1057–1067.
[16] A. Masood, A. Hashmi, Aiops: Predictive analytics & machine learning
in operations, in: Cognitive Computing Recipes, Springer, 2019, pp. 359–
382.
[17] Y. Dang, Q. Lin, P. Huang, Aiops: real-world challenges and research
innovations, in: 2019 IEEE/ACM 41st International Conference on Soft-
ware Engineering: Companion Proceedings (ICSE-Companion), IEEE,
2019, pp. 4–5.
[18] A. Dakkak, D. Issa Mattos, J. Bosch, Success factors when transitioning
to continuous deployment in software-intensive embedded systems, in:
2021 47th Euromicro Conference on Software Engineering and Advanced
Applications (SEAA), IEEE, 2021, pp. 129–137.
[19] S. G. Yaman, T. Sauvola, L. Riungu-Kalliosaari, L. Hokkanen, P. Kuvaja,
M. Oivo, T. M¨
annist¨
o, Customer involvement in continuous deployment:
a systematic literature review, in: International Working Conference on
Requirements Engineering: Foundation for Software Quality, Springer,
2016, pp. 249–265.
[20] K. Beetz, W. B ¨
ohm, Challenges in engineering for software-intensive em-
bedded systems, in: Model-Based Engineering of Embedded Systems,
Springer, 2012, pp. 3–14.
[21] S. Fischer, R. Ramler, C. Klammer, R. Rabiser, Testing of highly con-
figurable cyber-physical systems–a multiple case study, in: 15th In-
ternational Working Conference on Variability Modelling of Software-
Intensive Systems, 2021, pp. 1–10.
[22] J. Bosch, R. Capilla, Dynamic variability in software-intensive embedded
system families, Computer 45 (10) (2012) 28–35.
[23] S. Shokouhyar, S. Shokoohyar, S. Safari, Research on the influence of
after-sales service quality factors on customer satisfaction, Journal of Re-
tailing and Consumer Services 56 (2020) 102139.
[24] V. Bindroo, B. J. Mariadoss, R. Echambadi, K. R. Sarangee, Customer
satisfaction with consumption systems, Journal of Business-to-Business
Marketing 27 (1) (2020) 1–17.
[25] M. Szwejczewski, K. Gon, Z. Anagnostopoulos, Product service sys-
tems, after-sales service and new product development, International
Journal of Production Research 53 (17) (2015) 5334–5353.
[26] H. Gebauer, S. Joncourt, C. Saul, Services in product-oriented companies:
past, present, and future, Universia Business Review (49) (2016) 32–53.
[27] M. Wilson, K. Wnuk, L. Bengtsson, Business model flexibility
and software-intensive companies: Opportunities and challenges, e-
Informatica Software Engineering Journal 15 (1) (2021).
[28] B. Fitzgerald, K.-J. Stol, Continuous software engineering: A roadmap
and agenda, Journal of Systems and Software 123 (2017) 176–189.
[29] A. Dakkak, A. R. Munappy, J. Bosch, H. H. Olsson, Customer support in
the era of continuous deployment: A software-intensive embedded sys-
tems case study, in: 2022 IEEE 46th Annual Computers, Software, and
Applications Conference (COMPSAC), IEEE, 2022, pp. 914–923.
[30] P. Liggesmeyer, M. Trapp, Trends in embedded software engineering,
IEEE software 26 (3) (2009) 19–25.
[31] C. Ebert, C. Jones, Embedded software: Facts, figures, and future, Com-
puter 42 (4) (2009) 42–52.
[32] R. N. Charette, This car runs on code, IEEE spectrum 46 (3) (2009) 3.
[33] N. Lakemond, G. Holmberg, A. Pettersson, Digital transformation
in complex systems, IEEE Transactions on Engineering Management
(2021).
[34] M. M¨
ullerburg, Software intensive embedded systems, Information and
Software Technology 41 (14) (1999) 979–984.
[35] J. Bosch, Continuous software engineering: An introduction, in: Contin-
uous software engineering, Springer, 2014, pp. 3–13.
[36] A. R. Tan, Service-oriented product development strategies, Danmarks
Tekniske Universitet. DTU Management, DTU Management Engineering
(2010).
[37] M. J. Goedkoop, C. J. Van Halen, H. R. Te Riele, P. J. Rommens,
et al., Product service systems, ecological and economic basics, Report
for Dutch Ministries of environment (VROM) and economic aairs (EZ)
36 (1) (1999) 1–122.
[38] M. Salwin, A. Kraslawski, State-of-the-art in product-service system clas-
sification, Design, Simulation, Manufacturing: The Innovation Exchange
(2020) 187–200.
[39] M. Boehm, O. Thomas, Looking beyond the rim of one’s teacup: a multi-
disciplinary literature review of product-service systems in information
systems, business management, and engineering & design, Journal of
cleaner production 51 (2013) 245–260.
19
[40] M. G. de Oliveira, G. H. de Sousa Mendes, A. A. de Albuquerque,
H. Rozenfeld, Lessons learned from a successful industrial product ser-
vice system business model: emphasis on financial aspects, Journal of
Business & Industrial Marketing (2018).
[41] A. Annarelli, C. Battistella, F. Nonino, Product service system: A concep-
tual framework from a systematic review, Journal of cleaner production
139 (2016) 1011–1032.
[42] G. H. Mendes, M. G. Oliveira, H. Rozenfeld, C. A. N. Marques, J. M. H.
Costa, et al., Product-service system (pss) design process methodologies:
a systematic literature review, in: DS 80-7 Proceedings of the 20th In-
ternational Conference on Engineering Design (ICED 15) Vol 7: Product
Modularisation, Product Architecture, systems Engineering, Product Ser-
vice Systems, Milan, Italy, 27-30.07. 15, 2015, pp. 291–300.
[43] T. A. Tran, J. Y. Park, Development of integrated design methodology
for various types of product—service systems, Journal of Computational
Design and Engineering 1 (1) (2014) 37–47.
[44] S. Chowdhury, D. Haftor, N. Pashkevich, Smart product-service systems
(smart pss) in industrial firms: a literature review, Procedia Cirp 73 (2018)
26–31.
[45] M. Zambetti, F. Adrodegari, G. Pezzotta, R. Pinto, M. Rapaccini, C. Bar-
bieri, From data to value: Conceptualising data-driven product service
system, Production Planning & Control (2021) 1–17.
[46] C. E. McPhail, Respond, restore, resolve: Achieving 7-nines availabil-
ity telecommunications systems in the field, Bell Labs Technical Journal
11 (3) (2006) 173–189.
[47] P. J. Windley, Delivering high availability services using a multi-tiered
support model, Windley’s Technometria 16 (2002) 1–9.
[48] N. Ramasubbu, S. Mithas, M. S. Krishnan, High tech, high touch: The ef-
fect of employee skills and customer heterogeneity on customer satisfac-
tion with enterprise system support services, Decision Support Systems
44 (2) (2008) 509–523.
[49] M. Prior, you want to do what?” breaking the rules to increase customer
satisfaction, in: 2011 Agile Conference, IEEE, 2011, pp. 269–273.
[50] V. Mathieu, Product services: from a service supporting the product to a
service supporting the client, Journal of Business & Industrial Marketing
(2001).
[51] V. Parida, D. R. Sj¨
odin, J. Wincent, M. Kohtam¨
aki, A survey study of
the transitioning towards high-value industrial product-services, Procedia
CIRP 16 (2014) 176–180.
[52] D. Stahl, T. Martensson, J. Bosch, Continuous practices and devops: be-
yond the buzz, what does it all mean?, in: 2017 43rd Euromicro Con-
ference on Software Engineering and Advanced Applications (SEAA),
IEEE, 2017, pp. 440–448.
[53] I. Gerostathopoulos, M. Konersmann, S. Krusche, D. I. Mattos, J. Bosch,
T. Bures, B. Fitzgerald, M. Goedicke, H. Muccini, H. H. Olsson, et al.,
Continuous data-driven software engineering-towards a research agenda:
Report on the joint 5th international workshop on rapid continuous soft-
ware engineering (rcose 2019) and 1st international works, ACM SIG-
SOFT Software Engineering Notes 44 (3) (2019) 60–64.
[54] T. E. Cardoso, A. R. Santos, R. Chanin, A. Sales, Communication of
changes in continuous software development, in: International Confer-
ence on Software Business, Springer, 2020, pp. 86–101.
[55] G. G. Claps, R. B. Svensson, A. Aurum, On the journey to continuous
deployment: Technical and social challenges along the way, Information
and Software technology 57 (2015) 21–31.
[56] B. Fitzgerald, K.-J. Stol, Continuous software engineering and beyond:
trends and challenges, in: Proceedings of the 1st International Workshop
on Rapid Continuous Software Engineering, 2014, pp. 1–9.
[57] A. Fabijan, H. H. Olsson, J. Bosch, Time to say’good bye’: Feature life-
cycle, in: 2016 42th Euromicro Conference on Software Engineering and
Advanced Applications (SEAA), IEEE, 2016, pp. 9–16.
[58] L. Rijal, R. Colomo-Palacios, M. S´
anchez-Gord´
on, Aiops: A multivocal
literature review, Artificial Intelligence for Cloud and Edge Computing
(2022) 31–50.
[59] H. Su, Q. He, B. Guo, Kpi anomaly detection method for data center
aiops based on gru-gan, in: 2021 10th International Conference on Inter-
net Computing for Science and Engineering, 2021, pp. 23–29.
[60] S. Nedelkoski, J. Cardoso, O. Kao, Anomaly detection and classification
using distributed tracing and deep learning, in: 2019 19th IEEE/ACM in-
ternational symposium on cluster, cloud and grid computing (CCGRID),
IEEE, 2019, pp. 241–250.
[61] Y. Zhang, Z. Guan, H. Qian, L. Xu, H. Liu, Q. Wen, L. Sun, J. Jiang,
L. Fan, M. Ke, Cloudrca: A root cause analysis framework for cloud com-
puting platforms, in: Proceedings of the 30th ACM International Confer-
ence on Information & Knowledge Management, 2021, pp. 4373–4382.
[62] R. Harper, P. Tee, Cookbook, a recipe for fault localization, in: NOMS
2018-2018 IEEE/IFIP Network Operations and Management Sympo-
sium, IEEE, 2018, pp. 1–6.
[63] P. Naik, C. Govindarajan, S. Goel, K. Govindarajan, D. Behl, A. Singh,
M. Thomas, U. Mangla, P. Jayachandran, Closed-loop automation for 5g
slice assurance, in: 2022 14th International Conference on COMmunica-
tion Systems & NETworkS (COMSNETS), IEEE, 2022, pp. 424–426.
[64] P. Notaro, J. Cardoso, M. Gerndt, A systematic mapping study in aiops,
in: International Conference on Service-Oriented Computing, Springer,
2020, pp. 110–123.
[65] P. Notaro, J. Cardoso, M. Gerndt, A survey of aiops methods for failure
management, ACM Transactions on Intelligent Systems and Technology
(TIST) 12 (6) (2021) 1–45.
[66] C. B. Seaman, Qualitative methods in empirical studies of software engi-
neering, IEEE Transactions on software engineering 25 (4) (1999) 557–
572.
[67] C. Wohlin, M. H ¨
ost, K. Henningsson, Empirical research methods in soft-
ware engineering, in: Empirical methods and studies in software engi-
neering, Springer, 2003, pp. 7–23.
[68] R. K. Yin, Case study research and applications: Design and methods,
Sage publications, 2017.
[69] P. Runeson, M. H¨
ost, Guidelines for conducting and reporting case study
research in software engineering, Empirical software engineering 14 (2)
(2009) 131.
[70] V. Braun, V. Clarke, Using thematic analysis in psychology, Qualitative
research in psychology 3 (2) (2006) 77–101.
[71] K. Petersen, C. Gencel, Worldviews, research methods, and their rela-
tionship to validity in empirical software engineering research, in: 2013
joint conference of the 23rd international workshop on software measure-
ment and the 8th international conference on software process and prod-
uct measurement, IEEE, 2013, pp. 81–89.
[72] A. El Rhayour, T. Mazri, 5g architecture: Deployment scenarios and
options, in: 2019 International Symposium on Advanced Electrical and
Communication Technologies (ISAECT), IEEE, 2019, pp. 1–6.
[73] S. Kim, H. Lee, Y. Kwon, M. Yu, H. Jo, Our journey to becoming agile:
Experiences with agile transformation in samsung electronics, in: 2016
23rd Asia-Pacific Software Engineering Conference (APSEC), IEEE,
2016, pp. 377–380.
[74] L. E. Lwakatare, T. Kilamo, T. Karvonen, T. Sauvola, V. Heikkil¨
a, J. Itko-
nen, P. Kuvaja, T. Mikkonen, M. Oivo, C. Lassenius, Devops in practice:
A multiple case study of five companies, Information and Software Tech-
nology 114 (2019) 217–230.
[75] L. E. Lwakatare, T. Karvonen, T. Sauvola, P. Kuvaja, H. H. Olsson,
J. Bosch, M. Oivo, Towards devops in the embedded systems domain:
Why is it so hard?, in: 2016 49th hawaii international conference on sys-
tem sciences (hicss), IEEE, 2016, pp. 5437–5446.
[76] T. Sauvola, L. E. Lwakatare, T. Karvonen, P. Kuvaja, H. H. Olsson,
J. Bosch, M. Oivo, Towards customer-centric software development: a
multiple-case study, in: 2015 41st Euromicro Conference on Software
Engineering and Advanced Applications, IEEE, 2015, pp. 9–17.
[77] L. E. Lwakatare, P. Kuvaja, M. Oivo, An exploratory study of devops
extending the dimensions of devops with practices, ICSEA 104 (2016)
2016.
[78] O. A. El Sawy, Redesigning it-enabled customer support processes for
dynamic environments, in: Business Process Transformation, Routledge,
2015, pp. 173–200.
[79] A. Imran, A. Zoha, A. Abu-Dayya, Challenges in 5g: how to empower
son with big data for enabling 5g, IEEE network 28 (6) (2014) 27–33.
[80] H. Hu, J. Zhang, X. Zheng, Y. Yang, P. Wu, Self-configuration and self-
optimization for lte networks, IEEE Communications Magazine 48 (2)
(2010) 94–100.
[81] A. Alhammadi, M. Roslee, M. Y. Alias, I. Shayea, A. Alquhali, Velocity-
aware handover self-optimization management for next generation net-
works, Applied Sciences 10 (4) (2020) 1354.
[82] P. S. Coelho, J. Henseler, Creating customer loyalty through service cus-
tomization, European Journal of Marketing (2012).
[83] A. Dakkak, H. Zhang, D. Issa Mattos, J. Bosch, H. Holmstr ¨
om Ols-
20
son, Towards continuous data collection from in-service products: Ex-
ploring the relation between data dimensions and collection challenges,
in: 2021 28th Asia-Pacific Software Engineering Conference (APSEC),
IEEE, 2021, pp. 200–209.
[84] M. Lepp¨
anen, S. M¨
akinen, M. Pagels, V.-P. Eloranta, J. Itkonen, M. V.
M¨
antyl¨
a, T. M¨
annist¨
o, The highways and country roads to continuous
deployment, Ieee software 32 (2) (2015) 64–72.
21
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Complex systems increasingly include embedded digital technologies that interact with and are constrained by physical components and systems. Although these systems play a central role in our society, they have only been scarcely addressed in contemporary research on digital transformation and the organization of innovation. This article explores the digital transformation in complex products and systems and its consequences for organizational design. A longitudinal study of avionics development since the 1950s uncovers the application of digital technologies, first as a sequence of initial experiments, followed by the use as add-on functionality, then as an integral part of achieving critical functionality in systems, and currently combining add-on and critical functionalities enabling generativity. The findings emphasize the evolution of the intricate relationships between the systems architecture and organizational approaches when digital technology enables and enforces increased complexity, expanded functionality, increased systems integration, and continuous development. These nested dependencies are accentuated by the complexity that has emerged beyond human cognition, where increasingly sophisticated boundary objects based on modeling, simulation, and data play an important role in the organization's ability. Boundary objects relate and decouple the multifacetted dynamic relation between organization and architecture. The results also extend existing perspectives on platform strategies by outlining the importance of generativity in combination with criticality control, rather than market control. Criticality control in combination with generativity has become imperative not least as generative digital technologies have become central in achieving critical properties such as safety. Several avenues for further research are outlined.
Article
Full-text available
Background: Software plays an essential role in enabling digital transformation via digital services added to traditional products or fully digital business offerings. This calls for a better understanding of the relationships between the dynamic nature of business models and their realization using software engineering practices. Aim: In this paper, we synthesize the implications of digitalization on business model flexibility for software-intensive companies based on an extensive literature survey and a longitudinal case study at Ericsson AB. We analyze how software-intensive companies can better synchronize business model changes with software development processes and organizations. Method: We synthesize six propositions based on the literature review and extensive industrial experience with a large software-intensive company working in the telecommunication domain. Conclusions: Our work is designed to facilitate the cross-disciplinary analysis of business model dynamics and business model flexibility by linking value, transaction, and organizational learning to business model change. We believe that software engineering tools and methods can play a crucial role in enabling more automated synchronization between technology and business model changes.
Chapter
In the age of Internet of Things (IoT) and big data, artificial intelligence for IT operations (AIOps) plays an important role in enhancing IT operations. Such operation tasks include automation, performance monitoring, and event correlations, among others. Although AIOps has proved to be important, it has not received much academic attention. Thus, by means of Multivocal Literature Review, this study is aiming to define AIOps, the benefits gained from it, the challenges an organization might face, and, finally, what lies in the foreseen future of the AIOps. The findings revealed that adopting AIOps helps in monitoring IT work, efficient time saving, improving human-AI collaboration, proactive IT work, and boosting faster mean time to recovery (MTTR). However, there are also reported challenges like doubt about the efficiency of artificial intelligence and machine learning, low-quality data, and identifying use cases, constrained by traditional engineering approaches. In conclusion, this study aims to contribute to the body of knowledge to the adaptation of AIOps in the IT industry which may benefit IT organizations. Finally, further research can be done to better understand how AIOps provides human augmentation to enhance human productivity in terms of senses, cognition, and human action.
Article
Modern society is increasingly moving toward complex and distributed computing systems. The increase in scale and complexity of these systems challenges O&M teams that perform daily monitoring and repair operations, in contrast with the increasing demand for reliability and scalability of modern applications. For this reason, the study of automated and intelligent monitoring systems has recently sparked much interest across applied IT industry and academia. Artificial Intelligence for IT Operations (AIOps) has been proposed to tackle modern IT administration challenges thanks to Machine Learning, AI, and Big Data. However, AIOps as a research topic is still largely unstructured and unexplored, due to missing conventions in categorizing contributions for their data requirements, target goals, and components. In this work, we focus on AIOps for Failure Management (FM), characterizing and describing 5 different categories and 14 subcategories of contributions, based on their time intervention window and the target problem being solved. We review 100 FM solutions, focusing on applicability requirements and the quantitative results achieved, to facilitate an effective application of AIOps solutions. Finally, we discuss current development problems in the areas covered by AIOps and delineate possible future trends for AI-based failure management.