Conference PaperPDF Available

Cell Outage Management in LTE Networks

Authors:

Abstract and Figures

Cell outage management is a functionality aiming to automatically detect and mitigate outages that occur in radio networks due to unexpected failures. We envisage that future radio networks autonomously detect an outage based on measurements, from e.g., user equipment and base stations, and alter the configuration of surrounding radio base stations in order to compensate for the outage-induced coverage and service quality degradations and satisfy the operator-specified performance requirements as much as possible. In this paper we present a framework for cell outage management and outline the key components necessary to detect and compensate outages as well as to develop and evaluate the required algorithms.
Content may be subject to copyright.
EUROPEAN COOPERATION
IN THE FIELD OF SCIENTIFIC
AND TECHNICAL RESEARCH
—————————————————
EURO-COST
—————————————————
COST 2100 TD(09)941
Vienna, Austria
2009/Sep/28-30
SOURCE: FP7-SOCRATES
c/o IFN, Institut für Nachrichtentechnik
Technische Universität Braunschweig,
Braunschweig, Germany
Cell Outage Management in LTE Networks
This paper has been presented at ISWCS 2009, Siena, Italy
Contact:
Prof. Dr. -Ing Thomas Kürner
IFN/Institut für Nachrichtentechnik
Technische Universität Braunschweig
Schleinitzstr.22
D-38106 Braunschweig
Germany
Phone: + 49(0)531-391 2416
Fax: + 49(0)531-391 5192
Email: kuerner@ifn.ing.tu-bs.de
Cell Outage Management in LTE Networks
M. Amirijoo1, L. Jorguseski2, T. Kürner3, R. Litjens2, M. Neuland3, L. C. Schmelz4, U. Türke5
1 Ericsson, Linköping, Sweden, 2 TNO ICT, Delft, The Netherlands, 3 TU Braunschweig, Braunschweig, Germany, 4 Nokia
Siemens Networks, Munich, Germany, 5 Atesio, Berlin, Germany
mehdi.amirijoo@ericsson.com, ljupco.jorguseski@tno.nl, t.kuerner@tu-bs.de,
remco.litjens@tno.nl, m.neuland@tu-bs.de, lars.schmelz@nsn.com, tuerke@atesio.de
Abstract Cell outage management is a functionality aiming
to automatically detect and mitigate outages that occur in radio
networks due to unexpected failures. We envisage that future
radio networks autonomously detect an outage based on
measurements, from e.g., user equipment and base stations, and
alter the configuration of surrounding radio base stations in
order to compensate for the outage-induced coverage and service
quality degradations and satisfy the operator-specified
performance requirements as much as possible. In this paper we
present a framework for cell outage management and outline the
key components necessary to detect and compensate outages as
well as to develop and evaluate the required algorithms.
I. INTRODUCTION
The standardisation body 3rd Generation Partnership
Project (3GPP) has finalised the first release (Release 8) of the
UMTS successor named Evolved UTRAN (E-UTRAN),
commonly known as Long Term Evolution (LTE). In parallel
with the LTE development, the Next Generation Mobile
Network (NGMN) association of operators brings forward
requirements on management simplicity and cost
efficiency [1][2][3]. A promising approach for achieving these
requirements is the introduction of self-organisation
functionalities into the E-UTRAN [4]. One aspect that
benefits from self-organisation is cell outage management
(COM), which can be divided into cell outage detection (COD)
and cell outage compensation (COC).
There are multiple causes for a cell outage, e.g., hardware
and software failures (radio board failure, channel processing
implementation error, etc.), external failures such as power
supply or network connectivity failures, or even erroneous
configuration. While some cell outage cases are detected by
operations & management system (O&M) through
performance counters and/or alarms, others may not be
detected for hours or even days. It is often through relatively
long term performance analyses and/or subscriber complaints
that these outages are detected. Currently, discovery and
identification of some errors involves considerable manual
analysis and may require unplanned site visits, which makes
cell outage detection rather costly. It is the task of the
automated cell outage detection function to timely trigger
appropriate compensation methods, in order to alleviate the
degraded performance due to the resulting coverage gap and
loss in throughput by appropriately adjusting radio parameters
in surrounding sites. Moreover, if required, an immediate
alarm should be raised indicating the occurrence and cause of
an outage, in order to allow swift manual repair.
Cell outage detection has previously been studied and
reported in literature, e.g., [5][6][7]. In [5], historic
information is used in a Bayesian analysis to derive the
probability of a cause (fault that initiates the problem) given
the symptoms (manifestations of the causes). Knowledge of
radio network experts is needed as well as databases from a
real network in order to diagnose the problems in the network.
In [7], a cell outage detection algorithm, which is based on the
neighbour cell list reporting of mobile terminals is presented
and evaluated. The detection algorithms acts on changes in
neighbour reporting patterns, e.g., if a cell is no longer
reported for a certain period of time then the cell is likely to
be in outage. In comparison with previous approaches we
consider a broader perspective and consider not only detection
but also cell outage compensation. Off-line optimisation of
coverage and capacity has been reported in e.g., [8][9][10][11]
and [12]. Coverage and capacity are optimised by
appropriately tuning, e.g., the pilot power, antenna tilt and
azimuth. Although such off-line optimisation methods may
provide useful suggestions, for cell outage compensation it is
important to develop methods that adjust involved parameters
on-line and in real-time in order to timely respond to the
outage. Further, some approaches consider single-objective
optimisation (e.g., capacity), whereas we believe that multiple
objectives (e.g., combination of coverage and quality) need to
be considered. As such, in contrast to previous work, we
intend to develop methods for real-time and multi-objective
control and optimisation of LTE networks.
In this paper we present a framework for COM and
describe the needed functionalities and their corresponding
interrelations, as described in Section II. Further, we address a
number of key aspects that play a role in the development of
cell outage management algorithms. This includes a definition
of operator policies regarding the trade-off between various
performance objectives (Section III), an overview of
potentially useful measurements (Section IV), and a set of
appropriate control parameters (Section V). For development
and evaluation of detection and compensation algorithms we
propose a set of suitable scenarios (Sections VI), and present
our assessment methodology and associated criteria
(Section VII). The paper is concluded in Section VIII, where
we also present our future work.
II. OVERVIEW OF CELL OUTAGE MANAGEMENT
The goal of cell outage management is to minimise the
network performance degradation when a cell is in outage
through quick detection and compensation measures. The
latter is done by automatic adjustment of network parameters
in surrounding cells in order to meet the operator’s
performance requirements based on coverage and other
quality indicators, e.g., throughput, to the largest possible
extent. Cell outage compensation algorithms may alter, e.g,
the antenna tilt and the cell transmit power, in order to cover
the outage area.
Altering the radio parameters of the neighbouring cells
means that some of the user equipments (UEs) served by those
cells may be affected. This should be taken into account and
an appropriate balance between the capacity/coverage offered
to the outage area and the unavoidable performance
degradation experienced in the surrounding cells, should be
achieved. This balance is indicated by means of an operator
policy that governs the actions taken by the cell outage
compensation function (see also Section III).
Figure 1 shows the components and workflow of cell
outage management. Various measurements are gathered from
the UEs and the base stations (called eNodeBs in LTE). The
measurements are then fed into the cell outage detection
function, which decides whether at the current time an outage
has occurred and triggers the cell outage compensation
function to take appropriate actions. In the example given in
Figure 1, the base station in the center is in outage, resulting in
a coverage hole. The neighbouring cells have increased their
coverage in order to alleviate the degradation in coverage and
quality.
Figure 1 Overview of the cell outage management components. The center
site is in outage. Red area indicates the previous coverage of the outage cell.
Cell outage compensation is typically characterised by an
iterative process of radio parameter adjustment and evaluation
of the performance impact. In this process there is a clear need
to estimate the performance in the vicinity of the outage area.
This is useful in order to determine to what degree the
compensation actions are successful in terms of satisfying the
given operator policy during an outage. This can be provided
by the so-called X-map estimation function, which
continuously monitors the network and by possibly using
other sources of information such as propagation prediction
data, estimates the spatial characteristics of the network, e.g.,
coverage and quality. Essentially, an X-map is a geographic
map with overlay performance information.
III. OPERATOR POLICY
The configuration changes for compensating a cell outage
influence the network performance. For the network
performance every operator has its own policy, which may
range from just providing coverage up to guaranteeing high
quality in the network. This policy may even be different for
various cells in the network and may vary for cell outage
situations and normal operation situations. Furthermore, the
policies may be declared differently depending on the
time/day of the week. Below we propose a general framework
for defining an operator policy, taking into account the above
mentioned aspects.
In a cell outage situation an operator may still target the
ideal goal of achieving the best possible coverage, providing
the highest accessibility, and delivering the best possible
quality in the cell outage area and all surrounding cells. In
most cases, not all these goals can be fulfilled at the same time.
As a consequence the targets have to be weighted and/or
ranked in order to provide quantitative input to an
optimisation procedure. Depending on the operator’s policy
the optimisation goal itself may vary, i.e., the weighting and
the ranking of the targets may differ. Hence, the policy
definition should be modular/flexible enough to capture
different operator strategies, e.g., coverage-oriented strategy
vs. capacity-oriented strategy.
When defining the cost function for optimisation and
assessment purposes, it is important to include all cells that
are affected in one way or another by the COC algorithm. In
this light, three groups of cells can be distinguished (see
Figure 2): (1) the cell in outage, (2) cells whose parameters
may be adapted by the COC algorithm, and (3) those
surrounding cells whose parameters are not adapted, but
whose performance may be affected by the COC actions.
cell in outage
cells whose parameters may be adapted by the COC
algorithm
surrounding cells whose parameters are not adapted, but
whose performance may be affected by the COC actions
focus on quality
focus on coverage
focus on accessibility
cell in outage
cells whose parameters may be adapted by the COC
algorithm
surrounding cells whose parameters are not adapted, but
whose performance may be affected by the COC actions
focus on quality
focus on coverage
focus on accessibility
cell in outagecell in outage
cells whose parameters may be adapted by the COC
algorithm
cells whose parameters may be adapted by the COC
algorithm
surrounding cells whose parameters are not adapted, but
whose performance may be affected by the COC actions
surrounding cells whose parameters are not adapted, but
whose performance may be affected by the COC actions
focus on qualityfocus on quality
focus on coveragefocus on coverage
focus on accessibilityfocus on accessibility
Figure 2 Cells that are considered when defining a cost function.
The optimisation goals considered in the cost function may
vary for different cells. For example, in cells covering large
areas the coverage should be kept high, whereas in high-
capacity cells located in the same region the focus will be
more on accessibility and/or quality. As a consequence, the
optimisation goals have to be defined on a per-cell basis.
Furthermore, some cells might be more important for the
network operator then other cells due to, e.g., the number of
Measurements
Detection
Compensation
Operator policy
Control
parameters
X-map
estimation
O&M
customers or generated traffic by the customers in these cells.
As such, in addition to the different optimisation goals, cells
may have different importance or priorities. The cost function
has therefore to take into account all optimisation goals per
cell, i.e., coverage, accessibility, and quality, as well as the
corresponding importance. Since it is not possible to derive
the coverage, accessibility, and quality for the cell in outage,
the cost function may consider only the accessibility and
quality of the surrounding cells and the coverage of the whole
sub-network.
IV. MEASUREMENTS
The continuous collection of measurements and analysis of
radio parameters, counters, KPIs, statistics, alarms, and timers,
are an indispensable precondition for the detection and
compensation of a cell outage. In the following these various
information sources are described as measurements. These
measurements may be obtained from various sources (refer
to [13] for an overview of the LTE architecture): (1a) the
eNodeB which is affected by the cell outage (as far as there
are still measurements available), (1b) the neighbouring
eNodeBs, (2) UEs, and (3) O&M system and access
gateway(s). Note that it is to be further investigated to which
extent these measurements are suitable for use in COM
functions. Some major examples are given below:
A. Measurements from eNodeBs:
Cell load, e.g. the load can be used to indicate the
degree of traffic that is carried during outage.
Radio Link Failure (RLF) counter, e.g. a sudden
increase in number of RLFs may indicate a cell outage.
Handover failure rate, e.g., high handover failure rate
may indicate a cell outage.
Inter-cell interference, e.g., a sudden change of the
interference level may be used to indicate outages.
Further, interference measurements are required to
detect incorrect settings of tilt or cell power during
compensation.
Blocked / dropped calls, e.g., high number of dropped
calls may indicate a coverage hole.
Cell throughput and per-UE throughput.
B. Measurements from UE
Reference Signal Received Power (RSRP)
measurements taken by the UE of the serving and
surrounding cells. For example, if a neighbouring cell
is no longer reported then this may be an indication of
an outage.
Failure reports are generated by the UE after
connection or handover failures and sent to eNodeB
for cause analysis. Note, failure reports are not
standardised yet.
C. Measurements from O&M
KPIs and statistics are continuously calculated at the O&M
system. Some of these measurements can be used for the
purpose of outage detection and to validate performance
during and after outage compensation. Alarms appearing at
the O&M system may also be used for outage detection.
For outage detection, taking all the described measurements
into account, a dedicated algorithm is required that combines
the measurements and uses an appropriate decision logic to
determine whether an outage has occurred or not.
Regarding COC, it is clear that the goals presented in
Section III need to be measurable. Accessibility and quality
can be assessed by measurements such as call block ratio and
per-UE throughput. The coverage presents by far greater
challenges. One idea that we are currently pursuing is to
estimate coverage using UE measurement reports. Estimates
of the geographic coordinates of the mobile position may be
assigned to the obtained measurement reports in order to
derive a map which relates geo-reference data to performance
related metrics, e.g., path loss (cf. also Figure 1). We are
intending to investigate to which degree coverage can be
estimated based on UE reports (utilising position information)
as well as what accuracy is needed as input to the COC
algorithms.
V. CONTROL PARAMETERS
All radio parameters that have an impact on coverage and
capacity are relevant from a cell outage compensation point of
view. This includes transmit power and antenna parameters.
The power allocated to the physical channels dictates the
cell size. On the one hand, by increasing the physical channel
power the coverage area of a cell can be increased (in order to
compensate for outage). On the other hand, by lowering the
cell power the cell area is reduced and as a consequence load
and interference caused by the cell can also be reduced.
Further, modern antenna design allows influencing the
antenna pattern and the orientation of the main lobe by
electrical means (e.g., remote electrical tilt and beam forming).
Extensive studies for WCDMA systems reveal that antenna
tilt is a highly responsive lever when it comes to shaping the
cell footprint and the interference coupling with other
cells [14]. Beam forming may be used to steer the direction of
the antenna gain toward the area that is in outage (given that
the position and orientation of the base stations are known, at
least to some degree).
Besides the above-mentioned primary control parameters,
which are employed to improve the coverage area, there are
secondary control parameters. These are parameters that might
require an update as consequence of a cell outage as such, or
as consequence of the alteration of primary control parameters.
For example cell outages as well as the triggered adjustment
of e.g. reference signal power and antenna tilt are likely to
induce new neighbor relations and hence neighbor cell lists
need to be updated.
VI. SCENARIOS
In this section several scenarios are described that will be
considered in the development and assessment of cell outage
compensation methods. The appropriateness of the scenarios
lies therein that they capture a diversity of case studies
representing different network situations and where significant
impact on COD and COC performance is anticipated, and as
such, are likely to impact the specifics of the developed
algorithms. For cell outage compensation, we limit ourselves
to cases where an entire site or sector fails and do not consider
particular channel or transport network failures. The following
scenario descriptions specify assumptions regarding network,
traffic, and environment aspects:
Impact of eNodeB density and traffic load – In a sparse,
coverage-driven network layout, little potential is
likely to exist for compensating outage-induced
coverage/capacity loss. In a dense, capacity-driven
network layout, however, this potential is significantly
higher, particularly when traffic loads are low.
Impact of service typeThe distinct quality of service
requirements of different services affect the
compensation potential. For instance, compensation
actions may be able to alleviate local outage effects to
handle only low bandwidth services.
Impact of outage locationIf cell outages occur at the
edge of an ‘LTE island’ fewer neighbours exist to
enable compensation. For outages in the core of such
an ‘LTE island’, the compensation potential is larger.
Impact of user mobility If mobility is low, few users
spend a relatively long time in an outage area.
Alternatively, if the degree of mobility is high, many
users spend a relatively short time in an outage area.
The perceived outage impact depends on the delay-
tolerance and elasticity of the service.
Impact of spatial traffic distribution If traffic is
concentrated near sites, it is typically relatively far
away from neighbouring sites and hence the
compensation potential is limited. Alternatively, if
traffic is concentrated ‘in between’ sites, the potential
is larger.
Other possible scenarios that are deemed somewhat less
significant are related to propagation aspects and the UE
terminal class. As an example for the first aspect, note that
e.g., a higher shadowing variation generally causes the outage
area to be more scattered, providing more potential for
surrounding cells to capture the traffic. Regarding the latter,
note that the higher a UE’s maximum transmit power is, the
lower the need for outage compensation, since the UE may
still be able to attach to a more distant cell even without
compensation measures.
VII. ASSESSMENT CRITERIA AND METHODOLOGY
Appropriate assessment criteria are required for comparison
of different candidate COD and COC algorithms, and to
assess the network performance improvement when the COD
and COC algorithms are activated. The assessment criteria
should also quantify the trade-off between the gains in
network performance and deployment impact in terms of
signalling and processing overhead, complexity, etc.
Denote with Tfail and Tdetect the time instant when the failure
occurred and when it is detected, respectively. A failure
duration interval starts with the occurrence of a failure and
ends with the elimination of the failure (e.g., by repairing the
error involved), see Figure 3. A true detection is a detection
which is reported by the cell outage detection mechanism
during the failure duration interval. In contrast a false
detection is reported outside the failure duration interval and
is, as such, an erroneous detection.
Figure 3 Cell outage detection events.
Denote with Nfail the number of failures during the
observation period, and with Ndetect and Nfalse the number of
true and false detections, respectively. Suitable assessment
criteria to evaluate the outage detection performance are:
detection delay Tdetect - Tfail; detection probability Ndetect / Nfail;
and the false detection probability Nfalse / (Nfalse + Ndetect).
The assessment criteria for performance evaluation of the
COC algorithms can be defined from a subset of the UE and
eNodeB measurements, presented in Section IV, or based on
the performance information available in the simulations
models related to e.g. system capacity, provided coverage and
quality of service (QoS), etc. For example, an important
assessment criteria is coverage, which is defined as
(Nbin Nbin_outage) / Nbin, where Nbin is the number of pixels or
bins in the investigated area while Nbin_outage is the number of
pixels having average Signal to Interference Noise Ratio
(SINR) or throughput lower than a pre-defined threshold.
Suitable criteria for assessing the deployment impact for
cell outage detection and compensation are the signalling
overhead, defined as number of messages (or bytes) per time
unit, and processing overhead that is defined as amount of
processing needed to detect or compensate the outage. Note
that the signalling overhead can be assessed on the transport
network (e.g. X2, S1, and O&M interfaces) and the radio
network (e.g. between the UE and the eNodeB).
In order to illustrate the envisaged evaluation methodology
for the COC we define three system states as presented in
Figure 4. State A, denotes the pre-outage situation, state B is
the post-outage situation without COC, and state C is the post-
outage situation with COC. The red arrow indicates the time
instant at which the cell outage occurs.
time
Supported traffic
A
B
C
Outage
Figure 4 The system states before and after a cell outage event.
F
Detection Delay
D
True Detection
F
Missed Detection
False Detection
Tfail Tdetect
time
FD
Tdetect
FD
Failure occurrence Failure detection Failure duration
D
The proposed evaluation methodology for the assessment
of the COC algorithm consists of the following steps:
Step 1 ‘Controllability and observability analysis Given
the post-outage system state B, it should be determined which
control parameters (see Section V) have the highest impact on
the system performance. Furthermore, the most important UE
and eNodeB measurements should be identified that can be
used to timely observe the system state and performance with
sufficient accuracy.
Step 2 ‘Design of COC algorithm’ Design of the self-
optimisation algorithm aided by static/dynamic simulations.
Given system state B and starting from the parameter settings
in system state A, perform step-wise adjustments of the
control parameters deemed most effective in the
controllability study of Step 1. After each COC step the
system performance should be measured in order to evaluate
the impact of the previous COC decision(s). The COC steps
consecutively adjust the control parameters until the algorithm
decides that no further adjustments are needed and the system
converges into a stable working state. It is important to
determine the convergence time of the COC algorithm.
Additionally, it is important here to compare the resulting
system performance (with the COC adjustments) to either the
optimal system performance that can be achieved or the
situation when no COC adjustments are made in the system.
Step 3 COC deployment assessment’ As a final step it is
important to assess the performance of the algorithm
developed in Step 2, considering the full dynamics of the state
transition from state A to state B as well as the practical
constraints such as e.g. availability of the measurements,
duration of measurement intervals, measurement accuracy,
time needed for the parameter adjustments etc. The impact on
the resulting system performance (see state C in Figure 4)
should be determined when the state transition and the
practical constraints are taken into account.
VIII. CONCLUSION
The detection and compensation of outages in current
mobile access networks is rather slow, costly and suboptimal.
Herein, costly refers to the considerable manual analysis and
required unplanned site visits that are generally involved,
while suboptimal hints at the typical revenue and performance
loss due to delayed and poor mitigation of the reduced
coverage and capacity. The framework described in this paper
addresses the shortcomings of today’s practice and proposes
an approach for automatic cell outage management and
outlines the key components necessary to detect and
compensate for outages. We argue that the detection and
compensation of outages, which aims at alleviating the
performance degradation, can be made autonomous, requiring
minimal operator intervention. This means not only a faster
and better reaction to outages, which decreases the revenue
and performance losses, but also less manual work involved
and, as such, OPEX.
We will continue our work with the following three steps.
First, a controllability study will be carried out in order to
assess to what degree the application of the different control
parameters is effective in the compensation of outages, as well
as to understand the relation between control parameters and
performance indicators, e.g., coverage and throughput. In the
subsequent observability study we will investigate which
measurements, counters, etc, are most suitable for use in
outage detection and compensation algorithms. As part of this,
one line of work is to derive methods for obtaining X-maps.
Lastly, we will develop algorithms for cell outage detection
and compensation and evaluate their performance using the
scenarios and assessment criteria presented in this paper.
ACKNOWLE DGMENT
The presented work was carried out within the FP7
SOCRATES project [15], which is partially funded by the
Commission of the European Union.
REFERENCES
[1] 3GPP TR 36.902, “Self-configuring and self-optimizing network use
cases and solutions”, version 1.0.1, September 2008
[2] NGMN, “Use Cases related to Self Organising Network. Overall
Description”, version 1.53
[3] 3GPP S5-090009, “NGMN Recomendation on SON & O&M
Requirements”, 3GPP RAN3 & SA5 Joint Meeting, 12 - 13 January
2009, Sophia Antipolis, France.
[4] J. L. Van den Berg, R. Litjens, A. Eisenblätter, M. Amirijoo, O. Linnell,
C. Blondia, T. Kürner, N. Scully, J. Oszmianski, L. C. Schmelz, “Self-
Organisation in Future Mobile Communication Networks”, in
Proceedings of ICT Mobile Summit 2008, Stockholm, Sweden, 2008.
[5] R. Barco, V. Wille, L. Diez, “System for Automated Diagnosis in
Cellular Networks based on Performance Indicators”, European
Transactions on Telecommunications, Vol. 16, Issue 5, pp. 399 - 409,
Sept. 2005.
[6] R. Khanafer, B. Solana, J. Triola, R. Barco, L. Nielsen, Z. Altman, P.
Lázaro, “Automated Diagnosis for UMTS Networks using Bayesian
Network Approach”, IEEE Transactions on Vehicular Technology, Vol.
57, Issue 4, pp. 2451 2461, July 2008.
[7] C. Mueller, M. Kaschub, C. Blankenhorn, S.Wanke, “A Cell Outage
Detection Algorithm Using Neighbor Cell List Reports”, International
Workshop on Self-Organizing Systems, pp. 218 229, 2008.
[8] I. Siomina, P. Varbrand and D. Yuan, “Automated Optimisation of
Service Coverage and Base Station Antenna Configuration in UMTS
Networks”, IEEE Wireless Communications Magazine, Vol. 13, Issue
6, pp. 16-25, Dec. 2006.
[9] K. Valkealahti, A. Höglund, J. Pakkinen, and A. Flanagan, “WCDMA
Common Pilot Power Control for Load and Coverage Balancing”,
IEEE International Symposium on Personal, Indoor and Mobile Radio
Communications, Vol. 3, pp. 1412-1416, Sept. 2002.
[10] J. Yang and J. Lin, “Optimisation of Pilot Power Management in a
CDMA Radio Network”, Vehicular Technology Conference, pp. 2642-
2647, Sept. 2000.
[11] D. Fagen, P. Vicharelli, J. Weitzen, “Automated Coverage
Optimisation in Wireless Networks”, Vehicular Technology
Conference, pp. 1-5, Sept. 2006.
[12] K. Valkealahti, A. Höglund and T. Novosad, “UMTS Radio Network
Multiparameter Control”, International Symposium on Personal,
Indoor and Mobile Radio Communications, Vol. 1, pp. 616-621, 2003.
[13] 3GPP TS 36.300, “Evolved Universal Terrestrial Radio Access (E-
UTRA) and Evolved Universal Terrestrial Radio Access Network (E-
UTRAN); Overall description”, version 8.7.0, December 2008
[14] J. Niemelä, J. Lempiäinen, “Impact of Mechanical Antenna Downtilt
on Performance of W-CDMA Cellular Network”. Vehicular
Technology Conference 2004 Spring, pp 20912095.
[15] (2009) SOCRATES website., Available: www.fp7-socrates.eu.
... Gaps in network coverage could also result from partial or complete outages. This could be due to hardware and software failures (e.g., radio board failure, channel processing implementation error etc.) and external failures such as power supply or network connectivity failures [10]. ...
... While some cell outages are detected by operations support system functions using performance counters or alarms, other outages may not be detected for hours or even days. These outages are often detected through continuous long term performance analysis or subscriber complaints [10]. Therefore, coverage gaps due to poor network configuration or changes in radio environment (also known as coverage holes) require mobile cellular network operators to invest heavily in regular network coverage testing, usually through drive-tests. ...
... On the other hand, identifying and resolving cellular network outages, which are a consequence of software or hardware failure of network entities and do not raise an explicit alarm in operation and maintenance, requires highly trained engineers parsing gigabytes of network health logs and network performance indicator data looking for outages. This requires manual analysis and may require unplanned site visits, which makes cell outage detection a costly task [10]. Given the continuous growth in cell density and increasing pressure to reduce operational costs [11], both of the above approaches are quickly becoming impracticable [12], [13]. ...
Article
Full-text available
Ambitious quality of experience expectations from 5G mobile cellular networks have spurred the research towards ultra-dense heterogeneous networks (UDHNs). However, due to coverage limitations of millimeter wave cells and lack of coverage data in UDHNs, discovering coverage lapses in such 5G networks may become a major challenge. Recently, numerous studies have explored machine learning-based techniques to detect coverage holes and cell outages in legacy networks. Majority of these techniques are susceptible to noise in the coverage data and only characterize outages in the spatial domain. Thus, the temporal impact of an outage, i.e., the duration of its presence remains unidentified. In this paper, for the first time, we present an outage detection solution that characterizes outages in both space and time while also being robust to noise in the coverage data. We do so by employing entropy field decomposition (EFD) which is a combination of information field theory and entropy spectrum pathways theory. We demonstrate that compared to other techniques such as independent component analysis and k-means clustering, EFD returns accurate detection results for outage detection even in the presence of heavy shadowing in received signal strength data which makes it ideal for practical implementation in emerging mobile cellular networks.
... Antenna tilting and a cell reselection offset have been adopted in [17] to avoid the ping-pong effect during COC process. Antenna pattern and orientation, and power allocation to downlink physical channels were used in [18] which needs the beamforming adaptation capabilities. The gain of the antenna through the electrical tilting and the downlink transmission power of the surrounding control/data Base Station (BS) in the control/data plane were used in [19]. ...
... Such lists of selected parameters to be tuned are found to be the most effective ones for the performance of the network [12][13][14][15][16][17][18][19][20][21]. The objective of optimization is to maximize the number of reconnected users or minimize the number of dropped users relative to the total number of users. ...
Article
Full-text available
Abstract In this paper, a system for LTE Cell Outage Compensation (COC) based on hybrid Genetic Algorithms (GA) and Artificial Neural Networks (ANN) has been proposed. COC aims to minimize the impact of cell outage which leads to decrease in operator revenue and/or the customer satisfaction. The proposed system adopts an optimization module to search for an optimal setting of a set of LTE operational parameters to achieve a targeted set of key performance indicators. The optimization process always leads to good enough solutions, but it also requires a huge number of trials. So, in the proposed system, a huge set of outage scenarios is collected along with their optimal argument settings that are acquired by the optimization module and they are used to train an artificial neural network (ANN) module, which acts as an expert that can optimally act on the different situations in real‐time mode. Simulation environment is set to evaluate different LTE measures and Key Performance Indicators (KPIs) on different outage scenarios. Simulation results proved the capability and robustness of the proposed system to minimize the number of users experiencing outage. Simulation results also show that the proposed system achieves optimal parameter settings without violating the overall system performance and with minimal processing time, while introducing significant impact on the performance of LTE.
... These include poor network design, including improperly configured parameters such as the number, types, and locations of the base stations (BS), antenna heights, sector orientation, tilt, power, frequency reuse patterns, or the number of carriers, among others. Other types of faults can occur due to hardware, software, or functionality failures (e.g., power supply or radio board and network connectivity failures) [3]. ...
... Traditionally, outages resulting from faults are detected by human-based monitoring of either alarms, performance counters, or complaints filed by network subscribers [3], [4]. ...
Article
Full-text available
Diminishing viability of manual fault diagnosis in the increasingly complex emerging cellular network has motivated research towards artificial intelligence (AI)-based fault diagnosis using the minimization of drive test (MDT) reports. However, existing AI solutions in the literature remain limited to either diagnosis of faults in a single base station only or the diagnosis of a single fault in a multiple BS scenario. Moreover, lack of robustness to MDT reports spatial sparsity renders these solutions unsuitable for practical deployment. To address this problem, in this paper we present a novel framework named Hy brid D eep Learning-based R oot Cause A nalysis (HYDRA) that uses a hybrid of convolutional neural networks, extreme gradient boosting, and the MDT data enrichment techniques to diagnose multiple faults in a multiple base station network. Performance evaluation under realistic and extreme settings shows that HYDRA yields an accuracy of 93% and compared to the state-of-the-art fault diagnosis solutions, HYDRA is far more robust to MDT report sparsity.
... One reason is poor network planning, which results in the improper configuration of parameters, such as optimal number, types, and location of base stations, the antenna height, the number of sectors, the sector orientation, tilt, power, frequency reuse pattern, or the number of carriers, among others. Other types of faults can occur due to hardware, software, or functionality failures (e.g., power supply or radio board and network connectivity failures) [2]. ...
... Traditionally, outages are detected by either using alarms, performance counters, or by complaints filed by network subscribers [2]. This can take hours and at times days to resolve outage issues. ...
Conference Paper
Full-text available
Fault diagnosis is turning out to be an intense challenge due to the increasing complexity of the emerging cellular networks. The root-cause analysis of coverage-related network anomalies is traditionally carried out by human experts. However, due to the vast complexity and the increasing cell density of the emerging cellular networks, it is neither practical nor financially viable. To address this, many studies are proposing artificial intelligence (AI)-based solutions using minimization of drive test (MDT) reports. Nowadays, the focus of existing studies is either on diagnosing faults in a single base station (BS) only or diagnosing a single fault in multiple BS scenarios. Moreover, they do not take into account training data sparsity (varying user equipment (UE) densities). Inspired by the emergence of convolutional neural networks (CNN), in this paper, we propose a framework combining CNN and image inpainting techniques for root-cause analysis of multiple faults in multiple base stations in the network that is robust to the sparse MDT reports, BS locations and types of faults. The results demonstrate that the proposed solution outperforms several other machine learning models on highly sparse UE density training data, which makes it a robust and scalable solution for self-healing in a real cellular network.
... Self-healing in cellular networks is considered with outage detection and compensation and realized by three functions: outage detection, outage diagnosis and COC [7,241]. Once the outage is detected, the root cause should be diagnosed correctly in order to request the most appropriate recovery algorithms by the COC. ...
... The COC functionality is considered with outage compensation by modifying coverage and capacity NCPs according to the outage characteristics and the recovery requirements (see Table 4) [241]. Thus, the conflicting adjustments of NCPs at run-time are directly caused by one or both of the following conflicts: ...
Article
Self-organizing network (SON) is a well-known approach to reduce the complexity and the cost of cellular network management. It aims at replacing the manual configuration and optimization with the functionalities of self-configuration, self-optimization and self-healing. Due to the important role of SON, the problem of conflicts between SON functions has been seriously considered over the last decade. In order to resolve this problem, 3GPP has introduced the functionality of self-coordination which is responsible for conflict avoidance and resolution. However, the conflict-free execution of SON functions remains a challenge as it requires the coordination mechanisms to address all potential interactions between SON functionalities, anticipate their impact on the network and evaluate their execution results. Self-coordination in SON is therefore considered as an open research field since it directly affects the performance of SON functionalities and as a result affects the network stability. In this paper, we provide a survey of SON conflicts and self-coordination methodologies that can be used for conflicts avoidance and resolution, and review the recent solutions to state-of-the-art, including papers in this research area. Finally, we point out major challenges and research issues to be addressed in the future.
... Heuristic solutions for outage detection that completely depend on prior knowledge of experts in the field have been presented in the literature. In [5], decision tree scheme based on rules was employed to detect full outage. Expert knowledge was used to develop the rules in order to develop full outage detection trigger criteria for performance measurements. ...
Chapter
The 5G network is anticipated to be more densified in the future, containing numerous heterogeneous cells. Managing the heterogeneous networks (HetNets) becomes challenging and almost unattainable. Self-organizing networks (SONs) are needed to ensure flexiblity and automatic deployment and maintenance of the 5G networks. Automated cell outage detection is a prominent research focus since self-healing SON solutions execute compensation processes to mitigate network disruption. This study presents a Fisher’s discriminant analysis (FDA) to obtain feature vectors with lower dimensionality, which are suitable for hidden Markov model (HMM). The proposed FDA-HMM automatically predicts the present status of 5G base stations (BSs) and determines a cell outage. The proposed FDA-HMM outage detection scheme’s performance is compared with existing algorithms such as the conventional HMM, support vector machine (SVM), and random forest. The results of simulation indicate that the proposed FDA-HMM algorithm effectively detects cell outage with 97.02% accuracy as compared to the exisiting supervised learning methods.KeywordsCell outageFisher’s discriminant analysis (FDA)Heterogeneous networks (HetNets)Hidden Markov model (HMM)Random forestSelf-healingSelf-organizing networks (SON)Support vector machine (SVM)
... The IST project SOCRATES [2] has given several contributions to the understanding and development of SON. Load balancing and cell outage management are studied in [5,6]. 4G Americas, next generation mobile networks (NGMN), and 3rd Generation Partnership Project (3GPP) have provided reports [7,8,10] where the most basic use cases of SON are discussed. ...
Article
Self-Organizing Networks (SON) is an automation technology in the wireless industry implemented to simplify the planning, deployment, operation, optimization, and healing of networks. However, legacy SON functions are targeted at network automation and network optimization through certain optimization rules and policies which are globally applied in the entire networks. Therefore, scalable and targeted optimization is not supported in these existing SON solutions. Furthermore, such existing SON schemes are driven by performance optimization rather than ultimately improving user Quality of Experience (QoE). The impact of application characteristics on network performance and further on QoE are also not considered in SON defined by 3GPP. This paper presents an application characteristics-driven SON system (APP-SON) to optimize 4G/5G network performance and user Quality of Experience. APP-SON leverages a scalable big data platform for targeted optimization through profiling cell application characteristics with an incremental manner in temporal space. A Hungarian Algorithm Assisted Clustering (HAAC) algorithm and a deep learning-assisted regression algorithm are developed to profile the cell application characteristics and find the targeted KPIs to be optimized for each cell in a network. A similarity-based parameter-tuning algorithm is designed to tune the corresponding engineering parameters to optimize the targeted KPIs which further improve the QoE. Experimental results demonstrated that the APP-SON system can precisely profile cell traffic and application characteristics to find the targeted KPIs for optimization. APP-SON can also automatically tune the corresponding engineering parameters to improve corresponding KPIs, ultimately improving QoE. APP-SON has been successfully implemented in production and applied in a tier-1 operator's 4G network. As a universal SON solution, it will be smoothly transitioned and applied in 5G networks for this operator.
Conference Paper
Spectrum management in cellular networks is a challenging task that will only increase in difficulty as complexity grows in hardware, configurations, and new access technology (e.g. LTE for IoT devices). Wireless providers need robust and flexible tools to monitor and detect faults and misbehavior in physical spectrum usage, and to deploy them at scale. In this paper, we explore the design of such a system by building deep neural network (DNN) models1 to capture spectrum usage patterns and use them as baselines to detect spectrum usage anomalies resulting from faults and misuse. Using detailed LTE spectrum measurements, we show that the key challenge facing this design is model scalability, i.e. how to train and deploy DNN models at a large number of static and mobile observers located throughout the network. We address this challenge by building context-agnostic models for spectrum usage and applying transfer learning to minimize training time and dataset constraints. The end result is a practical DNN model that can be easily deployed on both mobile and static observers, enabling timely detection of spectrum anomalies across LTE networks.
Conference Paper
Full-text available
The aim of this paper is to evaluate the impact of the base station mechanical antenna downtilt scheme on the downlink capacity of a 6-sectored WCDMA cellular network of 33° horizontal beamwidth antennas. The effect of the base station antenna height and vertical beamwidth together with site spacing was evaluated in a macrocellular environment and observations were made based on system level simulations utilizing a Monte Carlo approach. The results show that downlink capacity of a WCDMA cellular network obviously depends on the mechanical downtilt angle and the capacity enhancements are based on reduction of other-cell interference. Moreover, the soft and softer handover areas are changed according to mechanical downtilt angle, which clearly depends on the base station antenna height and vertical beamwidth together with site spacing.
Conference Paper
Full-text available
Five key UMTS radio network parameters are simultaneously optimised with an automatic control method. The parameters relate to admission control, power control, handover control. and coverage. The optimization is guided by heuristic expert-defined rules, which apply specific trade-off policies and statistics of poor quality calls, blocking rates, power and interference levels, and terminal measurements to qualify the parameter values, the method was validated using a dynamic WCDMA system simulator with a deployment of 17 cells in Helsinki city center. The same parameter value was applied in each cell as the network formed of a uniform cell structure. The method was shown to produce convergence of parameters and stable operation. The obtained results showed that the method improved overall network performance in comparison to fixed planned default values The capacity of network was improved close to 20% with slightly decreased hut still acceptable quality of calls.
Conference Paper
Full-text available
The paper validates the feasibility of automating the setting of common pilot power in a WCDMA radio network. The pilot automation improves operability of the network and it is implemented with a control software aiming for load and coverage balancing. The control applies measurements of base station total transmission power of neighboring cells and terminal reports of received pilot signal level to determine the pilot qualification. The pilot power of a cell is periodically updated with simple heuristic rules in order to improve the load and coverage balance. The approach was validated using a dynamic WCDMA system simulator with a deployment of macro and micro cells on a city region whose measured propagation characteristics were incorporated into the model. The results showed that the proposed control method balanced load and coverage and improved the air interface performance measured as a function of packet throughput.
Conference Paper
Optimizing wireless networks represents a complex task, directly affecting quality, cost, coverage and capacity. In this paper we propose a new automated method of simultaneously maximizing coverage while minimizing interference. The proposed method represents a novel approach in that it focuses on those network parameters that can be represented by continuous variables, and applies classical optimization methods. A new figure of merit for coverage, the coverage coefficient, is also introduced. Results have been presented for transmit power optimization. A sample network of 36 sites showed an average coverage coefficient improvement of approximately 47%. By determining the desired network coverage with a minimum of overlap, this method can provide a good lead in for application of other capacity enhancing algorithms, such as frequency planning methods.
Conference Paper
Base stations experiencing hardware or software failures have negative impact on network performance and customer satisfaction. The timely detection of such so-called outage or sleeping cells can be a difficult and costly task, depending on the type of the error. As a first step towards self-healing capabilities of mobile communication networks, operators have formulated a need for an automated cell outage detection. This paper presents and evaluates a novel cell outage detection algorithm, which is based on the neighbor cell list reporting of mobile terminals. Using statistical classification techniques as well as a manually designed heuristic, the algorithm is able to detect most of the outage situations in our simulations.
Article
This paper presents a system for automated diagnosis of problems in a cellular network, which comprises a method and a model. The reasoning method, based on a naive Bayesian classifier, can be applied to the identification of the fault cause in GSM/GPRS, 3G or multi-systems networks. A diagnosis model for GSM/GPRS radio access networks is also described, whose elements are available in the network management systems (NMSs) of most networks. It is shown that the statistical relations among the elements, that is the quantitative part of the model, under certain assumptions, can be completely specified by means of the parameters of beta density functions. In order to support the theoretical concepts, a model has been built based on data from a real network and the automated diagnosis system has been used to classify problems in a cellular network, showing that the solution is easily implemented and that the diagnosis accuracy is very high, therefore leading to a reduction in the operational costs of running the network. Copyright © 2005 AEIT.
Conference Paper
In a code division multiple access (CDMA) radio network, optimization of power management can enhance the radio link capacity and performance. When the transmit power ratio of the center cell to adjacent cells changes 6 dB, the capacity of the center cell could be enhanced by 53% for a heavily loaded cell with double traffic density of surrounding cells. The capacity of the whole cluster could increase by 8%. The pilot power percentage directly affects the system capacity and performance. The cluster capacity could increase 43% when the whole cluster pilot percentage changes from 15% to 5%. The capacity of the heavily loaded cell could increase 31% when only the high density cell pilot percentage decreases. Cell breathing is needed to balance the cell when the power and interference change. The capacity and performance of a CDMA system can also be improved by optimizing the power distribution through adjusting the antenna beamwidth and orientation to balance the traffic loading. Commercially deployed CDMA system has demonstrated that the optimization of power management can improve the network capacity and performance
Article
Deployment and maintenance of UMTS networks involve optimizing a number of network configuration parameters in order to meet various service and performance requirements. In this article we address automated optimization of service coverage and radio base station antenna configuration. We consider three key configuration parameters: transmit power of the common pilot channel (CPICH), antenna tilt, and antenna azimuth. CPICH power greatly influences coverage. From a resource management point of view, satisfying the coverage requirement using minimum CPICH power offers several performance advantages. In particular, less CPICH power leads to less interference and higher system capacity. Optimal CPICH power, in its turn, is highly dependent on how the other two parameters, tilt and azimuth, are configured at radio base station antennas. Optimizing antenna tilt and azimuth network-wise, with the objective of minimizing the CPICH power consumption, is a challenging task. The solution approach in this article adopts automated optimization. Our optimization engine is a simulated annealing algorithm. Staring from an initial configuration, the algorithm searches effectively in the solution space of possible configurations in order to find improvements. The algorithm is computationally efficient; thus, we can optimize large networks without using excessive computing resources. We present a case study for a UMTS planning scenario in Lisbon. For this network, automated optimization saves up to 70 percent of the CPICH power used in the reference network configuration. In addition, the optimized network configuration offers significant performance improvement in terms of fewer overloaded cells and lower downlink load factor
Article
This paper presents an automated diagnosis in troubleshooting (TS) for Universal Mobile Telecommunications System (UMTS) networks using a Bayesian network (BN) approach. An automated diagnosis model is first described using the Naive Bayesian Classifier. To increase the performance of the diagnosis model, the entropy minimization discretization (EMD) method is incorporated into the model to select optimal segments for the discretization of the input symptoms. In the first phase, the diagnosis model is constructed using a dynamic simulator. The simulator TS platform allows generation of a large amount of data required to study the relations between faults and symptoms. In the second phase, the diagnosis model is adapted to a real UMTS network using counters and key performance indicators (KPIs) recovered from an Operations and Maintenance Center (OMC). Results for the automated diagnosis using both network simulator and real UMTS network measurements illustrate the efficiency of the proposed TS approach and its importance to mobile network operators.
Use Cases related to Self Organising Network. Overall Description
  • Ngmn