ArticlePDF Available

Deep-Learning-Based Scenario Recognition With GNSS Measurements on Smartphones

Authors:

Abstract and Figures

Smartphones are in everyone’s hands for applications including navigation and localization-based services, and scenario recognition is critical for seamless indoor and outdoor navigation. How to use smartphone sensing data to recognize different scenarios is a meaningful but challenging problem. To address this issue, we propose a structured grid-based deep-learning scenario recognition technique that uses smartphone GNSS measurements (satellite position, pseudorange, Doppler shift, and C/N0). In this work, the scenarios are grouped into four categories: deep indoors, shallow indoors, semi-outdoors, and open outdoors. The proposed approach utilizes Voronoi tessellations to obtain structured-grid representations from satellite positions and performs computations using convolutional neural networks (CNNs) and convolutional long short-term memory (ConvLSTM) networks. With only spatial information being considered, the CNN model is used to extract the features of Voronoi tessellations for scenario recognition, achieving a high accuracy of 98.82%. Then, to enhance the robustness of the algorithm, the ConvLSTM network is adopted, which treats the measurements as spatiotemporal sequences, improving the accuracy to 99.92%. Compared with existing methods, the proposed algorithm is simple and efficient, using only GNSS measurements without the need for additional sensors. Furthermore, the latencies of the CNN and ConvLSTM models on a CPU are only 16.82ms and 27.94ms, respectively. Therefore, the proposed algorithm has potential for real-time applications.
Content may be subject to copyright.
IEEE SENSORS JOURNAL, VOL. XX, NO. XX, XXXX 2022 1
Deep Learning-Based Scenario Recognition
with GNSS Measurements on Smartphones
Zhiqiang Dai ˙
ID , Chunlei Zhai ˙
ID , Fang Li ˙
ID , Weixiang Chen ˙
ID , Xiangwei Zhu ˙
ID , and Yanming Feng ˙
ID
AbstractSmartphones are in every-
one’s hands for applications including
navigation and localization-based ser-
vices, and scenario recognition is critical
for seamless indoor and outdoor navi-
gation. How to use smartphone sensing
data to recognize different scenarios is a
meaningful but challenging problem. To
address this issue, we propose a struc-
tured grid-based deep-learning scenario
recognition technique that uses smart-
phone GNSS measurements (satellite po-
sition, pseudorange, Doppler shift, and
C/N0). In this work, the scenarios are
grouped into four categories: deep indoors, shallow indoors, semi-outdoors, and open outdoors. The proposed approach
utilizes Voronoi tessellations to obtain structured-grid representations from satellite positions and performs computations
using convolutional neural networks (CNNs) and convolutional long short-term memory (ConvLSTM) networks. With only
spatial information being considered, the CNN model is used to extract the features of Voronoi tessellations for scenario
recognition, achieving a high accuracy of 98.82%. Then, to enhance the robustness of the algorithm, the ConvLSTM
network is adopted, which treats the measurements as spatiotemporal sequences, improving the accuracy to 99.92%.
Compared with existing methods, the proposed algorithm is simple and efficient, using only GNSS measurements without
the need for additional sensors. Furthermore, the latencies of the CNN and ConvLSTM models on a CPU are only 16.82ms
and 27.94ms, respectively. Therefore, the proposed algorithm has potential for real-time applications.
Index TermsScenario recognition, GNSS measurements, CNN, ConvLSTM, Smartphone
I. INTRODUCTION
HUMAN activities in modern cities emphasize the impor-
tance of mobile terminals in navigation and positioning
services, placing high requirements on accurate, robust, and ef-
ficient positioning in diverse environments. Smartphone-based
seamless indoor and outdoor navigation and localization has
received increasing attention [1]. However, a single technology
cannot satisfy all navigation and positioning requirements
in different scenarios. For example, the widely used Global
Navigation Satellite System (GNSS) can achieve accurate
This work was supported by Science and Technology Planning
Project of Guangdong Province of China (Grant No.2021A0505030030).
The associate editor coordinating the review of this article and approving
it for publication was Fabrizio Santi. (Corresponding author : Zhiqiang
Dai.)
Zhiqiang Dai, Chunlei Zhai, Fang Li, and Xiangwei Zhu are
with the School of Electronic and Communication Engineering,
Sun Yat-sen University, Guangzhou 510006, China (e-mail:
daizhiqiang@mail.sysu.edu.cn; zhaichlei@mail2.sysu.edu.cn;
lifang63@mail2.sysu.edu.cn; zhuxw666@mail.sysu.edu.cn).
Weixiang Chen is with the Department of Electronic
Engineering, Tsinghua University, Beijing 100084, China (e-mail:
cwx22@mails.tsinghua.edu.cn).
Yanming Feng is with the School of Computer Science, Queens-
land University of Technology, Brisbane, QLD 4000, Australia (e-mail:
y.feng@qut.edu.au).
positioning in open environments. However, in some complex
environments (such as urban canyons, wooded areas, and
tunnels), GNSS signals can easily be blocked, reflected or
attenuated by the surrounding buildings and vegetation [2]–
[4]. Thus, the inherent vulnerability of GNSS in complex
environments severely restricts the navigation and positioning
performance of these systems.
A practical solution to this problem is to design a multi-
source fusion positioning system that employs additional sen-
sors to obtain auxiliary information to improve positioning
accuracy and robustness [5]–[9]. Although such a scheme
performs well in challenging environments, considerable hard-
ware resources and power are wasted in open environments
that do not require these additional sensors. Therefore, a more
efficient method is to switch between different positioning
technologies in different navigation environments. Navigation
environments are usually divided into four categories: deep
indoors, shallow indoors, semi-outdoors, and open outdoors
[10]–[12]. Many algorithms have been developed to adapt
to these scenarios. In open outdoor scenarios, GNSS is the
most widely used technology and can meet the needs of high-
precision positioning. In semi-outdoor scenarios, such as in
urban canyons and under tree canopies, the shadow matching
2 IEEE SENSORS JOURNAL, VOL. XX, NO. XX, XXXX 2022
method has been proposed to achieve meter-level positioning
precision [13]. In shallow indoor scenarios, positioning can
usually be based on cellular signals [14] or the fusion of
satellite and terrestrial wireless systems [15]. In deep indoor
scenarios, WiFi and Bluetooth-based positioning methods are
the most widely used technologies [16]–[18]. Ultrawideband
(UWB) and visible light positioning also provide practical
solutions for indoor positioning problems [19]–[21]. However,
choosing the appropriate positioning strategy according to the
given scenario depends heavily on the location of the user.
Thus, scenario recognition plays a key role in seamless indoor
and outdoor positioning technology.
Scenario recognition methods can be divided into two
categories: vision-based methods and signal-based methods.
Vision-based methods identify the scenario type by extracting
features from photos of the environment [22]–[24]. Although
vision-based methods can achieve high accuracy, they need
cameras to continuously capture environmental images, which
is not suitable for smartphones. Some signal-based recognition
methods use various types of sensors that can be integrated
into smartphones [25]–[27], including GNSS, accelerometers,
gyroscopes, barometers, magnetometers and light sensors.
However, these multi-sensor scenario recognition algorithms
may misjudge when switching sensors to adapt to changes in
scenarios. Therefore, a desirable solution is to use a single
sensor for scenario recognition. Modern smartphones capable
of receiving GNSS measurements hold rich information about
surrounding environments. Therefore, GNSS-based scenario
recognition methods offer great potential. Several GNSS-based
scenario recognition algorithms have been developed in recent
years. Chen and Tan [28] simply extracted the number of
visible satellites from raw GPS data as a feature for scenario
recognition. Experimental results from various environments
showed a recognition accuracy of 85.6%. Gao and Groves
[29] extracted the number of visible satellites and the sum
of C/N0 over 25 dB-Hz as features and then used a hidden
Markov model (HMM) for scenario recognition. This method
was tested in various locations across the city of London and
achieved an overall accuracy of 88.2%. Lai et al. [30] input
eight multi-satellite GNSS measurements features to a support
vector machine (SVM) to divide the positioning environment
into three categories: open outdoors, occluded outdoors, and
indoors. The recognition accuracy of this method on the testing
dataset reached 90.3%. Zhu et al. [11] proposed a scenario
detection method that used satellite geometric distributions, the
number of visible satellites, the dilution of precision (DOP),
and C/N0 as the observed variables. Moreover, they utilized
a stacked machine learning model and an HMM for scenario
detection, achieving an accuracy of 97.02%. Xia et al. [12]
introduced a deep learning method that utilized a feature
vector containing the number of visible satellites, C/N0, and
its statistics. This vector was then fed into an LSTM network,
achieving an overall accuracy of 98.65%.
The above GNSS-based scenario recognition methods have
achieved excellent results. However, the recognition accuracy
can be further improved because they do not utilize both the
spatial and temporal characteristics of GNSS measurements.
Due to the spatial distribution of satellites and the continuity
in their measurements, various kinds of GNSS measurements
can be regarded as spatiotemporal sequences. Making full
use of this spatiotemporal information can effectively improve
the accuracy and robustness of scenario recognition algo-
rithms. Thus, in this paper, we utilize four types of GNSS
measurements: satellite position, pseudorange, Doppler shift,
and C/N0. These features are mapped to image sequences
using Voronoi tessellations to fully consider the spatiotemporal
characteristics of the measurements. Then, the images are fed
into a neural network, a convolutional neural network (CNN)
or a convolutional long short-term memory network (ConvL-
STM), to recognize the navigation scenarios. The results of
extensive experiments demonstrate that the proposed algorithm
performs well in terms of accuracy, robustness, and real-time
performance.
The remainder of this paper is organized as follows. Section
II introduces the proposed scenario recognition method, as
well as the Voronoi tessellations, the CNN model, the ConvL-
STM model, and the scenario categories. Section III presents
the dataset and an analysis of the results of the proposed
method. Section IV summarizes our conclusions and provides
directions for future work.
II. METHODOLOGY
In this section, we first present the flow of our scenario
recognition algorithm. Then, we introduce the scenario cate-
gories and the GNSS measurements used in the algorithm.
A. Overview
To optimize the spatiotemporal information contained in
the GNSS measurements, we propose a deep learning-based
scenario recognition method. Fig. 1 presents the concept and
processing flow of the proposed method. First, a satellite pro-
jection module maps visible satellites (white dots) and blocked
satellites (black dots) onto a square image, which serves as the
fourth channel in the 4-channel images. Then, the Voronoi
tessellation module utilizes three GNSS measurements(the
pseudorange, Doppler shift, and C/N0) to interpolate the
satellite projection image and obtain the other three channels.
Finally, the 4-channel images are fed into a neural network
(CNN or ConvLSTM) to identify four typical scenarios.
B. Satellite Projection Module
This module generates the projection images that are used
as the fourth channel in the 4-channel images and assists the
Voronoi tessellations. During each epoch, all received and
blocked satellites are projected onto a square image plane
based on the elevation and azimuth, which are computed
according to the broadcast ephemeris. As shown in Fig. 2,
the coordinates of a satellite on the image plane xs= [x, y]T
can be expressed as
x=rcos θsin φ
y=rcos θcos φ(1)
where rrepresents half of the side length of the square
image and θand φare the elevation and azimuth of the
AUTHOR et al.: PREPARATION OF PAPERS FOR IEEE TRANSACTIONS AND JOURNALS (DECEMBER 2022) 3
G01
C01 E01
R01
Convolutional Neural Network
G01
C01 E01
R01
Convolutional LSTM
deep indoor
shallow indoor
semi-outdoor
open outdoor
Projection
Pseudorange
C/N0
Doppler
Deep learning based scenario recognition
Voronoi tessellation
or
Satellites projection
cos sin
cos cos
xr θ φ
yr θ φ
=
=
Fig. 1. Concept and processing flow of the proposed system, which maps GNSS measurements to 4-channel images. The images are then fed
into a CNN or ConvLSTM to identify the four typical scenarios.
satellite, respectively. Then, the projection image IP=
IP({xsi}n
i=1)R2r×2r, i = 1,2,· · · , n (Fig. 3(a)), which
indicates the projection of the satellite on the square image,
is defined as
IP({xsi}n
i=1) = 1if x=xsi for any i
0otherwise.(2)
θ
φ
r
x
y
satellite
projectio n
X(E)
Y(N)
Z(U)
Fig. 2. Satellite projection method, where the blue dot is the satellite
and the green dot is its projection. The gray areas represent locations
without satellite distribution.
C. Voronoi Tessellation Module
This module utilizes three GNSS measurements to inter-
polate the projection image and determine the other three
channels in the 4-channel image. The tool we employ is the
Voronoi tessellation, which is widely used in meteorology
and geography. Voronoi tessellation is a simple and spatially
optimal projection of GNSS measurements onto the spatial
domain [31]. We first give the typical generic definition of
this tool. Let Sdenote a set of npoints si(called sites)
in the Euclidean plane E[32] and pbe any point in E.
This tessellation approach optimally partitions space Einto n
regions G={g1,g2,· · · ,gn}using boundaries determined
by distances dbetween pand si. Using a distance function d,
the Voronoi tessellation can be expressed as
gi={pE|d(p, si)d(p, sj), i =j}(3)
Then, let the projection of each satellite in IPbe a
site. The GNSS measurements from satellite sican then be
interpolated for each region gi. Since there are no satellites
outside the circle of radius r, we define the Voronoi image
IV {IR,ID,IC}(Fig. 3) as follows
IV=GM(4)
where denotes the Hadamard product, IR,ID,IC
R2r×2rdenote the pseudorange, Doppler shift, and C/N0
images, respectively, and Mis the masked image, which can
be expressed as
M(x, y) = 1if x2+y2r2
0otherwise.(5)
(a) Projection (b) Pseudorange (c) Doppler (d) C/N0
Fig. 3. Example of a 4-channel image, which consists of a projection
image, a pseudorange image, a Doppler image, and a C/N0 image.
Then, we obtain a 4-channel image I= [IP,IR,ID,IC]
R2r×2r×4, which is fed into a neural network (CNN or
ConvLSTM) to recognize four scenarios.
D. CNN Model
The transformation from GNSS raw measurements to
image-like data allows us to apply a convolutional neural
network (CNN) [33]. A CNN is a powerful neural network
that is designed for image processing and dominates computer
vision field. A CNN can be intuitively understood as a stack
of convolutional (conv), pooling, and fully connected (FC)
layers. The main operation of the CNN is convolution which
extracts the features of the input image.
Il
j(m, n)=Ψ"bj+
Cl11
X
i=0
K1
X
p=0
K1
X
q=0
Wl
ijmn
Il1
i(m+ps, n +qs)#(6)
4 IEEE SENSORS JOURNAL, VOL. XX, NO. XX, XXXX 2022
Fig. 4. Block diagram of the CNN model. A 4-channel image is fed into the network. Then two conv layers and two max pooling layers are used
for feature extraction and two FC layers are added to the network. Finally, a one-hot vector Yithat represents scenario label is obtained.
where s=floor(k/2),Wdenotes a convolution kernel of
size K×Kand bjindicates the bias. In the lth layer, Il1
iis
the ith channel in the input feature map, and Il
jis the output
of the jth channel. Ψrepresents the activation function. Our
CNN model is shown in Fig. 4.
In the input layer, the radius ris set to 50 pixels, yielding
an image size of 100×100×4. To reduce the algorithm com-
plexity, we reduce the number of conv layers and the number
of channels of the feature map as much as possible without
losing accuracy. The reason is that deeper and wider networks
have higher hardware requirements and larger latencies. To
ensure the real-time ability of the algorithm, we add only 2
conv layers. The numbers of feature map channels in the first
and second layers are 16 and 32, respectively. The conv kernel
size in each layer is set to 3×3. The conv layers are followed
by two FC layers. The output of the network Yiis a one-hot
vector corresponding to a particular scenario. We use softmax
as the nonlinear activation function in the last layer and the
rectified linear unit (ReLU) function in all other layers.
ReLU (x) = max (x, 0) (7)
Our model uses the cross-entropy loss function, which
usually performs well in multi-classification tasks. The pa-
rameters (weights and biases) of the network are optimized
by stochastic gradient descent (SGD). We train the network
on a training and validation dataset for approximately 2000
epochs. Throughout the training process, a batch size of 32 and
a learning rate of 106are used. To prevent overfitting, batch
normalization (BN) is added after each conv layer. Another
regularization technique is early stopping with a patience of
15, which means that if the validation loss is not reduced
within 15 epochs, the training is forced to stop early.
E. ConvLSTM Model
Although CNNs exhibit excellent performance in scenario
recognition tasks, CNNs do not consider correlations among
consecutive scenarios over time. GNSS measurements are
sequential data, and 4-channel images are generated at each
sampling time. Therefore, these correlations can be included
in the algorithm to further improve the robustness. We define
scenario recognition as a spatiotemporal sequential classifica-
tion problem. To model the spatiotemporal relationship, we
utilize ConvLSTM [34], which embeds conv structures into
an LSTM network [35]. In the ConvLSTM model, a processor
known as a cell determines whether the information is useful
(Fig. 5). Three gates are included in each cell: an input gate
It, a forget gate Ft, and an output gate Ot.Ht1is the
hidden state, and Ctis the memory passed to the next cell.
The key expression of the ConvLSTM cell is shown below.
It=σ(Wxi Xt+Whi Ht1+Wci Ct1+bi)
Ft=σWxf Xt+Whf Ht1+Wcf Ct1+bf
Ct=FtCt1+Ittanh (Wxc Xt+Whc Ht1+bc)
Ot=σ(Wxo Xt+Who Ht1+Wco Ct1+bo)
Ht=Ottanh (Ct)
(8)
where is the Hadamard product and represents the
convolution operation. Wxi,Wxf ,Wxc and Wxo are the
weight matrices connected to the input vector Xtin each
layer; Whi,Whf ,Whc and Who are the weight matrices
connected to the previous hidden state Ht1;Wci,Wcf and
Wco are the weight matrices connected to Ct1in each layer;
and bi,bf,bcand boare the biases.
σ σ tanh σ
tanh
It
FtOt
Ct
~
Ct-1
Ht-1
Xt
Ht
Ct
Yt
Activation function
Operational character
Copy
Join
Convolution
Hadamard product
Fig. 5. Block diagram of the ConvLSTM cell. The difference between
ConvLSTM and LSTM models is that ConvLSTM introduces the convo-
lution operation.
To perform scenario recognition with the ConvLSTM net-
work, the 4-channel image sequence needs to be transformed
into time segments with a sliding window. Consider the
following sequences:
{I0,I1,I2,· · · ,It,It+1,It+2 ,· · ·} (9)
Let the sliding window size be T. The tth (t0) time
segment is defined as
{It,It+1,· · · ,It+T2,It+T1}(10)
This approach has a drawback: scenario recognition cannot
be performed within the first T1sampling times. However,
AUTHOR et al.: PREPARATION OF PAPERS FOR IEEE TRANSACTIONS AND JOURNALS (DECEMBER 2022) 5
as we discuss later, Tis usually very small. Therefore, this
factor has little effect on the algorithm.
A ConvLSTM network can be built based on ConvLSTM
cells. Its structure is shown in Fig. 6. A 4-channel image
sequence with a sliding window width Tis fed into TCon-
vLSTM cells with the same parameters. These Tcells form a
layer in the network. A stronger feature extraction capability
can be obtained by stacking nlayers . After several FC layers
and a softmax layer, the final one-hot vector representing the
scenario category is obtained.
ConvLSTM1ConvLSTM1ConvLSTM1
ConvLSTM2ConvLSTM2ConvLSTM2
Fully connected
m neurons
Fully connected
4 neurons
ConvLSTMnConvLSTMnConvLSTMn
Softmax
Yi
I0I1IT-1
Fig. 6. Block diagram of the ConvLSTM network with time steps T,
including nConvLSTM layers, 2 FC layers, a softmax layer, and an
output layer.
To ensure the real-time performance of the algorithm, we
add only two ConvLSTM layers. The numbers of feature
map channels in the first and second layers are 8 and 16,
respectively. The size of the conv kernel in each layer is set
to 3×3. The sliding window size Tis set to 4. To prevent
overfitting, we add a BN layer after each ConvLSTM layer.
Similar to the CNN, we still use the cross-entropy loss function
and optimize the parameters of the network with SGD. The
network is trained on training and validation datasets for
approximately 1,000 epochs. The batch size is set to 32 and
the learning rate is set to 106. Moreover, the early stopping
regularization technique with a patience of 15 is applied.
F. Scenario Categories
Complex environments are divided into four categories:
deep indoors, shallow indoors, semi-outdoors, and deep out-
doors, as shown in Fig. 7.
Deep indoors are almost completely closed environments
in which only a few satellite signals are received. It should
be noted that scenarios where no satellites are visible do
not need to be considered in the algorithm. Once a scenario
has been identified as not receiving any signals, this sce-
nario is immediately determined to be deep indoors. Shallow
indoors are scenarios near windows, balconies and doors.
Semi-outdoors include half-obstructed outdoor areas that are
typically near tall buildings. Open outdoors are scenarios with
almost no obstruction; thus, open outdoors have the highest
GNSS signal strength and the most visible satellites. In the
proposed algorithm, we encode the scenario categories as one-
hot vectors.
(a) Deep indoors (b) Shallow indoors
(c) Semi-outdoors (d) Open outdoors
Fig. 7. Scenario categories. Complex environments are divided into
four categories: deep indoors, shallow indoors, semi-outdoors, and deep
outdoors.
Y0= [1,0,0,0]T
Y1= [0,1,0,0]T
Y2= [0,0,1,0]T
Y3= [0,0,0,1]T
(11)
G. Feature Vector Definition
In the proposed algorithm, the satellite position (elevation
and azimuth), pseudorange, Doppler shift, and C/N0 are uti-
lized as features for scenario recognition. In the above four
scenarios, the number of visible satellites and their distribu-
tions vary substantially due to different locations and occlu-
sions (Fig. 8). For example, satellite signals are rarely received
in the deep indoor scenario, while dozens of satellites are
visible in the open outdoor scenario. The pseudorange between
the satellite and the smartphone also varies when a smartphone
is at different latitudes, longitudes or altitudes. Moreover, the
Doppler shift is sensitive to the relative velocity between the
satellite and the smartphone. Different relative velocities, that
is, different Doppler shifts, can be used to distinguish distinct
scenarios to some extent. In terms of C/N0, when the scenario
changes, the average C/N0 varies considerably due to the
difference in the number of visible satellites. In addition, due
to the multipath effect and NLOS signal, diverse occlusions
also have a large impact on C/N0 [11] [12]. Therefore, C/N0
is an ideal feature for scenario recognition. Another reason
for using these measurements and not more is that they
can be read directly from the RINEX format file received
by the smartphone without additional calculations. Obtaining
other measurements requires additional computation, which
increases algorithm complexity. These measurements already
allow our algorithm to have a high accuracy. So adding another
one will only increase the amount of computation without any
improvement in algorithm performance.
Crucially, both visible and blocked satellites are used in
Voronoi tessellations. The measurements of the latter are set
to 0; that is, the Voronoi polygons containing these blocked
satellites are black after interpolation. The value ranges of the
features vary widely. To ensure that the network converges,
6 IEEE SENSORS JOURNAL, VOL. XX, NO. XX, XXXX 2022
Fig. 8. Sky plots of the four scenarios.
the training and testing sets need to be normalized before the
Voronoi tessellations are performed [36]. The normalization
formula is defined as follows:
mt
=mtmmin
mmax mmin
(12)
where mmax and mmin are the maximum and minimum values
of the features among all theoretically visible satellites at time
t, respectively, and mtis the original feature value at that time.
III. EXPERIMENTS AND ANA LYSIS
In this section, extensive experiments are conducted to
analyze the performance of our algorithm. First, the dataset
is introduced. Then, we study the results of the CNN and
ConvLSTM models. Finally, we analyze their real-time per-
formance.
A. Dataset
The experimental data were collected using an Android
smartphone (HONOR 30) with a sampling interval of 1 s.
The data were divided into training and testing datasets. The
training set was used for training and validation, and the testing
set was used to evaluate the generalizability of the proposed
algorithm. To increase the robustness of the algorithm, the
training set needs to be obtained over a wide time range.
Therefore, we collected data over three time periods in a day.
During each time period, we collected data for more than 10
minutes for each scenario, yielding a total duration of 2 hours.
The ratio of training and validation sets is 7:3; that is, 1.4 hours
of data was used for model training, and the remainder of the
data were used for validation. For the testing set, 10 minutes
of data were collected for each scenario, yielding a total time
of more than 40 minutes. The collection time distribution of
the dataset is shown in Fig. 9.
Deep indoor
S h a l l o w i n d o o r
S e m i - o u td o o r
O pen outdoor
00:00 02:00 04:00 06:00 08:00 10:00 12:00 14:00 16:00 18:00
T r a i n i n g d a t a
T e s t i n g d a t a
T i m e
Fig. 9. Distribution of the data collection time. The width of each bar
represents the amount of data. Note that the training and testing data
were not collected on the same day.
To demonstrate the generalizability of our method, we
consider two challenging measures. First, the testing set was
collected a few days after the training set. Moreover, as shown
in Fig. 9, the collection time points of the testing set did
not coincide with those of the training set. Second, the data
were not static. The volunteers stopped and walked freely in
the various scenarios while collecting the data. Therefore, our
algorithm has strong adaptability to time and space.
B. CNN-Based Scenario Recognition Results
In this section, we utilize all constellations (GPS, BDS,
Galileo, GLONASS, and QZSS) and all measurements (po-
sition, pseudorange, Doppler shift, and C/N0) to determine
the optimal scenario recognition accuracy. The accuracy and
loss curves of the training and validation sets are shown in
Fig. 10.
0 200 400 600 800 1000 1200 1400 1600 1800 2000
0 . 2
0 . 3
0 . 4
0 . 5
0 . 6
0 . 7
0 . 8
0 . 9
1 . 0
A c c u r a c y
E p o c h s
T r a i n i n g A c c u r a c y
V a l i d a t i o n A c c u r a c y
0 . 0
0 . 5
1 . 0
1 . 5
2 . 0
2 . 5
3 . 0
T r a i n i n g L o s s
V a l i d a t i o n L o s s
L o s s
Fig. 10. Accuracy and loss of the CNN at different epochs.
Fig. 10 shows the accuracy and loss at different epochs.
As the epoch increases, the loss curve falls smoothly, which
means that the recognition results of the CNN model tend
to be consistent with the ground truth, and the model tends
to converge. Therefore, the accuracy is gradually improved.
This is why the accuracy and loss curves are almost sym-
metrical. This phenomenon indicates that our model is stable.
The training and validation curves are essentially consistent,
demonstrating that the model is not overfitted. The accuracy
of the validation set converges to 99.51% when the CNN was
AUTHOR et al.: PREPARATION OF PAPERS FOR IEEE TRANSACTIONS AND JOURNALS (DECEMBER 2022) 7
trained for approximately 100 epochs. However, since the loss
tends to decrease slowly, the training continued to improve
the generalizability of the model. After training, the testing
data were fed into the model, and the confusion matrix of the
testing set (Table I) was obtained.
TABLE I
CONFUSION MATRIX OF THE TESTING SET BASED ON CNN
Deep indoor Shallow indoor Semi-outdoor Open outdoor
Deep indoor 100.00% 0.00% 0.00% 0.00%
Shallow indoor 0.00% 95.74% 4.26% 0.00%
Semi-outdoor 0.00% 0.49% 99.51% 0.00%
Open outdoor 0.00% 0.00% 0.00% 100.00%
The recognition accuracy of the deep indoor and open
outdoor scenarios is 100%, which is consistent with the
expected results. The deep indoor scenario is an almost com-
pletely closed environment with very few visible satellites;
thus, channels in the images of this scenario are expected
to be nearly black. In contrast, the open outdoor scenario is
almost entirely unobstructed and can usually receive dozens
of satellite signals. Therefore, these two scenarios are easier
to identify than the other two scenarios. The recognition
accuracies of the shallow indoor and semi-outdoor scenarios
are 95.74% and 99.51%, respectively. The overall accuracy
of the CNN model reaches 98.82%, which is higher than
that of existing scenario recognition algorithms. It is worth
mentioning that the testing set was acquired several days after
the training set, and the smartphone was randomly moving
and stopping during data collection. Therefore, the proposed
algorithm is robust in both time and space.
C. Optimal Image Resolution
One of the key steps in our approach is to use the satellites’
elevation and azimuth to project them onto an image. The
resolution of this image has a great impact on the sce-
nario recognition accuracy. If its resolution is too low, the
Voronoi polygons cannot be clearly distinguished. In contrast,
if the resolution is too high, the calculation time increases
while the accuracy is not improved. Therefore, the use of a
smaller image resolution can substantially improve the real-
time performance of our algorithm while maintaining a high
recognition accuracy. Thus, we studied radii between 10 and
100 in steps of 10 pixels in our experiment. The experimental
results are shown in Fig. 11.
0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0
9 5
9 6
9 7
9 8
9 9
100
A c c u r a c y ( % )
R a d i u s ( p i x e l s )
Fig. 11. The impact of the image resolutions on the accuracy.
The highest recognition accuracy is obtained when r= 50.
If the radius is less than 50, the accuracy may be reduced
because the neighboring edges of the Voronoi polygons are
not clear, thereby decreasing the ability of the model to
characterize spatial information. If the radius is larger than
50, higher-resolution images require a larger perceptual field
to extract effective features; thus, the network depth must be
increased. To ensure that the algorithm can meet the real-time
requirements, increasing the network depth is not desirable.
Therefore, 100×100 (r= 50) is a relatively ideal image
resolution. The subsequent experiments are all performed
based on this resolution.
D. Contribution of Multi-Constellations
In this section, we continuously increase the number of
constellations to evaluate the contribution of each constellation
in the scenario recognition task. Fig. 12 shows the stacked
bar chart of this experiment, and Fig. 13 shows the confu-
sion matrices. As the number of constellations increases, the
recognition accuracy of individual scenarios and the overall
accuracy both increase. When only GPS is used, the accuracy
of shallow indoors is only 53.68% and that of semi-outdoors
is only 73.54%. These low accuracies occur because the
number of visible GPS satellites is relatively small, while more
signals are received from the BDS satellites. Therefore, the
recognition accuracy is greatly improved by introducing BDS.
Similarly, few Galileo, GLONASS and QZSS signals were
received. However, these signals still contribute to improving
the accuracy. Moreover, even when only GPS signals are used,
the accuracy of deep indoors remains high. The reason for
this result is that the 4-channel images of this scenario are
obviously distinct because few satellites are visible in the deep
indoor scenario.
Deep indoor
S h a ll o w i n d o o r
S e m i - o u td o o r
Open outdoor
0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0
A c c u r a c y (% )
G P S G P S / B D S G P S / B D S / G a l i l e o A L L
Fig. 12. Stacked bar chart for the contribution of multi-constellations.
All: GPS, BDS, Galileo, GLONASS, and QZSS.
The confusion matrices show that scenario recognition er-
rors occur in mainly shallow indoor and semi-outdoor scenar-
ios. The former scenario is easily misidentified as deep indoors
or semi-outdoors, while the latter is easily misidentified as
shallow indoors and open outdoors.
E. Contribution of Multi-Measurements
In addition to exploring the contribution of multiple mea-
surements in scenario recognition, we investigated the role
of individual measurements. We increased the number of
measurements one by one in the order of satellite projection,
pseudorange, Doppler shift, and C/N0 to explore their roles.
Fig. 14 shows the stacked bar chart of the recognition accuracy,
and Fig. 15 shows the confusion matrices. When only satellite
8 IEEE SENSORS JOURNAL, VOL. XX, NO. XX, XXXX 2022
Fig. 13. Confusion matrices for the contribution of multi-constellations.
Heatmaps are used to characterize the confusion matrices; the darker
the color is, the higher the accuracy.
projection images are used, the recognition accuracy is very
low, reaching only 39.42%. After pseudorange images are
added, the overall accuracy improves greatly, reaching 71.68%.
This is because, for the same satellite, diverse pseudoranges
are obtained when the smartphone is at different latitudes,
longitudes, or altitudes. The Doppler shift images also differ
because of the large difference in the relative velocity between
the smartphone and the satellite when the smartphone is at
different positions. Because the distribution of occlusions dif-
fers in each scenario, the signals emitted by the same satellite
are diverse, resulting in different C/N0 values. Therefore, the
accuracy tends to increase incrementally as Doppler shift and
the C/N0 images are considered.
Deep indoor
S h a l lo w i n d o o r
S e m i -o u t d o o r
Open outdoor
0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0
A c c u r a c y ( % )
P r o j P r o j / P R P r o j / P R / D o p p l e r A l l
Fig. 14. Stacked bar chart for the contribution of multi-measurements.
Proj: satellite projection, PR: pesudorange, Doppler: Doppler shift, All:
satellite projection, pesudorange, Doppler shift, and C/N0.
F. ConvLSTM-Based Scenario Recognition Results
Since CNNs do not consider correlations among consecu-
tive scenarios over time, we utilize the ConvLSTM network
for scenario recognition to ensure that spatial and temporal
information are both considered. Therefore, the GNSS mea-
surements are transformed into image sequences and fed into
Fig. 15. Confusion matrices for the contribution of multi-measurements.
Heatmaps are used to characterize confusion matrices; the darker the
color, the higher the accuracy.
a ConvLSTM network. The output of the last ConvLSTM cell
at the final time step is used as the recognition result.
0 100 200 300 400 500 600 700 800 900 1000
0 . 4
0 . 5
0 . 6
0 . 7
0 . 8
0 . 9
1 . 0
A c c u r a c y
E p o c h s
T r a i n i n g A c c u r a c y
V a l i d a t i o n A c c u r a c y
0 . 0
0 . 2
0 . 4
0 . 6
0 . 8
1 . 0
1 . 2
T r a i n i n g L o s s
V a l i d a t i o n L o s s
L o s s
Fig. 16. Accuracy and loss of ConvLSTM network at different epochs.
Fig. 16 shows the accuracy and loss curves of the training
and validation sets for the ConvLSTM network. Although
these curves exhibit smooth convergence trends, they are insuf-
ficient for proving the advantage of the ConvLSTM network
over the CNN. As shown in the confusion matrix (Table II), the
recognition accuracy of the ConvLSTM network for each sce-
nario is higher than that of the CNN. The accuracy improves
because the ConvLSTM network considers correlations among
consecutive scenarios over time.
Since the time step is a key factor affecting the performance
of the ConvLSTM network, its effect was analyzed. With the
time step Tand the current time t, the algorithm considers
the GNSS measurements from tT+1 to t. If the time step is
too short, the information before the current moment cannot
be fully utilized. If the time step is too long, some useless
information will also be used for scenario recognition. Both
cases will reduce the recognition accuracy, so it is necessary
AUTHOR et al.: PREPARATION OF PAPERS FOR IEEE TRANSACTIONS AND JOURNALS (DECEMBER 2022) 9
to find the most suitable time step Tthrough experiments. We
adjusted the time step from 1 to 10 and trained each model
separately. The overall recognition accuracy is shown in Fig.
17.
TABLE II
CONFUSION MATRIX OF THE TESTING SET BASED ON CONVLSTM
NETWORK.
Deep indoor Shallow indoor Semi-outdoor Open outdoor
Deep indoor 100.00% 0.00% 0.00% 0.00%
Shallow indoor 0.00% 100.00% 0.00% 0.00%
Semi-outdoor 0.00% 0.33% 99.67% 0.00%
Open outdoor 0.00% 0.00% 0.00% 100.00%
012345678910
9 9
100
A c c u r a c y ( % )
T i m e s t e p s
Fig. 17. The impact of time steps on accuracy.
The highest recognition accuracy of 99.92% is achieved
when the time step is 4, which is 1.1% higher than that
of the CNN model and previous work(Table III). The use
of more time steps requires a larger computational effort.
However, even with fewer time steps, the accuracy exceeds
99%. Nonetheless, the ConvLSTM network has an unavoid-
able drawback: scenario recognition cannot be performed in
the first T1sampling moments. One solution to this
issue is to use the CNN model for scenario recognition at
these moments. Although the recognition accuracy decreases
slightly, the model is least usable during these moments.
TABLE III
SCENARIO RECOGNITION ALGORITHMS BASED ON GNSS
MEASUREMENTS IN RECENT YEARS.
Authors Years Methods Accuracy
Chen and Tan [28] 2017 Threshold judgment 85.6%
Gao and Groves [29] 2018 HMM 88.2%
Lai et al. [30] 2021 SVM 90.3%
Zhu et al. [11] 2019 Stacked Machine Learning 97.2%
Xia et al. [12] 2020 HMM and LSTM 98.65%
Ours 2022 CNN 98.82%
Ours 2022 ConvLSTM 99.92%
G. Real-Time Ability
In this section, we investigate the computing time of the
CNN and ConvLSTM models (Fig. 18). The times for con-
structing the images and model inference are considered. We
first calculated the total computing time for the testing set of
approximately 2,400 data samples and determined the average
computing time per sample. It is worth mentioning that we
tested the real-time performance only on the CPU (Intel Xeon
Platinum 8160T) but not on the GPU. Nevertheless, our model
still takes only tens of milliseconds.
10 20 30 40 50 60 70 80 90 100
0
1 0
2 0
3 0
4 0
5 0
T im e ( m s )
R a d i u s
C N N
O p t i m a l t i m e s t e p
O p t i m a l r e s o l u t i o n
27.94
16.82
12345678910
C o n v L S T M
T im e s t e p s
Fig. 18. Real-time ability of the CNN and ConvLSTM models.
As rincreases, the number of pixels (2r×2r) grows
parabolically. The computing time of CNN also has the same
increasing trend. Hence the time complexity is O(r2). For a
ConvLSTM network of depth n, each additional time step T
means an increase of nConvLSTM cells (Fig. 6). Since the
computation time of each cell is fixed , the time complexity of
ConvLSTM is O(T). Therefore, we observe a linear growth
trend.
As shown in Fig. 18, when r= 50, the computation
time of the CNN is only 16.82ms. The computation time of
ConvLSTM is longer than that of CNN (the red broken line
is above the blue dashed line). Even so, it still takes only
27.94ms. If a delay of 27.94ms is unacceptable in practical
applications, the time step can be reduced to obtain a lower
delay, and the recognition accuracy can still be guaranteed to
be above 99%. Therefore, CNN and ConvLSTM models boh
have the potential to run in real time.
IV. CONCLUSION
Using smartphones for seamless indoor and outdoor nav-
igation and positioning usually requires switching between
different positioning techniques in different indoor and out-
door scenarios. Therefore, how to use built-in GNSS mea-
surements for efficient scenario recognition has become a
key issue. In this paper, we develop a deep learning-based
scenario recognition algorithm using GNSS measurements
from smartphones. It maps GNSS measurements to 4-channel
images with Voronoi tessellation. Then the images are fed
into a CNN to recognize four scenarios. Alternatively, con-
sidering correlations among consecutive scenarios over time,
the ConvLSTM model is introduced to further improve the
recognition accuracy. Experiments were designed to evaluate
the proposed algorithm in terms of recognition accuracy and
robustness. The results showed that CNN and ConvLSTM
algorithms achieved overall recognition accuracies of 98.82%
and 99.92%, respectively, which are higher than those of
existing algorithms. We also compared the computation time
of the two networks and analyzed their real-time performance.
The latencies of the CNN and ConvLSTM models on a CPU
were only 16.82ms and 27.94ms, respectively. In future work,
10 IEEE SENSORS JOURNAL, VOL. XX, NO. XX, XXXX 2022
hardware resources and energy consumption issues need to be
focused on. In addition, how to combine our algorithm with
navigation methods into a complete system requires further
research.
ACKNOWLEDGMENT
This work was supported by the Science and Technology
Planning Project of Guangdong Province of China (Grant
No. 2021A0505030030). The authors would like to thank the
developers of the Android application Geo++ RINEX Logger.
REFERENCES
[1] H. S. Maghdid, I. A. Lami, K. Z. Ghafoor, and J. Lloret, ”Seamless
outdoors-indoors localization solutions on smartphones: implementation
and challenges,” ACM Computing Surveys (CSUR), vol. 48, no. 4, pp.
1-34, 2016.
[2] A. T. Balaei, ”Statistical inference technique in pre-correlation in-
terference detection in GPS receivers, in Proceedings of the 19th
International Technical Meeting of the Satellite Division of The Institute
of Navigation (ION GNSS 2006), 2006, pp. 2232-2240.
[3] F. Bastide, E. Chatre, and C. Macabiau, ”GPS interference detection and
identification using multicorrelator receivers, in Proceedings of the 14th
International Technical Meeting of the Satellite Division of The Institute
of Navigation (ION GPS 2001), 2001, pp. 872-881.
[4] Y. Dong, T. Arslan and Y. Yang, ”Real-Time NLOS/LOS Identification
for Smartphone-Based Indoor Positioning Systems Using WiFi RTT and
RSS,” IEEE Sensors Journal, vol. 22, no. 6, pp. 5199-5209, 2022.
[5] S. Cao, X. Lu, and S. Shen, ”Gvins: Tightly coupled gnss–visual–inertial
fusion for smooth and consistent state estimation,” IEEE Transactions
on Robotics, 2022.
[6] P. D. Groves, Z. Jiang, L. Wang, and M. K. Ziebart, ”Intelligent urban
positioning using multi-constellation GNSS with 3D mapping and NLOS
signal detection,” in Proceedings of the 25th International Technical
Meeting of The Satellite Division of the Institute of Navigation (ION
GNSS 2012), 2012, pp. 458-472.
[7] F. Santi, F. Pieralice, and D. Pastina, “Joint detection and localization of
vessels at sea with a GNSS-based multistatic radar, IEEE Transactions
on Geoscience and Remote Sensing, vol. 57, no. 8, pp. 5894-5913, 2019.
[8] T. Li, H. Zhang, Z. Gao, Q. Chen, and X. Niu, ”High-accuracy
positioning in urban environments using single-frequency multi-GNSS
RTK/MEMS-IMU integration, Remote sensing, vol. 10, no. 2, p. 205,
2018.
[9] K. Xu, Y. Chen, T. A. Okhai et al., “Micro optical sensors based on
avalanching silicon light-emitting devices monolithically integrated on
chips,” Optical Materials Express, vol. 9, no. 10, pp. 3985-3997, 2019.
[10] W. Wang, Q. Chang, Q. Li, Z. Shi, and W. Chen, ”Indoor-outdoor
detection using a smart phone sensor, Sensors, vol. 16, no. 10, p. 1563,
2016.
[11] Y. Zhu et al., ”A fast indoor/outdoor transition detection algorithm based
on machine learning,” Sensors, vol. 19, no. 4, p. 786, 2019.
[12] Y. Xia et al., ”Recurrent neural network based scenario recognition with
multi-constellation GNSS measurements on a smartphone,” Measure-
ment, vol. 153, p. 107420, 2020.
[13] P. D. Groves, ”Shadow matching: A new GNSS positioning technique
for urban canyons,” The journal of Navigation, vol. 64, no. 3, pp. 417-
430, 2011.
[14] Z. Z. M. Kassas, J. Khalife, K. Shamaei, and J. Morales, ”I hear,
therefore I know where I am: Compensating for GNSS limitations with
cellular signals,” IEEE Signal Processing Magazine, vol. 34, no. 5, pp.
111-124, 2017.
[15] M. A. Caceres, F. Penna, H. Wymeersch, and R. Garello, ”Hybrid
cooperative positioning based on distributed belief propagation, IEEE
Journal on Selected Areas in Communications, vol. 29, no. 10, pp. 1948-
1958, 2011.
[16] C. Yang and H.-R. Shao, ”WiFi-based indoor positioning,” IEEE Com-
munications Magazine, vol. 53, no. 3, pp. 150-157, 2015.
[17] D. Yu and C. Li, ”An Accurate WiFi Indoor Positioning Algorithm for
Complex Pedestrian Environments, IEEE Sensors Journal, vol. 21, no.
21, pp. 24440-24452, 2021.
[18] J.-H. Huh and K. Seo, ”An indoor location-based control system using
bluetooth beacons for IoT systems,” Sensors, vol. 17, no. 12, p. 2917,
2017.
[19] J. Tiemann, F. Schweikowski, and C. Wietfeld, ”Design of an UWB
indoor-positioning system for UAV navigation in GNSS-denied envi-
ronments,” in 2015 international conference on indoor positioning and
indoor navigation (IPIN), 2015: IEEE, pp. 1-7.
[20] B. Yang, J. Li, Z. Shao and H. Zhang, ”Robust UWB Indoor Localiza-
tion for NLOS Scenes via Learning Spatial-Temporal Features, IEEE
Sensors Journal, vol. 22, no. 8, pp. 7990-8000, 2022.
[21] W. Gu, M. Aminikashani, P. Deng, and M. Kavehrad, ”Impact of mul-
tipath reflections on the performance of indoor visible light positioning
systems,” Journal of Lightwave Technology, vol. 34, no. 10, pp. 2578-
2587, 2016.
[22] I. Tang and T. P. Breckon, ”Automatic road environment classification,”
IEEE Transactions on Intelligent Transportation Systems, vol. 12, no. 2,
pp. 476-484, 2010.
[23] A. Bosch, A. Zisserman, and X. Munoz, ”Scene classification using
a hybrid generative/discriminative approach, IEEE transactions on
pattern analysis and machine intelligence, vol. 30, no. 4, pp. 712-727,
2008.
[24] L. Fei-Fei and P. Perona, ”A bayesian hierarchical model for learning
natural scene categories,” in 2005 IEEE Computer Society Conference
on Computer Vision and Pattern Recognition (CVPR’05), 2005, vol. 2:
IEEE, pp. 524-531.
[25] M. Ali, T. ElBatt, and M. Youssef, ”SenseIO: Realistic ubiquitous indoor
outdoor detection system using smartphones,” IEEE Sensors Journal,
vol. 18, no. 9, pp. 3684-3693, 2018.
[26] P. Zhou, Y. Zheng, Z. Li, M. Li, and G. Shen, ”Iodetector: A generic
service for indoor outdoor detection,” in Proceedings of the 10th acm
conference on embedded network sensor systems, 2012, pp. 113-126.
[27] S. Li et al., ”A lightweight and aggregated system for indoor/outdoor
detection using smart devices,” Future Generation Computer Systems,
vol. 107, pp. 988-997, 2020.
[28] K. Chen and G. Tan, ”SatProbe: Low-energy and fast indoor/outdoor
detection based on raw GPS processing,” in IEEE INFOCOM 2017-
IEEE Conference on Computer Communications, 2017: IEEE, pp. 1-9.
[29] H. Gao, and P. D. Groves, “Environmental context detection for adaptive
navigation using GNSS measurements from a smartphone, Navigation:
Journal of the Institute of Navigation, vol. 65, no. 1, pp. 99-116, 2018.
[30] Q. Lai et al., ”Research on GNSS/INS combined positioning method in
urban environment based on scenario detection,”. Navigation Positioning
and Timing, vol. 8, no. 1, pp. 151-162, 2021.
[31] K. Fukami, R. Maulik, N. Ramachandra, K. Fukagata, and K.
Taira, ”Global field reconstruction from sparse sensors with Voronoi
tessellation-assisted deep learning,” Nature Machine Intelligence, vol. 3,
no. 11, pp. 945-951, 2021.
[32] F. Aurenhammer, ”Voronoi diagrams—a survey of a fundamental geo-
metric data structure,” ACM Computing Surveys (CSUR), vol. 23, no. 3,
pp. 345-405, 1991.
[33] Y. LeCun et al., ”Handwritten digit recognition with a back-propagation
network,” Advances in neural information processing systems, vol. 2,
1989.
[34] X. Shi et al., “Convolutional LSTM network: A machine learning
approach for precipitation nowcasting, Advances in neural information
processing systems, vol. 28, 2015.
[35] S. Hochreiter and J. Schmidhuber, ”Long Short-Term Memory,” Neural
Computation, vol. 9, no. 8, pp. 1735-1780, 1997.
[36] K. Xu, “Silicon electro-optic micro-modulator fabricated in standard
CMOS technology as components for all silicon monolithic integrated
optoelectronic systems,” Journal of Micromechanics and Microengineer-
ing, vol. 31, no. 5, pp. 054001, 2021.
Zhiqiang Dai received his Ph.D. degree from
Wuhan University, China. He is currently an as-
sistant professor at the School of Electronics
and Communication Engineering, Sun Yat-Sen
University. He is mainly engaged in the theory
of GNSS/SBAS precise data processing, real-
time PPP, multi-sensor navigation data fusion,
and algorithm and software development.
AUTHOR et al.: PREPARATION OF PAPERS FOR IEEE TRANSACTIONS AND JOURNALS (DECEMBER 2022) 11
Chunlei Zhai received his B.S. degree in elec-
tronic information science and technology from
Sun Yat-Sen University, China, in 2020. He
is currently pursuing an M.S. degree in elec-
tronic information at the School of Electron-
ics and Communication Engineering, Sun Yat-
Sen University, China. His research interests
include intelligent navigation and vision-based
autonomous navigation of UAVs.
Fang Li received her B.S. degree from Central
South University in 2020, and her M.S. degree
from Sun Yat-Sen University in 2022. She is
currently pursuing a Ph.D. degree in communi-
cation engineering at the School of Electronics
and Communication Engineering, Sun Yat-Sen
University, China. Her current research interests
include reliable navigation in complex scenarios
and multi-source fusion navigation.
Weixiang Chen received his B.S degree from
the School of Electronic and Communication
Engineering, Sun Yat-Sen University, China in
2021. He is currently pursing an M.S. degree in
the Department of Electronic Engineering from
Tsinghua University, China. His research inter-
ests include pseudolite positioning, muti-sensors
information fusion and SLAM.
Xiangwei Zhu received his Ph.D. degree from
the National University of Defense Technology,
China. He is currently a professor at the School
of Electronics and Communication Engineering,
Sun Yat-Sen University. His current research
interests include global navigation satellite sys-
tem, time synchronization, intelligent signal pro-
cessing and instrument design.
Yanming Feng received his Ph.D. degree in
satellite geodesy and spatial science from
the Wuhan Technical University of Surveying
and Mapping (Wuhan University since 2000),
Wuhan, China. He is currently a professor in
data science and navigation with the School
of Computer Science, Queensland University of
Technology, Brisbane, Australia. His active re-
search areas include global navigation satellite
system (GNSS) algorithms, geodetic data an-
alytics, satellite orbit determination and space
debris monitoring, the Internet of Things, precise positioning and de-
formation monitoring, vehicular networks and communications, and
machine learning applications. He has published articles in journals
on topics, such as geodesy, navigation, aerospace, sensors, remote
sensing, vehicular networks, and intelligent transport systems.
... For specific navigation applications, such as autonomous driving, a dedicated context categorization framework should be proposed based on the navigation requirements and the characteristics of different contexts. To the best of our knowledge, the existing literature (Dai et al., 2022;Xia et al., 2020;Y. Zhu et al., 2020) mainly divides the environments under consideration into four categories or less, namely deep indoor, shallow indoor, semi-outdoor, and open-sky. ...
... Existing classification models mainly include fuzzy inference (Zadeh, 1996), support vector machine (SVM) (Suthaharan, 2016), and long-short term memory (LSTM) (Sherstinsky, 2020). With the development of big data technology, large-scale parallel computing, and the popularity of graphics processing unit (GPU) devices, deep learning has emerged as a promising field, with algorithms such as Convolutional Neural Networks (CNN, Dai et al., 2022), Transformers (Vaswani et al., 2017) and Gated Recurrent Unit (GRU, Chung et al., 2014). ...
... Moreover, most smartphones' GNSS modules have a sampling rate of only 1Hz, which hampers the responsiveness to scenario transition. More recently, Dai et al. (2022) proposed a grid-based recognition approach that utilizes GNSS measurements such as pseudorange, Doppler shift, and C/N0. They represented the GNSS measurements with Voronoi diagrams and fed them into CNN networks, and achieved an accuracy of 99.92%. ...
Conference Paper
Full-text available
Recent years, people have put forward higher and higher requirements for context-adaptive navigation (CAN). CAN system realizes seamless navigation in complex environments by recognizing the ambient surroundings of vehicles, and it is crucial to develop a fast, reliable, and robust navigational context recognition (NCR) method to enable CAN systems to operate effectively. Environmental context recognition based on Global Navigation Satellite System (GNSS) measurements has attracted widespread attention due to its low cost because it does not require additional infrastructure. The performance and application value of NCR methods depend on three main factors: context categorization, feature extraction, and classification models. In this paper, a finegrained context categorization framework comprising seven environment categories (open sky, tree-lined avenue, semi-outdoor, urban canyon, viaduct-down, shallow indoor, and deep indoor) is proposed, which currently represents the most elaborate context categorization framework known in this research domain. To improve discrimination between categories, a new feature called the C/N0-weighted azimuth distribution factor, is designed. Then, to ensure real-time performance, a lightweight gated recurrent unit (GRU) network is adopted for its excellent sequence data processing capabilities. A dataset containing 59,996 samples is created and made publicly available to researchers in the NCR community on Github. Extensive experiments have been conducted on the dataset, and the results show that the proposed method achieves an overall recognition accuracy of 99.41% for isolated scenarios and 94.95% for transition scenarios, with an average transition delay of 2.14 seconds.
... Demonstrating the algorithm's broad applicability, datasets for both training and testing were gathered from various cities, achieving an overall recognition accuracy of 89.3 % across diverse environments. In [124], the authors improve scenario recognition for mobile applications by classifying environments into four categories and using a Hidden Markov Model and an RNN. The RNN method effectively handles scenario transitions and environmental changes, achieving an overall accuracy of 98.65 % and a transition recognition accuracy of 90.94 %, with minimal transition delay. ...
Preprint
Full-text available
Global Navigation Satellite Systems (GNSS)-based positioning plays a crucial role in various applications, including navigation, transportation, logistics, mapping, and emergency services. Traditional GNSS positioning methods are model-based and they utilize satellite geometry and the known properties of satellite signals. However, model-based methods have limitations in challenging environments and often lack adaptability to uncertain noise models. This paper highlights recent advances in Machine Learning (ML) and its potential to address these limitations. It covers a broad range of ML methods, including supervised learning, unsupervised learning, deep learning, and hybrid approaches. The survey provides insights into positioning applications related to GNSS such as signal analysis, anomaly detection, multi-sensor integration, prediction, and accuracy enhancement using ML. It discusses the strengths, limitations, and challenges of current ML-based approaches for GNSS positioning, providing a comprehensive overview of the field.
Conference Paper
The interest in machine learning (ML) research and its potential applications in many fields has also led to several studies on its use in Indian Regional Navigation Satellite Systems (IRNSS). The traditional IRNSS survey algorithms and models are further developed to improve their reliability and efficiency using machine learning techniques. We review how ML can improve the efficiency and usability of IRNSS, and also discuss areas of IRNSS where ML algorithms have been applied. Potential areas of IRNSS where ML can be applied to improve efficiency, accuracy, and robustness are explored, providing fertile ground for new research. The results show reasonable performance of the machine learning techniques for several IRNSS applications. However, the use of ML models in industry is still limited. In addition, we discuss the application areas, challenges, risks, and futures of using ML techniques in IRNSS.
Article
Sidewalk-level positions are required for a growing number of pedestrian applications. However, in urban canyons, buildings along both sides of the street severely obstruct Global Navigation Satellite System (GNSS) signals, and the lack of redundant fault-free measurements leads to the poor accuracy in the cross-street direction, posing challenges in determining the side of the street solely based on GNSS positions. While 3D building models have been utilized to improve position accuracy, particularly in the cross-street direction, techniques relying on these models face issues such as position ambiguity, high computational load, and low accuracy in the along-street direction. In this study, we aim to develop a novel intelligent urban positioning system using smartphone sensors and pedestrian network. An algorithm is proposed to determine the side of the street by analyzing which half of the sky most of LOS signals are observed. The additional virtual measurement derived from the sidewalk is combined with real measurements to solve GNSS position. It can achieve sidewalk-level positioning since the redundancy in the cross-street direction is significantly improved. The proposed system offers several advantages including elimination of the need for LOS/NLOS signal identification for each satellite and elimination of the need for 3D building models. Extensive datasets were utilized to train the classification model and evaluate the system’s performance. The results demonstrate a correct identification rate of better than 96% using single epoch GNSS observations. More importantly, the proposed positioning system achieves the accuracy of better than 5 meter in urban canyons.
Article
Full-text available
The automated inspection and mapping of engineering structures are mainly based on photogrammetry and laser scanning. Mobile robotic platforms like unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs), but also handheld platforms, allow efficient automated mapping. Engineering structures like bridges shadow global navigation satellite system (GNSS), which complicates precise localization. Simultaneous localization and mapping (SLAM) algorithms offer a sufficient solution, since they do not require GNSS. However, testing and comparing SLAM algorithms in GNSS-denied areas is difficult due to missing ground truth data. This work presents an approach to measuring the performance of SLAM in indoor and outdoor GNSS-denied areas using a terrestrial scanner Leica RTC360 and a tachymeter to acquire point cloud and trajectory information. The proposed method is independent of time synchronization between robot and tachymeter and also works on sparse SLAM point clouds. For the evaluation of the proposed method, three LiDAR-based SLAM algorithms called KISS-ICP, SC-LIO-SAM, and MA-LIO are tested using a UGV equipped with two light detection and ranging (LiDAR) sensors and an inertial measurement unit (IMU). KISS-ICP is based solely on a single LiDAR scanner and SC-LIO-SAM also uses an IMU. MA-LIO, which allows multiple (different) LiDAR sensors, is tested on a horizontal and vertical one and an IMU. Time synchronization between the tachymeter and SLAM data during post-processing allows calculating the root mean square (RMS) absolute trajectory error, mean relative trajectory error, and the mean point cloud to reference point cloud distance. It shows that the proposed method is an efficient approach to measure the performance of SLAM in GNSS-denied areas. Additionally, the method shows the superior performance of MA-LIO in four of six test tracks with 5 to 7 cm RMS trajectory error, followed by SC-LIO-SAM and KISS-ICP in last place. SC-LIO-SAM reaches the lowest point cloud to reference point cloud distance in four of six test tracks, with 4 to 12 cm.
Article
Full-text available
Achieving accurate and robust global situational awareness of a complex time-evolving field from a limited number of sensors has been a long-standing challenge. This reconstruction problem is especially difficult when sensors are sparsely positioned in a seemingly random or unorganized manner, which is often encountered in a range of scientific and engineering problems. Moreover, these sensors could be in motion and could become online or offline over time. The key leverage in addressing this scientific issue is the wealth of data accumulated from the sensors. As a solution to this problem, we propose a data-driven spatial field recovery technique founded on a structured grid-based deep-learning approach for arbitrary positioned sensors of any numbers. It should be noted that naive use of machine learning becomes prohibitively expensive for global field reconstruction and is furthermore not adaptable to an arbitrary number of sensors. In this work, we consider the use of Voronoi tessellation to obtain a structured-grid representation from sensor locations, enabling the computationally tractable use of convolutional neural networks. One of the central features of our method is its compatibility with deep learning-based super-resolution reconstruction techniques for structured sensor data that are established for image processing. The proposed reconstruction technique is demonstrated for unsteady wake flow, geophysical data and three-dimensional turbulence. The current framework is able to handle an arbitrary number of moving sensors and thereby overcomes a major limitation with existing reconstruction methods. Our technique opens a new pathway toward the practical use of neural networks for real-time global field estimation.
Article
Full-text available
Silicon avalanche light-emitting devices (Si Av LEDs) offer various possibilities for realizing micro- and even nano- optical biosensors directly on chip. The light-emitting devices (LEDs) operate in the wavelength range of about 450-850nm, and their optical power emitted is of the order of a few hundreds of nW/µm². These LEDs could be fabricated in micro- and nano- dimensions by using modern semiconductor fabrication processing technologies through the mainstream of silicon material. Through a series of experiments, the dispersion phenomena in the Si Av LED are observed. Also, its light emission point was proved to locate at about one micron just below the silicon-silicon oxide interface. Subsequently, a micro-fluidic channel sensor was designed by using the dispersion characteristics owned by the Si Av LED. The analytes flowing through a micro-fluidic channel could be studied by their specific transmittance and absorption spectra. Moreover, simulations verify that a novel designed waveguide-based sensor could be fabricated on chip between the Si optical source and the Si P-I-N detector.
Article
Full-text available
The widespread popularity of smartphones makes it possible to provide Location-Based Services (LBS) in a variety of complex scenarios. The location and contextual status, especially the Indoor/Outdoor switching, provides a direct indicator for seamless indoor and outdoor positioning and navigation. It is challenging to quickly detect indoor and outdoor transitions with high confidence due to a variety of signal variations in complex scenarios and the similarity of indoor and outdoor signal sources in the IO transition regions. In this paper, we consider the challenge of switching quickly in IO transition regions with high detection accuracy in complex scenarios. Towards this end, we analyze and extract spatial geometry distribution, time sequence and statistical features under different sliding windows from GNSS measurements in Android smartphones and present a novel IO detection method employing an ensemble model based on stacking and filtering the detection result by Hidden Markov Model. We evaluated our algorithm on four datasets. The results showed that our proposed algorithm was capable of identifying IO state with 99.11% accuracy in indoor and outdoor environment where we have collected data and 97.02% accuracy in new indoor and outdoor scenarios. Furthermore, in the scenario of indoor and outdoor transition where we have collected data, the recognition accuracy reaches 94.53% and the probability of switching delay within 3 s exceeds 80%. In the new scenario, the recognition accuracy reaches 92.80% and the probability of switching delay within 4 s exceeds 80%.
Article
Ultra-wide band (UWB) localization system suffers from deteriorating performance in complex scenes, especially in non-line-of-sight (NLOS) conditions. In order to improve the accuracy and robustness of the localization system in NLOS environments, we propose an end-to-end deep neural network with both distance and received signal strength (RSS) measurements. On one hand, high-level spatial-temporal features can be learned through the proposed network from both RSS and distance data, which benefits the localization performance. On the other hand, the proposed network is robust to the variance in the number of available anchors, leading to high adaptability to different scenes. Specifically, three modules are designed in the deep network: 1) a module based on convolutional neural network (CNN) is presented to extract the local spatial features from the input data, and the structure of this module lends itself to varying-dimension input. 2) To manage the correlations between consecutive frames, we develop a deep long short-term memory (LSTM) model to extract temporal features and provide a high-level representation for a series of input data. 3) Finally, the fully-connected layers are utilized to estimate the 3D positions of the UWB tag. We conduct extensive experiments in three real-world scenarios to evaluate the proposed deep network. The experimental results indicate that the proposed network can significantly improve the accuracy and robustness of the UWB localization results, especially in NLOS situations.
Article
Visual–inertial odometry (VIO) is known to suffer from drifting, especially over long-term runs. In this article, we present GVINS, a nonlinear optimization-based system that tightly fuses global navigation satellite system (GNSS) raw measurements with visual and inertial information for real-time and drift-free state estimation. Our system aims to provide accurate global six-degree-of-freedom estimation under complex indoor–outdoor environments, where GNSS signals may be intermittent or even inaccessible. To establish the connection between global measurements and local states, a coarse-to-fine initialization procedure is proposed to efficiently calibrate the transformation online and initialize GNSS states from only a short window of measurements. The GNSS code pseudorange and Doppler shift measurements, along with visual and inertial information, are then modeled and used to constrain the system states in a factor graph framework. For complex and GNSS-unfriendly areas, the degenerate cases are discussed and carefully handled to ensure robustness. Thanks to the tightly coupled multisensor approach and system design, our system fully exploits the merits of three types of sensors and is able to seamlessly cope with the transition between indoor and outdoor environments, where satellites are lost and reacquired. We extensively evaluate the proposed system by both simulation and real-world experiments, and the results demonstrate that our system substantially suppresses the drift of the VIO and preserves the local accuracy in spite of noisy GNSS measurements. The versatility and robustness of the system are verified on large-scale data collected in challenging environments. In addition, experiments show that our system can still benefit from the presence of only one satellite, whereas at least four satellites are required for its conventional GNSS counterparts.
Article
The accuracy of smartphone-based positioning systems using WiFi usually suffers from ranging errors caused by non-line-of-sight (NLOS) conditions. Previous research usually exploits several distribution features from a long time series (hundreds of samples) of WiFi received signal strength (RSS) or WiFi round-trip time (RTT) to achieve a high identification accuracy. However, the long time series or large sample size attributes to high power and time consumption in data collection for both training and testing. This will also undoubtedly be detrimental to user experience as the waiting time for getting enough samples is quite long. Therefore, this paper proposes three new real-time NLOS/LOS identification methods for smartphone-based indoor positioning systems using WiFi RSS and RTT distance measurement (RDM). Based on our extensive analysis of RSS and RDM dispersion features, three machine learning algorithms were chosen and developed to separate the samples for NLOS/LOS conditions. Experiments show that our best method achieves a discrimination accuracy of over 96% with a sample size of 10. Considering the theoretically shortest WiFi ranging interval of 100ms of the RTT-enabled smartphones, our algorithm is able to provide the shortest latency of 1s to get the testing result among all of the state-of-art methods.
Article
This paper proposes a precise WiFi fingerprinting indoor positioning algorithm for complex pedestrian environments. We transform the disturbed received signal strength (RSS) from the original space to latent space using the improved probabilistic linear discriminant analysis (PLDA). In the latent space, Bayes rule is used to calculate the posterior probability of the similarity between the test point and the reference points, and the ${K}$ reference points with the highest posterior probability are weighted to estimate the position. Actual on-site experiments involving three floors demonstrate that the mean localization error of the proposed algorithm is 1.38 m, which outperforms the Horus algorithm by 29% under the same test conditions. In addition, by studying the variability of mean value of RSS in different pedestrian environments, the fingerprint maps in different states of personnel movement are simulated. By using which, the average localization error of the proposed algorithm increases slightly to 1.63m, while the workload required during the offline training phase is significantly reduced.
Article
In this paper, optoelectronic characteristics and related switching behavior of one monolithically integrated silicon light-emitting device (LED) with an interesting wavelength range of 400–900 nm are studied. Through the comparison of two types of geometry, Si avalanche-based LED and Si field-effect LED (Si FE LED), in the same device, we establish the dimensional dependence of the switching speed of the LED. Almost-linear modulation curve implies lower distortion is shown for the Si FE LED with light emission enhancement, and technology computer aided design (TCAD) simulations are in line with the experimental results. Our findings indicate that ON–OFF keying up to GHz frequencies should be feasible with such diodes. Potential applications should include Si FE LED integrated into the micro-photonic systems.
Article
As an upper layer context-aware mobile application, fast and accurate scenario recognition is essential for seamless indoor and outdoor localization and robust positioning in complex environments. With the popularity of multi-constellation smartphones, scenario recognition based on smartphone Global Navigation Satellite System (GNSS) measurements becomes desirable. In this paper, we divide the complex environments into four categories (deep indoor, shallow indoor, semi-outdoor and open outdoor) and conduct research work in two areas. Firstly, we analyze in detail the influence of multi-constellation satellite signals on scenario recognition performance based on a Hidden Markov Model (HMM) algorithm. The experimental results show that the scenario recognition accuracy is improved significantly with the increase of the number of constellations received by smartphones. Secondly, in order to solve the description degradation of the traditional model caused by scenario transitions and environmental changes around the scenario, we propose a new scenario recognition method based on Recurrent Neural Network (RNN). Considering the computational complexity and the availability of feature values, we utilize the position-independent features as the input of the RNN model, and then evaluate the performance of the model using test sets from the new places. The results indicate that our proposed algorithm has high recognition accuracy in both isolated scenarios and transition regions, with the overall accuracy of 98.65%. Especially in the scenario transitions, the recognition accuracy reaches 90.94% and in the three times of correct recognition for scenario transitions (four times in total), the maximum transition delay is only 3 s.
Article
This paper addresses the exploitation of global navigation satellite systems as opportunistic sources for the joint detection and localization of vessels at sea in a passive multistatic radar system. A single receiver mounted on a proper platform (e.g., a moored buoy) can collect the signals emitted by multiple navigation satellites and reflected from ship targets of interest. This paper puts forward a single-stage approach to jointly detect and localize the ship targets by making use of long integration times (tens of seconds) and properly exploiting the spatial diversity offered by such a configuration. A proper strategy is defined to form a long-time and multistatic range and Doppler (RD) map, where the total target power can be reinforced with respect to, in turn, the case in which the RD map is obtained over a short dwell and the case in which a single transmitter is employed. The exploitation of both the long integration time and the multiple transmitters can greatly enhance the performance of the system, allowing counteracting the low-power budget provided by the considered sources representing the main bottleneck of this technology. Moreover, the proposed single-stage approach can reach superior detection performance than a conventional two-stage process where peripheral decisions are taken at each bistatic link and subsequently the localization is achieved by multilateration methods. Theoretical and simulated performance analysis is proposed and also validated by means of experimental results considering Galileo transmitters and different types of targets of opportunity in different scenarios. Obtained results prove the effectiveness of the proposed method to provide detection and localization of ship targets of interest.