Available via license: CC BY 4.0
Content may be subject to copyright.
Page 1/28
Machine learning-based multipath modelling in
spatial-domain: a demonstration on GNSS short
baseline processing
Yuanxin Pan ( yxpan@ethz.ch )
ETH Zurich
Gregor Möller
ETH Zurich
Benedikt Soja
ETH Zurich
Research Article
Keywords: GNSS, Multipath, Spatial-domain, Machine learning, XGBoost
Posted Date: February 9th, 2023
DOI: https://doi.org/10.21203/rs.3.rs-2555284/v1
License: This work is licensed under a Creative Commons Attribution 4.0 International License.
Read Full License
Page 2/28
Abstract
Multipath is the main unmodeled error source hindering high-precision Global Navigation Satellite System
(GNSS) data processing. Conventional multipath mitigation methods, such as sidereal ltering (SF) and
multipath hemispherical map (MHM), have certain disadvantages: they are either too complicated for
implementation or not effective enough for multipath mitigation. In this study, we propose a machine
learning (ML)-based multipath mitigation method. Multipath modelling was formulated as a regression
task, and the multipath errors were tted with respect to azimuth and elevation in the spatial-domain. We
collected 30 days of 1 Hz GPS data to validate the proposed method. In total ve short baselines were
formed and multipath errors were extracted from the post residuals. ML-based multipath models, as well
as observation-domain SF and MHM models, were constructed using 5 days of residuals before the
target day and later applied for multipath correction. It was found that the XGBoost (XGB) method
outperformed SF and MHM. It achieved the highest residual reduction rates, which were 24.9%, 36.2%,
25.5% and 20.4% for GPS P1, P2, L1 and L2 observations, respectively. After applying the XGB-based
multipath corrections, kinematic positioning precisions of 1.6 mm, 1.9 mm and 4.5 mm could be
achieved in east, north and up components, respectively, corresponding to 20.0%, 17.4% and 16.7%
improvements compared to the original solutions. The effectiveness of the ML-based multipath model
was further validated using 30 s sampling data. We conclude that the ML-based multipath mitigation
method is effective, easy-to-use, and can be easily extended by adding auxiliary input features, such as
signal-to-noise ratio (SNR), during model training.
1 Introduction
Global Navigation Satellite System (GNSS) has already become an essential part of our daily life and a
crucial part of the geodetic infrastructure (Rebischung et al., 2016). With the renement of error correction
models and the improvement of precise products provided by the International GNSS Service (IGS),
GNSS-based positioning precision can reach mm-level in static mode and cm-level in kinematic mode
(Bock et al., 2004; Kouba, 2015; Choy et al., 2016). However, multipath still remains the main unmodelled
error source due to its nonlinear nature. It degrades the contribution of GNSS to applications demanding
high precision, such as earthquake early warning (Larson, 2009).
Multipath is the effect of simultaneous reception of direct and reected GNSS signals. It is almost
inevitable due to the nondirectional nature of GNSS antennas. Apart from choosing a less reective
environment, hardware- and software-based measures are usually adopted to reduce multipath. The
hardware-based methods can be divided into antenna design and receiver improvement, such as choke
ring and narrow-band correlation (Van Dierendonck et al., 1992; McGraw et al., 2004). However, they can
only reduce part of the multipath error (Park et al., 2004). The software-based approaches include various
ltering methods utilizing the frequency signature of multipath (Satirapod and Rizos, 2005). However, it is
hard to apply such ltering when the multipath frequency range overlaps with that of signals of interest.
Signal-to-noise ratio (SNR) measured by GNSS receivers can be used for multipath characterization or
observation weighting (Bilich et al., 2008; Su et al., 2021). However, its eciency is dependent on the SNR
Page 3/28
data quality and antenna gain pattern. Sidereal ltering (SF) is a widely used method to mitigate
multipath for high-precision GNSS data processing (Genrich and Bock, 1992; Bock et al., 2004). The idea
is that the geometric relation between the Global Positioning System (GPS) constellation and a static
station will repeat every sidereal day, and the positioning error induced by multipath will also repeat after
the same period. Hence, the coordinate time series of previous days, with proper time shifts, can be used
to correct the multipath for the target day. The key to implement SF is to calculate the correct orbit repeat
period for each satellite, since the actual orbit repeat period of GPS is not exactly one sidereal day and
even varies with different satellites (Choi et al., 2004; Agnew and Larson, 2006). When it comes to multi-
GNSS, the case is more complicated and SF can no longer be applied in the coordinate-domain. In order
to solve this problem and use the individual orbit repeat period for each satellite, observation-domain SF
was rst proposed by Zhong et al. (2010) for baseline processing and was also successfully applied to
precise point positioning (PPP) and multi-GNSS processing (Atkins and Ziebart, 2015; Ye et al., 2014;
Geng et al., 2018). It extracts multipath corrections from postt residuals of previous days and applies
them to the observations of each satellite on the target day after shifting the corrections by individual
orbit repeat periods. Although SF can effectively mitigate multipath errors, it is cumbersome to implement
due to the different orbit repeat periods of GNSS satellites, and it is less effective for observations of low
sampling rate.
The spatiotemporal repeatability of multipath can also be modelled in the spatial domain. It is based on
the fact that multipath errors mainly depend on satellite positions in a skyplot, and thus a multipath
correction model can be established with respect to azimuth and elevation angles in a topocentric
coordinate system. Cohen and Parkinson (1991) proposed a multipath lookup table to model the
reective environment around the station. Fuhrmann et al. (2014) used congruent grids with similar
shapes and sizes to generate multipath maps. Dong et al. (2015) named this kind of spatial domain-
based multipath model as multipath hemispherical map (MHM) and compared its performance with SF
using 1 Hz GPS data from a dual-antenna receiver. It was concluded that similar multipath mitigation
performance could be achieved with both methods but MHM was less effective for high-frequency
multipath. However, MHM is satellite independent and is easy to implement and use (Zheng et al., 2019;
Lu et al., 2021). Wang et al. (2019) modied the MHM method by introducing a set of trend surface
coecients for each grid to capture the multipath variation within a grid. It was found that the modied
MHM method achieved about 5% more residual reduction rate than MHM, but it complicated the
application of the original MHM method.
Over the last decade, articial intelligence, especially machine learning (ML), has become more and more
prominent in geosciences (Li et al., 2011; Beroza et al., 2021; Crocetti et al., 2021; Aichinger et al., 2022).
Such data-driven algorithms are suitable for solving nonlinear problems, including classication and
regression tasks. ML algorithms have already been applied to GNSS multipath and non-line-of-sight
(NLOS) signal classication. It was shown that 75%-90% classication accuracy could be achieved with a
support vector machine (SVM) when appropriate input features were used (Hsu, 2017; Xu et al., 2020).
Suzuki et al. (2020) trained a convolutional neural network (CNN) to detect NLOS signals based on the
output of multiple GNSS signal correlators of a software-dened receiver and reported a 98%
Page 4/28
classication accuracy. Li et al. (2022) demonstrated the advantage of deep neural network (DNN)-based
signal correlation schemes in a receiver tracking loop over standard correlation schemes regarding
multipath mitigation. Tao et al. (2020, 2021) used neural networks to mine the multipath features in
coordinate and frequency domains, respectively, and reported better multipath mitigation performance
than conventional methods. However, currently there is no research that studies the possibility of
multipath modelling in the spatial-domain with ML.
The focus of this paper is to investigate the potential of ML algorithms on multipath modelling in the
spatial-domain. We formulate multipath modelling as a regression task for ML algorithms. The multipath
errors are tted with respect to azimuth and elevation angles in the skyplot. The benet of SNR
measurements for multipath modelling is also examined. Three widely used ML methods, i.e., random
forest (RF), extreme gradient boosting (XGB) and multilayer perceptron (MLP), are tested regarding
multipath mitigation for short baselines. The remainder of this paper is organized as follows: principles
of multipath modelling are introduced in Section 2. The data used in this study is described in Section 3.
The ML-based multipath mitigation results are displayed and discussed in Section 4. Conclusions and
outlooks are given in Section 5.
2 Multipath Modelling
Multipath is the composite of direct and reected GNSS signals, and it cannot be modelled thoroughly
due to its nonlinearity. But under the assumption of specular reection, the multipath errors of
pseudorange and carrier phase can be modelled as (Bilich et al., 2007):
1
where and are multipath errors of pseudorange and carrier phase, respectively. We denote the
reection coecient as , which is the amplitude ratio between the reected signal and the direct signal.
The geometric path delay is denoted as , and is the phase offset of the reected signal, which is
caused by the extra path delay and phase shift due to the reection.
The SNR measured by a receiver is a useful indicator of multipath errors. It contains the reection
information of the environment:
2
⎧
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎩
PMP
=
Φ
MP
= tan−1
α
⋅
δ
⋅ cos
φ
1 +
α
cos
φ
α
sin
φ
1 +
α
cos
φ
PMP
Φ
MP αδ φ
SNR
2=
A
2
d
+
A
2
m
+ 2
AdAm
cos
φ
Page 5/28
where and are the amplitudes of direct and reected GNSS signals. The symbol has the same
meaning as in Eq.(1). It can be noted that is the common underlying parameter for multipath and SNR,
which means that SNR measurements could be benecial for multipath modelling.
Multipath extraction
GPS data is processed in baseline mode to extract multipath errors. The mathematical models for short
baseline processing can be formulated as follows:
5
where is the double-differencing operator, and are pseudorange and carrier phase
measurements on frequency , respectively, is the geometric distance between the receiver and the
satellite, is the signal wavelength, and is the integer ambiguity. Pseudorange and carrier phase
multipath errors are denoted as and , respectively. The observation noises of pseudorange and
carrier phase are denoted as and , respectively. Receiver and satellite clocks are eliminated by
double-differencing. Troposphere and ionosphere delays can be neglected in short baseline processing.
Earth tides, phase windup and satellite antenna PCO/PCV (Phase Center Offset/Variation) are not
necessary to be corrected for short baselines. Receiver antenna PCO/PCV can also be neglected if the
same type of antenna is used.
The baseline data is usually processed in static mode to squeeze all the multipath errors into posterior
residuals. Then these double-differenced (DD) residuals are converted to single-differenced (SD) ones by
adding a zero-mean hypothesis for each epoch (Alber et al., 2000). Otherwise, each DD residual is related
to two satellites with different azimuth and elevation angles, which is not suitable for multipath model
construction. The converted SD residuals are further low-pass ltered with an empirical corner frequency
of 0.1 Hz to remove the observation noise (Choi et al., 2004; Geng et al., 2018). These ltered residuals
can be predominantly treated as multipath errors and will be used for multipath modelling.
ML-based multipath modelling
ML algorithms have been proven to be powerful tools for regression tasks. The main advantage of ML
over classic spatial interpolation algorithms is that not only azimuth and elevation angles but also
auxiliary information, such as SNR measurements, can be utilized for interpolation. There are a lot of ML
algorithms suitable for regression tasks. Among them, ensemble learning algorithms, including bagging
and boosting, are usually ranked among the best-performing methods. Besides, articial neural networks
(ANN) are also commonly used ML algorithms. Hence, three representative ML algorithms, including
random forest (RF), extreme gradient boosting (XGB) and multilayer perceptron (MLP) are selected as the
candidate methods for multipath modelling in this study. RF is an ensemble learning method that outputs
the average results of a set of randomized decision trees (Breiman, 2001). It can overcome the overtting
AdAmφ
φ
{
∇Δ
Pi
= ∇Δ
ρ
+ ∇Δ
Mi
+ ∇Δ
εi
∇Δ
Li
= ∇Δ
ρ
+
λi
∇Δ
Ni
+ ∇Δ
mi
+ ∇Δ
σi
∇Δ
PiLi
fiρ
λiNi
Mimi
εiσi
Page 6/28
issue of a single decision tree and usually can achieve high accuracy without complex conguration.
XGB is an open-sourced gradient boosting framework (Friedman, 2001; Chen and Guestrin, 2016). The
basic idea is that a set of decision trees are trained sequentially to better t the samples with larger
residuals, and it is widely used due to its high performance. MLP is a type of ANN with fully connected
nodes. It consists of three parts, i.e., input layer, hidden layer(s) and output layer. Nonlinear activation
functions are used at each node, and thus can simulate the nonlinear relation between input and output.
Data preparation is the key to ML model training. Two sets of data, i.e., input features and a target vector
need to be provided. Apart from azimuth and elevation angles, SNR is also tested as an additional input
feature for model training in this study. The SD residuals extracted from baseline processing form the
target vector. ML models are trained to best t the relation between input features and the target vector.
Note that all the training data should be cleaned for outliers and normalized to improve training stability
and model performance. The basic procedures of data processing and ML-based multipath mitigation
are illustrated in Fig.1. More details about optimal ML model construction can be found in Section 4.2.
The observation-domain SF and MHM multipath models are built on the same data set for performance
comparison. Note that observation-domain SF is simply denoted as SF in the remaining text for clarity.
Specically, the actual orbit repeat period required by SF is calculated using broadcast ephemerides for
each satellite (Choi et al., 2004), and the MHM model is constructed with 1° by 1° grids for improved
model stability and effectiveness (Dong et al., 2015). Multipath models for GPS P1, P2, L1 and L2
observations are constructed individually. Although pseudorange multipath can be quantied with the so-
called multipath combination if dual-frequency measurements are available, it is still meaningful to
investigate pseudorange multipath modelling since the multipath model can be applied for single-
frequency data.
3 Data
We obtained 30 days (DOY 244–273 of 2021) of 1 Hz high-rate GPS data from Curtin University to test
the ML-based multipath mitigation method. There were four GPS antennas on the rooftop of Curtin
University (Fig.2). The antenna connected to a Trimble NetR9 receiver (CUT0) was used as the reference
station. The other three antennas were all connected to two different receivers, forming six rover stations,
i.e., CUAA/CUTA, CUBB/CUTB and CUCC/CUTC (see details in Table1). Here, a station denotes the
combination of an antenna and a receiver. Station CUBB was excluded in the following studies since
there were many gaps in its data. Hence, ve short baselines were formed between CUT0 and rover
stations for multipath mitigation experiments. The baselines were processed with a modied version of
the RTKLIB software (Takasu, 2009). The coordinates of the reference station were xed to a mean SPP
solution (~ 5 m precision). Uncombined observations of pseudorange and carrier phase on dual
frequencies were utilized for parameter estimation. Since the distance between the reference station and
any rover station was less than 10 m, tropospheric and ionosphere delays were eliminated by differencing
between stations. Only rover positions and DD ambiguities were estimated in a Kalman lter. It was worth
noting that all the stations had the same type of antenna (Table1), which meant receiver PCV errors
Page 7/28
could be eliminated through differencing and would not affect multipath modelling. Finally, multipath
errors were extracted from the postt residuals according to the method described in Section 2 and were
used for multipath modelling in the following experiments.
Table 1
Detailed congurations of GPS stations used in this study
Station Receiver type Firmware version Antenna type
CUT0 Trimble NetR9 5.45 TRM59800.00 SCIS
(Choke ring antenna)
CUTA 5.22
CUTB 5.22
CUTC 5.45
CUAA Javad TRE_G3T DELTA 3.7.9
CUBB
CUCC
4 Results
We rst analyze the characteristics of the extracted multipath errors. Then, different input features and
ML algorithms are explored to establish the multipath models. Finally, the ML-based multipath mitigation
method is compared to the conventional observation-domain SF and MHM methods regarding residual
reduction and positioning improvement. Considering the similar data quality and multipath environment,
we take the station CUCC as the example for specic analysis, and present the statistical results for all
the stations.
4.1 Multipath characteristics analysis
The key to set up a reliable multipath model is the spatiotemporal repeatability of multipath. Figure3
shows the low-pass ltered residuals of satellite G04 at CUCC station on DOY 244, 245 and 249. Note
that the residual time series of DOY 245 and 249 are shifted by corresponding orbit repeat times to better
show the correlation with the residuals of DOY 244. The Pearson correlation coecients between DOY
244 and 249 can reach 0.68, 0.70, 0.82 and 0.86 for P1, P2, L1 and L2, respectively. It indicates that the
environment around the stations is quite stable during the experiment periods, and it also conrms that
the zero-mean constraint for residual conversion is effective according to the repeatability characteristics
of multipath. Considering the good temporal correlation, we stacked 5 days of residuals before the target
day to enhance the multipath signals during modelling (Dong et al., 2016; Wang et al., 2019). We also
checked the multipath correlation between both frequencies for pseudorange and carrier phase,
respectively. The correlation coecients between P1 and P2 residuals are below 0.2 and most of the time
close to 0. This indicates the multipath errors of different pseudorange measurements are not correlated.
Page 8/28
For carrier phase residuals on dual-frequency, the correlation coecients vary between 0 and 0.5. Hence,
there is no denite relation between L1 and L2 multipath effects. Considering the same path delay but
different wavelengths of L1 and L2 carriers, the phase of the two reected carriers is usually
unsynchronized, and thus the disturbance on the direct carrier signals will be different and uncorrelated
(Bilich et al., 2007).
4.2 Optimal ML-based multipath model setup
To select the best ML algorithms and input feature combinations, we used 30 days of data from the
CUCC station to evaluate the performance of each combination regarding the residual reduction rate. The
three candidate ML algorithms included RF, XGB and MLP, and the potential input features included
azimuth, elevation and SNR. Since azimuth and elevation were necessary for spatial interpolation, there
were only two choices for input features, i.e., with or without SNR. ML-based multipath models with the
six possible combinations of algorithms and input features were trained on the residuals of DOY 244–
248 and validated using the residuals of DOY 249, respectively. Grid search was adopted for
hyperparameter tuning. The optimal hyperparameters (Table2) were determined based on the residual
reduction rates for DOY 249. Note that adding SNR as an additional input feature had little impact on the
optimal hyperparameters for each ML algorithm (i.e., RF, XGB or MLP) according to our experiments. After
determining the best hyperparameters, model training and testing were repeated for the remaining 24
days (see Fig.1) and mean multipath reduction rates were calculated for each combination. This meant 5
days of residuals before the target day were used to train the model and later it was applied for multipath
mitigation for the target day.
Figure4 shows the average residual reduction rates for all six combinations. It indicates that including
SNR as an additional feature does not help to improve the model performance, especially for RF and
XGB. The reason might be that the numerical precision of SNR values in the RINEX (Receiver Independent
Exchange Format) les is not high enough. The Trimble and Javad receivers involved in this study only
record SNR measurements with a precision of 0.2 dB-Hz and 0.25 dB-Hz, respectively. Such coarse SNR
increments are not precise enough to improve multipath modelling. Bilich et al. (2007) also found this
issue and reported that it was receiver model dependent. Still, adding SNR as an additional feature for
MLP can improve its performance for pseudorange multipath mitigation. Using only azimuth and
elevation angles as input features is sucient for RF and XGB models. Residual reduction rates of 25%,
36%, 30% and 25% can be achieved for P1, P2, L1 and L2, respectively. It is worth pointing out that
although RF can conduct multivariate regression (i.e., one multipath model for four observables), no
obvious improvement can be observed compared to the results of building multipath models individually.
Since XGB with azimuth and elevation as input features can achieve highest residual reduction rates, we
only present the results of this combination for ML-based methods in the following experiments.
Page 9/28
Table 2
Optimal set of hyperparameters derived from grid search
Algorithm Hypeparameter Value
RF n_estimators 10
max_depth 30
criterion squared_error
XGB n_estimators 40
max_depth 20
criterion squared_error
MLP hidden_layer_sizes (128,128,128,128,128,128,128)
activation relu
solver adam
4.3 Multipath mitigation test
After picking the optimal ML algorithm and input features, we tested and compared the multipath
mitigation performance for three different methods: SF, MHM and XGB. The improvements in kinematic
relative positioning were also evaluated.
4.3.1 Multipath model
The multipath models based on the XGB method are visualized in Fig.5 for station CUCC on DOY 251. It
can be seen that most severe multipath errors concentrate in the low elevation areas, and there is no
obvious pattern difference with respect to azimuth. This is because there is no strong reection source
around the station, and most reected signals come from the surrounding grounds (Fig.2). Such an
observation environment is similar to most IGS stations and can make the conclusions of this study
generally applicable. We further plotted the multipath correction time series of SF, MHM and XGB in Fig.6
(a) to directly compare their capability of modelling multipath. The corresponding low-pass ltered
residuals of satellite G10 are also included as the reference. It can be found that the multipath models of
SF and XGB are in good agreement with the low-pass ltered residuals for both pseudorange and carrier
phase on dual frequencies. They successfully replicate both the long- and short-term variations induced
by multipath. However, the multipath model from MHM can only capture the long-term tendency but not
the short-term changes, i.e., high-frequency components. This drawback is most obvious during the
period from 14 h to 16 h when the satellite is at low elevation and multipath changes fast. The MHM
multipath model in this period resembles a low-resolution version of SF and XGB models.
Figure6 (b) exhibits the power spectral density (PSD) for L1 multipath models generated by three
methods as well as the low-pass ltered residuals. The MHM model has a lower power density between
Page 10/28
the frequency range from 17s to 60 s. Compared to XGB, MHM is about 7 dB lower for the high-frequency
multipath components, which accounts for around 80.5% lower signal power. This explains the low
resolution of the MHM model in Fig.6 (a). For lower frequencies, MHM agrees well with the other two
methods. In contrast, the PSDs of SF and XGB are in good agreement with that of low-pass ltered
residuals from 17s to the lowest frequency. The PSD drop before 0.1 Hz of the SF model and the low-
pass ltered residuals is caused by the low-pass lter used to remove the white noise in raw residuals.
The higher noise level in the XGB model between 2 s and 17 s is most probably caused by spatial
interpolation errors, but it does not affect the multipath mitigation effect since the noise magnitude is
very small compared to multipath errors. A similar phenomenon is also observed for MHM, although it is
not expected since the MHM model averages the residuals within each grid and the noise level should be
lower. Hence, the higher noise level is possibly an artifact caused by the step signals generated by the
low-resolution MHM model as shown in Fig.6 (a). Overall, the PSD analysis conrms that the ML-based
multipath model can achieve similar performance as SF and outperforms the MHM model due to the
advantage of spatial interpolation.
4.3.2 Residual reduction
Figure7 shows the posterior residuals of G04 at station CUCC on DOY 251 and those corrected using SF,
MHM and XGB methods. The multipath errors are effectively reduced by all three methods for
pseudorange and carrier phase on dual frequencies, especially for periods when the satellite is at low
elevations. The RMS of the residuals corrected with XGB is the smallest among the three methods,
reaching 0.55 m, 0.27 m, 2.25 mm and 2.52 mm for P1, P2, L1 and L2, respectively. Compared to the raw
residuals, the improvements are 26.7%, 41.3%, 36.6% and 30.6%, respectively. Here, it can be found that
the reduction rate for P1 is much smaller than for P2. This can be explained by the higher noise level of
P1 residuals. The multipath mitigation performance of SF is similar to that of XGB, and the RMS
differences between them are only 0.01 m, 0.01 m, 0.00 mm and 0.02 mm for P1, P2, L1 and L2,
respectively. In contrast, MHM can only achieve improvements of 10.6%, 26.1%, 25.1% and 25.3% for the
four observables. That is because the MHM method models multipath using 1° by 1° grids, and it cannot
capture the high-frequency multipath components. This can be seen in the L1 residual time series
between 14 h and 16 h in Fig.7. There are still obvious uctuations in this period after being corrected
with MHM, especially the variations near 15 h. Such uctuations nearly disappear when XGB or SF
models are applied.
The mean residual reduction rates using SF, MHM and XGB over all ve stations and 25 days are
displayed in Fig.8. Overall, the results are consistent with those shown in Fig.7, i.e., XGB performs
similarly to SF and better than MHM regarding residual reduction rates. After multipath mitigated with
XGB, RMS improvements of 24.9%, 36.2%, 25.5% and 20.4% can be achieved for P1, P2, L1 and L2
residuals, respectively. The reduction is 0.1–0.7% and 2.0–2.8% larger than for SF for carrier phase and
pseudorange residuals, respectively. In contrast, the reduction rates achieved with MHM are 13.7%, 14.3%,
8.0% and 3.5% less than for XGB for P1, P2, L1 and L2, respectively, due to its deciency of modelling
high-frequency multipath signals.
Page 11/28
Up to now, we have always used the residuals of the latest 5 days before the target day to set up the
multipath model. Although the best correction effect can be obtained in this way, updating the model in a
daily manner is cumbersome. It would be benecial, especially for real-time applications, if the multipath
model can be applied to subsequent days without too much precision loss. Hence, we tested the model
validity period for all ve stations and compare the performance among SF, MHM and XGB. Residuals
from DOY 244–248 are used to set up the multipath model, and later it is applied for multipath correction
on days from DOY 249 to 273. The mean residual reduction rates over ve stations on each day are
plotted in Fig.9. It can be found that the multipath correction effect of the three different methods
gradually degrades over the whole test period. XGB achieves the highest residual reduction rates for all
four observables on the 25 test days. The reduction rates using XGB drop from 25.2%, 35.6%, 25.1% and
19.4–9.0%, 17.7%, 16.1% and 14.4% for P1, P2, L1 and L2 residuals, respectively. It means the multipath
correction effect on DOY 273 is only half of that on DOY 249. A 5–7 days update rate seems to be a good
trade-off between model validity and the workload of data processing. In this circumstance, it can still
achieve 90% multipath correction effect of the daily updated model. SF performs similarly to XGB on the
rst day, but its performance rapidly drops on the subsequent three days and then decreases at a linear
pace. That is mainly because the effectiveness of the SF model heavily depends on the accurate orbit
repeat time for each satellite on each day and any deviations between the computed and real orbit repeat
time will impact the model performance. The MHM model performance is more stable, with the reduction
rates dropping by 5.9%, 8.7%, 6.3% and 5.6% over the test periods for P1, P2, L1 and L2, respectively. It
conrms that MHM model mainly captures the lower frequency multipath signals as they are more stable
over time.
4.3.3 Positioning improvement
We further applied the three different multipath models for kinematic relative positioning. The model
performance was evaluated regarding the positioning precision improvement compared to the solutions
without correction. The daily static coordinates were used as the benchmark for calculation of
positioning RMS. Note that multipath correction was not applied for static solutions as the impact of
multipath on daily static positioning could be neglected. We nally obtained 125 time series (at ve
stations over 25 days) for each type of solution, i.e., raw (without correction), corrected with SF, MHM and
XGB.
Figure10 depicts the displacements for four different solutions at station CUCC on DOY 273. The raw
solution contains many variations induced by multipath spanning from tens of seconds to half an hour,
which are evident in all three coordinate components. These variations are effectively mitigated by
applying the multipath models of SF, MHM and XGB. The RMS of the solution corrected by XGB are 1.4
mm, 1.9 mm and 4.4 mm for east, north and up components, respectively, which are equal to those of SF.
It is interesting that the MHM model can reach comparable positioning precisions, especially considering
its disadvantage to capture high-frequency multipath signals. The RMS values are only 0.1 mm, 0.1 mm
and 0.2 mm larger than the other two models in east, north and up components, respectively. Usually, the
high-frequency multipath occurs when a satellite is at low elevations. Such low-elevation observations
Page 12/28
are down-weighted during data processing. This can explain the reasonable positioning precision of
MHM although it is decient in high-frequency multipath modelling. The mean positioning precisions
over all ve stations and 25 days are listed in Table3. Again, XGB and SF can achieve the highest
precisions, which are 1.6 mm, 1.9 mm and 4.5 mm for east, north and up components, respectively.
Compared to the raw solutions, the improvements are about 20.0%, 17.4% and 16.7% for the three
components. The performance of MHM is a bit worse compared to XGB and SF, but it can still reach
15.0%, 13.0% and 13.0% precision improvements for east, north and up components, respectively.
Table 3
Mean RMS of 1 Hz displacements in east, north and
up components for 4 types of solutions over all 5
stations and 25 days
Method Kinematic positioning precision (mm)
East North Up
Raw 2.0 2.3 5.4
Sidereal 1.6 1.9 4.5
MHM 1.7 2.0 4.7
XGB 1.6 1.9 4.5
4.4 Multipath mitigation for 30 s sampling data
In the last section, we have demonstrated the multipath mitigation performance of the XGB model using
1 s GPS data. However, 30 s is the more common sampling rate for most geodetic stations, such as the
IGS network. Hence, we further validate the XGB model using GPS data of 30 s interval.
We reprocessed the data at 30 s sampling rate for all ve stations and utilized the 30 s sampling
residuals from 5 days before each target day to set up the multipath models for SF, MHM and XGB,
respectively. Then the residuals of P1, P2, L1 and L2 are corrected with the corresponding models. The
daily residual reduction rates are plotted in Fig.11 and the mean values over 25 days are given in Table4.
We nd that the reduction rates of XGB are the highest among the three different methods. The
performance of XGB is almost constantly 1.9% and 2.6% higher than for SF and MHM for P1 residuals
and 2.7% and 4.5% for P2. For carrier phase residuals, the reduction rates of XGB are about 1.0% and
2.4% higher than for SF and MHM for L1, and 0.4% and 1.3% for L2. This demonstrates the superiority of
ML methods for multipath mitigation. In the circumstance of 30 s sampling rate, the error of the orbit
repeat time calculation might be up to 15 s, which will degrade the SF model performance. For 30 s data,
the frequency of multipath signals will not be higher than 60 s. Hence, the performance of the MHM
model is closer to SF and XGB. But there will also be fewer data points within each grid cell, which might
degrade the stability of the MHM model. The corresponding kinematic positioning results are listed in
Table5. It is found that XGB can achieve 0.1 mm higher precision in both north and up components than
Page 13/28
those of SF and MHM. Compared to the raw solutions, precision improvements of 15.0%, 13.0% and
11.3% can be achieved with the XGB model in east, north and up components, respectively.
Table 4
Mean residual reduction rates for 30 s
data over all ve stations and 25 days
Model Residual reduction rate (%)
P1 P2 L1 L2
SF 10.2 21.8 16.3 15.6
MHM 9.5 20.0 14.9 14.7
XGB 12.1 24.5 17.3 16.0
Table 5
Mean RMS of 30 s displacements in east, north and
up components for four types of solutions over all
ve stations and 25 days
Method Kinematic positioning precision (mm)
East North Up
Raw 2.0 2.3 5.3
Sidereal 1.7 2.1 4.8
MHM 1.7 2.1 4.8
XGB 1.7 2.0 4.7
5 Conclusion
Multipath is the main unmodelled error in high-precision GNSS data processing. It hinders the
achievement of mm-level kinematic positioning precision, which especially impacts the application of
GNSS for seismology studies and structure health monitoring. In this study, we proposed an ML-based
multipath mitigation method. It takes azimuth and elevation as input features and outputs multipath
corrections for pseudorange and carrier phase on both frequencies. Owing to its ability of spatial
interpolation, it can overcome the shortcoming of the conventional MHM method, i.e., deciency in
capturing high-frequency multipath signals. With 30 days of 1 Hz GPS data from ve baselines on the
rooftop of Curtin University, we validated the superiority of the ML-based multipath mitigation method
over conventional methods. The best ML algorithm (XGB) and optimal input features (azimuth and
elevation angles) are selected based on the residual variance reduction rate. However, SNR
measurements cannot improve the model performance and it might be attributed to the insucient
numeric precision. The conventional SF and MHM models were used for comparison. We demonstrate
that the XGB model can achieve 24.9%, 36.2%, 25.5% and 20.4% reduction rates for P1, P2, L1 and L2
Page 14/28
residuals, respectively. Such performance is similar to SF but without the inconvenience of computing the
orbit repeat period for each satellite. XGB can reach 14.0% and 5.8% more reduction rates than MHM for
pseudorange and carrier phase residuals, respectively. After applying the XGB model, kinematic
positioning precisions of 1.6 mm, 1.9 mm and 4.5 mm can be achieved in east, north and up
components, respectively, which are 20.0%, 17.4% and 16.7% improvements compared to the raw
solutions. The effectiveness of the XGB model for 30 s sampling data was also evaluated and compared
to that of SF and MHM. It conrms that the advantage of spatial interpolation still holds for low-sampling
data. Residual reduction rates of 12.1%, 24.5%, 17.3% and 16.0% can be reached for P1, P2, L1 and L2,
respectively, which are better than for SF and MHM.
Although we only demonstrated the ML-based multipath mitigation using baseline data in this study, it is
also valid for PPP multipath modelling and mitigation according to our preliminary internal tests. Since
the ML-based model has the merit of ease of use, it is also promising for real-time applications, such as
structural health monitoring. Besides, the ML-based model can be extended to include additional input
features, such as environmental information and SNR measurements with higher numerical precision.
This might be helpful to further improve the model, especially in long-term performance. Finally, more
sophisticated ML algorithms are worth further investigation. As demonstrated in this study, basic tree-
based ML algorithms can perform well for multipath modelling. More powerful algorithms, such as deep
learning and reinforcement learning, should be further examined for multipath mitigation in future
studies.
Declarations
Ethics approval and consent to participate: Not applicable.
Consent for publication: All authors approved the manuscript for publication.
Availability of data and materials: The high-rate GNSS data used in this study is available at
http://saegnss2.curtin.edu.au/ldc/.
Competing interests: The authors declare no conict of interest.
Funding: This research received no external funding.
Authors’ contributions: YP designed the study, analyzed the data and wrote the paper. GM and BS
supervised the study and revised the manuscript. All authors reviewed the manuscript and approved it for
publication.
Acknowledgements: The authors would like to thank Curtin GNSS-SPAN Group for the access to the high-
rate GNSS data and Amir Allahvirdi-Zadeh for providing the station photos.
References
Page 15/28
1. Agnew, D. C., & Larson, K. M. (2006). Finding the repeat times of the GPS constellation. GPS
Solutions, 11(1), 71-76. https://doi.org/10.1007/s10291-006-0038-4
2. Aichinger-Rosenberger, M., Brockmann, E., Crocetti, L., Soja, B., & Moeller, G. (2022). Machine learning-
based prediction of Alpine foehn events using GNSS troposphere products: rst results for Altdorf,
Switzerland. Atmospheric Measurement Techniques, 15(19), 5821-5839.
3. Alber, C., Ware, R., Rocken, C., & Braun, J. (2000). Obtaining single path phase delays from GPS
double differences. Geophysical Research Letters, 27(17), 2661-2664.
https://doi.org/10.1029/2000GL011525
4. Atkins, C., & Ziebart, M. (2015). Effectiveness of observation-domain sidereal ltering for GPS precise
point positioning. GPS Solutions, 20(1), 111-122. https://doi.org/10.1007/s10291-015-0473-1
5. Beroza, G. C., Segou, M., & Mostafa Mousavi, S. (2021). Machine learning and earthquake
forecasting—next steps. Nature communications, 12(1), 1-3. https://doi.org/10.1038/s41467-021-
24952-6
. Bilich, A., Axelrad, P., & Larson, K. M. (2007). Scientic Utility of the Signal-to-Noise Ratio (SNR)
Reported by Geodetic GPS Receivers. Proceedings of the 20th International Technical Meeting of the
Satellite Division of The Institute of Navigation (ION GNSS 2007), Fort Worth, TX.
7. Bilich, A., Larson, K. M., & Axelrad, P. (2008). Modeling GPS phase multipath with SNR: Case study
from the Salar de Uyuni, Boliva. Journal of Geophysical Research, 113(B4).
https://doi.org/10.1029/2007jb005194
. Bock, Y., Prawirodirdjo, L., and Melbourne, T. I. (2004). Detection of arbitrarily large dynamic ground
motions with a dense high-rate gps network. Geophysical Research Letters, 31(6).
https://doi.org/10.1029/2003GL019150
9. Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
https://doi.org/10.1023/A:1010933404324
10. Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd
acm sigkdd international conference on knowledge discovery and data mining, pp. 785-794.
https://doi.org/10.1145/2939672.2939785
11. Choi, K., Bilich, A., Larson, K. M., & Axelrad, P. (2004). Modied sidereal ltering: Implications for high-
rate GPS positioning. Geophysical Research Letters, 31(22). https://doi.org/10.1029/2004gl021621
12. Choy, S., Bisnath, S., & Rizos, C. (2016). Uncovering common misconceptions in GNSS Precise Point
Positioning and its future prospect. GPS Solutions, 21(1), 13-22. https://doi.org/10.1007/s10291-
016-0545-x
13. Cohen C, Parkinson B (1991) Mitigating multipath error in GPS-based attitude determination.
Advances in the astronautical sciences, AAS guidance and control conference, Keystone. Univelt, San
Diego, pp 74–78
14. Crocetti L, Schartner M, Soja B. (2021). Discontinuity Detection in GNSS Station Coordinate Time
Series Using Machine Learning. Remote Sensing, 13(19):3906. https://doi.org/10.3390/rs13193906
Page 16/28
15. Dong, D., Wang, M., Chen, W., Zeng, Z., Song, L., Zhang, Q., Cai, M., Cheng, Y., & Lv, J. (2015).
Mitigation of multipath effect in GNSS short baseline positioning by the multipath hemispherical
map. Journal of Geodesy, 90(3), 255-262. https://doi.org/10.1007/s00190-015-0870-9
1. Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of
statistics, 1189-1232. https://doi.org/10.1214/aos/1013203451
17. Fuhrmann, T., Luo, X., Knöper, A., & Mayer, M. (2014). Generating statistically robust multipath
stacking maps using congruent cells. GPS Solutions, 19(1), 83-92. https://doi.org/10.1007/s10291-
014-0367-7
1. Geng, J., Pan, Y., Li, X., Guo, J., Liu, J., Chen, X., & Zhang, Y. (2018). Noise Characteristics of High-Rate
Multi-GNSS for Subdaily Crustal Deformation Monitoring. Journal of Geophysical Research: Solid
Earth, 123(2), 1987-2002. https://doi.org/10.1002/2018jb015527
19. Genrich, J. F. and Bock, Y. (1992). Rapid resolution of crustal motion at short ranges with the global
positioning system. Journal of Geophysical Research: Solid Earth, 97(B3):3261–3269.
https://doi.org/10.1029/91JB02997
20. Hsu, L. T. (2017). GNSS multipath detection using a machine learning approach. In 2017 IEEE 20th
International Conference on Intelligent Transportation Systems (ITSC) (pp. 1-6).
21. Kouba, J., (2015). A guide to using International GNSS Service (IGS) products.
http://kb.igs.org/hc/en-us/article_attachments/203088448/UsingIGSProductsVer21_cor.pdf
22. Larson, K. M. (2009). GPS seismology. Journal of Geodesy, 83(3-4), 227-233.
https://doi.org/10.1007/s00190-008-0233-x
23. Li, H., Borhani-Darian, P., Wu, P., & Closas, P. (2022). Deep Neural Network Correlators for GNSS
Multipath Mitigation. IEEE Transactions on Aerospace and Electronic Systems, 1-23.
https://doi.org/10.1109/taes.2022.3197098
24. Li, J., Heap, A. D., Potter, A., & Daniell, J. J. (2011). Application of machine learning methods to
spatial interpolation of environmental variables. Environmental Modelling & Software, 26(12), 1647-
1659. https://doi.org/10.1016/j.envsoft.2011.07.004
25. Lu, R., Chen, W., Dong, D., Wang, Z., Zhang, C., Peng, Y., & Yu, C. (2021). Multipath mitigation in GNSS
precise point positioning based on trend-surface analysis and multipath hemispherical map. GPS
Solutions, 25(3). https://doi.org/10.1007/s10291-021-01156-5
2. McGraw, G. A., Young, R. S., Reichenauer, K., Stevens, J., and Ventrone, F. (2004). Gps multipath
mitigation assessment of digital beam forming antenna technology in a jpals dual frequency
smoothing architecture. In Proceedings of the 2004 National Technical Meeting of the Institute of
Navigation, pp. 561–572.
27. Park, K. D., Nerem, R. S., Schenewerk, M. S., & Davis, J. L. (2004). Site-specic multipath
characteristics of global IGS and CORS GPS sites. Journal of Geodesy, 77(12), 799-803.
https://doi.org/10.1007/s00190-003-0359-9
2. Rebischung, P., Altamimi, Z., Ray, J., & Garayt, B. (2016). The IGS contribution to ITRF2014. Journal
of Geodesy, 90(7), 611-630. https://doi.org/10.1007/s00190-016-0897-6
Page 17/28
29. Satirapod, C. and Rizos, C. (2005). Multipath mitigation by wavelet analysis for GPS base station
applications. Survey Review, 38(295):2–10. https://doi.org/10.1179/003962605791521699
30. Su, M., Yang, Y., Qiao, L., Teng, X., & Song, H. (2021). Enhanced multipath mitigation method based
on multi-resolution CNR model and adaptive statistical test strategy for real-time kinematic PPP.
Advances in Space Research, 67(2), 868-882. https://doi.org/10.1016/j.asr.2020.10.035
31. Suzuki, T., Kusama, K., & Amano, Y. (2020). NLOS Multipath Detection using Convolutional Neural
Network. In Proceedings of the 33rd International Technical Meeting of the Satellite Division of The
Institute of Navigation (ION GNSS+ 2020), pp. 2989-3000. https://doi.org/10.33012/2020.17663
32. Takasu, T. (2009). RTKLIB: Open source program package for RTK-GPS. Proceedings of the FOSS4G.
33. Tao, Y., Liu, C., Chen, T., Zhao, X., Liu, C., Hu, H., Zhou, T., Xin, H., & Neagu, A. (2021). Real-Time
Multipath Mitigation in Multi-GNSS Short Baseline Positioning via CNN-LSTM Method. Mathematical
Problems in Engineering, 2021, 1-12. https://doi.org/10.1155/2021/6573230
34. Tao, Y., Liu, C., Liu, C., Zhao, X., Hu, H., & Xin, H. (2021). Joint time–frequency mask and
convolutional neural network for real-time separation of multipath in GNSS deformation monitoring.
GPS Solutions, 25(1). https://doi.org/10.1007/s10291-020-01074-y
35. Van Dierendonck, A., Fenton, P., and Ford, T. (1992). Theory and performance of narrow correlator
spacing in a GPS receiver. Navigation, 39(3):265–283.
3. Wang, Z., Chen, W., Dong, D., Wang, M., Cai, M., Yu, C., Zheng, Z., & Liu, M. (2019). Multipath
mitigation based on trend surface analysis applied to dual-antenna receiver with common clock.
GPS Solutions, 23(4). https://doi.org/10.1007/s10291-019-0897-0
37. Xu, H., Angrisano, A., Gaglione, S., & Hsu, L.-T. (2020). Machine learning based LOS/NLOS classier
and robust estimator for GNSS shadow matching. Satellite Navigation, 1(1).
https://doi.org/10.1186/s43020-020-00016-w
3. Ye, S., Chen, D., Liu, Y., Jiang, P., Tang, W., & Xia, P. (2014). Carrier phase multipath mitigation for
BeiDou navigation satellite system. GPS Solutions, 19(4), 545-557. https://doi.org/10.1007/s10291-
014-0409-1
39. Zheng, K., Zhang, X., Li, P., Li, X., Ge, M., Guo, F., Sang, J., & Schuh, H. (2019). Multipath extraction and
mitigation for high-rate multi-GNSS precise point positioning. Journal of Geodesy, 93(10), 2037-
2051. https://doi.org/10.1007/s00190-019-01300-7
40. Zhong, P., Ding, X., Yuan, L., Xu, Y., Kwok, K., & Chen, Y. (2009). Sidereal ltering based on single
differences for mitigating GPS multipath effects on short baselines. Journal of Geodesy, 84(2), 145-
158. https://doi.org/10.1007/s00190-009-0352-z
Figures
Page 18/28
Figure 1
Flow chart of GPS data processing and ML-based multipath mitigation. The ML model is trained using
the data from previous days and it predicts the multipath errors for the target day
Page 19/28
Figure 2
(a) GPS station distribution in bird’s-eye view. CUT0 is the reference station used for relative positioning.
All three rover antennas are connected to two different receivers, respectively. (b) View of the observation
environment from the rooftop of Curtin University. The copyright of the photos are preserved by Curtin
GNSS-SPAN Group
Page 20/28
Figure 3
Low-pass ltered residuals of satellite G04 at station CUCC. Pseudorange and carrier phase residuals are
shown in the upper and bottom panels, respectively. Residuals on DOY 244, 245 and 249 are denoted by
red, blue and green curves, respectively, and they are shifted by corresponding orbit repeat periods along
the x-axis to better show the temporal correlation. Shifts along the y-axis are made to avoid overlapping.
Pearson correlation coecients with respect to the residuals of DOY 244 are denoted above each curve
for DOY 245 and 249
Page 21/28
Figure 4
Mean residual reduction rates for station CUCC over DOY 250-273. The reduction rates using MLP, RF and
XGB (only azimuth and elevation as input features) are denoted by blue, orange and green bars,
respectively. The shaded bars represent the corresponding models with SNR as an additional input
feature
Page 22/28
Figure 5
Visualization of ML-based (XGB) multipath models for station CUCC on DOY 251
Page 23/28
Figure 6
(a) Low-pass lterd residuals and different multipath models for satellite G10 at station CUCC on DOY
251. The ltered residuals and multipath models of SF, MHM and XGB are displayed in black, red, blue
and green, respectively. The individual curves are shifted vertically to avoid overlapping. (b) L1 power
spectral density (relative to 1 m2/Hz) for low-pass ltered residuals and different multipath models. The
two orange vertical lines represent the frequency range from 17 s to 60 s
Page 24/28
Figure 7
Raw and multipath corrected residuals of satellite G10 at station CUCC on DOY 251. The raw residuals
and those corrected with SF, MHM and XGB are represented by black, red, blue and green curves,
respectively. The individual curves are shifted along the y-axis to avoid overlapping. The RMS value is
denoted above each curve
Page 25/28
Figure 8
Mean residual reduction rates using different multipath mitigation methods over all ve stations and 25
days
Page 26/28
Figure 9
Comparison of validity periods for different multipath models. Data from DOY 244-248 is used to set up
the multipath models, which are later applied for multipath mitigation on DOY 249-273. The residual
reduction rate on each day is the mean value of all ve stations
Page 27/28
Figure 10
Hz kinematic positioning results at station CUCC on DOY 273 with multipath mitigated with different
methods. The raw positioning result and those corrected with SF, MHM and XGB are shown in black, red,
blue and green curves, respectively. RMS values of displacements are denoted above each curve. Only the
last 6 h displacements are displayed to show the detailed positioning errors induced by multipath
Page 28/28
Figure 11
Residual reduction rates for 30 s sampling data using different multipath mitigation methods. The
residual reduction rate on each day is the mean value of all ve stations