ArticlePDF Available

Progress of Instantiated Reality Augmentation Method for Smart Phone Indoor Scene Elements

May 2024
Computer Engineering and Applications Journal 60(7):58-69

May 2024
60(7):58-69

DOI:10.3778/j.issn.1002-8331.2309-0376

Authors:

Beijing University of Civil Engineering and Architecture

Indoor mobile phone navigation and location services are current research hotspots, of which scene element instantiation reality augmentation methods are an important part. Instantiated segmentation is a challenging and fundamental task in scene element perception, and augmented reality is an effective way to apply digital twin building maps, both of which are of great importance in the field of indoor location navigation. At present, augmented reality technology is mainly applied to semantic enhancements in scenes, and AR enhancements for smartphone indoor navigation only stay in the visual visualization effect, and have not yet really penetrated to the level of enhancement of elemental instances in scenes. To address this problem, this paper proposes an AR research idea of mobile phone scene element instantiation, by identifying objects in indoor scenes and matching them with building maps, the corresponding stored element information in the building maps will be enhanced and displayed using AR technology, thus assisting pedestrians in indoor navigation and location services and other related applications, and improving the information level of location services such as indoor positioning and navigation for users. This paper provides a systematic overview of instance segmentation and augmented reality methods for smartphone-side video, and analyses the characteristics and applicable scenarios of the relevant methods, summaries the research progress of instance segmentation and augmented reality in mobile, and finally discusses the application prospects of instantiated reality augmentation methods for indoor scene elements in the field of navigation and location services.

Overall flowchart of mobile phone indoor scene elements instantiation AR

…

Classification of video instance segmentation methods on smartphone

…

Representative implementation methods for smartphone AR

…

Figures - uploaded by Liu Jianhua

Content may be subject to copyright.

Content uploaded by Liu Jianhua

Content may be subject to copyright.

计算机工程与应用

www.ceaj.org

Computer Engineering and Applications 计算机工程与应用2024，60（7）

信息智能化时代位置服务受到人们的广泛重视，在

室内环境中受到墙壁对信号遮挡和多路径效应等复杂

因素的影响，室外卫星导航定位系统并不能提供有效服

务。在大型复杂的室内场景，如交通枢纽、购物中心、大

手机室内场景要素实例化现实增强方法研究进展

刘建华，王楠，白明辰

北京建筑大学测绘与城市空间信息学院移动地理空间大数据云服务创新团队，北京 100044

摘要：室内手机导航与位置服务是当前的研究热点，场景要素实例化现实增强方法是其中重要的组成部分。实例

分割是场景要素感知中一项具有挑战性的基本任务，增强现实是数字孪生建筑物地图应用的有效途径，两者在室内

定位导航领域有着重要意义。当前，增强现实技术主要应用在对场景中的语义增强，智能手机室内导航 AR增强也

只是停留在视觉可视化效果方面，尚没有真正深入到对场景中要素实例的增强层面。针对该问题，提出手机场景要

素实例化 AR 研究思路，通过识别室内场景中的对象并与建筑物地图进行匹配，将建筑物地图中对应存储的要素信

息利用 AR 技术进行增强显示，进而辅助行人进行室内导航与位置服务等相关应用，提升用户室内定位导航等位置

服务的信息化水平。对智能手机端视频的实例分割和增强现实方法进行了系统的梳理，并分析了相关方法的特点

和适用场景，总结了移动端实例分割和增强现实的研究进展，最后探讨了室内场景要素实例化现实增强方法在导航

与位置服务领域的应用前景。

关键词：增强现实；实例分割；深度学习；手机室内定位导航；建筑物地图匹配

文献标志码：A中图分类号：TP301 doi：10.3778/j.issn.1002-8331.2309-0376

Progress of Instantiated Reality Augmentation Method for Smart Phone Indoor Scene Elements

LIU Jianhua, WANG Nan, BAI Mingchen

Mobile Geospatial Big Data Cloud Service Innovation Team, School of Geomatics and Urban Spatial Information, Beijing

University of Civil Engineering and Architecture, Beijing 100044, China

Abstract：Indoor mobile phone navigation and location services are current research hotspots, of which scene element

instantiation reality augmentation methods are an important part. Instantiated segmentation is a challenging and fundamental

task in scene element perception, and augmented reality is an effective way to apply digital twin building maps, both of

which are of great importance in the field of indoor location navigation. At present, augmented reality technology is mainly

applied to semantic enhancements in scenes, and AR enhancements for smartphone indoor navigation only stay in the visual

visualization effect, and have not yet really penetrated to the level of enhancement of elemental instances in scenes. To

address this problem, this paper proposes an AR research idea of mobile phone scene element instantiation, by identifying

objects in indoor scenes and matching them with building maps, the corresponding stored element information in the

building maps will be enhanced and displayed using AR technology, thus assisting pedestrians in indoor navigation and

location services and other related applications, and improving the information level of location services such as indoor

positioning and navigation for users. This paper provides a systematic overview of instance segmentation and augmented

reality methods for smartphone-side video, and analyses the characteristics and applicable scenarios of the relevant methods,

summaries the research progress of instance segmentation and augmented reality in mobile, and finally discusses the

application prospects of instantiated reality augmentation methods for indoor scene elements in the field of navigation and

location services.

Key words：augmented reality; instance segmentation; deep learning; mobile indoor location-based navigation; building

map matching

基金项目：北京市高等教育学会重点项目（ZD202244）；北京建筑大学教育科学研究重点项目（Y2111）。

作者简介：刘建华（1981—），通信作者，男，博士，副教授，北京建筑大学大数据应用研究中心特聘教授，硕士生导师，中国遥感应

用协会建设工程分会委员，研究方向为地理信息科学与遥感应用技术，E-mail：liujianhua@bucea.edu.cn。

收稿日期：2023-09-21 修回日期：2024-01-05 文章编号：1002-8331（2024）07-0058-12

计算机工程与应用

www.ceaj.org

2024，60（7）

型医院、博物馆、地下场站等，室内导航与位置服务显得

尤为重要。手机视觉定位技术由于所需设备简便，受环

境影响因素较小而得到广泛关注。智能手机拥有丰富

的传感器资源和多源信号增强的处理能力，以手机为载

体的室内导航与位置服务成为当前的研究热点。在室

内场景要素识别研究中，正确感知用户所在场景中的要

素信息（空间位置及属性信息等）或据此进一步对用户

进行视觉定位仍具有很大的挑战性。

移动增强现实技术（mobile augmented reality，MAR）

通过移动设备上的摄像头来识别特定图像，将目标识

别、渲染等计算任务在云端或边缘服务器进行处理，并

将渲染后的图像下载到移动端进行显示。徐舒婷等[1]通

过将训练好的 SSD网络模型搭载到移动端，开发可识别

建筑单体信息的增强导航系统。Roh 等[2]将深度学习与

AR 结合有效识别个人移动用户并提供驾驶辅助信息。

Wang[3]通过计算机视觉方法对地图点状、线状元素进行

识别和自动提取，利用 AR 实现二维平面地图的增强表

示。但是，上述方法并没有应用到室内导航领域，且没

有真正深入到对场景中要素实例增强的层面。因此，研

究手机室内场景要素实例化 AR方法在室内导航与位置

服务领域有着现实意义。

实例是指类的对象，实例化是指通过实例分割的途

径来获取场景中对象的过程。本研究提出的场景要素

实例化 AR方法首先在建筑物地图基础之上构建地图定

位锚点，地图定位锚点即建筑物内具有普适性的附属设

施点，并存储要素实例增强信息。然后利用手机后置摄

像头采集要素图片，通过 labelimg 图像标注工具进行样

本数据集的制作。接着利用样本数据集训练深度学习

网络模型，并通过格式转换的方式将模型轻量化部署到

移动端进行识别，将识别对象与建筑物地图进行匹配，

进行场景要素实例化识别。最后利用移动增强现实技

术将识别结果在建筑物地图中对应存储的要素信息（文

本、图片、视频等）进行增强显示，进而辅助行人进行室

内导航与位置服务等相关应用，最终拟实现效果如图 1

所示。本方法中的室内场景要素实例化方法具有普适

性，选取建筑物室内具有普适性的附属设施作为识别要

素，可以应用到多类建筑物室内场景识别中。

近年来，关于手机视频实例分割和 AR 方法的研究

备受关注，但对该领域相关研究现状系统总结的综述性

文献鲜见。本文对智能手机端视频的场景要素实例分

割和增强现实方法进行了面向室内导航定位应用的系

统性梳理，并分析了相关方法的特点和适用场景，总结

了移动端视频实例分割和增强现实的研究进展，最后探

讨了室内场景要素实例化 AR方法在导航与位置服务领

域的应用前景。

1研究进展

1.1 实例分割

1.1.1 概述

实例分割是计算机视觉领域一项具有挑战性的基

本任务，广泛应用于医学影像分析、自动驾驶、监控系

统、导航定位[4- 5]和增强现实[6]等领域。在边缘设备上部

署高性能、低延迟的对象检测器正受到越来越多的关

注。在过去的几年里，研究人员对基于卷积神经网络

（convolutional neural network，CNN）

[7- 8] 的检测网络进

行了广泛的研究，诸多实例分割框架被提出。Bolya等[9]

提出一个全卷积实时（>30 fps）实例分割框架 YOLACT，

通过并行生成一组原型掩码并预测每个实例的掩码系

手机实例化AR

摄像头

门

火警报警器

门牌

消防柜

安全出口

WLAN

电梯

展板

照明灯

电箱

构件样本数据集

图像标注工具目标标签

原始影像

目标位置

手工

标记

构建场景

要素深度

学习模型

模型参数优化

模型格式转换

地理编码

匹配

AR增强

技术

模型精度

是否合格

室内场景要素

实例识别模型

场景要素实例识别

射灯

实

体

模

型

网

络

模

型

构建建筑物

地图定位锚点

存储要素实例

增强信息

是

否

采集场景要素照片

AR建筑物地图

图1手机室内场景要素实例化现实增强整体流程图

Fig.1 Overall flowchart of mobile phone indoor scene elements instantiation AR

刘建华，等：手机室内场景要素实例化现实增强方法研究进展 59

计算机工程与应用

www.ceaj.org

Computer Engineering and Applications 计算机工程与应用2024，60（7）

数来生成实例掩码。Wu等[10]提出 EfficientVIS“端到端”

框架，通过迭代查询视频交互，跨空间和时间关联并分

割感兴趣区域（ROI）。Li 等[11]提出单阶段参考视频对象

分割（RVOS）框架 YOFO，其中元传输（MT）模块仅使用

图像特征重建 VOS所需的图像和语言特征。Kini 等[12]

提出基于标记的注意力模块学习视频中的实例传播，以

实现像素级粒度的实例掩码预测。Zhu等[13]提出 instance

as identity（IAI）在线 VIS 范式，该范式以一种高效的方

式对检测和跟踪的时间信息进行建模。Ganesh 等[14] 提

出RFCR 原始特征收集和重分发模块，以及一种用于改

进迁移学习的目标检测的截断主干，用于提高各种轻量

级体系结构的准确性和效率。

同时，随着 Transformer[15-17]的兴起，视觉 Transformer

（vision Transformer，ViT）

[18] 作为当前最先进的模型逐

渐应用于实例分割领域。Wang 等[19]提出的 VisTR 视频

实例分割框架将 VIS 任务视为一个直接的“端到端”并

行序列解码、预测问题，是第一个将 Transformer 应用于

视频实例分割的框架。Zhou 等[20] 提出一系列全注意力

网络（FANs），通过整合注意力通道处理设计来提高

Transformer 的稳健性。Wu 等[21] 提出的 ReferFormer 框

架将语言视为查询，通过链接相应查询可以实现对象跟

踪。Jin 等[22]提出一种基于 Transformer 的航拍视频异常

检测（ANDT），它将连续的视频帧视为一个管序列，利

用Transformer 编码器从序列中学习特征表示，并利用

解码器预测下一帧。Li 等[23]提出一种基于 Transformer

的鲁棒稀疏特征匹配网络模型 MSFA-T，该模型利用图

像语义信息和最优置信度特征解决视点失真和纹理薄

弱的问题，在室内大视野场景视觉定位中实现了精确的

图像匹配。

轻量化 CNN 在视觉任务中广泛应用，但其表示是

局部性的，ViT 能够学习全局表示，但由于自注意的二

次方复杂性使其在计算和模型大小方面要求很高[24]。

因此融合 CNN 与ViT的优势来构建移动视觉模型逐渐

成为一个新的趋势。Heo 等[25]提出一种基于池化的 PiT

模型，将 ResNet 风格的维度应用于 ViT，显著提高 ViT

体系结构的性能。Pan等[26]提出的 EdgeViTs模型通过引

入基于自注意和卷积优化整合的 local-global-local模块

来聚合信息。Chen 等[27]提出的 Mobile-Former模型将轻

量级 MobileNet 与transformer 并行设计，实现本地和全

局特征的双向融合。Yang 等[28] 提出建立在 mobile 卷积

和注意力基础上的 MOAT模型。

当前实例分割框架并没有与 AR增强技术有效结合

应用到室内定位导航领域，即并没有深入结合建筑物地

图对视频要素实例识别结果进行语义增强，具体如对视

频中的实例对象进行相应的文本、语音、图像、多媒体等

语义信息增强，究其原因主要是缺乏工程化的视频场景

要素实例化解决方案和配套技术。

1.1.2 方法谱系

上述实例分割框架的提出与应用，促进了实例分割

技术的快速发展。如表 1所示，在实例分割框架基础

方法

MansNet

ESPNetV1

ESPNetV2

MobileNetV1

MobileNetV2

MobileNetV3

MobileViTv1

MobileViTv2

EdgeNeXt

时间

2019

2018

2019

2017

2018

2020

2021

2022

作者

谭明星

Mehta 等

Howard 等

Sandler 等

Howard 等

Mehta 等

Maaz 等

特点

（1）将设计问题转化为多目标优化问题（multi-objective optimization），同时考虑准确率和实际推理耗时，并在

实际移动设备上运行来测量推理耗时

（2）提出分解的层次搜索空间（factorized hierarchical search space），使得层存在结构差异的同时，仍然能很好

地平衡灵活性和搜索空间大小

核心在于 ESP 模块，该模块包含point-wise 卷积和空洞卷积金字塔，分别用于降低计算复杂度以及重采样各有

效感受域的特征

（1）通用的轻量化网络结构，能够支持视觉数据以及序列化数据，即能支持视觉任务和自然语言处理任务

（2）在 ESPNet 基础上，加入深度可分离空洞卷积进行拓展，相对于 ESPNet 拥有更好的精度以及更少的参数

（1）用深度可分离卷积（depthwise separable convolution）代替普通的卷积，来构建轻量级的深度神经网络

（2）引入了两个简单的全局超参数，可以有效地权衡延迟和准确性

（1）引入倒残差结构（inverted residuals），先升维再降维，增强梯度的传播，显著减少推理期间所需的内存占用

（2）提出了线性瓶颈层（linear bottlenecks）

（1）使用 NetAdapt 算法获得卷积核和通道的最佳数量

（2）引入 SE通道注意力结构

（3）使用了一种新的激活函数 h-swish（x）

结合 CNN 和ViTs的优势，构建了一个轻量级、通用和移动友好的网络，轻量级、低延迟的移动视觉任务网络

（1）一种具有线性复杂度 O（k）的可分离自注意力方法

（2）使用元素操作来计算自注意力，使其成为资源受限设备的良好选择

（1）一种新的轻量级架构 EdgeNeXt，该架构在模型大小、参数和 MADD 方面都很有效，同时在移动视觉任务中

具有更高的准确性

（2）引入分割深度方向转置注意力（SDTA）编码器，该编码器可以有效地学习局部和全局表示，以解决 CNN中

感受野有限的问题，而不增加参数和 MADD操作的数量

表1智能手机视频实例分割代表性方法

Table 1 Representative methods for smartphone video instance segmentation

计算机工程与应用

www.ceaj.org

2024，60（7）

上，本文进一步分析总结了智能手机视频实例分割的代

表性方法。

如图 2所示，本文从识别模型的角度将智能手机端

视频实例分割方法分为三大类，基于 CNN、基于 vision

Transformer、CNN 与ViT融合的方法。

（1）MansNet

MansNet[29]是一种用于设计移动端 CNN 模型的自

动神经结构搜索方法。搜索框架由三个组件组成：基于

循环神经网络（RNN）控制器，获得模型精度的训练器，

以及用于测量延迟的基于手机的推理引擎。提出一种

新的分解分层搜索空间（factorized hierarchical search

space）以实现网络中的层分集。该方法在移动延迟限

制下，在 ImageNet 分类和 COCO 对象检测方面与其他

先进模型（MobileNetV1/MobileNetV2）相比具有更高的

精度和更低的推理延迟。

（2）ESPNets

ESPNetV1[30]是一种快速高效的卷积神经网络。高

效空间金字塔卷积模块（ESP module）是 ESPNet 的核心

组成部分，该模块将标准卷积分解成 point-wise 卷积和

空洞卷积金字塔（spatial pyramid of dilated convolutions），

这种分解方法能够大量减少 ESP 模块的参数量和内

存。据研究，ESP 模块比其他卷积分解方法（MobileNet/

ShuffleNet）更高效。

ESPNetV2[31]首先将计算量较大的空洞卷积替换为

深度可分离空洞卷积，使用层次特征融合来消除网格伪

影得到 EESP Unit。并将带有 stride 对应的空洞卷积替

换深度空洞卷积，将 add 特征融合方式替换为 concat 形

式得到 Strided EESP。研究显示，该网络在不同的任务

（如对象分类、检测、分割和语言建模）中提供了先进的

性能。但是该模型缺乏像素之间的全局交互，准确性有

待提高[32]。

（3）MobileNets

MobileNetV1[33]是一种用于移动视觉的高效卷积神经

网络。其核心是深度可分离卷积（depth-wise separable

convolution）。深度可分离卷积将标准卷积操作分解为

深度卷积（depth-wise convolution）和逐点卷积（point-wise

convolution）两个过程，大幅度降低参数量和计算量（约

1/8~1/9）。

MobileNetV2[34]相较于 MobileNetV1，提出具有类似

ResNet（Shortcut 结构）线性瓶颈的倒残差结构（inverted

residual with linear bottleneck），首先将输入通道的低

维表示（low-dimensional compressed representation）利

用投影卷积扩展到高维，使用轻量级深度卷积提取特

征；随后用 linear bottleneck 将特征投影回低维压缩表

示。此外，该模块可以显著减少推理期间所需内存占

用。据研究，该架构在 MobileNetV1 基础上保留了其简

单性，并提高了其准确性。

MobileNetV3[35] 针对不同场景资源消耗问题，分

别设计了 MobileNetV3-Large 和MobileNetV3- Small。

MobileNetV3 通过结合 NAS（network architecture search）

与NetAdapt 两种网络搜索方法搜索得到轻量级网络，非

线性激活函数 h-swish 改进网络的性能，高效语义解码

器LR-ASPP（lite reduced atrous spatial pyramid pooling）

用于语义分割任务。据研究表明[35]，该网络相较于

MobileNet前两个版本无论在计算延迟性上还是在计算

精度上都有较为明显的优势。

（4）MobileViTs

MobileViTv1[24] 是一种用于移动设备的轻量级通用

视觉 ViT。MobleViT 结合 CNNs 和ViT 的优势，其中

MobileViT Block 通过更少的参数对局部和全局信息进

行建模，使用 Transformer 将卷积中的局部处理方式替

换为全局处理。但 Transformer 中的多头自注意力 MHA

（multi-head self-attention）是 MobileViT的主要效率瓶颈[35]。

MobileViTv2[36] 在MobileViT 基础上提出一种具有

线性复杂度的可分离自注意力方法，用于解决 Transformer

中MHA 的瓶颈。为了进行有效推理，该方法将 MHA

中计算量大的操作（例如，批矩阵乘法）替换为元素操作

（例如，求和及乘法）来计算自注意力，使其成为资源受限

移动设备的优先选择。据研究改进后的模型MobileViTv2

在ImageNet 对象分类和 MS-COCO 目标检测以及语义

分割方面优于现有轻量级和基于 ViT的方法，达到 SOTA

性能。

（5）EdgeNeXt

EdgeNeXt[32]是一种用于移动端视觉应用的高效轻

量级混合架构。研究人员从减轻计算资源消耗以部署

于终端设备入手，引入可拆分深度转置注意力编码器

（split depth-wise transpose attention，SDTA），将输入的

张量拆分为多个通道组，并利用深度卷积网络和跨通

道的自注意来隐式地增加感受野和编码多尺度特征，

智

能

手

机

端

视

频

实

例

分

割

方

法

基于CNN

基于vision

Transformer

CNN与ViT融合

MansNet

ESPNet V1/V2

MobileNet V1~V3

MobileViT V1/V2

EdgeNeXt

BoxeR

SepViT

EdgeFormer

Mask SSD

SK-MobileNet

图2智能手机端视频实例分割方法分类

Fig.2 Classification of video instance segmentation

methods on smartphone

刘建华，等：手机室内场景要素实例化现实增强方法研究进展 61

计算机工程与应用

www.ceaj.org

Computer Engineering and Applications 计算机工程与应用2024，60（7）

提高资源利用率。通过对比 EdgeNeXt 的不同变体与

MobileViT 的对应模型，EdgeNeXt 每种模型大小提供了

更高的准确度和更低的延迟[32]。

1.1.3 问题分析

一般考察部署在移动端模型的性能、泛化能力、鲁

棒性、轻量级、低延迟是评价该方法优劣的基本指标。

目前，上述应用于智能移动设备（手机、PAD 等）边缘计

算的轻量级网络架构在面向场景要素的检测与识别领

域均取得了良好的效果。然而，深入到对场景要素识别

结果进行语义增强，例如对分割获取的实例对象进行文

本、语音、图片等方面的信息深度增强仍然存在问题。

本文提出手机场景要素实例化 AR方法这一技术创新问

题，建议通过实例化识别室内场景要素，并与建筑物地

图空间数据库要素进行匹配，将要素实例识别结果利用

AR 技术增强显示，提升行人进行室内定位导航等相关

位置服务的信息化水平。

1.2 增强现实（AR）

随着实景 3D[37]、5G[38]、AI[39]等技术的发展，利用增强

现实技术来提升室内定位导航的服务体验已经得到了

广泛应用。增强现实技术[40-43]

（augmented reality，AR）

是一种将虚拟信息与现实世界巧妙融合的技术。

目前的手机增强现实代表性方法大致分为商用 AR

SDK、基于 SDK 的AR 方法以及自研 AR 框架。增强现

实的商业化软件开发包主要有苹果的 ARKit[44-45] 和谷歌

的ARCore[46- 47]。但是商用 SDK 在不同的平台上可能需

要进行修改或适配，可移植性比较差。

基于商用AR SDK的AR方法日益增多。Varelas等[48]

基于 ARCore集成的室内定位系统为用户提供完全无标

记的体验，但仍存在位置初始化速度慢、易受环境亮度

影响等缺点。Lu等[49]设计的基于 ARCore 的增强现实校

园导航系统，可以为用户提供文本、语音、视频等增强现

实内容。Martin 等[50] 开发的室内导航应用程序基于

Google AR Core从手机摄像头获取实时信息，进行 AR

视图中的导航。为了应对增强现实系统中对象匹配准

确度、注册误差等问题，基于 SDK 融合其他技术的 AR

方法逐渐成为趋势，一方面体现在融合传感器进行 AR

导航。Huang 等[51] 开发的导航系统 ARBIN 采用 Google

ARCore 创建 AR 3D 模型，获取陀螺仪传感器读数，经

实地测验精度可以达到 3~5 m。但 ARBIN 对定位环境

周围信息的增强显示并不完全。Zhou 等[52] 提出一种融

合蓝牙（BLE）和行人航位推算（PDR）定位技术来进行

AR 摄像机跟踪的方法，将室内地图和语义信息渲染到

真实世界中。Mahapatra等[53]设计的行人增强现实导航

（PAReNt）iOS 应用程序将来自 ARKit、GNSS 和一个基

于蓝牙的定位系统的信息融合在一起以提高定位的准

确性。Sharin 等[54] 结合加速度计传感器、计步技术和

AR 技术设计基于移动设备的室内定位地图应用程序

GoMap。但是该类方法大多受限于传感器的准确性和

灵敏度。另一方面 AR结合深度学习模型实现更加精准

的检测识别。Li等[55]将AR Core 和SSD Mobilenet 模型

结合来改进 2D 对象检测，但是该方法在实例分割与增

强现实结合方面仍有一定的发展空间。Kaul 等[56] 将移

动场景检测框架 Apple ARKit 与MobileNetv2 和3D 空

间音频相结合，在场景中识别检测到的对象，并为视觉

障碍人群提供听觉场景描述。Chen 等[57] 将检测算法

（YOLO，SSD，mask RCNN 等）与 ARKit、SiriKit 结合进

行物体检测跟踪、语音和 AR 增强界面的设计，可提供语

音交互和 AR 可视化。但该类方法易受到场景深度、光

照等影响，从而影响对象匹配的准确度。同时，SLAM

技术也被整合到 AR 系统的开发中。Hsieh 等[58] 基于

ARKit 和SLAM 集成开发的现场可视化施工管理系统，

可以在物理施工现场实现 BIM 模型的精确对齐，但该方

法对特征点映射和扫描的依赖可能会受到环境条件的

影响，如光线不足或遮挡等。

自研 AR 框架大多基于 Unity3d、OpenCV、OpenGL

ES 进行开发。Verma 等[59]基于 Unity3d 开发的 AR 移动

应用程序实现在导航过程中呈现AR 导航箭头指引，但

该应用增强内容较少。Verykokou 等[60] 基于 Android

NDK 的OpenCV c++库和 OpenGL ES 2.0 搭建 MAR

系统，将基于特征的图像匹配和姿态估计与三维纹理模

型的快速渲染相结合，用于平面表面跟踪和 3D 纹理模

型的覆盖。Wang 等[61]基于 OpenCV、OpenGL 等开发的

ARIAS 交互式广告系统，通过手势操作来显示广告视

频，但是该系统仍存在一些局限性如不支持音频播放及

存在较难手势等。Tsuboki等[62]开发的基于深度估计和

运动信息动态替换背景的 AR 虚拟空间系统，但该系统

在进程分配方法方面仍需要进一步研究。

当前智能手机室内导航 AR 增强技术仅停留在视觉

可视化效果层面，并没有大众化地应用到对场景要素的

增强级别，究其原因主要是缺乏工程化的场景要素实例

化AR的解决方案和配套技术。

1.2.1 方法谱系

随着手机 AR 方法的提出与实现，增强现实技术越

来越广泛地应用于定位导航等位置服务领域，本小节进

一步分析并总结了相关智能手机 AR 代表性方法，如

表2所示。

如图 3所示，本文从商业产品的角度将智能手机

AR 代表性软件分为三大类，商用 AR SDK、基于 SDK

的手机 AR实现方法、自研手机 AR 框架。

（1）ARKit

ARKit 使用 VIO（visual inertial odometry，视觉惯

性里程计）来精确跟踪现实世界中的真实场景[63]。相比

其他设备平台，ARKit 中的 VIO 可以将传感器数据和

CoreMotion 的数据融合在一起，从而提供更为精确的

计算机工程与应用

www.ceaj.org

2024，60（7）

信息。ARKit 目前已经支持 Unity、Unreal Engine 和

Scenekit。但是该系统还不能很好地分离自我运动和物

体运动，导致在匹配过程中存在严重问题[64]，限制了其

在大规模行业中的适用。

（2）ARCore

ARCore 包括不同的 API，目前，可从 Android[Java]，

Web[JavaScript]，Unreal[C++]和Unity[C]等多个开发平

台构建基于 ARCore 的增强现实应用程序，为开发人员

提供了足够的灵活性和选项[47]。由于典型的基于相机

的问题（如光照条件差、几何形状不充分、环境结构复杂

性低、图像中的动态即运动模糊）导致环境地图缩放的

漂移以及真实和虚拟 SLAM 地图的发散[64]，使得其不能

很好地适用于大规模（工业）环境。

（3）Deep learning-based AR-HUD

Deep learning-based AR-HUD[2]主要由两个子系统

组成：基于深度学习的个人移动用户识别和基于 AR

HUD的驾驶信息可视化。第一个子系统是分别使用基

于深度学习的异常检测和身体方位估计来识别行人和

个人移动用户的位置和移动方向。第二个子系统是在

AR-HDD 中可视化驾驶辅助信息，行人和个人移动用户

可以在其中突出显示和集中注意力。但是该系统提供

的信息过多可能会阻碍驾驶，且在拥挤区域时黄色和红

色指示灯可能会造成干扰[2]。

（4）CollabAR

CollabAR[65]是一个边缘辅助系统，为移动 AR 提供

抗扭曲的图像识别与难以察觉的系统延迟。CollabAR

包含失真容忍图像识别器、相关图像查找模块以及辅助

系统化多视图集成器三个组件。它们部署在边缘服务

器上，以减轻识别精度和端到端系统延迟之间的权衡。

同时客户端还运行基于锚点的姿态估计模块，该模块利

用谷歌 ARCore 提供的云锚点来跟踪移动设备的位置和

方向。由于在人工和真实世界的失真之间存在着不可

忽视的差距，可能导致识别性能不佳，同时由于底层硬

件和信号处理管道的异质性可能导致不同设备拍摄的

图像之间的特征不匹配等问题[65]。

（5）EdgeXAR

EdgeXAR[66]是一个移动 AR 框架，它利用边缘计算

的优势，通过任务加载来支持灵活的基于相机的 AR 交

互。并设计了一种用于移动设备的混合跟踪系统，它提

供了 6个自由度的轻量级跟踪，并从用户的感知中隐藏

了加载延迟。采用一种实用、可靠的通信机制，实现关

键信息的快速响应和一致性。同时还提出了一个多目

标图像检索管道，在云和边缘服务器上执行快速和准确

的图像识别任务。但是该系统存在不抗遮挡的问题，在

基于光流的跟踪方面仍待改进，且图像分割方法在复杂

背景下的性能不太稳定[66]。

方法

ARKit 6

ARCore1.7

CollabAR

EdgeXAR

Deep learning-

based AR-HUD

时间

2022

2020

2021

2023

作者

苹果

谷歌

Guohao Lan，等

Wenxiao Zhang，等

Dong Hyeon Roh，等

优势

只适用于 iOS 设备，具有运动追踪、光

线评估、场景理解、渲染等关键功能

支持 Android 设备上的 AR 应用开发，

具有运动跟踪、环境理解、光线评估和

图片识别等关键功能

（1）提出容忍失真的图像识别器来解决

图像失真引起的域自适应问题

（2）协作多视图图像识别

（1）用于移动设备的混合跟踪系统

（2）多目标图像识别管道

（3）细粒度的边缘卸载管道

将深度学习和 AR 相结合，在AR-HUD

上提供驾驶员辅助信息

局限性

不能很好地分离自我运动和物体运动，

导致匹配性能不稳定

（1）环境地图缩放漂移

（2）真实和虚拟 SLAM地图发散

（1）“野外”AR场景识别性能不佳

（2）设备异构性可能导致不同设备捕获

的图像之间特征不匹配

（1）极端用例下延迟补偿技术不足

（2）基于光流的跟踪不耐遮挡

（3）复杂背景下图像分割性能不稳定

过多的信息可能会阻碍驾驶员的驾驶

能力

适用场景

室内、室外等

多种场景

室内、室外等

多种场景

多失真场景

电影院等相关

场景

汽车驾驶场景

表2智能手机 AR代表性实现方法

Table 2 Representative implementation methods for smartphone AR

手机AR

代表性方法

商用AR SDK

基于SDK的

手机AR方法

自研手机AR

框架

ARKit

ARCore

Wikitude

CollabAR

EdgeXAR

ARBIN

ARCore-Based

CNS

MIRAR

ModAR

ARIANNA+

ANSVIP

图3智能手机 AR代表性方法分类图

Fig.3 Classification of representative methods

for smartphone AR

刘建华，等：手机室内场景要素实例化现实增强方法研究进展 63

计算机工程与应用

www.ceaj.org

Computer Engineering and Applications 计算机工程与应用2024，60（7）

1.2.2 问题分析

目前，有关室内导航 AR 增强的方法设计与系统研

发仍停留在视觉效果层面，如基于 AR 的地标、箭头、要

素的增强显示，尚没有彻底解决对室内场景中的实例化

对象进行多源信息增强的问题。同时，对象匹配准确度

及注册误差也是限制当前增强现实系统应用的主要问

题。本文提出手机实例化 AR方法将有效结合实例分割

与AR技术，首先实例化识别室内场景要素，继而将要素

实例识别与地图匹配结果利用AR 技术进行增强，提升

用户室内定位导航等位置服务应用水平。

1.3 地图匹配

1.3.1 建筑物地图（building map）

建筑物是室内位置服务应用的主要场景，建筑物地

图是室内位置服务应用的基础和信息载体[67]。近年来

各种新型数字地图不断涌现，其中实景地图和全息位置

地图由于良好的视觉效果与多维数据组织能力，被研究

者广泛讨论。李德仁院士[68]提出可量测实景影像，并将

其作为新的数字化测绘产品与 4D 产品集成，从而推进

按需测量的空间信息服务。周成虎院士等[69]认为全息

位置地图是以位置为基础，全面反映位置本身及其与位

置相关的各种特征、事件或事物的数字地图。闾国年

等[70]提出了涵盖“空间定位”“语义描述”“属性特征”“几

何形态”“演化过程”“要素相互关系”的地理信息六要素

表达模型。Zhu 等[71]开发了一种模型驱动的方法，使工

业基础类（industry foundation classes，IFC）数据完全转

换为标记的属性图（IFC-graph），支持有效和高效的建

筑信息访问和查询，试图解决有关建筑信息提取的挑战。

建筑物地图为众多研究提供了数据和信息基础，是

面向智慧城市空间要素表达的重要方式，在城市建设、

智能管理、应急管理与室内位置服务等方面都起到至关

重要的作用。IFC 是建筑行业 BIM 领域中最常讨论的

模型数据规范，包含了大量构建建筑物地图中所需的几

何语义信息，是生成建筑物地图的理想数据来源[72]。

GIS 领域最具代表性的数据规范为 CityGML 与Indoor-

GML。CityGML 主要反映城市三维对象的通用语义信

息[73]，并定义了五个不同的“详细级别（LODS）”。近年

来CityGML 规范逐渐由室外大型建筑场景向室内延

伸，LODS分类在室内建筑环境中得到进一步扩展和细

化[74]。IndoorGML 是面向室内定位导航应用的数据规

范，专注于表示 2D 和3D室内环境的数据表示和交换[75]，

尤其是空间拓扑关系方面。罗竟妍等[76]提出一种由实

体模型和网络模型组成的建筑物地图混合模型

BIMPN。建筑物地图实体模型，通过多边形的形式来

呈现建筑物，可以直观地对建筑物的空间信息进行很好

的表达，还可以提高室内基于位置服务应用的可视化表

达，为用户提供更好的三维建筑物室内场景漫游体

验[77]。建筑物地图网络模型，以图结构为基础对路网等

要素及其空间拓扑关系的进一步抽象表达[78]。BIMPN

混合模型将两种模型交互连接在一起不仅集成了两种

模型各自的特点，而且可以为室内定位导航应用提供更

便捷的几何和语义约束信息。此外，建筑物地图不仅可

以对室内地图和定位结果进行可视化表达，还可以辅助

进行地图匹配、手机室内场景要素实例化现实增强和导

航路径规划等相关室内定位服务应用。

一般室内场景要素实例化 AR增强的空间视觉定位

约束信息来源于建筑物地图定位锚点 MLA[79]，主要包

括门、门牌、消防柜、火警报警器、安全出口、摄像头、

WLAN、电箱、电梯、展板、灯等 11 类普适性要素。本研

究在建筑物地图[80]的基础上，进一步通过提出的地图定

位锚点 MLA 对室内定位导航位置服务中所需的几何与

语义信息进行组织表达，以辅助室内定位结果的约束匹

配及室内导航路径的距离计算。同时深入分析不同场

景下的实例要素信息的分类以及AR 增强的数据内容，

将AR 增强数据以文字、图片、语音、视频等形式进行存

储，以进行 AR 建筑物地图的构建。AR 建筑物地图构建

方法如图 4所示。

1.3.2 地图匹配

地图匹配是将定位结果与地图上相应规划路径匹

AR建筑物地图

实体模型

网络模型

直

接

链

接

间

接

链

接

多源数据采集BIM模型

手机近景摄影测量

提取

信息

无人机贴近

摄影测量

地图定位锚点MLA

文本、语音、视频、

图片等增强信息

几何信息

纹理信息

语义属性

节点信息

拓扑关系

增强信息

建筑物

地图引擎可视化

图4 AR建筑物地图构建方法

Fig.4 AR building map construction method

计算机工程与应用

www.ceaj.org

2024，60（7）

配的过程，也是室内导航应用的前提[81]。现有的地图匹

配算法大致可以分为两类，即传统基于模型的方法和基

于学习的方法[82]。

传统基于模型的方法主要有几何分析[83]、卡尔曼滤

波[84]、隐马尔可夫模型（hidden Markov model，HMM）

[85]

等。针对密集路网中基于隐马尔可夫模型的地图匹配

方法，Cui 等[86] 提出了一种基于分段的隐马尔可夫模型

（SHMM），该方法将 GNSS 轨迹划分为若干个子轨迹，

然后搜索每个 GNSS 子轨迹的候选道路段序列，最后采

用隐马尔可夫模型对 GNSS 子轨迹与路段序列进行匹

配，识别出概率最大的路段序列，该方法在特性方面需

考虑更多的因素。Harder 等[87] 开发了一种实时地图匹

配方法，该方法使用回溯粒子滤波器，降低了空间查询

的复杂性，并在使用不同类型的空间约束时提供了灵活

性，并开发了一个使用楼层间过渡区域的基于地图的优

化。Guo 等[88] 提出了一种使用智能手机上的虚拟无线

设备进行行人可及性和楼层地图约束的混合室内实时

定位方法。

随着深度学习的发展，现有研究逐渐从数据驱动角

度研究地图匹配问题。基于 seq2seq 学习框架，Feng

等[89]提出了用于稀疏和噪声轨迹匹配的 DeepMM算法，

设计了两种轨迹增强方法来丰富数据，提高模型的地图

匹配性能。Hong 等[90] 利用一个多头自注意力（MHSA）

神经网络，该模型集成了位置特征、时间特征和功能性

土地使用背景，用于下一个位置预测。Hong 等[91]提出

了一个基于变压器解码器的神经网络来预测一个人的

下一个访问地点，其依据是历史地点、时间和旅行模式，

这些都是以往工作中经常忽略的行为维度。Jiang 等[82]

提出了一个基于深度学习的地图匹配模型 L2MM，它使

用多个深度模型来学习从轨迹到对应路径的映射函数，

但该方法在映射路径的拓扑连续性方面仍存在一些问

题。Li等[92]通过深度学习和基于关键点的几何排列，提

出了一个针对黑暗环境的从粗到细的快速定位框架。

1.3.3 问题分析

建筑物地图是室内位置服务应用的基础，增强的信

息（文本、语音、图像、视频等形式）一般按照场景类型

（交通、医院、展馆、商超等）进行必要的重构。目前，依

托激光雷达和室内实景三维建模技术的建筑物地图数

据生产成本较高，如何结合建筑行业的信息化过程来提

高建筑物地图的生产水平最为迫切。此外，建筑物地图

分层分类专题要素的组织方式，对地图匹配以及面向用

户兴趣目标的路径规划沿途资源的高效调度加载和个

性化配置产生直接影响，也是当前手机室内场景实例化

现实增强走向实用化的一项挑战。

2结论与展望

目前，实例分割和 AR 方法受关注的程度与日俱增，

现有方法也取得了良好的识别和增强效果。本文综述

了智能手机端场景要素实例分割和增强现实方法，分析

了智能手机视频实例分割和增强现实的研究进展，并总

结了相关方法的特点与现存问题，提出手机场景要素实

例化 AR 研究思路，通过实例化识别室内场景要素，并与

建筑物地图进行匹配，将要素实例识别结果利用 AR 技

术增强显示，提升用户室内定位导航等位置服务的信息

化水平。

综上所述，实例分割与 AR 技术正处于快速发展时

期，实例化 AR 方法在室内导航领域具有广阔的应用前

景。实例化 AR 属于视觉范畴，主要以手机内置相机为

传感器，实例识别精度和增强效果易受到场景光线及手

机性能、相机精度等因素的影响，利用具有普适性的室

内要素样本数据来训练并优化模型，可以提高实例化

AR技术在各类场景中的应用灵活性。可以预见随着算

法和硬件设备的不断优化，将提高实例化 AR 技术的处

理效率。对于场景识别中存在的错识别、漏识别等问

题，后续工作考虑加入建筑物路网轨迹约束来提高室内

场景识别的鲁棒性。目前视觉识别定位方法仍需构建

大型图像样本库，此外，由于各类室内定位技术的局限

性，可以尝试融合手机内置多源传感器，设计高效的协

同调度方案，建立不同室内场景与手机多源传感器特征

模式彼此之间的多层次关联耦合，进一步提高其定位应

用的普适性。未来，智能手机实例化 AR 方法有望大众

化地应用到对室内场景要素实例的增强级别，实现视频

场景要素实例化的解决方案和配套技术的工程化应用。

参考文献：

[1] 徐舒婷,郑先伟,谢潇,等.面向虚实融合的单体建筑物实

时识别与定位[J]. 武汉大学学报 (信息科学版), 2023, 48(4):

542-549.

XU S T, ZHENG X W, XIE X, et al. Real-time building instance

recognition for vector map and real scene fusion[J]. Geomatics

and Information Science of Wuhan University, 2023, 48(4):

542-549.

[2] ROH D, LEE J. Augmented reality-based navigation using deep

learning-based pedestrian and personal mobility user recognition—a

comparative evaluation for driving assistance[J]. IEEE Access,

2023, 11: 62200-62211.

[3] WANG Z. An AR map virtual-real fusion method based on

element recognition[J]. ISPRS International Journal of Geo-

Information, 2023, 12: 126.

[4] 陈锐志,王磊,李德仁,等.导航与遥感技术融合综述[J]. 测

绘学报, 2019, 48(12): 1507-1522.

CHEN R Z, WANG L, LI D R, et al. A survey on the fusion

of the navigation and the remote sensing techniques[J]. Acta

刘建华，等：手机室内场景要素实例化现实增强方法研究进展 65

计算机工程与应用

www.ceaj.org

Computer Engineering and Applications 计算机工程与应用2024，60（7）

Geodaetica et Cartographica Sinica, 2019, 48(12): 1507-1522.

[5] 陈锐志,钱隆,牛晓光,等.基于数据与模型双驱动的音

频/惯性传感器耦合定位方法[J]. 测绘学报, 2022, 51(7):

1160-1171.

CHEN R Z, QIAN L, NIU X G, et al. Fusing acoustic ranges

and inertial sensors using a data and model dual-driven approach

[J]. Acta Geodaetica et Cartographica Sinica, 2022, 51(7):

1160-1171.

[6] 高翔,安辉,陈为,等.移动增强现实可视化综述 [J]. 计算

机辅助设计与图形学学报, 2018, 30(1): 1-8.

GAO X, AN H, CHEN W, et al. A survey on mobile aug-

mented reality visualization[J]. Journal of Computer-Aided

Design & Computer Graphics, 2018, 30(1): 1-8.

[7] CHEN L Y, LI S B, BAI Q, et al. Review of image classifica-

tion algorithms based on convolutional neural networks[J].

Remote Sensing, 2021, 13(22): 4712.

[8] XU Y S, ZHANG H Z. Convergence of deep convolutional

neural networks[J]. Neural Networks, 2022, 153: 553-563.

[9] BOLYA D, ZHOU C, XIAO F, et al. YOLACT++: better real-

time instance segmentation[J]. IEEE Transactions on Pattern

Analysis and Machine Intelligence, 2022, 44(2): 1108-1121.

[10] WU J, YARRAM S, LIANG H, et al. Efficient video instance

segmentation via tracklet query and proposal[C]//Proceed-

ings of the 2022 IEEE/CVF Conference on Computer Vision

and Pattern Recognition (CVPR), 2022: 949-958.

[11] LI D, LI R, WANG L, et al. You only infer once: cross-modal

meta- transfer for referring video object segmentation[C]//

Proceedings of the AAAI Conference on Artificial Intelli-

gence, 2022: 1297-1305.

[12] KINI J, SHAH M. Tag- based attention guided bottom-up

approach for video instance segmentation[C]//Proceedings

of the 2022 26th International Conference on Pattern Recog-

nition (ICPR), 2022: 3536-3542.

[13] ZHU F, YANG Z, YU X, et al. Instance as identity: a generic

online paradigm for video instance segmentation[J]. arXiv:

2208.03079, 2022.

[14] GANESH P, CHEN Y, YANG Y, et al. YOLO-ReT: towards

high accuracy real-time object detection on edge GPUs[C]//

Proceedings of the 2022 IEEE/CVF Winter Conference on

Applications of Computer Vision (WACV), 2021: 1311-1321.

[15] SHIN A, ISHII M, NARIHIRA T. Perspectives and prospects

on Transformer architecture for cross-modal tasks with lan-

guage and vision[J]. International Journal of Computer Vision,

2022, 130(2): 435-454.

[16] YAO H Y, WAN W G, LI X. End-to-end pedestrian trajectory

forecasting with Transformer network[J]. ISPRS International

Journal of Geo-Information, 2022, 11(1): 44.

[17] 田永林,王雨桐,王建功,等.视觉 Transformer研究的关键

问题:现状及展望[J]. 自动化学报, 2022, 48(4): 957-979.

TIAN Y L, WANG Y T, WANG J G, et al. Key problems

and progress of vision Transformers: the state of the art and

prospects[J]. Acta Automatica Sinica, 2022, 48(4): 957-979.

[18] VASWANI A, SHAZEER N, PARMAR N, et al. Attention

is all you need[C]//Proceedings of the 31st International

Conference on Neural Information Processing Systems, Long

Beach, California, USA, 2017: 6000-6010.

[19] WANG Y, XU Z, WANG X, et al. End-to-end video instance

segmentation with Transformers[C]//Proceedings of the 2021

IEEE/CVF Conference on Computer Vision and Pattern

Recognition (CVPR), 2020: 8737-8746.

[20] ZHOU D, YU Z, XIE E, et al. Understanding the robustness

in vision Transformers[J]. arXiv:2204.12451, 2022.

[21] WU J, JIANG Y, SUN P, et al. Language as queries for refer-

ring video object segmentation[C]//Proceedings of the 2022

IEEE/CVF Conference on Computer Vision and Pattern

Recognition (CVPR), 2022: 4964-4974.

[22] JIN P, MOU L, XIA G S, et al. Anomaly detection in aerial

videos with transformers[J]. IEEE Transactions on Geosci-

ence and Remote Sensing, 2022, 60: 1-13.

[23] LI N, TU W, AI H. A sparse feature matching model using a

Transformer towards large-view indoor visual localization[J].

Wireless Communications and Mobile Computing, 2022:

1243041.

[24] MEHTAS, RASTEGARI M. MobileViT: light-weight, general-

purpose, and mobile-friendly vision Transformer[J]. arXiv:

2110.02178, 2021.

[25] HEO B, YUN S, HAN D, et al. Rethinking spatial dimen-

sions of vision Transformers[C]//Proceedings of the 2021

IEEE/CVF International Conference on Computer Vision

(ICCV), 2021: 11916-11925.

[26] PAN J, BULATA, TAN F, et al. EdgeViTs: competing light-

weight CNNs on mobile devices with vision Transformers

[C]//Proceedings of the European Conference on Computer

Vision, 2022.

[27] CHEN Y, DAI X, CHEN D, et al. Mobile-former: bridging

MobileNet and Transformer[C]//Proceedings of the 2022

IEEE/CVF Conference on Computer Vision and Pattern

Recognition (CVPR), 2022: 5260-5269.

[28] YANG C, QIAO S, YU Q, et al. MOAT: alternating mobile

convolution and attention brings strong vision models[C]//

Proceedings of the International Conference on Learning

Representations, 2023.

[29] TAN M, CHEN B, PANG R, et al. MnasNet: platform-aware

neural architecture search for mobile[C]//Proceedings of the

2019 IEEE/CVF Conference on Computer Vision and Pat-

tern Recognition (CVPR), 2018: 2815-2823.

计算机工程与应用

www.ceaj.org

2024，60（7）

[30] MEHTA S, RASTEGARI M, CASPI A, et al. ESPNet: effi-

cient spatial pyramid of dilated convolutions for semantic

segmentation[C]//Proceedings of the European Conference on

ComputerVision (ECCV 2018), Cham, 2018: 561-580.

[31] MEHTA S, RASTEGARI M, SHAPIRO L G, et al. ESPNetv2:

a light-weight, power efficient, and general purpose convo-

lutional neural network[C]//Proceedings of the 2019 IEEE/

CVF Conference on Computer Vision and Pattern Recogni-

tion (CVPR), 2018: 9182-9192.

[32] MAAZ M, SHAKER A M, CHOLAKKAL H, et al. EdgeNeXt:

efficiently amalgamated CNN-Transformer architecture for

mobile vision applications[C]//Proceedings of the European

Conference on Computer Vision, 2022.

[33] HOWARD A G, ZHU M, CHEN B, et al. MobileNets: effi-

cient convolutional neural networks for mobile vision appli-

cations[J]. arXiv:1704.04861, 2017.

[34] SANDLER M, HOWARD A G, ZHU M, et al. MobileNetV2:

inverted residuals and linear bottlenecks[C]//Proceedings of

the 2018 IEEE/CVF Conference on Computer Vision and

Pattern Recognition, 2018: 4510-4520.

[35] HOWARD A G, SANDLER M, CHU G, et al. Searching for

MobileNetV3[C]//Proceedings of the 2019 IEEE/CVF Inter-

national Conference on Computer Vision (ICCV), 2019:

1314-1324.

[36] MEHTA S, RASTEGARI M. Separable self- attention for

mobile vision Transformers[J]. arXiv:2206.02680, 2022.

[37] MA L. Application of AR in 3D model[C]//Proceedings of the

2021 2nd International Conference on Control, Robotics

and Intelligent System, Qingdao, China, 2021: 261-265.

[38] ZHOU Y, SUN B, QI Y, et al. Mobile AR/VR in 5G based

on convergence of communication and computing[J]. Tele-

communications Science, 2018, 34(8): 19-33.

[39] LI R P, ZHAO Z F, ZHOU X, et al. Intelligent 5G: when

cellular networks meet artificial intelligence[J]. IEEE Wire-

less Communications, 2017, 24(5): 175-183.

[40] GHASEMI Y, JEONG H, CHOI S H, et al. Deep learning-

based object detection in augmented reality: a systematic

review[J]. Computers in Industry, 2022, 139: 103661.

[41] HWANG S, LEE J, KANG S. Enabling product recognition

and tracking based on text detection for mobile augmented

reality[J]. IEEE Access, 2022, 10: 98769-98782.

[42] ZHOU B, GUVEN S. Fine-grained visual recognition in mobile

augmented reality for technical support[J]. IEEE Transac-

tions on Visualization and Computer Graphics, 2020, 26(12):

3514-3523.

[43] 王巍,王志强,赵继军,等.基于移动平台的增强现实研究

[J]. 计算机科学, 2015, 42(Z11): 510-519.

WANG W, WANG Z Q, ZHAO J J, et al. Research of aug-

mented reality based on mobile platform[J]. Computer Sci-

ence, 2015, 42(Z11): 510-519.

[44] LE H, NGUYEN M, YAN W Q, et al. Augmented reality

and machine learning incorporation using YOLOv3 and

ARKit[J]. Applied Sciences-Basel, 2021, 11(13): 6006.

[45] LO VALVO A, CROCE D, GARLISI D, et al. A navigation

and augmented reality system for visually impaired people

[J]. Sensors, 2021, 21(9): 3061.

[46] REAL S, ARAUJO A. VES: a mixed-reality system to assist

multisensory spatial perception and cognition for blind and

visually impaired people[J]. Applied Sciences- Basel, 2020,

10(2): 523.

[47] ZHANG X C, YAO X Y, ZHU Y, et al. An ARCore based

user centric assistive navigation system for visually impaired

people[J]. Applied Sciences-Basel, 2019, 9(5): 989.

[48] VARELAS T, PENTEFOUNDAS A, GEORGIADIS C, et al.

An AR indoor positioning system based on anchors[J].

MATTER: International Journal of Science and Technology,

2020, 6: 43-57.

[49] LU F, ZHOU H, GUO L, et al. An ARCore-based augmented

reality campus navigation system[J]. Applied Sciences, 2021,

11(16): 7515.

[50] MARTIN A, CHERIYAN J, GANESH J J, et al. Indoor nav-

igation using augmented reality[J]. EAI Endorsed Transac-

tions on Creative Technologies, 2021: 168718.

[51] HUANG B C, HSU J, CHU E T, et al. ARBIN: augmented

reality based indoor navigation system[J]. Sensors (Basel,

Switzerland), 2020, 20(20): 5890.

[52] ZHOU B, GU Z, MA W, et al. Integrated BLE and PDR

indoor localization for geo- visualization mobile augmented

reality[C]//Proceedings of the 2020 16th International Con-

ference on Control, Automation, Robotics and Vision (ICARCV),

2020: 1347-1353.

[53] MAHAPATRA T, TSIAMITROS N, ROHR A, et al. Pedes-

trian augmented reality navigator[J]. Sensors, 2023, 23: 1816.

[54] SHARIN N A, NOROWI N, ABDULLAH L, et al. GoMap:

combining step counting technique with augmented reality

for a mobile- based indoor map locator[J]. Indonesian Jour-

nal of Electrical Engineering and Computer Science, 2023,

29: 1792.

[55] LI X, TIAN Y, ZHANG F, et al. Object detection in the con-

text of mobile augmented reality[C]//Proceedings of the

2020 IEEE International Symposium on Mixed and Aug-

mented Reality (ISMAR), 2020: 156-163.

[56] KAUL O, BEHRENS K, ROHS M. Mobile recognition and

tracking of objects in the environment through augmented

reality and 3D audio cues for people with visual impair-

ments[C]//Proceedings of the CHI Conference on Human

刘建华，等：手机室内场景要素实例化现实增强方法研究进展 67

计算机工程与应用

www.ceaj.org

Computer Engineering and Applications 计算机工程与应用2024，60（7）

Factors in Computing Systems, 2021: 1-7.

[57] CHEN J, ZHU Z. Real- time 3D object detection, recogni-

tion and presentation using a mobile device for assistive

navigation[J]. SN Computer Science, 2023, 4: 543.

[58] HSIEH C C, CHEN H M, WANG S K. On-site visual con-

struction management system based on the integration of

SLAM-based AR and BIM on a handheld device[J]. KSCE

Journal of Civil Engineering, 2023, 27: 4688-4707.

[59] VERMA P, AGRAWAL K, SARASVATHI V. Indoor naviga-

tion using augmented reality[C]//Proceedings of the 2020

4th International Conference on Virtual and Augmented

Reality Simulations, 2020.

[60] VERYKOKOU S, BOUTSIA M, IOANNIDIS C. Mobile aug-

mented reality for low-end devices based on planar surface

recognition and optimized vertex data rendering[J]. Applied

Sciences, 2021, 11(18): 8750.

[61] WANG Q, XIE Z. ARIAS: an AR-based interactive advertis-

ing system[J]. PLoS One, 2023, 18: e0285838.

[62] TSUBOKI Y, KAWAKAMI T, MATSUMOTO S, et al.A real-

time background replacement system based on estimated

depth for AR applications[J]. Journal of Information Pro-

cessing, 2023, 31: 758-765.

[63] BARUCH G, CHEN Z, DEHGHAN A, et al. ARKitScenes-

a diverse real-world dataset for 3D indoor scene understand-

ing using mobile RGB-D data [J]. arXiv:2111.08897, 2021.

[64] FEIGL T, PORADA A, STEINER S, et al. Localization limi-

tations of ARCore, ARKit, and hololens in dynamic large-

scale industry environments[C]//Proceedings of the 15th

International Conference on Computer Graphics Theory and

Applications, 2020: 307-318.

[65] LIU Z, LAN G, STOJKOVIC J, et al. CollabAR: edge-assisted

collaborative image recognition for mobile augmented reality

[C]//Proceedings of the 2020 19th ACM/IEEE International

Conference on Information Processing in Sensor Networks

(IPSN), 2020: 301-312.

[66] ZHANG W, LIN S, BIJARBOONEH F H, et al. EdgeXAR:

a 6-DoF camera multi-target interaction framework for MAR

with user-friendly latency compensation[C]//Proceedings of

the ACM on Human-Computer Interaction, 2021: 1-24.

[67] XIAO Y, AI T, YANG M, et al. A multi-scale representation

of point-of- interest (POI) features in indoor map visualiza-

tion[J]. International Journal of Geo- Information, 2020, 9

(4): 239.

[68] 李德仁 .论可量测实景影像的概念与应用

—

—从 4D 产品

到5D产品 [J]. 测绘科学, 2007, 32(4): 5-7.

LI D R. On concept and application of digital measurable

images-from 4D production to 5D production[J]. Science of

Surveying and Mapping, 2007, 32(4): 5-7.

[69] 朱欣焰,周成虎,呙维,等.全息位置地图概念内涵及其关

键技术初探[J]. 武汉大学学报(信息科学版), 2015, 40(3):

285-295.

ZHU X Y, ZHOU C H, GUO W, et al. Preliminary study on

conception and key technologies of the location-based pan-

information map[J]. Geomatics and Information Science of

Wuhan University, 2015, 40(3): 285-295.

[70] 闾国年,袁林旺,俞肇元.地理学视角下测绘地理信息再

透视[J]. 测绘学报, 2017, 46(10): 1549-1556.

LV G N, YUAN L W, YU Z Y. Surveying and mapping geo-

graphical information from the perspective of geography[J].

Acta Geodaetica et Cartographica Sinica, 2017, 46(10):

1549-1556.

[71] ZHU J, WU P, LEI X. IFC-graph for facilitating building infor-

mation access and query[J]. Automation in Construction,

2023, 148: 104778.

[72] LIU L, LI B, ZLATANOVA S, et al. Indoor navigation sup-

ported by the industry foundation classes (IFC): a survey

[J]. Automation in Construction, 2021, 121: 103436.

[73] WEI Z, LI X, HE Z. Semantic urban vegetation modelling

based on an extended CityGML description[C]//Proceedings

of the 2022 Digital Landscape Architecture Conference,

2022.

[74] TANG L, YING S, LI L, et al. An application-driven LOD

modeling paradigm for 3D building models[J]. ISPRS Journal

of Photogrammetry and Remote Sensing, 2020, 161: 194-207.

[75] DIAKITÉ A, DÍAZ- VILARIÑO L, BILJECKI F, et al.

IFC2INDOORGML: an open-source tool for generating

IndoorGML from IFC[J]. The International Archives of the

Photogrammetry, Remote Sensing and Spatial Information

Sciences, 2022: 295-301.

[76] 罗竟妍.建筑物实景全息地图模型构建方法研究[D]. 北

京:北京建筑大学, 2021.

LUO J Y. Research on the construction method of realistic

holographic map model of building[D]. Beijing: Beijing Uni-

versity of Civil Engineering and Architecture, 2021.

[77] WANG Q, YE L, YUN L, et al. Pedestrian walking distance

estimation based on smartphone mode recognition[J]. Remote

Sensing, 2019, 11: 1140.

[78] WU Y, CHEN P, GU F, et al. HTrack : an efficient heading-

aided map matching for indoor localization and tracking[J].

IEEE Sensors Journal, 2019, 19(8): 3100-3110.

[79] JIANHUAL, GUOQIANG F, JINGYAN L, et al. Mobile phone

indoor scene features recognition localization method based

on semantic constraint of building map location anchor[J].

Open Geosciences, 2022, 14: 1268-1289.

[80] 刘建华.手机室内导航与位置服务[M]. ResearchGate,

2022: 69-79.

计算机工程与应用

www.ceaj.org

2024，60（7）

LIU J H. Mobile indoor navigation and location services

[M]. ResearchGate, 2022: 69-79.

[81] 于娟,杨琼,鲁剑锋,等.高级地图匹配算法:研究现状和

趋势[J]. 电子学报, 2021, 49(9): 1818-1829.

YU J, YANG Q, LU J F, et al. Advanced map matching algo-

rithms: a survey and trends[J]. Acta Electronica Sinica, 2021,

49(9): 1818-1829.

[82] JIANG L, CHEN C, CHEN C. L2MM: learning to map match-

ing with deep models for low- quality GPS trajectory data

[J]. ACM Transactions on Knowledge Discovery from Data,

2022, 17: 1-25.

[83] 郑诗晨,盛业华,吕海洋.基于粒子滤波的行车轨迹路网

匹配方法[J]. 地球信息科学学报, 2020, 22(11): 2109-2117.

ZHENG S C, SHENG Y H, LV H Y. Vehicle trajectory-map

matching based on particle filter[J]. Journal of Geo-information

Science, 2020, 22(11): 2109-2117.

[84] OBRADOVIC D, LENZ H, SCHUPFNER M. Fusion of map

and sensor data in a modern car navigation system[J]. Jour-

nal of VLSI Signal Processing Systems for Signal Image &

Video Technology, 2006, 45(1/2): 111-122.

[85] 毛江云,吴昊,孙未未 .路网空间下基于马尔可夫决策过

程的异常车辆轨迹检测算法[J]. 计算机学报, 2018, 41(8):

1928-1942.

MAO J Y, WU H, SUN W W. Vehicle trajectory anomaly

detection in road network via Markov decision process[J].

Chinese Journal of Computers, 2018, 41(8): 1928-1942.

[86] CUI G, BIAN W, WANG X. Hidden Markov map matching

based on trajectory segmentation with heading homogeneity

[J]. GeoInformatica, 2021, 25(1): 179-206.

[87] HARDER D, SHOUSHTARI H, STERNBERG H. Real-time

map matching with a backtracking particle filter using geo-

spatial analysis[J]. Sensors (Basel, Switzerland), 2022, 22

(9): 3289.

[88] GUO G, YAN K, LIU Z, et al. Virtual wireless device-

constrained robust extended Kalman filters for smartphone

positioning in indoor corridor environment[J]. IEEE Sen-

sors Journal, 2023, 23(3): 2815-2822.

[89] FENG J, LI Y, ZHAO K, et al. DeepMM: deep learning based

map matching with data augmentation[J]. IEEE Transac-

tions on Mobile Computing, 2022, 21(7): 2372-2384.

[90] HONG Y, ZHANG Y, SCHINDLER K, et al. Context-aware

multi-head self- attentional neural network model for next

location prediction [J]. arXiv:2212.01953, 2022.

[91] HONG Y, MARTIN H, RAUBAL M. How do you go where?

Improving next location prediction by learning travel mode

information using transformers[C]//Proceedings of the 30th

International Conference on Advances in Geographic Infor-

mation Systems, 2022: 1-10.

[92] LI Q, CAO R, ZHU J, et al. Learn then match: a fast coarse-

to-fine depth image-based indoor localization framework for

dark environments via deep learning and keypoint-based

geometry alignment[J]. ISPRS Journal of Photogrammetry

and Remote Sensing, 2023, 195: 169-177.

刘建华，等：手机室内场景要素实例化现实增强方法研究进展 69

ResearchGate has not been able to resolve any citations for this publication.

ARIAS: An AR-based interactive advertising system

Article

Full-text available

Sep 2023
PLOS ONE

In this paper, we present an interactive advertising system based on augmented reality(AR) called ARIAS, which is manipulated with gestures for displaying advertising videos. Two-dimensional markers are defined in the system. The system captures the frame data through the camera in real time, uses OpenCV library to identify the predefined markers, and calculates the pose of markers captured by the camera. With OpenGL library, a virtual cubic model is created at the position of the marker, and thus videos or images are displayed on the six faces of the cube. The virtual cube, together with the original frame data collected by the camera, is displayed in the interactive window to achieve the augmented reality effect. Customers are accessible to various advertising content by observing the marker from different positions. The system, meanwhile, supports gesture operation in order to make the customers pay attention to the content they are interested in with one hand. The MediaPipe Hand framework is used to extract the landmarks of hands, based on which, a series of gestures are designed for interactive operation. The efficiency and accuracy of the system are tested and analyzed with the result, indicating that the system has high reliability and good interactiveness. This system is open at https://github.com/wanzhuxie/ARIAS/tree/PLOS-ONE .

Context-aware multi-head self-attentional neural network model for next location prediction

Article

Full-text available

Nov 2023
TRANSPORT RES C-EMER

Accurate activity location prediction is a crucial component of many mobility applications and is particularly required to develop personalized, sustainable transportation systems. Despite the widespread adoption of deep learning models, next location prediction models lack a comprehensive discussion and integration of mobility-related spatio-temporal contexts. Here, we utilize a multi-head self-attentional (MHSA) neural network that learns location transition patterns from historical location visits, their visit time and activity duration, as well as their surrounding land use functions, to infer an individual's next location. Specifically, we adopt point-of-interest data and latent Dirichlet allocation for representing locations' land use contexts at multiple spatial scales, generate embedding vectors of the spatio-temporal features, and learn to predict the next location with an MHSA network. Through experiments on two large-scale GNSS tracking datasets, we demonstrate that the proposed model outperforms other state-of-the-art prediction models, and reveal the contribution of various spatio-temporal contexts to the model's performance. Moreover, we find that the model trained on population data achieves higher prediction performance with fewer parameters than individual-level models due to learning from collective movement patterns. We also reveal mobility conducted in the recent past and one week before has the largest influence on the current prediction, showing that learning from a subset of the historical mobility is sufficient to obtain an accurate location prediction result. We believe that the proposed model is vital for context-aware mobility prediction. The gained insights will help to understand location prediction models and promote their implementation for mobility applications.

Real-Time 3D Object Detection, Recognition and Presentation Using a Mobile Device for Assistive Navigation

Article

Full-text available

Jul 2023

This paper presents an integrated solution for 3D object detection, recognition, and presentation to increase accessibility for various user groups in indoor areas through a mobile application. The system has three major components: a 3D object detection module, an object tracking and update module, and a voice and AR-enhanced interface. The 3D object detection module consists of pre-trained 2D object detectors and 3D bounding box estimation methods to detect the 3D poses and sizes of the objects in each camera frame. This module can easily adapt to various 2D object detectors (e.g., YOLO, SSD, mask RCNN) based on the requested task and requirements of the run time and details for the 3D detection result. It can run on a cloud server or mobile application. The object tracking and update module minimizes the computational power for long-term environment scanning by converting 2D tracking results into 3D results. The voice and AR-enhanced interface integrates ARKit and SiriKit to provide voice interaction and AR visualization to improve information delivery for different user groups. The system can be integrated with existing applications, especially assistive navigation, to increase travel safety for people who are blind or have low vision and improve social interaction for individuals with autism spectrum disorder. In addition, it can potentially be used for 3D reconstruction of the environment for other applications. Our preliminary test results for the object detection evaluation and real-time system performance are provided to validate the proposed system.

GoMap: Combining step counting technique with augmented reality for a mobile-based indoor map locator

Article

Full-text available

Mar 2023

span lang="EN-US">In recent years, indoor navigation and localization has become a popular alternative to paper-based maps. However, the most popular navigation approach of using the global positioning satellite (GPS) does not work well indoors and the majority of current approaches designed for indoor navigation does not provide realistic solutions to key challenges, including implementation cost, accuracy, longer computation processes, and practicality. The step count method was proposed to solve these issues. This paper introduces GoMap - a mobile-based indoor locator map application, which combines the step counting technique and augmented reality (AR). The design and architecture of GoMap is described in this paper. Two small-scale studies were conducted to demonstrate the performance of GoMap. The first study found that GoMap’s performance and accuracy was comparable to other step counting app such as “Google Fit”. The second part of the study demonstrated the feasibility of the application when used in a real-world setting. The findings from the studies show that GoMap is a promising application that can help the indoor navigation process.</span

IFC-graph for facilitating building information access and query

Article

Full-text available

Feb 2023
AUTOMAT CONSTR

This study attempts to address a challenge regarding the extraction of building information, which is one of the fundamental tasks that needs to be addressed in the construction domain. Current technologies, such as relational databases, have difficulty in efficiently and effectively managing and querying the interconnected building information with full of hidden relationships. To address this problem, this study adopted the graph-theory-based graph database technology to reveal hidden relationships within building information. A model-driven approach was developed to enable a full conversion of Industry Foundation Classes data into labeled property graph, which is referred to as IFC-Graph. The result shows that IFC-Graph can represent interconnected building information and reveal hidden relationships, supporting effective and efficient building information access and query. This study can benefit a vast number of future studies in the area of building information query by improving its accessibility and queryability.

Pedestrian Augmented Reality Navigator

Article

Full-text available

Feb 2023
SENSORS-BASEL

Navigation is often regarded as one of the most-exciting use cases for Augmented Reality (AR). Current AR Head-Mounted Displays (HMDs) are rather bulky and cumbersome to use and, therefore, do not offer a satisfactory user experience for the mass market yet. However, the latest-generation smartphones offer AR capabilities out of the box, with sometimes even pre-installed apps. Apple’s framework ARKit is available on iOS devices, free to use for developers. Android similarly features a counterpart, ARCore. Both systems work well for small spatially confined applications, but lack global positional awareness. This is a direct result of one limitation in current mobile technology. Global Navigation Satellite Systems (GNSSs) are relatively inaccurate and often cannot work indoors due to the restriction of the signal to penetrate through solid objects, such as walls. In this paper, we present the Pedestrian Augmented Reality Navigator (PAReNt) iOS app as a solution to this problem. The app implements a data fusion technique to increase accuracy in global positioning and showcases AR navigation as one use case for the improved data. ARKit provides data about the smartphone’s motion, which is fused with GNSS data and a Bluetooth indoor positioning system via a Kalman Filter (KF). Four different KFs with different underlying models have been implemented and independently evaluated to find the best filter. The evaluation measures the app’s accuracy against a ground truth under controlled circumstances. Two main testing methods were introduced and applied to determine which KF works best. Depending on the evaluation method, this novel approach improved the accuracy by 57% (when GPS and AR were used) or 32% (when Bluetooth and AR were used) over the raw sensor data.

Virtual Wireless Device Constrained Robust Extended Kalman Filters for Smartphone Positioning in Indoor Corridor Environment

Article

Full-text available

Feb 2023

The indoor positioning technology is the key to achieve seamless indoor/outdoor location-based services (LBS). As a typical nonlinear information, the pedestrian accessibility improves the performance of the indoor positioning system. Based on a framework and related positioning algorithms of tightly coupled heterogeneous observables from Wi-Fi/BLE/PDR, this work presents a real-time approach for pedestrian accessibility and floor map constrained hybrid indoor positioning method using virtual wireless devices on smartphones. The non-linear characteristic of accessibility is transformed into virtual wireless devices to constrain the position estimation at the observation level. The evaluation of the proposed technique with three different criteria shows that the accuracy, reliability, and the user experience of the estimated locations are comparable to particle filter (PF) results. In a typical corridor environment, the proposed method achieves 0.93 m average positioning accuracy while balancing the computational time consumption.

A Real-time Background Replacement System Based on Estimated Depth for AR Applications

Article

Nov 2023

Recent technological advances in Virtual Reality (VR) and Augmented Reality (AR) enable users to experience a high-quality virtual world. The AR technology is attracting attention in various fields and is also used in the entertainment field such as museums. However, the existing AR technology generally requires specialized sensors such as Light Detection And Ranging (LiDAR) sensors and feature points, which require cost in terms of time and money. The authors have proposed a real-time background removal method and an AR system based on the estimated depth of the captured image to provide a virtual space experience using mobile devices such as smartphones. This paper describes an AR virtual space system that dynamically changes the replaced background based on motion information transmitted from the user's device.

On-site Visual Construction Management System Based on the Integration of SLAM-based AR and BIM on a Handheld Device

Article

Sep 2023

Recently, Simultaneous Localization and Mapping (SLAM) has been integrated into the development of AR systems. Using conventional SLAM adoption method to overlay a building information model (BIM) onto the image of a real scene on handheld device in a construction site will result in poor model overlaying accuracy owing to the relatively unique environment and monotonous texture of such environment. This paper proposes an adoption method of SLAM suitable for construction environment to improve positioning accuracy for the fitting of BIMs for the AR presentation. In addition, a AR system to visualize construction progress on-site is developed. It allows site personnel to input on-site work progress through an AR device, then compares the originally planned construction progress with the actual on-site progress, and presents BIM components using different colors in the AR mode to show whether work progress on each component is ahead or behind schedule intuitively.

Fusing acoustic ranges and inertial sensors using a data and model dual-driven approach

Article

Jul 2022

BDS started providing services worldwide since 2020. It can offer centimeter level positioning service when an open sky is available. BDS is now making a step further to become a more ubiquitous, integrated and intelligent system. At the meantime, high precise indoor positioning techniques are still under developments. Among these techniques, Apple has adapted the ultra-wideband (UWB) technique to iPhone and tried to push this technique to mass-market. While other new positioning techniques such as 5G, acoustic ranging, WiFi round-trip-time (RTT) and bluetooth (BT) angle of arrive (AoA) which support pervasive smartphones are alse competitive. For indoor positioning, it is still facing the challenges of low accuracy, high cost, small signal coverage and limited capability of generalization. Fusing multiple positioning sources method is one of the important approaches to solve these problems. Especially the fusing combination of low-cost inertial positioning source and high-accuracy radio frequency/acoustic positioning source has practical applicable value at present. Pedestrian dead reckoning (PDR) positioning source based on inertial sensors has advantage of the capability to alleviate error accumulation in double integration. However, it is still facing difficulties because of the complex of smartphone holding poses and the diversity of sensor hardware performance. Furthermore, this step-wise approach also limits the position update rate to less than 2 Hz. In order to develop a low-cost, high-precision and wide-coverage indoor positioning solution, a new approach of fusing acoustic ranges and inertial sensors by using a data and model dual-driven method is proposed in this paper. The data driven PDR solution part is developed based on a neural network, it is a deep learning approach by training a network to learn the velocity vector using the inertial measurements as input. The learned velocity vector is then used to propagate the PDR trajectory, which is further integrated with the high precise acoustic ranging measurements by an extended Kalman filter(EKF) in the model driven part. The proposed solution can offer a positioning accuracy of 0.23 meters at a position update rate of 20 Hz.

Progress of Instantiated Reality Augmentation Method for Smart Phone Indoor Scene Elements

Abstract and Figures

Recommended publications

Streamlining Navigation Using Sensor Fusion for Gps and Augmented Reality

Augmented Reality based navigation during Climate emergencies and crises.

Indoor Navigation with Augmented Reality

Indoor Navigation with Augmented Reality and BIM: A Marker-Based Approach for Locating Logistics Are...