ArticlePDF Available

Progress of Instantiated Reality Augmentation Method for Smart Phone Indoor Scene Elements

Authors:

Abstract and Figures

Indoor mobile phone navigation and location services are current research hotspots, of which scene element instantiation reality augmentation methods are an important part. Instantiated segmentation is a challenging and fundamental task in scene element perception, and augmented reality is an effective way to apply digital twin building maps, both of which are of great importance in the field of indoor location navigation. At present, augmented reality technology is mainly applied to semantic enhancements in scenes, and AR enhancements for smartphone indoor navigation only stay in the visual visualization effect, and have not yet really penetrated to the level of enhancement of elemental instances in scenes. To address this problem, this paper proposes an AR research idea of mobile phone scene element instantiation, by identifying objects in indoor scenes and matching them with building maps, the corresponding stored element information in the building maps will be enhanced and displayed using AR technology, thus assisting pedestrians in indoor navigation and location services and other related applications, and improving the information level of location services such as indoor positioning and navigation for users. This paper provides a systematic overview of instance segmentation and augmented reality methods for smartphone-side video, and analyses the characteristics and applicable scenarios of the relevant methods, summaries the research progress of instance segmentation and augmented reality in mobile, and finally discusses the application prospects of instantiated reality augmentation methods for indoor scene elements in the field of navigation and location services.
Content may be subject to copyright.
计算机工程与应用
www.ceaj.org
Computer Engineering and Applications 计算机工程与应用2024607
信息智能化时代位置服务受到人们的广泛重视,
室内环境中受到墙壁对信号遮挡和多路径效应等复杂
因素的影响,室外卫星导航定位系统并不能提供有效服
务。在大型复杂的室内场景,如交通枢纽、购物中心、
手机室内场景要素实例化现实增强方法研究进展
刘建华 楠,白明辰
北京建筑大学 测绘与城市空间信息学院 移动地理空间大数据云服务创新团队,北京 100044
要:室内手机导航与位置服务是当前的研究热点,场景要素实例化现实增强方法是其中重要的组成部分。实例
分割是场景要素感知中一项具有挑战性的基本任务,增强现实是数字孪生建筑物地图应用的有效途径两者在室内
定位导航领域有着重要意义。当前,增强现实技术主要应用在对场景中的语义增强 智能手机室内导航 AR增强也
只是停留在视觉可视化效果方面,尚没有真正深入到对场景中要素实例的增强层面。针对该问题提出手机场景要
素实例化 AR 研究思路,通过识别室内场景中的对象并与建筑物地图进行匹配将建筑物地图中对应存储的要素信
息利用 AR 技术进行增强显示,进而辅助行人进行室内导航与位置服务等相关应用提升用户室内定位导航等位置
服务的信息化水平。对智能手机端视频的实例分割和增强现实方法进行了系统的梳理,并分析了相关方法的特点
和适用场景,总结了移动端实例分割和增强现实的研究进展最后探讨了室内场景要素实例化现实增强方法在导航
与位置服务领域的应用前景。
关键词:增强现实 实例分割深度学习;手机室内定位导航建筑物地图匹配
文献标志码:A中图分类号:TP301 doi10.3778/j.issn.1002-8331.2309-0376
Progress of Instantiated Reality Augmentation Method for Smart Phone Indoor Scene Elements
LIU Jianhua, WANG Nan, BAI Mingchen
Mobile Geospatial Big Data Cloud Service Innovation Team, School of Geomatics and Urban Spatial Information, Beijing
University of Civil Engineering and Architecture, Beijing 100044, China
AbstractIndoor mobile phone navigation and location services are current research hotspots, of which scene element
instantiation reality augmentation methods are an important part. Instantiated segmentation is a challenging and fundamental
task in scene element perception, and augmented reality is an effective way to apply digital twin building maps, both of
which are of great importance in the field of indoor location navigation. At present, augmented reality technology is mainly
applied to semantic enhancements in scenes, and AR enhancements for smartphone indoor navigation only stay in the visual
visualization effect, and have not yet really penetrated to the level of enhancement of elemental instances in scenes. To
address this problem, this paper proposes an AR research idea of mobile phone scene element instantiation, by identifying
objects in indoor scenes and matching them with building maps, the corresponding stored element information in the
building maps will be enhanced and displayed using AR technology, thus assisting pedestrians in indoor navigation and
location services and other related applications, and improving the information level of location services such as indoor
positioning and navigation for users. This paper provides a systematic overview of instance segmentation and augmented
reality methods for smartphone-side video, and analyses the characteristics and applicable scenarios of the relevant methods,
summaries the research progress of instance segmentation and augmented reality in mobile, and finally discusses the
application prospects of instantiated reality augmentation methods for indoor scene elements in the field of navigation and
location services.
Key wordsaugmented reality; instance segmentation; deep learning; mobile indoor location-based navigation; building
map matching
基金项目:北京市高等教育学会重点项目ZD202244北京建筑大学教育科学研究重点项目Y2111
作者简介:刘建华1981通信作者,男,博士,副教授,北京建筑大学大数据应用研究中心特聘教授,硕士生导师,中国遥感应
用协会建设工程分会委员,研究方向为地理信息科学与遥感应用技术,E-mailliujianhua@bucea.edu.cn
收稿日期:2023-09-21 修回日期:2024-01-05 文章编号:1002-8331202407-0058-12
58
计算机工程与应用
www.ceaj.org
2024607
型医院、博物馆、地下场站等,室内导航与位置服务显得
尤为重要。手机视觉定位技术由于所需设备简便,受环
境影响因素较小而得到广泛关注。智能手机拥有丰富
的传感器资源和多源信号增强的处理能力,以手机为载
体的室内导航与位置服务成为当前的研究热点。在室
内场景要素识别研究中,正确感知用户所在场景中的要
素信息空间位置及属性信息等或据此进一步对用户
进行视觉定位仍具有很大的挑战性。
移动增强现实技术mobile augmented realityMAR
通过移动设备上的摄像头来识别特定图像,将目标识
别、渲染等计算任务在云端或边缘服务器进行处理,
将渲染后的图像下载到移动端进行显示。徐舒婷等[1]
过将训练好的 SSD网络模型搭载到移动端,开发可识别
建筑单体信息的增强导航系统。Roh [2]将深度学习与
AR 结合有效识别个人移动用户并提供驾驶辅助信息。
Wang[3]通过计算机视觉方法对地图点状、线状元素进行
识别和自动提取,利用 AR 实现二维平面地图的增强表
示。但是,上述方法并没有应用到室内导航领域,且没
有真正深入到对场景中要素实例增强的层面。因此,
究手机室内场景要素实例化 AR方法在室内导航与位置
服务领域有着现实意义。
实例是指类的对象,实例化是指通过实例分割的途
径来获取场景中对象的过程。本研究提出的场景要素
实例化 AR方法首先在建筑物地图基础之上构建地图定
位锚点,地图定位锚点即建筑物内具有普适性的附属设
施点,并存储要素实例增强信息。然后利用手机后置摄
像头采集要素图片,通过 labelimg 图像标注工具进行样
本数据集的制作。接着利用样本数据集训练深度学习
网络模型,并通过格式转换的方式将模型轻量化部署到
移动端进行识别,将识别对象与建筑物地图进行匹配,
进行场景要素实例化识别。最后利用移动增强现实技
术将识别结果在建筑物地图中对应存储的要素信息
本、图片、视频等进行增强显示,进而辅助行人进行室
内导航与位置服务等相关应用,最终拟实现效果如图 1
所示。本方法中的室内场景要素实例化方法具有普适
性,选取建筑物室内具有普适性的附属设施作为识别要
素,可以应用到多类建筑物室内场景识别中。
近年来,关于手机视频实例分割和 AR 方法的研究
备受关注,但对该领域相关研究现状系统总结的综述性
文献鲜见。本文对智能手机端视频的场景要素实例分
割和增强现实方法进行了面向室内导航定位应用的系
统性梳理,并分析了相关方法的特点和适用场景,总结
了移动端视频实例分割和增强现实的研究进展,最后探
讨了室内场景要素实例化 AR方法在导航与位置服务领
域的应用前景。
1研究进展
1.1 实例分割
1.1.1 概述
实例分割是计算机视觉领域一项具有挑战性的基
本任务,广泛应用于医学影像分析、自动驾驶、监控系
统、航定[4- 5]和增强现实[6]等领域。在边缘设备上部
署高性能、低延迟的对象检测器正受到越来越多的关
注。在过去的几年里,研究人员对基于卷积神经网络
convolutional neural networkCNN
[7- 8] 的检测网络进
行了广泛的研究,诸多实例分割框架被提出。Bolya[9]
提出一个全卷积实时>30 fps实例分割框架 YOLACT
通过并行生成一组原型掩码并预测每个实例的掩码系
手机实例化AR
摄像头
火警报警器
门牌
消防柜
安全出口
WLAN
电梯
展板
照明灯
电箱
构件样本数据集
图像标注工具 目标标签
原始影像
目标位置
手工
标记
构建场景
要素深度
学习模型
模型参数优化
模型格式转换
地理编码
匹配
AR增强
技术
模型精度
是否合格
室内场景要素
实例识别模型
场景要素实例识别
射灯
+
构建建筑物
地图定位锚点
存储要素实例
增强信息
采集场景要素照片
AR建筑物地图
1手机室内场景要素实例化现实增强整体流程图
Fig.1 Overall flowchart of mobile phone indoor scene elements instantiation AR
刘建华,等:手机室内场景要素实例化现实增强方法研究进展 59
计算机工程与应用
www.ceaj.org
Computer Engineering and Applications 计算机工程与应用2024607
数来生成实例掩码。Wu[10] EfficientVIS“端到端”
框架,通过迭代查询视频交互,跨空间和时间关联并分
割感兴趣区域ROILi [11]提出单阶段参考视频对象
分割RVOS YOFO其中元传输MT模块仅使用
图像特征重建 VOS所需的图像和语言特征。Kini [12]
提出基于标记的注意力模块学习视频中的实例传播,
实现像素级粒度的实例掩码预测。Zhu[13]提出 instance
as identityIAI线 VIS 范式,该范式以一种高效的方
式对检测和跟踪的时间信息进行建模。Ganesh [14]
RFCR 原始特征收集和重分发模块,以及一种用于改
进迁移学习的目标检测的截断主干,用于提高各种轻量
级体系结构的准确性和效率。
同时,随着 Transformer[15-17]的兴起 视觉 Transformer
vision TransformerViT
[18] 作为当前最先进的模型逐
渐应用于实例分割领域。Wang [19]提出的 VisTR 视频
实例分割框架将 VIS 任务视为一个直接的“端到端”
行序列解码、 测问题, 第一个将 Transformer 应用于
视频实例分割的框架。Zhou [20] 提出一系列全注意力
FANs
Transformer 的稳健性。Wu [21] 提出的 ReferFormer
架将语言视为查询,通过链接相应查询可以实现对象跟
踪。Jin [22]提出一种基于 Transformer 的航拍视频异常
检测ANDT它将连续的视频帧视为一个管序列,
Transformer 编码器从序列中学习特征表示,并利用
解码器预测下一帧。Li [23]提出一种基于 Transformer
的鲁棒稀疏特征匹配网络模型 MSFA-T该模型利用图
像语义信息和最优置信度特征解决视点失真和纹理薄
弱的问题,在室内大视野场景视觉定位中实现了精确的
图像匹配。
轻量化 CNN 在视觉任务中广泛应 用,但其表示是
局部性的,ViT 能够学习全局表示,但由于自注意的二
次方复杂性使其在计算和模型大小方面要求很高[24]
因此融合 CNN ViT的优势来构建移动视觉模型逐渐
成为一个新的趋势。Heo [25]提出一种基于池化的 PiT
ResNet 风格的维度应用于 ViT显著提高 ViT
体系结构的性能。Pan[26]提出的 EdgeViTs模型通过引
入基于自注意和卷积优化整合的 local-global-local模块
来聚合信息。Chen [27]提出的 Mobile-Former模型将轻
量级 MobileNet transformer 并行设计,实现本地和全
局特征的双向融合。Yang [28] 提出建立在 mobile 卷积
和注意力基础上的 MOAT型。
当前实例 分割框架并没有 AR增强技术有效结合
应用到室内定位导航领域,即并没有深入结合建筑物地
图对视频要素实例识别结果进行语义增强,具体如对视
频中的实例对象进行相应的文本、语音、图像、多媒体等
语义信息增强,究其原因主要是缺乏工程化的视频场景
要素实例化解决方案和配套技术。
1.1.2 方法谱系
上述实例分割框架的提出与应用,促进了实例分割
技术的快速发展。如表 1所示,在实例分割框架基础
方法
MansNet
ESPNetV1
ESPNetV2
MobileNetV1
MobileNetV2
MobileNetV3
MobileViTv1
MobileViTv2
EdgeNeXt
时间
2019
2018
2019
2017
2018
2020
2021
2022
2022
作者
谭明星
Mehta
Mehta
Howard
Sandler
Howard
Mehta
Mehta
Maaz
特点
1将设计问题转化为多目标优化问题multi-objective optimization同时考虑准确率和实际推理耗时,并在
实际移动设备上运行来测量推理耗时
2提出分解的层次搜索空间factorized hierarchical search space使得层存在结构差异的同时,仍然能很好
地平衡灵活性和搜索空间大小
核心在于 ESP 模块,该模块包含point-wise 卷积和空洞卷积金字塔,分别用于降低计算复杂度以及重采样各有
效感受域的特征
1通用的轻量化网络结构,能够支持视觉数据以及序列化数据,即能支持视觉任务和自然语言处理任务
2 ESPNet 基础上,加入深度可分离空洞卷积进行拓展,相对于 ESPNet 拥有更好的精度以及更少的参数
1用深度可分离卷积depthwise separable convolution代替普通的卷积,来构建轻量级的深度神经网络
2引入了两个简单的全局超参数,可以有效地权衡延迟和准确性
1引入倒残差结构inverted residuals先升维再降维,增强梯度的传播,显著减少推理期间所需的内存占用
2提出了线性瓶颈层linear bottlenecks
1使用 NetAdapt 算法获得卷积核和通道的最佳数量
2引入 SE通道注意力结构
3使用了一种新的激活函数 h-swishx
结合 CNN ViTs的优势,构建了一个轻量级、通用和移动友好的网络,轻量级、低延迟的移动视觉任务网络
1一种具有线性复杂度 Ok的可分离自注意力方法
2使用元素操作来计算自注意力,使其成为资源受限设备的良好选择
1一种新的轻量级架构 EdgeNeXt该架构在模型大小、参数和 MADD 方面都很有效,同时在移动视觉任务中
具有更高的准确性
2引入分割深度方向转置注意力SDTA编码器,该编码器可以有效地学习局部和全局表示,以解决 CNN
感受野有限的问题,而不增加参数和 MADD操作的数量
1智能手机视频实例分割代表性方法
Table 1 Representative methods for smartphone video instance segmentation
60
计算机工程与应用
www.ceaj.org
2024607
上,本文进一步分析总结了智能手机视频实例分割的代
表性方法。
如图 2所示,本文从识别模型的角度将智能手机端
视频实例分割方法分为三大类,基于 CNN vision
TransformerCNN ViT融合的方法。
1MansNet
MansNet[29]是一种用于设计移动端 CNN 模型的自
动神经结构搜索方法。搜索框架由三个组件组成:基于
循环神经网络RNN控制器,获得模型精度的训练器,
以及用于测量延迟的基于手机的推理引擎。提出一种
新的分解分层搜索空间factorized hierarchical search
space以实现网络中的层分集。该方法在移动延迟限
下, ImageNet 分类和 COCO 对象检测方面与其他
先进模型MobileNetV1/MobileNetV2相比具有更高的
精度和更低的推理延迟。
2ESPNets
ESPNetV1[30]是一种快速高效的卷积神经网络。高
效空间金字塔卷积模块ESP module ESPNet 的核心
组成部分,该模块将标准卷积分解成 point-wise 卷积和
空洞卷积金字spatial pyramid of dilated convolutions
方法 够大 减少 ESP
存。据研究,ESP 模块比其他卷积分解方法MobileNet/
ShuffleNet更高效
ESPNetV2[31]首先将计算量较大的空洞卷积替换为
深度可分离空洞卷积,使用层次特征融合来消除网格伪
影得到 EESP Unit。并将带有 stride 对应的空洞卷积替
换深度空洞卷积, add 特征融合方式替换为 concat
式得到 Strided EESP。研究显示,该网络在不同的任务
如对象分类、检测、分割和语言建模中提供了先进的
性能。但是该模型缺乏像素之间的全局交互,准确性有
待提高[32]
3MobileNets
MobileNetV1[33]是一种用于移动视觉的高效卷积神经
网络。其核心是深度可分离卷积depth-wise separable
convolution。深度可分离卷积将标准卷积操作分解为
深度卷depth-wise convolution逐点卷积point-wise
convolution两个过程,大幅度降低参数量和计算量
1/8~1/9
MobileNetV2[34]相较于 MobileNetV1提出具有类似
ResNetShortcut 结构线性瓶颈的倒残差结构inverted
residual with linear bottleneck 的低
low-dimensional compressed representation
用投影卷积扩展到高维,使用轻量级深度卷积提取特
linear bottleneck 将特征投影回低维压缩表
示。此外,该模块可以显著减少推理期间所需内存占
用。据研究,该架构在 MobileNetV1 基础上保留了其简
单性,并提高了其准确性。
MobileNetV3[35] 针对不同场景资源消耗问题,
别设计了 MobileNetV3-Large MobileNetV3- Small
MobileNetV3 通过结合 NASnetwork architecture search
NetAdapt 两种网络搜索方法搜索得到轻量级网络,
线性激活函数 h-swish 改进网络的性能,高效语义解码
LR-ASPPlite reduced atrous spatial pyramid pooling
用于语义分割任务。据研究表明[35]该网络相较于
MobileNet前两个版本无论在计算延迟性上还是在计算
精度上都有较为明显的优势。
4MobileViTs
MobileViTv1[24] 是一种用于移动设备的轻量级通用
ViTMobleViT CNNs ViT
MobileViT Block 通过更少的参数对局部和全局信息进
行建 模,使 Transformer 将卷积中的局部处理方式替
换为全局处理。但 Transformer 中的多头自注意力 MHA
multi-head self-attention MobileViT的主要效率瓶颈[35]
MobileViTv2[36] MobileViT 基础上提出一种具有
线性复杂度的可分离自注意力方法,用于解 Transformer
MHA MHA
中计算量大的操作例如,批矩阵乘法替换为元素操作
例如,求和及乘法来计算自注意力,使其成为资源受限
移动设备的优先选择。据研究改进后的模型MobileViTv2
ImageNet 对象分类和 MS-COCO 目标检测以及语义
割方面优于现有轻量级和基于 ViT的方法,达到 SOTA
性能。
5EdgeNeXt
EdgeNeXt[32]是一种用于移动端视觉应用的高效轻
量级混合架构。研究人员从减轻计算资源消耗以部署
于终端设备入手,引入可拆分深度转置注意力编码器
split depth-wise transpose attentionSDTA输入
张量拆分为多个通道组,并利用深度卷积网络和跨通
道的自注意来隐式地增加感受野和编码多尺度特征,
基于CNN
基于vision
Transformer
CNNViT融合
MansNet
ESPNet V1/V2
MobileNet V1~V3
MobileViT V1/V2
EdgeNeXt
BoxeR
SepViT
EdgeFormer
Mask SSD
SK-MobileNet
2智能手机端视频实例分割方法分类
Fig.2 Classification of video instance segmentation
methods on smartphone
刘建华,等:手机室内场景要素实例化现实增强方法研究进展 61
计算机工程与应用
www.ceaj.org
Computer Engineering and Applications 计算机工程与应用2024607
提高资源利用率。通过对比 EdgeNeXt 的不同变体与
MobileViT 的对应模型,EdgeNeXt 每种模型大小提供了
更高的准确度和更低的延迟[32]
1.1.3 问题分析
一般考察 署在动端模型 性能化能
棒性、轻量级、低延迟是评价该方法优劣的基本指标。
目前,上述应用于智能移动设备手机、PAD 边缘
算的轻量级网络架构在面向场景要素的检测与识别领
域均取得了良好的效果。然而,深入到对场景要素识别
结果进行语义增强,例如对分割获取的实例对象进行文
本、语音、图片等方面的信息深度增强仍然存在问题。
本文提出手机场景要素实例化 AR方法这一技术创新问
题,建议通过实例化识别室内场景要素,并与建筑物地
图空间数据库要素进行匹配,将要素实例识别结果利用
AR 技术增强显示,提升行人进行室内定位导航等相关
位置服务的信息化水平。
1.2 增强现实AR
随着实景 3D[37]5G[38]AI[39]等技术的发展,利用增强
现实技术来提升室内定位导航的服务体验已经得到了
广泛应用。增强现实技术[40-43]
augmented realityAR
是一种将虚拟信息与现实世界巧妙融合的技术。
目前的手机增强现实代表性方法大致分为商用 AR
SDK基于 SDK AR 方法以及自研 AR 框架。增强现
实的商业化软件开发包主要有苹果的 ARKit[44-45] 和谷歌
ARCore[46- 47]。但是商用 SDK 在不同的平台上可能需
要进行修改或适配,可移植性比较差。
基于商用AR SDKAR方法日益增多Varelas[48]
基于 ARCore集成的室内定位系统为用户提供完全无标
记的体验,但仍存在位置初始化速度慢、易受环境亮度
影响等缺点。Lu[49]设计的基于 ARCore 的增强现实校
园导航系统,可以为用户提供文本、语音、视频等增强现
实内容。Martin [50] 开发的室内导航应用程序基于
Google AR Core从手机摄像头获取实时信息 进行 AR
视图中的导航。为了应对增强现实系统中对象匹配准
确度、注册误差等问题,基于 SDK 融合其他技术的 AR
方法逐渐成为趋势,一方面体现在融合传感器进行 AR
导航。Huang [51] 开发的导航系统 ARBIN Google
ARCore AR 3D 模型,获取陀螺仪传感器读数,
实地测验精度可以达到 3~5 m ARBIN 对定位环境
周围信息的增强显示并不完全。Zhou [52] 提出一种融
合蓝 BLE和行人航位推算PDR定位技术来进行
AR 摄像机跟踪的方法,将室内地图和语义信息渲染到
真实世界中。Mahapatra[53]设计的行人增强现实导航
PAReNtiOS 应用程序将来自 ARKitGNSS 和一个基
于蓝牙的定位系统的信息融合在一起以提高定位的准
确性。Sharin [54] 结合加速度计传感器、计步技术和
AR 技术设计基于移动设备的室内定位地图应用程序
GoMap。但是该类方法大多受限于传感器的准确性和
灵敏度。另一方面 AR结合深度学习模型实现更加精准
的检测识别。Li[55]AR Core SSD Mobilenet 模型
结合来改进 2D 对象检测,但是该方法在实例分割与增
强现实结合方面仍有一定的发展空间。Kaul [56] 将移
动场景检测框架 Apple ARKit MobileNetv2 3D
间音频相结合,在场景中识别检测到的对象,并为视觉
障碍人群提供听觉场景描述。Chen [57] 将检测算法
YOLOSSDmask RCNN 等) ARKitSiriKit 结合进
行物体检测跟踪、语音和 AR 增强界面的设计,可提供语
音交互和 AR 可视化。但该类方法易受到场景深度、
照等影响,从而影响对象匹配的准确度。同时,SLAM
技术也被整合到 AR 系统的开发中。Hsieh [58]
ARKit SLAM 集成开发的现场可视化施工管理系统,
可以在物理施工现场实现 BIM 模型的精确对齐,但该方
法对特征点映射和扫描的依赖可能会受到环境条件的
影响,如光线不足或遮挡等。
自研 AR 框架大多基于 Unity3dOpenCVOpenGL
ES 进行开发。Verma [59] Unity3d 开发的 AR 移动
应用程序实现在导航过程中呈现AR 导航箭头指引,
该应用增强内容较少。Verykokou [60] Android
NDK OpenCV c++ OpenGL ES 2.0 MAR
系统,将基于特征的图像匹配和姿态估计与三维纹理模
型的快速渲染相结合,用于平面表面跟踪和 3D 纹理模
型的覆盖。Wang [61] OpenCVOpenGL 等开发的
ARIAS 交互式广告系统,通过手势操作来显示广告视
频,但是该系统仍存在一些局限性如不支持音频播放及
存在较难手势等。Tsuboki[62]开发的基于深度估计和
运动信息动态替换背景的 AR 虚拟空间系统,但该系统
在进程分配方法方面仍需要进一步研究。
当前智能 手机室内导航 AR 增强技术仅停留在视觉
可视化效果层面,并没有大众化地应用到对场景要素的
增强级别,究其原因主要是缺乏工程化的场景要素实例
AR的解决方案和配套技术
1.2.1 方法谱系
随着手机 AR 方法的提出与实现,增强现实技术越
来越广泛地应用于定位导航等位置服务领域,本小节进
一步分析并总结了相关智能手机 AR 代表性方法,
2所示。
3所示,本文从商业产品的角度将智能手机
AR 代表性软件分为三大类,商用 AR SDK SDK
的手机 AR实现方法、自研手机 AR 框架。
1ARKit
ARKit 使 VIOvisual inertial odometry
性里程计来精确跟踪现实世界中的真实场景[63]。相比
其他设备平台,ARKit VIO 可以将传感器数据和
CoreMotion 的数据融合在一起,从而提供更为精确的
62
计算机工程与应用
www.ceaj.org
2024607
信息。ARKit 目前已经支持 UnityUnreal Engine
Scenekit。但是该系统还不能很好地分离自我运动和物
体运动,导致在匹配过程中存在严重问题[64]限制了其
在大规模行业中的适用。
2ARCore
ARCore 包括 不同的 API 前, Android[Java]
Web[JavaScript]Unreal[C++]Unity[C]等多个开发平
台构建基于 ARCore 的增强现实应用程序,为开发人员
提供了足够的灵活性和选项[47]。由于典型的基于相机
的问题如光照条件差、几何形状不充分、环境结构复杂
性低、图像中的动态即运动模糊导致环境地图缩放的
漂移以及真实和虚拟 SLAM 地图的发散[64]使得其不能
很好地适用于大规模工业环境。
3Deep learning-based AR-HUD
Deep learning-based AR-HUD[2]主要由两个子系统
组成:基于深度学习的个人移动用户识别和基于 AR
HUD的驾驶信息可视化。第一个子系统是分别使用基
于深度学习的异常检测和身体方位估计来识别行人和
个人移动用户的位置和移动方向。第二个子系统是在
AR-HDD 中可视化驾驶辅助信息,行人和个人移动用户
可以在其中突出显示和集中注意力。但是该系统提供
的信息过多可能会阻碍驾驶,且在拥挤区域时黄色和红
色指示灯可能会造成干扰[2]
4CollabAR
CollabAR[65]是一个边缘辅助系统,为移动 AR
抗扭曲的图像识别与难以察觉的系统延迟。CollabAR
包含失真容忍图像识别器、相关图像查找模块以及辅助
系统化多视图集成器三个组件。它们部署在边缘服务
器上,以减轻识别精度和端到端系统延迟之间的权衡。
同时客户端还运行基于锚点的姿态估计模块,该模块利
用谷歌 ARCore 提供的云锚点来跟踪移动设备的位置和
方向。由于在人工和真实世界的失真之间存在着不可
忽视的差距,可能导致识别性能不佳,同时由于底层硬
件和信号处理管道的异质性可能导致不同设备拍摄的
图像之间的特征不匹配等问题[65]
5EdgeXAR
EdgeXAR[66]是一个移动 AR 框架,它利用边缘计算
的优势,通过任务加载来支持灵活的基于相机的 AR
互。并设计了一种用于移动设备的混合跟踪系统,它提
供了 6个自由度的轻量级跟踪,并从用户的感知中隐藏
了加载延迟。采用一种实用、可靠的通信机制,实现关
键信息的快速响应和一致性。同时还提出了一个多目
标图像检索管道,在云和边缘服务器上执行快速和准确
的图像识别任务。但是该系统存在不抗遮挡的问题,
基于光流的跟踪方面仍待改进,且图像分割方法在复杂
背景下的性能不太稳定[66]
方法
ARKit 6
ARCore1.7
CollabAR
EdgeXAR
Deep learning-
based AR-HUD
时间
2022
2022
2020
2021
2023
作者
苹果
谷歌
Guohao Lan
Wenxiao Zhang
Dong Hyeon Roh
优势
只适用于 iOS 设备,具有运动追踪、
线评估、场景理解、渲染等关键功能
支持 Android 设备上的 AR 应用开发,
具有运动跟踪、环境理解、光线评估和
图片识别等关键功能
1提出容忍失真的图像识别器来解决
图像失真引起的域自适应问题
2协作多视图图像识别
1用于移动设备的混合跟踪系统
2多目标图像识别管道
3细粒度的边缘卸载管道
将深度学习和 AR 相结 合,AR-HUD
上提供驾驶员辅助信息
局限性
不能很好地分离自我运动和物体运动,
导致匹配性能不稳定
1环境地图缩放漂移
2真实和虚拟 SLAM地图发散
1“野外”AR场景识别性能不佳
2设备异构性可能导致不同设备捕获
的图像之间特征不匹配
1极端用例下延迟补偿技术不足
2基于光流的跟踪不耐遮挡
3复杂背景下图像分割性能不稳定
过多的信息可能会阻碍驾驶员的驾驶
能力
适用场景
室内、室外等
多种场景
室内、室外等
多种场景
多失真场景
电影院等相关
场景
汽车驾驶场景
2智能手机 AR代表性实现方法
Table 2 Representative implementation methods for smartphone AR
手机AR
代表性方法
商用AR SDK
基于SDK
手机AR方法
自研手机AR
框架
ARKit
ARCore
Wikitude
CollabAR
EdgeXAR
ARBIN
ARCore-Based
CNS
MIRAR
ModAR
ARIANNA+
ANSVIP
3智能手机 AR代表性方法分类图
Fig.3 Classification of representative methods
for smartphone AR
刘建华,等:手机室内场景要素实例化现实增强方法研究进展 63
计算机工程与应用
www.ceaj.org
Computer Engineering and Applications 计算机工程与应用2024607
1.2.2 问题分析
目前,有关室内导航 AR 增强的方法设计与系统研
发仍停留在视觉效果层面,如基于 AR 地标箭头
素的增强显示,尚没有彻底解决对室内场景中的实例化
对象进行多源信息增强的问题。同时,对象匹配准确度
及注册误差也是限制当前增强现实系统应用的主要问
题。本文提出手机实例化 AR方法将有效结合实例分割
AR技术,首先实例化识别室内场景要素,继而将要素
实例识别与地图匹配结果利用AR 技术进行增强,提升
用户室内定位导航等位置服务应用水平。
1.3 地图匹配
1.3.1 建筑物地图building map
建筑物是室内位置服务应用的主要场景,建筑物地
图是室内位置服务应用的基础和信息载体[67]。近年来
各种新型数字地图不断涌现,其中实景地图和全息位置
地图由于良好的视觉效果与多维数据组织能力,被研究
者广泛讨论。李德仁院士[68]提出可量测实景影像,并将
其作为新的数字化测绘产品与 4D 产品集成,从而推进
按需测量的空间信息服务。周成虎院士等[69]认为全息
位置地图是以位置为基础,全面反映位置本身及其与位
置相关的各种特征、事件或事物的数字地图。闾国年
[70]提出了涵盖“空间定位”“语义描述”“属性特征”“几
何形态”“演化过程”“要素相互关系”的地理信息六要素
表达模型。Zhu [71]开发了一种模型驱动的方法,使工
业基础类industry foundation classesIFC数据完全转
换为标记的属性图IFC-graph支持有效和高效的建
筑信息访问和查询试图解决有关建筑信息提取的挑战。
建筑物地图为众多研究提供了数据和信息基础,
面向智慧城市空间要素表达的重要方式,在城市建设、
智能管理、应急管理与室内位置服务等方面都起到至关
重要的作用。IFC 是建筑行业 BIM 领域中最常讨论的
模型数据规范,包含了大量构建建筑物地图中所需的几
何语义信息,是生成建筑物地图的理想数据来源[72]
GIS 领域最具代表性的数据规范为 CityGML Indoor-
GMLCityGML 主要反映城市三维对象的通用语义信
[73]并定义了五个不同的“详细级别LODS
CityGML
伸,LODS分类在室内建筑环境中得到进一步扩展和细
[74]IndoorGML 是面向室内定位导航应用的数据规
范,专注于表示 2D 3D室内环境的数据表示和交换[75]
尤其是空间拓扑关系方面。罗竟妍等[76]提出一种由实
体模型和网络模型组成的建筑物地图混合模型
BIMPN。建筑物地图实体模型,通过多边形的形式来
呈现建筑物,可以直观地对建筑物的空间信息进行很好
的表达,还可以提高室内基于位置服务应用的可视化表
达,为用户提供更好的三维建筑物室内场景漫游体
[77]。建筑物地图网络模型,以图结构为基础对路网等
要素及其空间拓扑关系的进一步抽象表达[78]BIMPN
混合模型将两种模型交互连接在一起不仅集成了两种
模型各自的特点,而且可以为室内定位导航应用提供更
便捷的几何和语义约束信息。此外,建筑物地图不仅可
以对室内地图和定位结果进行可视化表达,还可以辅助
进行地图匹配、手机室内场景要素实例化现实增强和导
航路径规划等相关室内定位服务应用。
一般室内场景要素实例化 AR增强的空间视觉定
约束信息来源于建筑物地图定位锚点 MLA[79]主要
括门、门牌、消防柜、火警报警器、安全出口、摄像头、
WLAN电箱、电梯、展板、灯等 11 类普适性要素。本研
究在建筑物地图[80]的基础上,进一步通过提出的地图定
位锚点 MLA 对室内定位导航位置服务中所需的几何与
语义信息进行组织表达,以辅助室内定位结果的约束匹
配及室内导航路径的距离计算。同时深入分析不同场
景下的实例要素信息的分类以及AR 增强的数据内容,
AR 增强数据以文字、图片、语音、视频等形式进行存
储,以进行 AR 建筑物地图的构建。AR 建筑物地图构建
方法如图 4所示。
1.3.2 地图匹配
地图匹配是将定位结果与地图上相应规划路径匹
AR建筑物地图
实体模型
网络模型
多源数据采集BIM模型
手机近景摄影测量
提取
信息
无人机贴近
摄影测量
地图定位锚点MLA
文本、语音、视频、
图片等增强信息
几何信息
纹理信息
语义属性
节点信息
拓扑关系
增强信息
建筑物
地图引擎 可视化
4 AR建筑物地图构建方法
Fig.4 AR building map construction method
64
计算机工程与应用
www.ceaj.org
2024607
配的过程,也是室内导航应用的前提[81]。现有的地图匹
配算法大致可以分为两类,即传统基于模型的方法和基
于学习的方法[82]
传统基于模型的方法主要有几何分析[83]卡尔曼滤
[84]隐马尔可夫模型hidden Markov modelHMM
[85]
等。针对密集路网中基于隐马尔可夫模型的地图匹配
方法Cui [86] 提出了一种基于分段的隐马尔可夫模型
SHMM该方法 GNSS 轨迹划分为若干个子轨迹,
然后搜索每个 GNSS 子轨迹的候选道路段序列,最后采
用隐马尔可夫模型对 GNSS 子轨迹与路段序列进行匹
配,识别出概率最大的路段序列,该方法在特性方面需
考虑更多的因素。Harder [87] 开发了一种实时地图匹
配方法,该方法使用回溯粒子滤波器,降低了空间查询
的复杂性,并在使用不同类型的空间约束时提供了灵活
性,并开发了一个使用楼层间过渡区域的基于地图的优
化。Guo [88] 提出了一种使用智能手机上的虚拟无线
设备进行行人可及性和楼层地图约束的混合室内实时
定位方法。
随着深度学习的发展,现有研究逐渐从数据驱动角
度研究地图匹配问题。基于 seq2seq 学习框架,Feng
[89]提出了用于稀疏和噪声轨迹匹配的 DeepMM算法,
设计了两种轨迹增强方法来丰富数据,提高模型的地图
匹配性能。Hong [90] 利用一个多头自注意力MHSA
神经网络,该模型集成了位置特征、时间特征和功能性
土地使用背景,用于下一个位置预测。Hong [91]
了一个基于变压器解码器的神经网络来预测一个人的
下一个访问地点,其依据是历史地点、时间和旅行模式,
这些都是以往工作中经常忽略的行为维度。Jiang [82]
提出了一个基于深度学习的地图匹配模型 L2MM它使
用多个深度模型来学习从轨迹到对应路径的映射函数,
但该方法在映射路径的拓扑连续性方面仍存在一些问
题。Li[92]通过深度学习和基于关键点的几何排列,
出了一个针对黑暗环境的从粗到细的快速定位框架。
1.3.3 问题分析
建筑物地图是室内位置服务应用的基础,增强的信
文本、语音、图像、视频等形式一般按照场景类型
交通、医院、展馆、商超等进行必要的重构。目前,
托激光雷达和室内实景三维建模技术的建筑物地图数
据生产成本较高,如何结合建筑行业的信息化过程来提
高建筑物地图的生产水平最为迫切。此外,建筑物地图
分层分类专题要素的组织方式,对地图匹配以及面向用
户兴趣目标的路径规划沿途资源的高效调度加载和个
性化配置产生直接影响,也是当前手机室内场景实例化
现实增强走向实用化的一项挑战。
2结论与展望
目前,实例分割和 AR 方法受关注的程度与日俱增,
现有方法也取得了良好的识别和增强效果。本文综述
了智能手机端场景要素实例分割和增强现实方法,分析
了智能手机视频实例分割和增强现实的研究进展,并总
结了相关方法的特点与现存问题,提出手机场景要素实
例化 AR 研究思路,通过实例化识别室内场景要素,并与
建筑物地图进行匹配,将要素实例识别结果利用 AR
术增强显示,提升用户室内定位导航等位置服务的信息
化水平。
综上所述,实例分割与 AR 技术正处于快速发展时
期,例化 AR 方法在室内导航领域具有广阔的应用前
景。实例化 AR 属于视觉范畴,主要以手机内置相机为
传感器,实例识别精度和增强效果易受到场景光线及手
机性能、相机精度等因素的影响,利用具有普适性的室
内要素样本数据来训练并优化模型,可以提高实例化
AR技术在各类场景中的应用灵活性。可以预见随着算
法和硬件设备的不断优化,将提高实例化 AR 技术的处
理效率。对于场景识别中存在的错识别、漏识别等问
题,后续工作考虑加入建筑物路网轨迹约束来提高室内
场景识别的鲁棒性。目前视觉识别定位方法仍需构建
大型图像样本库,此外,由于各类室内定位技术的局限
性,可以尝试融合手机内置多源传感器,设计高效的协
同调度方案,建立不同室内场景与手机多源传感器特征
模式彼此之间的多层次关联耦合,进一步提高其定位应
用的普适性。未来,智能手机实例化 AR 方法有望大众
化地应用到对室内场景要素实例的增强级别,实现视频
场景要素实例化的解决方案和配套技术的工程化应用。
参考文献:
[1] 徐舒婷,郑先伟,谢潇,.面向虚实融合的单体建筑物实
时识别与定位[J]. 武汉大学学报 (信息科学版), 2023, 48(4):
542-549.
XU S T, ZHENG X W, XIE X, et al. Real-time building instance
recognition for vector map and real scene fusion[J]. Geomatics
and Information Science of Wuhan University, 2023, 48(4):
542-549.
[2] ROH D, LEE J. Augmented reality-based navigation using deep
learning-based pedestrian and personal mobility user recognitiona
comparative evaluation for driving assistance[J]. IEEE Access,
2023, 11: 62200-62211.
[3] WANG Z. An AR map virtual-real fusion method based on
element recognition[J]. ISPRS International Journal of Geo-
Information, 2023, 12: 126.
[4] 陈锐志,王磊,李德仁,.导航与遥感技术融合综述[J].
绘学报, 2019, 48(12): 1507-1522.
CHEN R Z, WANG L, LI D R, et al. A survey on the fusion
of the navigation and the remote sensing techniques[J]. Acta
刘建华,等:手机室内场景要素实例化现实增强方法研究进展 65
计算机工程与应用
www.ceaj.org
Computer Engineering and Applications 计算机工程与应用2024607
Geodaetica et Cartographica Sinica, 2019, 48(12): 1507-1522.
[5] 陈锐志,,牛晓光,.基于数据与模型双驱动的音
/惯性传感器耦合定位方法[J]. 测绘学报, 2022, 51(7):
1160-1171.
CHEN R Z, QIAN L, NIU X G, et al. Fusing acoustic ranges
and inertial sensors using a data and model dual-driven approach
[J]. Acta Geodaetica et Cartographica Sinica, 2022, 51(7):
1160-1171.
[6] 高翔,安辉,陈为,.移动增强现实可视化综述 [J].
机辅助设计与图形学学报, 2018, 30(1): 1-8.
GAO X, AN H, CHEN W, et al. A survey on mobile aug-
mented reality visualization[J]. Journal of Computer-Aided
Design & Computer Graphics, 2018, 30(1): 1-8.
[7] CHEN L Y, LI S B, BAI Q, et al. Review of image classifica-
tion algorithms based on convolutional neural networks[J].
Remote Sensing, 2021, 13(22): 4712.
[8] XU Y S, ZHANG H Z. Convergence of deep convolutional
neural networks[J]. Neural Networks, 2022, 153: 553-563.
[9] BOLYA D, ZHOU C, XIAO F, et al. YOLACT++: better real-
time instance segmentation[J]. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 2022, 44(2): 1108-1121.
[10] WU J, YARRAM S, LIANG H, et al. Efficient video instance
segmentation via tracklet query and proposal[C]//Proceed-
ings of the 2022 IEEE/CVF Conference on Computer Vision
and Pattern Recognition (CVPR), 2022: 949-958.
[11] LI D, LI R, WANG L, et al. You only infer once: cross-modal
meta- transfer for referring video object segmentation[C]//
Proceedings of the AAAI Conference on Artificial Intelli-
gence, 2022: 1297-1305.
[12] KINI J, SHAH M. Tag- based attention guided bottom-up
approach for video instance segmentation[C]//Proceedings
of the 2022 26th International Conference on Pattern Recog-
nition (ICPR), 2022: 3536-3542.
[13] ZHU F, YANG Z, YU X, et al. Instance as identity: a generic
online paradigm for video instance segmentation[J]. arXiv:
2208.03079, 2022.
[14] GANESH P, CHEN Y, YANG Y, et al. YOLO-ReT: towards
high accuracy real-time object detection on edge GPUs[C]//
Proceedings of the 2022 IEEE/CVF Winter Conference on
Applications of Computer Vision (WACV), 2021: 1311-1321.
[15] SHIN A, ISHII M, NARIHIRA T. Perspectives and prospects
on Transformer architecture for cross-modal tasks with lan-
guage and vision[J]. International Journal of Computer Vision,
2022, 130(2): 435-454.
[16] YAO H Y, WAN W G, LI X. End-to-end pedestrian trajectory
forecasting with Transformer network[J]. ISPRS International
Journal of Geo-Information, 2022, 11(1): 44.
[17] 田永林,王雨桐,王建功,.视觉 Transformer研究的关键
问题:现状及展望[J]. 自动化学报, 2022, 48(4): 957-979.
TIAN Y L, WANG Y T, WANG J G, et al. Key problems
and progress of vision Transformers: the state of the art and
prospects[J]. Acta Automatica Sinica, 2022, 48(4): 957-979.
[18] VASWANI A, SHAZEER N, PARMAR N, et al. Attention
is all you need[C]//Proceedings of the 31st International
Conference on Neural Information Processing Systems, Long
Beach, California, USA, 2017: 6000-6010.
[19] WANG Y, XU Z, WANG X, et al. End-to-end video instance
segmentation with Transformers[C]//Proceedings of the 2021
IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR), 2020: 8737-8746.
[20] ZHOU D, YU Z, XIE E, et al. Understanding the robustness
in vision Transformers[J]. arXiv:2204.12451, 2022.
[21] WU J, JIANG Y, SUN P, et al. Language as queries for refer-
ring video object segmentation[C]//Proceedings of the 2022
IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR), 2022: 4964-4974.
[22] JIN P, MOU L, XIA G S, et al. Anomaly detection in aerial
videos with transformers[J]. IEEE Transactions on Geosci-
ence and Remote Sensing, 2022, 60: 1-13.
[23] LI N, TU W, AI H. A sparse feature matching model using a
Transformer towards large-view indoor visual localization[J].
Wireless Communications and Mobile Computing, 2022:
1243041.
[24] MEHTAS, RASTEGARI M. MobileViT: light-weight, general-
purpose, and mobile-friendly vision Transformer[J]. arXiv:
2110.02178, 2021.
[25] HEO B, YUN S, HAN D, et al. Rethinking spatial dimen-
sions of vision Transformers[C]//Proceedings of the 2021
IEEE/CVF International Conference on Computer Vision
(ICCV), 2021: 11916-11925.
[26] PAN J, BULATA, TAN F, et al. EdgeViTs: competing light-
weight CNNs on mobile devices with vision Transformers
[C]//Proceedings of the European Conference on Computer
Vision, 2022.
[27] CHEN Y, DAI X, CHEN D, et al. Mobile-former: bridging
MobileNet and Transformer[C]//Proceedings of the 2022
IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR), 2022: 5260-5269.
[28] YANG C, QIAO S, YU Q, et al. MOAT: alternating mobile
convolution and attention brings strong vision models[C]//
Proceedings of the International Conference on Learning
Representations, 2023.
[29] TAN M, CHEN B, PANG R, et al. MnasNet: platform-aware
neural architecture search for mobile[C]//Proceedings of the
2019 IEEE/CVF Conference on Computer Vision and Pat-
tern Recognition (CVPR), 2018: 2815-2823.
66
计算机工程与应用
www.ceaj.org
2024607
[30] MEHTA S, RASTEGARI M, CASPI A, et al. ESPNet: effi-
cient spatial pyramid of dilated convolutions for semantic
segmentation[C]//Proceedings of the European Conference on
ComputerVision (ECCV 2018), Cham, 2018: 561-580.
[31] MEHTA S, RASTEGARI M, SHAPIRO L G, et al. ESPNetv2:
a light-weight, power efficient, and general purpose convo-
lutional neural network[C]//Proceedings of the 2019 IEEE/
CVF Conference on Computer Vision and Pattern Recogni-
tion (CVPR), 2018: 9182-9192.
[32] MAAZ M, SHAKER A M, CHOLAKKAL H, et al. EdgeNeXt:
efficiently amalgamated CNN-Transformer architecture for
mobile vision applications[C]//Proceedings of the European
Conference on Computer Vision, 2022.
[33] HOWARD A G, ZHU M, CHEN B, et al. MobileNets: effi-
cient convolutional neural networks for mobile vision appli-
cations[J]. arXiv:1704.04861, 2017.
[34] SANDLER M, HOWARD A G, ZHU M, et al. MobileNetV2:
inverted residuals and linear bottlenecks[C]//Proceedings of
the 2018 IEEE/CVF Conference on Computer Vision and
Pattern Recognition, 2018: 4510-4520.
[35] HOWARD A G, SANDLER M, CHU G, et al. Searching for
MobileNetV3[C]//Proceedings of the 2019 IEEE/CVF Inter-
national Conference on Computer Vision (ICCV), 2019:
1314-1324.
[36] MEHTA S, RASTEGARI M. Separable self- attention for
mobile vision Transformers[J]. arXiv:2206.02680, 2022.
[37] MA L. Application of AR in 3D model[C]//Proceedings of the
2021 2nd International Conference on Control, Robotics
and Intelligent System, Qingdao, China, 2021: 261-265.
[38] ZHOU Y, SUN B, QI Y, et al. Mobile AR/VR in 5G based
on convergence of communication and computing[J]. Tele-
communications Science, 2018, 34(8): 19-33.
[39] LI R P, ZHAO Z F, ZHOU X, et al. Intelligent 5G: when
cellular networks meet artificial intelligence[J]. IEEE Wire-
less Communications, 2017, 24(5): 175-183.
[40] GHASEMI Y, JEONG H, CHOI S H, et al. Deep learning-
based object detection in augmented reality: a systematic
review[J]. Computers in Industry, 2022, 139: 103661.
[41] HWANG S, LEE J, KANG S. Enabling product recognition
and tracking based on text detection for mobile augmented
reality[J]. IEEE Access, 2022, 10: 98769-98782.
[42] ZHOU B, GUVEN S. Fine-grained visual recognition in mobile
augmented reality for technical support[J]. IEEE Transac-
tions on Visualization and Computer Graphics, 2020, 26(12):
3514-3523.
[43] 王巍,王志强,赵继军,.基于移动平台的增强现实研究
[J]. 计算机科学, 2015, 42(Z11): 510-519.
WANG W, WANG Z Q, ZHAO J J, et al. Research of aug-
mented reality based on mobile platform[J]. Computer Sci-
ence, 2015, 42(Z11): 510-519.
[44] LE H, NGUYEN M, YAN W Q, et al. Augmented reality
and machine learning incorporation using YOLOv3 and
ARKit[J]. Applied Sciences-Basel, 2021, 11(13): 6006.
[45] LO VALVO A, CROCE D, GARLISI D, et al. A navigation
and augmented reality system for visually impaired people
[J]. Sensors, 2021, 21(9): 3061.
[46] REAL S, ARAUJO A. VES: a mixed-reality system to assist
multisensory spatial perception and cognition for blind and
visually impaired people[J]. Applied Sciences- Basel, 2020,
10(2): 523.
[47] ZHANG X C, YAO X Y, ZHU Y, et al. An ARCore based
user centric assistive navigation system for visually impaired
people[J]. Applied Sciences-Basel, 2019, 9(5): 989.
[48] VARELAS T, PENTEFOUNDAS A, GEORGIADIS C, et al.
An AR indoor positioning system based on anchors[J].
MATTER: International Journal of Science and Technology,
2020, 6: 43-57.
[49] LU F, ZHOU H, GUO L, et al. An ARCore-based augmented
reality campus navigation system[J]. Applied Sciences, 2021,
11(16): 7515.
[50] MARTIN A, CHERIYAN J, GANESH J J, et al. Indoor nav-
igation using augmented reality[J]. EAI Endorsed Transac-
tions on Creative Technologies, 2021: 168718.
[51] HUANG B C, HSU J, CHU E T, et al. ARBIN: augmented
reality based indoor navigation system[J]. Sensors (Basel,
Switzerland), 2020, 20(20): 5890.
[52] ZHOU B, GU Z, MA W, et al. Integrated BLE and PDR
indoor localization for geo- visualization mobile augmented
reality[C]//Proceedings of the 2020 16th International Con-
ference on Control, Automation, Robotics and Vision (ICARCV),
2020: 1347-1353.
[53] MAHAPATRA T, TSIAMITROS N, ROHR A, et al. Pedes-
trian augmented reality navigator[J]. Sensors, 2023, 23: 1816.
[54] SHARIN N A, NOROWI N, ABDULLAH L, et al. GoMap:
combining step counting technique with augmented reality
for a mobile- based indoor map locator[J]. Indonesian Jour-
nal of Electrical Engineering and Computer Science, 2023,
29: 1792.
[55] LI X, TIAN Y, ZHANG F, et al. Object detection in the con-
text of mobile augmented reality[C]//Proceedings of the
2020 IEEE International Symposium on Mixed and Aug-
mented Reality (ISMAR), 2020: 156-163.
[56] KAUL O, BEHRENS K, ROHS M. Mobile recognition and
tracking of objects in the environment through augmented
reality and 3D audio cues for people with visual impair-
ments[C]//Proceedings of the CHI Conference on Human
刘建华,等:手机室内场景要素实例化现实增强方法研究进展 67
计算机工程与应用
www.ceaj.org
Computer Engineering and Applications 计算机工程与应用2024607
Factors in Computing Systems, 2021: 1-7.
[57] CHEN J, ZHU Z. Real- time 3D object detection, recogni-
tion and presentation using a mobile device for assistive
navigation[J]. SN Computer Science, 2023, 4: 543.
[58] HSIEH C C, CHEN H M, WANG S K. On-site visual con-
struction management system based on the integration of
SLAM-based AR and BIM on a handheld device[J]. KSCE
Journal of Civil Engineering, 2023, 27: 4688-4707.
[59] VERMA P, AGRAWAL K, SARASVATHI V. Indoor naviga-
tion using augmented reality[C]//Proceedings of the 2020
4th International Conference on Virtual and Augmented
Reality Simulations, 2020.
[60] VERYKOKOU S, BOUTSIA M, IOANNIDIS C. Mobile aug-
mented reality for low-end devices based on planar surface
recognition and optimized vertex data rendering[J]. Applied
Sciences, 2021, 11(18): 8750.
[61] WANG Q, XIE Z. ARIAS: an AR-based interactive advertis-
ing system[J]. PLoS One, 2023, 18: e0285838.
[62] TSUBOKI Y, KAWAKAMI T, MATSUMOTO S, et al.A real-
time background replacement system based on estimated
depth for AR applications[J]. Journal of Information Pro-
cessing, 2023, 31: 758-765.
[63] BARUCH G, CHEN Z, DEHGHAN A, et al. ARKitScenes-
a diverse real-world dataset for 3D indoor scene understand-
ing using mobile RGB-D data [J]. arXiv:2111.08897, 2021.
[64] FEIGL T, PORADA A, STEINER S, et al. Localization limi-
tations of ARCore, ARKit, and hololens in dynamic large-
scale industry environments[C]//Proceedings of the 15th
International Conference on Computer Graphics Theory and
Applications, 2020: 307-318.
[65] LIU Z, LAN G, STOJKOVIC J, et al. CollabAR: edge-assisted
collaborative image recognition for mobile augmented reality
[C]//Proceedings of the 2020 19th ACM/IEEE International
Conference on Information Processing in Sensor Networks
(IPSN), 2020: 301-312.
[66] ZHANG W, LIN S, BIJARBOONEH F H, et al. EdgeXAR:
a 6-DoF camera multi-target interaction framework for MAR
with user-friendly latency compensation[C]//Proceedings of
the ACM on Human-Computer Interaction, 2021: 1-24.
[67] XIAO Y, AI T, YANG M, et al. A multi-scale representation
of point-of- interest (POI) features in indoor map visualiza-
tion[J]. International Journal of Geo- Information, 2020, 9
(4): 239.
[68] 李德仁 .论可量测实景影像的概念与应用
—从 4D 产品
5D产品 [J]. 测绘科学, 2007, 32(4): 5-7.
LI D R. On concept and application of digital measurable
images-from 4D production to 5D production[J]. Science of
Surveying and Mapping, 2007, 32(4): 5-7.
[69] 朱欣焰,周成虎,呙维,.全息位置地图概念内涵及其关
键技术初探[J]. 武汉大学学报(信息科学版), 2015, 40(3):
285-295.
ZHU X Y, ZHOU C H, GUO W, et al. Preliminary study on
conception and key technologies of the location-based pan-
information map[J]. Geomatics and Information Science of
Wuhan University, 2015, 40(3): 285-295.
[70] 闾国年,袁林旺,俞肇元.地理学视角下测绘地理信息再
透视[J]. 测绘学报, 2017, 46(10): 1549-1556.
LV G N, YUAN L W, YU Z Y. Surveying and mapping geo-
graphical information from the perspective of geography[J].
Acta Geodaetica et Cartographica Sinica, 2017, 46(10):
1549-1556.
[71] ZHU J, WU P, LEI X. IFC-graph for facilitating building infor-
mation access and query[J]. Automation in Construction,
2023, 148: 104778.
[72] LIU L, LI B, ZLATANOVA S, et al. Indoor navigation sup-
ported by the industry foundation classes (IFC): a survey
[J]. Automation in Construction, 2021, 121: 103436.
[73] WEI Z, LI X, HE Z. Semantic urban vegetation modelling
based on an extended CityGML description[C]//Proceedings
of the 2022 Digital Landscape Architecture Conference,
2022.
[74] TANG L, YING S, LI L, et al. An application-driven LOD
modeling paradigm for 3D building models[J]. ISPRS Journal
of Photogrammetry and Remote Sensing, 2020, 161: 194-207.
[75] DIAKITÉ A, AZ- VILARIÑO L, BILJECKI F, et al.
IFC2INDOORGML: an open-source tool for generating
IndoorGML from IFC[J]. The International Archives of the
Photogrammetry, Remote Sensing and Spatial Information
Sciences, 2022: 295-301.
[76] 罗竟妍.建筑物实景全息地图模型构建方法研究[D].
:北京建筑大学, 2021.
LUO J Y. Research on the construction method of realistic
holographic map model of building[D]. Beijing: Beijing Uni-
versity of Civil Engineering and Architecture, 2021.
[77] WANG Q, YE L, YUN L, et al. Pedestrian walking distance
estimation based on smartphone mode recognition[J]. Remote
Sensing, 2019, 11: 1140.
[78] WU Y, CHEN P, GU F, et al. HTrack : an efficient heading-
aided map matching for indoor localization and tracking[J].
IEEE Sensors Journal, 2019, 19(8): 3100-3110.
[79] JIANHUAL, GUOQIANG F, JINGYAN L, et al. Mobile phone
indoor scene features recognition localization method based
on semantic constraint of building map location anchor[J].
Open Geosciences, 2022, 14: 1268-1289.
[80] 刘建华.手机室内导航与位置服务[M]. ResearchGate,
2022: 69-79.
68
计算机工程与应用
www.ceaj.org
2024607
LIU J H. Mobile indoor navigation and location services
[M]. ResearchGate, 2022: 69-79.
[81] 于娟,杨琼,鲁剑锋,.高级地图匹配算法:研究现状和
趋势[J]. 电子学报, 2021, 49(9): 1818-1829.
YU J, YANG Q, LU J F, et al. Advanced map matching algo-
rithms: a survey and trends[J]. Acta Electronica Sinica, 2021,
49(9): 1818-1829.
[82] JIANG L, CHEN C, CHEN C. L2MM: learning to map match-
ing with deep models for low- quality GPS trajectory data
[J]. ACM Transactions on Knowledge Discovery from Data,
2022, 17: 1-25.
[83] 郑诗晨,盛业华,吕海洋.基于粒子滤波的行车轨迹路网
匹配方法[J]. 地球信息科学学报, 2020, 22(11): 2109-2117.
ZHENG S C, SHENG Y H, LV H Y. Vehicle trajectory-map
matching based on particle filter[J]. Journal of Geo-information
Science, 2020, 22(11): 2109-2117.
[84] OBRADOVIC D, LENZ H, SCHUPFNER M. Fusion of map
and sensor data in a modern car navigation system[J]. Jour-
nal of VLSI Signal Processing Systems for Signal Image &
Video Technology, 2006, 45(1/2): 111-122.
[85] 毛江云,吴昊,孙未未 .路网空间下基于马尔可夫决策过
程的异常车辆轨迹检测算法[J]. 计算机学报, 2018, 41(8):
1928-1942.
MAO J Y, WU H, SUN W W. Vehicle trajectory anomaly
detection in road network via Markov decision process[J].
Chinese Journal of Computers, 2018, 41(8): 1928-1942.
[86] CUI G, BIAN W, WANG X. Hidden Markov map matching
based on trajectory segmentation with heading homogeneity
[J]. GeoInformatica, 2021, 25(1): 179-206.
[87] HARDER D, SHOUSHTARI H, STERNBERG H. Real-time
map matching with a backtracking particle filter using geo-
spatial analysis[J]. Sensors (Basel, Switzerland), 2022, 22
(9): 3289.
[88] GUO G, YAN K, LIU Z, et al. Virtual wireless device-
constrained robust extended Kalman filters for smartphone
positioning in indoor corridor environment[J]. IEEE Sen-
sors Journal, 2023, 23(3): 2815-2822.
[89] FENG J, LI Y, ZHAO K, et al. DeepMM: deep learning based
map matching with data augmentation[J]. IEEE Transac-
tions on Mobile Computing, 2022, 21(7): 2372-2384.
[90] HONG Y, ZHANG Y, SCHINDLER K, et al. Context-aware
multi-head self- attentional neural network model for next
location prediction [J]. arXiv:2212.01953, 2022.
[91] HONG Y, MARTIN H, RAUBAL M. How do you go where?
Improving next location prediction by learning travel mode
information using transformers[C]//Proceedings of the 30th
International Conference on Advances in Geographic Infor-
mation Systems, 2022: 1-10.
[92] LI Q, CAO R, ZHU J, et al. Learn then match: a fast coarse-
to-fine depth image-based indoor localization framework for
dark environments via deep learning and keypoint-based
geometry alignment[J]. ISPRS Journal of Photogrammetry
and Remote Sensing, 2023, 195: 169-177.
刘建华,等:手机室内场景要素实例化现实增强方法研究进展 69
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
In this paper, we present an interactive advertising system based on augmented reality(AR) called ARIAS, which is manipulated with gestures for displaying advertising videos. Two-dimensional markers are defined in the system. The system captures the frame data through the camera in real time, uses OpenCV library to identify the predefined markers, and calculates the pose of markers captured by the camera. With OpenGL library, a virtual cubic model is created at the position of the marker, and thus videos or images are displayed on the six faces of the cube. The virtual cube, together with the original frame data collected by the camera, is displayed in the interactive window to achieve the augmented reality effect. Customers are accessible to various advertising content by observing the marker from different positions. The system, meanwhile, supports gesture operation in order to make the customers pay attention to the content they are interested in with one hand. The MediaPipe Hand framework is used to extract the landmarks of hands, based on which, a series of gestures are designed for interactive operation. The efficiency and accuracy of the system are tested and analyzed with the result, indicating that the system has high reliability and good interactiveness. This system is open at https://github.com/wanzhuxie/ARIAS/tree/PLOS-ONE .
Article
Full-text available
Accurate activity location prediction is a crucial component of many mobility applications and is particularly required to develop personalized, sustainable transportation systems. Despite the widespread adoption of deep learning models, next location prediction models lack a comprehensive discussion and integration of mobility-related spatio-temporal contexts. Here, we utilize a multi-head self-attentional (MHSA) neural network that learns location transition patterns from historical location visits, their visit time and activity duration, as well as their surrounding land use functions, to infer an individual's next location. Specifically, we adopt point-of-interest data and latent Dirichlet allocation for representing locations' land use contexts at multiple spatial scales, generate embedding vectors of the spatio-temporal features, and learn to predict the next location with an MHSA network. Through experiments on two large-scale GNSS tracking datasets, we demonstrate that the proposed model outperforms other state-of-the-art prediction models, and reveal the contribution of various spatio-temporal contexts to the model's performance. Moreover, we find that the model trained on population data achieves higher prediction performance with fewer parameters than individual-level models due to learning from collective movement patterns. We also reveal mobility conducted in the recent past and one week before has the largest influence on the current prediction, showing that learning from a subset of the historical mobility is sufficient to obtain an accurate location prediction result. We believe that the proposed model is vital for context-aware mobility prediction. The gained insights will help to understand location prediction models and promote their implementation for mobility applications.
Article
Full-text available
This paper presents an integrated solution for 3D object detection, recognition, and presentation to increase accessibility for various user groups in indoor areas through a mobile application. The system has three major components: a 3D object detection module, an object tracking and update module, and a voice and AR-enhanced interface. The 3D object detection module consists of pre-trained 2D object detectors and 3D bounding box estimation methods to detect the 3D poses and sizes of the objects in each camera frame. This module can easily adapt to various 2D object detectors (e.g., YOLO, SSD, mask RCNN) based on the requested task and requirements of the run time and details for the 3D detection result. It can run on a cloud server or mobile application. The object tracking and update module minimizes the computational power for long-term environment scanning by converting 2D tracking results into 3D results. The voice and AR-enhanced interface integrates ARKit and SiriKit to provide voice interaction and AR visualization to improve information delivery for different user groups. The system can be integrated with existing applications, especially assistive navigation, to increase travel safety for people who are blind or have low vision and improve social interaction for individuals with autism spectrum disorder. In addition, it can potentially be used for 3D reconstruction of the environment for other applications. Our preliminary test results for the object detection evaluation and real-time system performance are provided to validate the proposed system.
Article
Full-text available
span lang="EN-US">In recent years, indoor navigation and localization has become a popular alternative to paper-based maps. However, the most popular navigation approach of using the global positioning satellite (GPS) does not work well indoors and the majority of current approaches designed for indoor navigation does not provide realistic solutions to key challenges, including implementation cost, accuracy, longer computation processes, and practicality. The step count method was proposed to solve these issues. This paper introduces GoMap - a mobile-based indoor locator map application, which combines the step counting technique and augmented reality (AR). The design and architecture of GoMap is described in this paper. Two small-scale studies were conducted to demonstrate the performance of GoMap. The first study found that GoMap’s performance and accuracy was comparable to other step counting app such as “Google Fit”. The second part of the study demonstrated the feasibility of the application when used in a real-world setting. The findings from the studies show that GoMap is a promising application that can help the indoor navigation process.</span
Article
Full-text available
This study attempts to address a challenge regarding the extraction of building information, which is one of the fundamental tasks that needs to be addressed in the construction domain. Current technologies, such as relational databases, have difficulty in efficiently and effectively managing and querying the interconnected building information with full of hidden relationships. To address this problem, this study adopted the graph-theory-based graph database technology to reveal hidden relationships within building information. A model-driven approach was developed to enable a full conversion of Industry Foundation Classes data into labeled property graph, which is referred to as IFC-Graph. The result shows that IFC-Graph can represent interconnected building information and reveal hidden relationships, supporting effective and efficient building information access and query. This study can benefit a vast number of future studies in the area of building information query by improving its accessibility and queryability.
Article
Full-text available
Navigation is often regarded as one of the most-exciting use cases for Augmented Reality (AR). Current AR Head-Mounted Displays (HMDs) are rather bulky and cumbersome to use and, therefore, do not offer a satisfactory user experience for the mass market yet. However, the latest-generation smartphones offer AR capabilities out of the box, with sometimes even pre-installed apps. Apple’s framework ARKit is available on iOS devices, free to use for developers. Android similarly features a counterpart, ARCore. Both systems work well for small spatially confined applications, but lack global positional awareness. This is a direct result of one limitation in current mobile technology. Global Navigation Satellite Systems (GNSSs) are relatively inaccurate and often cannot work indoors due to the restriction of the signal to penetrate through solid objects, such as walls. In this paper, we present the Pedestrian Augmented Reality Navigator (PAReNt) iOS app as a solution to this problem. The app implements a data fusion technique to increase accuracy in global positioning and showcases AR navigation as one use case for the improved data. ARKit provides data about the smartphone’s motion, which is fused with GNSS data and a Bluetooth indoor positioning system via a Kalman Filter (KF). Four different KFs with different underlying models have been implemented and independently evaluated to find the best filter. The evaluation measures the app’s accuracy against a ground truth under controlled circumstances. Two main testing methods were introduced and applied to determine which KF works best. Depending on the evaluation method, this novel approach improved the accuracy by 57% (when GPS and AR were used) or 32% (when Bluetooth and AR were used) over the raw sensor data.
Article
Full-text available
The indoor positioning technology is the key to achieve seamless indoor/outdoor location-based services (LBS). As a typical nonlinear information, the pedestrian accessibility improves the performance of the indoor positioning system. Based on a framework and related positioning algorithms of tightly coupled heterogeneous observables from Wi-Fi/BLE/PDR, this work presents a real-time approach for pedestrian accessibility and floor map constrained hybrid indoor positioning method using virtual wireless devices on smartphones. The non-linear characteristic of accessibility is transformed into virtual wireless devices to constrain the position estimation at the observation level. The evaluation of the proposed technique with three different criteria shows that the accuracy, reliability, and the user experience of the estimated locations are comparable to particle filter (PF) results. In a typical corridor environment, the proposed method achieves 0.93 m average positioning accuracy while balancing the computational time consumption.
Article
Recent technological advances in Virtual Reality (VR) and Augmented Reality (AR) enable users to experience a high-quality virtual world. The AR technology is attracting attention in various fields and is also used in the entertainment field such as museums. However, the existing AR technology generally requires specialized sensors such as Light Detection And Ranging (LiDAR) sensors and feature points, which require cost in terms of time and money. The authors have proposed a real-time background removal method and an AR system based on the estimated depth of the captured image to provide a virtual space experience using mobile devices such as smartphones. This paper describes an AR virtual space system that dynamically changes the replaced background based on motion information transmitted from the user's device.
Article
Recently, Simultaneous Localization and Mapping (SLAM) has been integrated into the development of AR systems. Using conventional SLAM adoption method to overlay a building information model (BIM) onto the image of a real scene on handheld device in a construction site will result in poor model overlaying accuracy owing to the relatively unique environment and monotonous texture of such environment. This paper proposes an adoption method of SLAM suitable for construction environment to improve positioning accuracy for the fitting of BIMs for the AR presentation. In addition, a AR system to visualize construction progress on-site is developed. It allows site personnel to input on-site work progress through an AR device, then compares the originally planned construction progress with the actual on-site progress, and presents BIM components using different colors in the AR mode to show whether work progress on each component is ahead or behind schedule intuitively.
Article
BDS started providing services worldwide since 2020. It can offer centimeter level positioning service when an open sky is available. BDS is now making a step further to become a more ubiquitous, integrated and intelligent system. At the meantime, high precise indoor positioning techniques are still under developments. Among these techniques, Apple has adapted the ultra-wideband (UWB) technique to iPhone and tried to push this technique to mass-market. While other new positioning techniques such as 5G, acoustic ranging, WiFi round-trip-time (RTT) and bluetooth (BT) angle of arrive (AoA) which support pervasive smartphones are alse competitive. For indoor positioning, it is still facing the challenges of low accuracy, high cost, small signal coverage and limited capability of generalization. Fusing multiple positioning sources method is one of the important approaches to solve these problems. Especially the fusing combination of low-cost inertial positioning source and high-accuracy radio frequency/acoustic positioning source has practical applicable value at present. Pedestrian dead reckoning (PDR) positioning source based on inertial sensors has advantage of the capability to alleviate error accumulation in double integration. However, it is still facing difficulties because of the complex of smartphone holding poses and the diversity of sensor hardware performance. Furthermore, this step-wise approach also limits the position update rate to less than 2 Hz. In order to develop a low-cost, high-precision and wide-coverage indoor positioning solution, a new approach of fusing acoustic ranges and inertial sensors by using a data and model dual-driven method is proposed in this paper. The data driven PDR solution part is developed based on a neural network, it is a deep learning approach by training a network to learn the velocity vector using the inertial measurements as input. The learned velocity vector is then used to propagate the PDR trajectory, which is further integrated with the high precise acoustic ranging measurements by an extended Kalman filter(EKF) in the model driven part. The proposed solution can offer a positioning accuracy of 0.23 meters at a position update rate of 20 Hz.