ArticlePDF Available

Summarization of ICU Patient Motion from Multimodal Multiview Videos

Authors:

Abstract and Figures

Clinical observations indicate that during critical care at the hospitals, patients sleep positioning and motion affect recovery. Unfortunately, there is no formal medical protocol to record, quantify, and analyze patient motion. There is a small number of clinical studies, which use manual analysis of sleep poses and motion recordings to support medical benefits of patient positioning and motion monitoring. Manual processes are not scalable, are prone to human errors, and strain an already taxed healthcare workforce. This study introduces DECU (Deep Eye-CU): an autonomous mulitmodal multiview system, which addresses these issues by autonomously monitoring healthcare environments and enabling the recording and analysis of patient sleep poses and motion. DECU uses three RGB-D cameras to monitor patient motion in a medical Intensive Care Unit (ICU). The algorithms in DECU estimate pose direction at different temporal resolutions and use keyframes to efficiently represent pose transition dynamics. DECU combines deep features computed from the data with a modified version of Hidden Markov Model to more flexibly model sleep pose duration, analyze pose patterns, and summarize patient motion. Extensive experimental results are presented. The performance of DECU is evaluated in ideal (BC: Bright and Clear/occlusion-free) and natural (DO: Dark and Occluded) scenarios at two motion resolutions in a mock-up and a real ICU. The results indicate that deep features allow DECU to match the classification performance of engineered features in BC scenes and increase the accuracy by up to 8% in DO scenes. In addition, the overall pose history summarization tracing accuracy shows an average detection rate of 85% in BC and of 76% in DO scenes. The proposed keyframe estimation algorithm allows DECU to reach an average 78% transition classification accuracy.
Content may be subject to copyright.
Summarization of ICU Patient Motion from Multimodal Multiview Videos
Carlos TorresKenneth RoseJeffrey C. Fried*B.S. Manjunath
University of California Santa Barbara *Santa Barbara Cottage Hospital
{carlostorres, rose, manj}@ece.ucsb.edu jfried@sbch.org
Abstract
Clinical observations indicate that during critical care
at the hospitals, patients sleep positioning and motion affect
recovery. Unfortunately, there is no formal medical proto-
col to record, quantify, and analyze patient motion. There is
a small number of clinical studies, which use manual anal-
ysis of sleep poses and motion recordings to support med-
ical benefits of patient positioning and motion monitoring.
Manual processes are not scalable, are prone to human er-
rors, and strain an already taxed healthcare workforce. This
study introduces DECU (Deep Eye-CU): an autonomous
mulitmodal multiview system, which addresses these issues
by autonomously monitoring healthcare environments and
enabling the recording and analysis of patient sleep poses
and motion. DECU uses three RGB-D cameras to moni-
tor patient motion in a medical Intensive Care Unit (ICU).
The algorithms in DECU estimate pose direction at differ-
ent temporal resolutions and use keyframes to efficiently
represent pose transition dynamics. DECU combines deep
features computed from the data with a modified version
of Hidden Markov Model to more flexibly model sleep pose
duration, analyze pose patterns, and summarize patient mo-
tion. Extensive experimental results are presented. The
performance of DECU is evaluated in ideal (BC: Bright
and Clear/occlusion-free) and natural (DO: Dark and Oc-
cluded) scenarios at two motion resolutions in a mock-up
and a real ICU. The results indicate that deep features al-
low DECU to match the classification performance of en-
gineered features in BC scenes and increase the accuracy
by up to 8% in DO scenes. In addition, the overall pose
history summarization tracing accuracy shows an average
detection rate of 85% in BC and of 76% in DO scenes. The
proposed keyframe estimation algorithm allows DECU to
reach an average 78% transition classification accuracy.
Keywords: Healthcare, Multimodal, Multiview, Deep
Features, Hidden Markov Models, Multimodal Emission,
Pose Transitions, ICU Monitoring, Motion Summarization.
1. Introduction
The recovery rates of patients admitted to the ICU with
similar conditions vary vastly and often inexplicably [4].
ICU patients spend most of their time on a bed, while cy-
cling over various decubitus positions. The rate and range
of patient motion are believed to be indicators of distress
and increased or decreased recovery [25]. Although patients
are continuously monitored by staff, there are no procedures
to reliably analyze and understand pose variations from pa-
tient observations such as videos. Nevertheless, limited
clinical studies [11] suggest that patient health-dependent
positioning and controlled motion enhance patient recovery,
while inadequate poses and uncontrolled or erratic motion
aggravates wounds and injuries.
While recording and analyzing the motion of patients us-
ing human observers is a straightforward solution, it puts
strain on an already taxed healthcare workforce. It does not
scale with the volume of the data and is prone to human er-
rors. This work introduces DECU, a multimodal multiview
autonomous system for patient positioning and motion anal-
ysis. DECU enables the following analytical features for
healthcare:
1. motion quantification (rate and range) to aid the anal-
ysis and prevention of decubitus ulcers (DUs) or bed
sores;
2. timely detection of erratic, harmful, or distressed mo-
tion that can be used to stop patients from pulling intra-
venous lines or falling off the bed; and
3. summarization of pose sequences (pose history) over
extended periods of time, which can be used to evalu-
ate quality sleep quality without intrusive equipment.
DECU incorporates algorithms for keyframe extraction
and pose (state) duration estimation to autonomously and
unobtrusively monitor patient motion at different tempo-
ral resolutions. DECU combines deep features from mul-
timodal multiview data with Hidden Semi-Markov Models
(HSMM) to more flexible model pose durations. DECU
extracts keyframes from multiple sources to reliably repre-
sent transitions and monitor and summarize patient motion.
1
arXiv:1706.09430v1 [cs.CV] 28 Jun 2017
DECU is designed, trained, and tested in a mock-up ICU
and tested in a real ICU. Fig 1 shows the major elements
of the framework (stages A-H). Stage A (top right) contains
the references. Stage B (bottom left) shows frames from
a sample sequence recorded using multimodal (RGB and
Depth) multiview (three cameras) sources. At stage C, the
framework selects the summarization resolution and acti-
vates the keyframe identification stage (for training). Stage
D contains the motion thresholds (dense optic-flow esti-
mated at training) to distinguish between the motion types
and account for depth sensor noise. Deep features are ex-
tracted at stage E. Stage F shows the keyframe computa-
tion, which compresses motion and encodes motion seg-
ments (encoding of duration of poses and transitions). Stage
G shows the multimodal multiview HMM trellis with two
scene conditions. Finally, stage H shows the results: pose
history and pose transition summarizations.
DECU summarization is evaluated in ideal (BC: Bright
and Clear/occlusion-free) and natural (DO: Dark and Oc-
cluded) scenarios at two motion resolutions in a mock-
up and a real ICU. Experimental results indicate that us-
ing deep features for pose representation allows DECU to
match the classification performance of engineered features
in BC scenes and increases the accuracy by up to 8% in DO
scenes. The overall pose history summarization (coarser
time resolution) tracing accuracy shows an average detec-
tion rate of 85% in BC scenes and of 76% in DO scenes.
The performance of pose transition summarization (finer
time resolution) depends directly on the range of motion,
dissimilarity between poses, and direction of rotation. The
proposed multimodal multiview keyframe estimation algo-
rithm allows DECU to reach a mean transition classification
accuracy of 78% using a maximum of five pseudo-poses
(keyframes) to represent a transition.
1.1. Background
In August 2016, Harvard Medical School published a re-
port stating that monitoring ICUs can save up to $15 bil-
lion by saving $20,000 in each of the 750,000 ICU beds
in the U.S. with preventive care and by reducing the effect
of preventable ICU-related conditions such as poor qual-
ity of sleep and DUs. The U.S. Department of Health and
Human Services 1states that the U.S. ICU expenditure is
about $130 million per year and, at its current state, rises by
$5 billion per year. The ICUs in the U.S. receive about five
million patients per year. The average ICU stay is 9.3 days
and patient mortality rate ranges from 10 to 30% depending
on health conditions.
Clinical studies covering sleep analysis indicate that
sleep hygiene directly impacts healthcare. In addition, qual-
ity of sleep and effective patient rest are correlated to shorter
hospital stays, increased recovery rates, and decreased mor-
1U.S. Department of Health Services – online report Feb 2016
tality rates. Clinical applications that correlate body pose
and movement to medical conditions include sleep apnea –
where the obstructions of the airway are affected by supine
positions [16]. Pregnant women are recommended to sleep
on their sides to improve fetal blood flow [10]. The findings
of [2], [8], and [26] correlate sleep positions with quality
of sleep and its various effects on patient health. Decu-
bitus ulcers (bed sores) appear on bony areas of the body
and are caused by continuous decubitus positions 2. Al-
though nefarious, bed sores can be prevented by manipu-
lating patient poses over time. Standards of care require
that patients be rotated every two hours. However, this
protocol has very low compliance and in the U.S., a very
high number of ICU patients develop DUs [18]. There is
little understanding about the set of poses and pose dura-
tions that cause or prevent DU incidence. Studies that ana-
lyze pose durations, rotation frequency, rotation range, and
the duration of weight/pressure off-loading are required, as
are the non-obtrusive measuring tools to collect and ana-
lyze the relevant data. Additional studies analyze pose ma-
nipulation effects on treatment of severe acute respiratory
failure such as: ARDS (Adult Respiratory Distress Syn-
drome), pneumonia, and hemodynamics in patients with
various forms of shock. These examples highlight the im-
portance of DECU’s autonomous patient monitoring and
summarization tasks. They accentuate the need and chal-
lenges faced by the framework, which must be capable of
adapting to hospital environments and supporting existing
infrastructure and standards of care.
1.2. Related Work
There is a large body of research that focuses on recog-
nizing and tracking human motion. The latest developments
in deep features and convolutional neural network architec-
tures achieve impressive performance; however, these re-
quire large amounts of data [3], [24], [1], and [23]. These
methods tackle the recognition of actions performed at the
center of the camera plane, except for [19], which uses static
cameras to analyze actions. Method [19] allows actions to
not be centered on the plane; however, it requires scenes
with good illumination and no occlusions. At its current
stage of development the DECU framework cannot collect
the large number of samples necessary to train a deep net-
work without disrupting the hospital.
Multi-sensor and multi-camera systems and methods
have been applied to smart environments [6] and [27]. The
systems require alterations to existing infrastructure making
their deployment in a hospital logistically impossible. The
methods are not designed to account for illumination varia-
tions and occlusions and do not account for non-sequential,
subtle motion. Therefore, these systems and methods can-
not be used to analyze patient motion in a real ICU where
2Online Medical Dictionary
2
Figure 1: Stages of the DECU framework, which uses multimodal multiview (MM) data and the modified Hidden Semi-
Markov Modeling to monitor patient motion. From left-to-right (A to H): the set of references is shown on stage A (top-left);
(A1) a dictionary of poses and pose transitions, and (A2) a lattice showing possible motion dynamics between poses. Stage
B (bottom-left) shows the multimodal multiview input video. Stage C (center-left) selects the summarization resolution and
activates keyframe identification when required. Stage D (center) integrates the motion thresholds (estimated at training) to
account for various levels of motion resolution and sensor noise. Stage E (bottom-center) represents the feature extraction
block via a convolutional neural network. Stage F (center-right) shows the keyframe identification process using algorithm 1.
Stage G (bottom-right) shows the multimodal multiview HMM trellis, which encodes illumination and occlusion variations.
Stage H (top-right) shows the two possible summarization outputs (H1) pose history and (H2) pose transitions.
patients have limited or constrained mobility and the scenes
have random occlusions and unpredictable illumination.
Healthcare applications of pose monitoring include the
detection and classification of sleep poses in controlled en-
vironments [7]. Static pose classification in a range of
simulated healthcare environments is addressed in [22],
where the authors use modality trust and RGB, Depth, and
Pressure data. In [21], the authors introduce a coupled-
constrained optimization technique that allows them to re-
move the pressure sensor and increase pose classification
performance. However, neither method analyzes poses over
time or pose transition dynamics. A pose detection and
tracking system for rehabilitation is proposed in [12]. The
system is developed and tested in ideal scenarios and cannot
be used to detect constrained motion. In [13] a controlled
study focuses on work flow analysis by observing surgeons
in a mock-up operating room. A single depth camera and
Radio Frequency Identification Devices (RFIDs) are used in
[9] to analyze work flows in a Neo-Natal ICU (NICU) envi-
ronment. These studies focus on staff actions and disregard
patient motion. Literature search indicates that the DECU
framework is the first of its kind. It studies patient motion
in a mock-up and a real ICU environment. DECU’s tech-
nical innovation is motivated by the shortcomings of pre-
vious studies. It observes the environment from multiple
views and modalities, integrates temporal information, and
accounts for challenging natural scenes and subtle patient
movements using principled statistics.
1.3. Proposed Approach
DECU is a new framework to monitor patient motion in
ICU environments at two motion resolutions. Its elements
include time-series analysis algorithms and a multimodal
multiview data collection system. The algorithms analyze
poses at two motion resolutions (sequence of poses and
pose transition directions). The system is capable of col-
lecting and representing poses from multiview multimodal
data. The views and modalities are shown in Figure 2 (a)
and (b). A sample motion summary is shown in Figure 2
(c). Patients in the ICU are often bed-ridden or immobi-
3
lized. Overall, their motion can be unpredictable, heavily
constrained, slow and subtle, or aided by caretakers. The
two resolutions address different medical needs. Pose his-
tory summarization is the coarser resolution. It provides
a pictorial representation of poses over time (i.e., the his-
tory). The applications of the pose history include preven-
tion and analysis of DUs and analysis of sleep-pose effects
on quality of sleep. The pose transition summarization is
the finer resolution. DECU looks at the transition poses
that occur while a patient transitions between two clearly
defined sleep poses. Physical therapy evaluation is one ap-
plication of transition summarization.
Main Contributions of this work:
1. An adaptive framework capable of monitoring patient
motion at various resolutions. The algorithm detects
patient motion behavior and summarizes the sequence
of sleep poses and the subtle motion and direction be-
tween two poses using segments.
2. A non-disruptive and non-obtrusive monitoring system
robust to natural healthcare scenarios and conditions
such as variable illumination and partial occlusions.
3. An algorithm that effectively compresses sleep pose
transitions using a subset of the most informative and
most discriminative keyframes. The algorithm incor-
porates data from all views and modalities to identify
keyframes and increase monitoring resolution.
4. A fusion technique that incorporates observations from
multiple modalities and views into emission probabil-
ities to leverage complementary information and esti-
mate intermediate poses and transitions over time.
2. DECU System Description
The DECU system is modular and adaptive. It is com-
posed of three nodes and each node has three modalities
(RGB, Depth, and Mask). At the heart of each node is
a Raspberry Pi3 running Linux Ubuntu, which controls a
Carmine RGB-D cameras3. The units are synchronized us-
ing TCP/IP communication. DECU combines information
from multiple views and modalities to overcome scene oc-
clusions and illumination changes.
2.1. Multiple modalities (Multimodal).
Multimodal studies use complementary modalities to
classify static sleep poses in natural ICU scenes with large
variations in illumination and occlusions. This study uses
the findings from those studies (which are also validated in
the experimental section) regarding the benefits of multi-
modal systems.
3Primesense, manufacturer of Carmine sensors, was acquired by Apple
Inc. in 2013; however, similar devices can be purchased from structure.io
RGB (R) Standard RGB video data provides reliable in-
formation to represent and classify human sleep poses in
scenes with relatively ideal conditions. However, most peo-
ple sleep in imperfectly illuminated scenarios, using sheets,
blankets, and pillows that block and disturb sensor measure-
ments. The systems collects RGB color images of dimen-
sions 640 ×480 from each actor in each of the scene condi-
tions, and extracts pose appearance features representative
of the lines in the human body (i.e., limbs and extremities).
Depth (D) Infrared depth cameras can be resilient to il-
lumination changes. The Eye-CU system uses Primense
Carmine devices to collect depth data. The devices are de-
signed for indoor use and can acquire images of dimensions
640×480. These sensors use 16 bits to represent pixel inten-
sity values, which correspond to the distance from sensor to
a point in the scene. Their operating distance range is 0.8m
to 3.5m; and their spatial resolution for scenes 2.0m away
is 3.5mm for the horizontal (x) and vertical (y) axes, and
30 mm along the depth (z) axis. The systems uses the depth
images to represent the 3-dimensional shape of the poses.
The usability of these images, however, depends on depth
contrast, which is affected by the deformation properties of
the mattress and blanket present in ICU environments.
2.2. Multiple views (Multiview).
The studies from [21] and [15] show that analyzing ac-
tions from multiple views and multiple orientations greatly
improves detection. In particular, the studies indicate that
multiple views provide algorithmic view and orientation In-
dependence.
2.3. Time Analysis (Hidden Semi-Markov Models).
ICU patients are often immobilized or recovering. They
move subtly and slowly (very different from the walking
or running motion). DECU effectively monitors subtle and
abrupt patient motion by breaking the motion cues into seg-
ments.
3. Data Collection
Pose data is collected in a mock-up ICU with seven ac-
tors and tested in medical ICU with two real patients. The
diagram in Figure 2 (b) shows the top-view of the rigged
mock-up ICU room and the camera views. In the mock-
up ICU, actors are asked follow the same test sequence of
poses. The sequence is set at random using a random num-
ber generator. Figure 2 (c) shows a sequence of 20 observa-
tions, which include ten poses (p1to p10) and ten transitions
(t1to t10) with random transition direction.
All actors in the mock-up ICU are asked to assume
and hold each of the poses while data is being recorded
from multiple modalities and views. A total of 28 ses-
sions are recorded: 14 under ideal conditions (BC: bright
4
and clear) and 14 under challenging conditions (DO: dark
and occluded). The annotated dataset will available at
vision.ece.ucsb.
3.1. ICU Rooms Infrastructure.
The DECU system and algorithmic elements are de-
signed, tested, and refined in a mock-up ICU with actors
and simulated hospital scenarios. Once ready for real-world
testing, DECU is deployed in a medical ICU with real pa-
tients and hospital scene conditions, and where medical ex-
perts evaluate its benefits.
The mock-up ICU room The mock-up ICU room allows
researchers to collect data, design and test algorithms, and
evaluate and refine the DECU system and algorithms. Three
views of the mock-up ICU are shown in Figure 2.
The real ICU room DECU is currently deployed in a real
ICU at a local community hospital where medical experts
validate its benefits and performance and explore its appli-
cations. The system nodes are battery powered and the three
nodes account for unexpected occlusions and illumination
changes. Views of the medical ICU are shown in Figure 3
3.2. Pose Transitions.
The actors follow the sequence poses and transitions
shown in Stage A from Figure 1. Each initial pose has
10 possible final poses (inclusive) and each final pose can
be arrived to by rotating left or right. The combination of
pose pairs and transition directions generates a set of 20 se-
quences for each initial pose. There are 10 possible initial
poses. One actor and one recording session generates 200
sequence pairs.
3.3. Feature Selection.
Previous findings indicate that engineered features such
as geometric moments (gMOMs) and histograms of ori-
ented gradients (HOG) are suitable for the classification of
sleep poses. However, these features are limited in their
ability to represent body configurations in dark and oc-
cluded scenarios. The latest developments in deep learning
and feature extraction led this study to consider deep fea-
tures extracted from the VGG [17] and the Inception [20]
architectures. Experimental results (see Sec 5.1) indicate
that Inception features perform better than gMOMs, HOG,
and VGG features. Parameters for gMOM and HOG extrac-
tion are obtained from [22]. Background subtraction and
calibration procedures from [5] are applied prior to feature
extraction.
4. Problem Description
Patients in the ICU spend most of their time in bed and
their motion is limited to a small set of poses. Practitioners
manipulate the patient poses to prevent DUs, evaluate sleep
hygiene, and enhance recovery rates among other. DECU
uses videos from multiple views and modalities to moni-
tor patient poses and their transitions. However, it is nec-
essary that the system and the algorithms properly handle
motion rates (speed) and motion ranges. For instance, pose
history summarization analysis patients poses over a longer
period of time (e.g., eight hours) at a very low sampling
rate. The pose transition summarization is another exam-
ple. The summarization and the analysis involves identify-
ing the set of pseudo poses associated with the transition
and quantifying the direction of rotation. The are various
challenges in DECU. The first main challenge arouses by
conventional algorithms being unable model pose durations
effectively. The second challenges involves detecting the di-
rection of motion and rotation when transitioning between
poses. The last main challenge involves accurate represen-
tation of pseudo poses and keyframe estimation. The chal-
lenges and approaches are discussed in this section.
The DECU system uses Mmultimodal cameras sta-
tioned at different locations to obtaining Vviews of the pa-
tients and estimate pose transitions such as the one shown
in Figure 2 (c). Note that there are two directions of rota-
tion for the a patient or actor to transition from the faller
facing up (falU) position to the fetal laying left (fetL) posi-
tion. Features extracted from video frames F={ft}, for
1tTto construct feature vectors X=X1:Tare used
to represent non-directly observable poses (Y=Y1:T). The
first objective of DECU is to find the sequence of poses
(Y=Y1:T) that probabilistically can best represent the ob-
servations i.e., Pr Y,X= Pr Y1:T, X1:T.
Temporal patterns caused by sleep-pose transitions are
simulated and analyzed using Hidden Semi-Markov Models
(HSMMs) as described in Section 4.2.3. The interactions
between the modalities to accurately represent a pose using
different sensor measurements are encoded into the emis-
sion probabilities. Scene conditions are encoded into the
set of states (the analysis of two scenes doubles the num-
ber of poses). Conventional Markov assumptions support
DECU and ideally fit most of its analysis. However, HMMS
are limited in their ability to distinguish between poses and
pseudo-poses (i.e., transitory, short time body pose config-
urations observed when transitioning between poses) based
on pose duration. This is because, by design, HMMs model
the probability of staying in a given pose as a geometric dis-
tribution Pri(d)=(aii)d1(1aii ), where dis the duration
in pose i, and aii is the self-transition probability of pose i.
More details are discussed in this section and subsequent
subsections. Table 1 describes the DECU variables.
5
(a) (b) (c)
Figure 2: The transition data is collected in a mock-up ICU and a real ICU: (a) shows the relative position of the cameras with
respect to the ICU room and ICU bed; (b) shows a set of randomly selected poses and pose transitions, which are represented
by lines (dashed, dotted, and solid lines defined in the legend box); (c) shows a sample set of sleep-pose transitions and
rotation directions.
Figure 3: Top view of the node locations (center of the image) and views of the real a medical ICU room and ICU patient.
4.1. Hidden Markov Models (HMMs)
HMMs are a generative approach that models the var-
ious poses (pose history) and pseudo-poses (pose transi-
tions summarization) as states. The hidden variable or state
at time step k(i.e., t=k) is yk(statekor posek) and
the observable or measurable variables (x(v)
k,m, the vector
of image features extracted from the k-th frame, the m-
th modality, and the v-th view) at time t=kis xk(i.e.,
xk=x(v)
k,m ={Rk, Dk, ...Mk}). The first order Markov
assumption indicates that at time t, the hidden variable yt,
depends only on the previous hidden variable yt1. At time
tthe observable variable xtdepends on the hidden variable
6
DECU VARIABLES
SYMBOL DESCRIPTION
ATransition probability matrix; AR|P|×|P|and A={aij}
ai,j Probability of transitioning from pose ito j
BEmission probability matrix R|P|and B={µin}
buBeginning of the u-th segment with b1= 1
Dkk-th frame from the depth modality video
D Face-Down patient pose
dSegment duration
duSegment duration for u-th segment
HMM Abbreviation for Hidden Markov Model
HSMM Abbreviation for Hidden Semi-Markov Model
KData set size, K=|X |
kData point index, 1kK
KF Set of keyframes representing a pose transition
L Laying-Left patient pose
l,m, and nDummy variables
Rkk-th frame from the rgb modality video
R Laying-Right patient pose
µiProbability that state igenerates the observation xat time t
πInitial state probability vector R|P|and πiπ
kThe time step index (i.e., k=t)
PSet of patient poses P={pi}
Pmock Set of actor poses in the mock-up ICU
Pmicu Set of patient poses in the medical ICU (micu)
Pr(Y, X )Joint probability distribution between states and observations
SSet of time segments S={su}for 1uU
sSegment element sS
tTime tick with 1tT
τtd Store the estimated duration (1dD) at time (t)
θHMM model with probabilities A,B, and π
UNumber of segments U=|S|
U Face-Up patient pose
uSegment index: 1uU
VView set V={left, center, right}
VNumber of views V=|V|
vView index, 1vV
ykk-th hidden state ykY
YSequence of hidden states |Y|=T
XDataset indexed by k(i.e., Xk)
Xkk-th datapoint with {fNm}k={fR, fD, fP}k
xkk-th observation feature vector
x(v)
km The k-th observable variable from view vand modality m
δKroenecker delta function
δtThe maximum probability duration
θDummy variable used in inference
ζStores the state label (for a pose) of the previous segment
φStores the best duration
ψt(i)Stores the label with the best duration for time tand state i
Table 1: DECU variable symbols and their descriptions.
yt. This information is used to compute P(Y , X)via:
PY1:T, X1:T=P(y1)
T
Y
t=1
Pxt|yt
T
Y
t=2
Pyt|yt1(1)
where P(y1)is the initial state probability distribution (π).
It represents the probability of sequence starting (t= 1) at
posei(statei). Pxt|ytis the observation or emission prob-
ability distribution (B)and represents the probability that
at time tposei(statei) can generate the observable mumti-
modal multiview vector xt. Finally, Pyt|yt1is the tran-
sition probability distribution (A)and represents the proba-
bility of going from poseito poseo(state ito o). The HMM
parameters are A={aij },B={µin}, and π={πi},
discussed below.
Initial State Probability Distribution (π).The initial
pose probabilities are obtained from [8] and adjusted to sim-
ulate the two scenes considered in this study. The scene
initial state probabilities πis shown in Table 2.
State Transition Probability Distribution (A).The
transition probabilities are estimated for one pose to the
next one for Left (L) and Right (R) rotation directions as
indicated in the results from Figs. 10 and 11.
Emission Probability Distribution (B).The scene in-
formation is encoded into the emission probabilities. This
information server to model moving from one scene condi-
tion to the next shown in Figure 4. The trellis shows two
scenes, which doubles the number of hidden states. The
alternating blue and red lines (or solid and dashed lines) in-
dicate transitions from one scene to the next.
One limitation of HMMs is their lack of flexibility to
model pose and transition (pseudo-poses) duration. Given
an HMM in a known pose or pseudo-pose, the probabil-
ity that it stays in there for dtime slices is: Pi(d) =
(aii)d1(1 aii ), where Pi(d)is the discrete probability
density function (PDF) of duration din pose iand aii is the
self-transition probability of pose i[14].
4.2. Hidden Semi-Markov Models (HSMMs)
HSMMs are derived from conventiaonal HMMs to pro-
vide state duration flexbility. HSMMs represent hidden
variables as segments, which have useful properties. Fig-
ure 5 shows the structure of the HSMM and its main com-
ponents. The sequence of states y1:Tis represented by the
segments (S). A segment is a sequence of unique, sequen-
tially repeated symbols. The segments contain information
to identify when an observation is first detected and its du-
ration based on the number of observed samples. The el-
ements of the j-th segment (Sj)are the indexes (from the
original sequence) where the observation (bj) is detected,
the number of sequential observations of the same symbol
(dj), and the state or pose (yj). For example, the sequence
y1:8 ={1,1,1,2,2,1,2,2}is represented by the set of
segments S1:Uwith elements S1:J={S1, S2, S3, S4}=
{(1,3,1),(4,2,2),(6,1,1),(7,2,2)}. The letter Jis
the total number of segments and the total number of state
changes. The elements of the segment S1= (1,3,1) are,
from left to right: the index of the start of the segment (from
the sequence: y1:8); the number of times the state is ob-
served; and the symbol.
4.2.1 HSMM elements
The hidden variables are the segments S1:U, the observable
variables are the features X1:T. Their joint probability is:
7
Initial State Probability: π={πi}
Pose Name Acronym Symbol State - BC Probability State - DO Probability
Soldier Up solU p1s10.03 s11 0.02
Fetal Right fetR p2s20.145 s12 0.07
Fetal Left fetL p3s30.145 s13 0.07
Log Right logR p4s40.05 s14 0.03
Soldier Down solD p5s50.02 s15 0.01
Yearner Left yeaL p6s60.04 s16 0.02
Log Left logL p7s70.05 s17 0.03
Faller Down falD p8s80.05 s18 0.02
Faller Up falU p9s90.05 s19 0.03
Yearner Right yeaR p10 s10 0.04 s20 0.02
Other other p0s00.036 s00.073
Table 2: Initial probability for each of the 10 poses. Notice that poses facing Up have a higher probability than the poses that
face Down, while Left and Right poses are equally probable. Please note that there is a category for poses not covered in this
study identifiable by the label Other and the symbol p11. Also, note that one pose can have two states based on the BC and
DO scene conditions.
Figure 4: Multimodal Multiview Hidden Markov Model (mmHMM) trellis. The variation in scene illumination between
night and day are examples of scene changes.
PS1:U, X1:T=PY1:U, b1:U, d1:U, X1:T
PS1:U, X1:T=P(y1)P(b1)P(d1|y1)
b1+d1+1
Y
t=b1
P(xt|y1)
U
Y
u=2
P(yu|yu1)Pbu|bu1, du1
Pdu|yu
b1+d1+1
Y
t=bu
P(xt|yu),
(2)
where Uis the sequence of segments such that S1:U=
{S1, S2, ..., SU}for Sj=bj, dj, yjand with bjas the
start position (a bookkeeping variable to track the starting
point of a segment), djis the duration, and yjis the hidden
state (∈ {1, ..., Q}). The range of time slices starting at bj
and ending at bj+dj(exclusively) have state label yj. All
segments have a positive duration and completely cover the
time-span 1 : Twithout overlap. Therefore, the constraints
b1= 1,
U
P
u=1
and bj+1 =bj+djhold.
The transition probability P(yu|yu1), represents the
probability of going from one segment to the next via:
A:Pyu=j|ytu=iaij (3)
The first segment (bu) always starts at 1 (u= 1). Con-
secutive points are calculated deterministically from the
previous point via:
8
Figure 5: HSMM diagram indicating the hidden segments Sjindexed by jand their elements {bj, dj, yj}. The variable bis
the first detection in a sequence, yis the hidden layer, (x)is the observable layer containing samples from time bto b+dd0.
The variables band dare the observation’s detection (time tick) and duration.
Pbu=m|bu1=n, du1=l=δm, n +l(4)
where δ(i, j)is the Kroenecker delta function (1, for i=j
and 0, else).
The duration probability is now given by P(du=l|yu=
i) = Pi(l). DECU uses Pi(l) = N(µ, σ ).
4.2.2 Parameter Learning
Learning is based on maximum likelihood estimation (mle).
The training sequence of key frames is fully annotated,
including the exact start and end frames for each seg-
ment X1:T, Y1:T. To find the parameters that maximize
PY1:T, X1:T|θ, one maxizes the likelihood parameters of
each of the factors in the joint probability. In particular, the
observation probability Pxn|y=i, is a Bernoulli distri-
bution whose max likelihood is estimated via:
µn,i =PT
t=1 xi
tδyt, i
PT
t=1 δyt, i,(5)
where Tis the number of data points, δ(i, j)is the Kroe-
necker delta function, and Pyt=j|yt1=iis the multi-
nomial distribution with:
aij =PN
n=2 δyn, jδyn1, i
PN
n=2 δyt1, j(6)
4.2.3 HSMM Inference
HSMM Viterbi The segment notation is used to represent
state sequences. The inference objective is to find the state
sequence that maximizes PS1:U, X1:T|θ. The duration
is not known for a new sequence of observations. The se-
quence corresponding to the duration with the highest prob-
ability is determined at each time step by iterating over all
possible durations from 1 to a prefix duration D. This in-
formation is stored as follows:
τt,d = max
s1,...,sk1
PX1:t, s1:k=td+ 1, d, i|θ,(7)
which represents the highest probability of a sequence of
Ksegments, where the final segment started at td+ 1,
has duration dand label i.
NOTE: just as with conventional HMMs, it is sufficient to
only keep track of the max probability of ending in state
sk1to effectively compute the max probability of ending
up in state sk.
The state label (for a pose or pseudo-pose) of the previ-
ous segment is stored in array ζt(d, i). The max probability
duration (δ) is computed via:
δt(i) = max
s1,...,sk1
Px1:t, s1:k=td+1, d, i|θ,(8)
where dis the duration with the highest probability at time
tfor state i. The best duration is stored in φt(i)and the
label of the previous segment is stored in ψt(i).
4.2.4 Finding the Best Sequence
The complete procedure for finding the best sequence is de-
scribed in the following procedure:
Initialization: The probability of label of the first seg-
ment is given by the initial state distribution π.
τt,d =πiPi(d)
T
Y
t=1
Pxt|yt
ζd(d, i)=0
9
Recursion: Iterate over all possible durations at each step.
τt,d = max
1iQδtd(i)aij Pj(d)
t
Y
m=m1
P~xm|ym=j,
where m1=td+ 1 and
ζd(d, i) = arg max
1iQδtd(i)aij
The duration with the highest probability is estimated via:
δt(i) = max
1dDδtd(i)aij ,
which represents the best segment. The variable dis the
duration with the highest probability at time tfor state i.
The best duration for state iat time tis estimated via:
φt(i) = arg max
1dD
τd,t(i).
Finally, ψt(i) = ζtφt(i), iis the label corresponding to
the best duration for time tand state i.
Termination: Estimate the state with the highest proba-
bility in the last timeslice.
P= max
1iQ[δT(i)]
y
T= arg max[δT(i)]
t=T
u=0
Backtracking: Starting from termination look up the du-
ration and previous states stored in variables φand ψ.
d
t=φty
t]
s
u=td
t+ 1, d
t, y
t
t=td
t
u=u1
y
t=φt+d(y
t+d)
Negative indexing is used for the segments because the
number of segments is not known in advance. However, this
is corrected after inference by adding |S|to all indices.
4.3. Key Frame (KF ) Selection.
Data collected from pose transition is very large and of-
ten repetitive, since the motion is relatively slow and subtle.
The pre-processing stage incorporates a key frame estima-
tion step that integrates multimodal and multiview data. The
algorithm used to select a set (K F ) of K-transitory frames
is shown in Figure 6 and detailed in Algorithm 1. The size
of the key frame set is determined experimentally (K= 5)
on the feature scape using Inception vectors.
Let X={x(v)
m,n}fbe the set of training features ex-
tracted from Vviews and Mmodalities over Nframes and
let Piand Porepresent the initial and final poses. The tran-
sition frames are indexed by n,1n≤ |N|; views are
indexed by v,1v≤ |V|and modalities are indexed by
m,1m≤ |M|. Algorithm 1 uses this information to
identify key frames. Experimental evaluation of |KF |size
is shown in Figure 7. Key frames are the most informative
and discriminant frames across all views and modalities.
Input: X, set of mm features and dissimilarity threshold th;
Result: KF ={Key Frames}K,K1
Initialize: KF ={empty}K,K1and count = 0 ;
Stage 1: Modality (m) and View (v) Selection;
for 1< v < V and 1< m < M do
D(v)
m=euclid(x(v)
mni, x(v)
mno),ni= 1, no=N;
end
ˆv,ˆm=max D(v)
m> th;
{xv)
ˆmn1, xv)
ˆmnN} → F K ;
Stage 2: Find Complementary Frames to K F ;
for 1< v < V and 1< m < M and 1< n < N do
D1=D(v)
m,n1=euclid(x(v)
mn1, x(v)
mn);
D2=D(v)
m,nN=euclid(x(v)
mnN, x(v)
mn);
end
Sort D1={d1> d2> ... > dN2}descending;
Sort D2={d1> d2> ... > dN2}descending;
diKF if di
dj> th, for 1< i, j < N 2;
Stage 3: Find Center Frame (i.e., Motion Peak);
for KF2and K FK1do
Use Stage 2 to compute D3and D4;
if max(D3, D4)>0)then
max (D3, D4)KF ;
end
end
Algorithm 1: Multimodal multiview key frame selec-
tion using euclidean dissimilarity measure. The algo-
rithm is applied at training with labeled frames to estimate
the number and indexes of key frames across views and
modalities.
5. Experimental Results and Analysis
Experiments are performed to validate the feature selec-
tion, the keyframe set size (i.e., number of states) repre-
senting a transition, and the summarization performance of
DECU in a real and a mock-up ICU environment.
5.1. Static Pose Analysis - Feature Validation
Static sleep-pose analysis is used to compare the DECU
method to previous studies. Couple-Constrained Least-
Squares (cc-LS) and DECU are tested on the dataset from
10
Figure 6: Selection key frames for the represention of transitions between two poses. The key frame selection is based on
Algorithm 1. This figure show an example of how the algorithm is used to identify five key frames from three views and
two modalities. In this example, the first two key frames are extracted from the RGB video from the first camera (View 1).
Subsequent key frames are selected from the depth video from the second camera (View 2) and from the RGB video from
the third camera (View 3).
Feature Suitability Evaluation with cc-LS [21]
Scene HOG +
gMOM Vgg Inception
BC 100 100 100
DO 65 69 (+4) 73 (+8)
Table 3: Evaluation of deep features for sleep-pose recog-
nition tasks using the cc-LS method from [21] in dark and
occluded (DO) scenes using. The performance of HOG and
gMOM is compared to the performance of the Vgg and In-
ception features.
[21]. Combining the cc-LS method with deep features ex-
tracted from two common network architectures improved
classification performance over the HOG and gMOM fea-
tures in dark and occluded (DO) scenes by an average of
eight percent with Inception and four percent with Vgg.
Deep features matched the performance of cc-LS (with
HOG and gMOM) for a bright and clear (BC) scenario
shown in Table 3.
5.2. Key Frame Performance
The key frame set size (|KF |= 5) and key frame dis-
similarity threshold (th .8) affects DECU performance.
Figure 7 shows the effect of these parameters.
Figure 7: Performance of the DECU framework for the fine
motion summarization based on the number of key frames
used to represent transitions and rotations between poses.
5.3. Summarization Performance
The mock-up ICU allows staging the motion and scene
condition variations without disturbing patients in the med-
ical ICU. A sample test sequence is shown in Figure 2(c)
and summarization history results are shown in Figure 9 for
(a) the mock-up and (b) the real ICU environments. The
pose numerical symbols are shown in Table 4
5.4. Summarization History
History summarization is the coarser time resolution and
its overall objective is shown in Figure 8.
11
DECU: Pose History Summarization
Symbol Pose Name
0 Aspiration
+1 / -1 Soldier (+Up / -Down)
+2 / -2 Yearner (+R / -L)
+3 / -3 Log (+R, -L)
+4 / -4 Faller (+Up / -Down)
+5 / -5 Other / Background
+6 / -6 Fetal (+R / -L)
Table 4: Pose symbols and descriptions used for ICU pose
history summarization.
Figure 8: Pose history summarization log for patient motion
analysis in medical ICUs.
Pose History Summarization in the Mock-Up ICU.
This summarization requires two parameters: sampling rate
and pose duration. The experiments are executed with a
sampling rate of one second and pose duration of 10 sec-
onds. A pose is assigned a label if consistently detect 80
percent of the time, else the assigned label is ”other”. Poses
not consistently detected are ignored (low confidence). The
mock-up experiment uses a randomly selected scene and
sequence of poses, which can range from two to ten poses.
The pose duration is also set at random and includes one
scene transition (BC to DO or DO to BC). A sample (long)
sequence is shown in Figure 2 (c) and its history summa-
rization performance is shown in Table 5 and Figure 9(a).
DECU: Pose History Summarization
Scene Average Detection Rate
BC 85
DO 76
Table 5: Pose history summarization performance (percent
accuracy) of the DECU framework in bright and clear (BC)
and dark and occluded (DO) scenes. The sequences are
composed of 10 poses with duration ranging from 10 sec-
onds to 1 minute. The sampling rate is one second.
Pose Transition Dynamics: Motion Direction. The
analysis and pose transitions and rotation directions are im-
portant to physical therapy and recovery rate analysis. The
performance of DECU summarizing fine motion to describe
transitions between poses for a bright and clear scene and a
dark and occluded scene are shown in Figs. 10 and 11. Re-
sults for each of the figures are shown for (a) singleview
and (b) multiview data. The bottom row (c) shows the gray
scale and the color-font legend.
Summarization of Transitions in the real-ICU Note
that it is logistically impossible to control ICU work flows
and to account for unpredictable patient motion. ICU pa-
tients are not free to rotate, which reduces the set of pose
transitions (unavailable transitions are marked N/A). The
set of poses for the history summary require that a new pose
be included (aspiration). Figure 8 (b) shows the overall clin-
ical objective behind the pose history summarization.
The real medical ICU environment is shown in Figure
12 (a). DECU’s fine motion summarization results for two
patients are shown in Figure 12(b) and the quantified detec-
tion accuracies are shown in Figure 12 (c). The blue trace
represents the true transition labels and the red trace indi-
cates the predicted labels. Table 4 has the pose symbols and
descriptions used in the summarization plot.
6. Conclusion
This work introduced the DECU framework to analyze
patient poses in natural healthcare environments at two mo-
tion resolutions. Extensive experiments and evaluation of
the framework indicate that the detection and quantification
of pose dynamics is possible. The DECU system and mon-
itoring algorithms are currently being tested in real ICU en-
vironments. The performance results presented in this study
support its potential applications and benefits to healthcare
analytics. The system is non-disruptive and non-intrusive.
It is robust to variations in illumination, view, orientation,
and partial occlusions. DECU is non-obtrusive and non-
intrusive but not without a cost. The cost is noticed in the
most challenging scenario where a blanket and poor illu-
mination block sensor measurements. The performance of
DECU to monitor pose transitions in dark and occluded en-
vironments is far from perfect; however, most medical ap-
plications that analyze motion transitions, such as physical
therapy sessions, are carried under less severe conditions.
Future Work Future studies will investigate the recog-
nition and analysis patient motion in similar challenging
scenarios using recurrent neural networks, incorporate ad-
ditional modalities, and integrate natural language under-
standing to analyze ICU events.
12
(a)
(b)
Figure 9: Performance of DECU pose history summarization in a the mock-up ICU with bright and clear conditions over a
10-minute time-span (a) and in the real ICU using multimodal data under natural scene conditions over a two-hr time-span
(b). Note that the set of patient poses is reduced for the real ICU and the summarization performance is limited to a maximum
session of two hours to avoid disrupting the Braden-scale protocol.
13
(a)
(b)
(c)
Figure 10: Performance of DECU in the mock-up ICU under bright and clear conditions. Detection results are obtained using
(a) single view and (b) multiview data. The cells are gray scaled to indicate detection accuracy. The color coded scale and
the legend are shown in (c). Note that overall detection rates increase with longer rotation angles and decrease when rotation
motion requires the actors to face the bed (i.e., cameras record actor backs).
14
(a)
(b)
(c)
Figure 11: Performance of DECU in the mock-up ICU under dark and occluded conditions. Detection results are obtained
using (a) single view and (b) multiview data. The cells are gray scaled to indicate detection accuracy. The color coded
scale and the legend are shown in (c). Again, detection rates increase with longer rotation angles and decrease when rotation
motion requires the actors to face the bed (i.e., cameras record actor backs).
15
(a)
(b)
(c)
Figure 12: Performance of DECU pose transition summarization in a real ICU shown in (a) using multimodal data under
natural scene conditions. The detection scores are shown in (b), where the cells are gray scaled to indicate detection accuracy.
The font color indicates rotation angle range and N/A indicated the pose is not available. The number of poses is reduced
due to patient health conditions and inability to move. The grading color scale and font-color legend are shown in (c).
16
Acknowledgements. Research is funded in part by the
Army Research Laboratory under Cooperative Agreement
Number W911NF-09-2-0053 (the ARL Network Science
CTA). The views and conclusions contained in this docu-
ment are those of the authors and should not be interpreted
as representing the official policies, either expressed or im-
plied, of the Army Research Laboratory or the U.S. Govern-
ment. The U.S. Government is authorized to reproduce and
distribute reprints for Government purposes notwithstand-
ing any copyright notation here on. The authors want to
thank Richard Beswick, PhD (Director of Research), Paula
Gallucci (Medical ICU Nurse Manager), Mark Mullenary
(Director Biomedical Engineering), and Leilani Price, PhD
(IRB Administrator) from Santa Barbara Cottage Hospital
for their help and patience identifying and recruiting pa-
tients and ensuring HIPAA compliance.
References
[1] M. Baccouche, F. Mamalet, C. Wolf, C. Garcia, and
A. Baskurt. Sequential deep learning for human action
recognition. In Springer Int’l Workshop on Human Behavior
Understanding, 2011.
[2] S. Bihari, R. D. McEvoy, E. Matheson, S. Kim, R. J. Wood-
man, and A. D. Bersten. Factors affecting sleep quality of
patients in intensive care unit. Journal of Clinical Sleep
Medicine, 2012.
[3] G. Ch´
eron, I. Laptev, and C. Schmid. P-cnn: Pose-based
cnn features for action recognition. In IEEE Int’l Conf. on
Computer Vision (ICCV), 2015.
[4] T. Giraud, J.-f. Dhainaut, J.-f. Vaxelaire, T. Joseph,
D. Journois, G. Bleichner, J.-p. Sollet, S. Chevret, and J.-f.
Monsallier. Iatrogenic complications in adult intensive care
units: a prospective two-center study. Critical care medicine,
21(1):40–51, 1993.
[5] R. I. Hartley and A. Zisserman. Multiple View Geometry in
Computer Vision. Cambridge Univ. Press, 2nd edition, 2004.
[6] E. Hoque and J. Stankovic. Aalo: Activity recognition in
smart homes using active learning in the presence of over-
lapped activities. In IEEE Int’l Conf. on Pervasive Com-
puting Technologies for Healthcare (PervasiveHealth) and
Workshops, 2012.
[7] W. Huang, A. A. P. Wai, S. F. Foo, J. Biswas, C.-C. Hsia,
and K. Liou. Multimodal sleeping posture classification. In
IEEE Int’l Conf. on Pattern Recognition (ICPR), 2010.
[8] C. Idzikowski. Sleep position gives personality clue. BBC
News (September 16), 2003.
[9] C. Lea, J. Facker, G. Hager, R. Taylor, and S. Saria. 3d sens-
ing algorithms towards building an intelligent intensive care
unit. AMIA summits on translational science proceedings,
2013.
[10] S. Morong, B. Hermsen, and N. de Vries. Sleep position
and pregnancy. In Positional Therapy in Obstructive Sleep
Apnea. Springer, 2015.
[11] P. E. Morris. Moving our critically ill patients: mobility bar-
riers and benefits. Critical care clinics, 2007.
[12] S. Obdrˇ
z´
alek, G. Kurillo, J. Han, T. Abresch, R. Bajcsy,
et al. Real-time human pose detection and tracking for tele-
rehabilitation in virtual reality. Studies in health technology
and informatics, 2012.
[13] N. Padoy, D. Mateus, D. Weinland, M.-O. Berger, and
N. Navab. Workflow monitoring based on 3d motion fea-
tures. In IEEE Int’l Conf. on Computer Vision Workshops
(ICCV Workshops), 2009.
[14] L. R. Rabiner. Ieee proc. a tutorial on hidden markov models
and selected applications in speech recognition. 1989.
[15] S. Ramagiri, R. Kavi, and V. Kulathumani. Real-time multi-
view human action recognition using a wireless camera net-
work. In ACM/IEEE Int’l Conf. on Distributed Smart Cam-
eras (ICDSC), 2011.
[16] C. Sahlin, K. A. Franklin, H. Stenlund, and E. Lindberg.
Sleep in women: normal values for sleep stages and position
and the effect of age, obesity, sleep apnea, smoking, alcohol
and hypertension. Sleep medicine, 2009.
[17] K. Simonyan and A. Zisserman. Very deep convolutional
networks for large-scale image recognition. arXiv preprint
arXiv:1409.1556, 2014.
[18] L. Soban, S. Hempel, B. Ewing, J. N. Miles, and L. V.
Rubenstein. Preventing pressure ulcers in hospitals. Joint
Commission Journal on Quality and Patient Safety, 2011.
[19] B. Soran, A. Farhadi, and L. Shapiro. Generating notifica-
tions for missing actions: Don’t forget to turn the lights off!
In IEEE Int’l Conference on Computer Vision (ICCV), 2015.
[20] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed,
D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich.
Going deeper with convolutions. In IEEE Conf. on Computer
Vision and Pattern Recognition (CVPR), 2015.
[21] C. Torres, V. Fragoso, S. D. Hammond, J. C. Fried, and B. S.
Manjunath. Eye-cu: Sleep pose classification for healthcare
using multimodal multiview data. In IEEE Winter Conf. on
Applications of Computer Vision (WACV), 2016.
[22] C. Torres, S. D. Hammond, J. C. Fried, and B. S. Manjunath.
Multimodal pose recognition in an icu using multimodal data
and environmental feedback. In Springer Int’l Conf. on Com-
puter Vision Systems (ICVS), 2015.
[23] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri.
Learning spatiotemporal features with 3d convolutional net-
works. In IEEE Int’l Conf. on Computer Vision (ICCV),
2015.
[24] V. Veeriah, N. Zhuang, and G.-J. Qi. Differential recurrent
neural networks for action recognition. In IEEE Int’l Con-
ference on Computer Vision (ICCV), 2015.
[25] C. L. von Baeyer, M. E. Johnson, and M. J. McMillan. Con-
sequences of nonverbal expression of pain: Patient distress
and observer concern. Social Science & Medicine, 1984.
[26] G. L. Weinhouse and R. J. Schwab. Sleep in the critically ill
patient. Sleep-New York Then Westchester, 2006.
[27] C. Wu, A. H. Khalili, and H. Aghajan. Multiview activity
recognition in smart homes with spatio-temporal features.
In ACM/IEEE Int’l Conf. on Distributed Smart Cameras
(ICDSC), 2010.
17
... This article focuses on the latter-computer vision applications for assessing patients and their environment. Previous research has demonstrated the potential applications of computer vision for fall detection (26,27), sleep pose detection (28), agitation detection (29), physical activity monitoring (30,31), head pose detection (22), physiological signal monitoring (32), and visitation detection (22,33) in hospital settings (Figure 1). ...
... Different camera types have been used to detect patient mobility, including Red-Green-Blue (RGB) cameras, depth cameras, and cameras that capture both color and depth images, such as the Microsoft Kinect device. Multi-view settings using multiple cameras installed at different positions also have been used for capturing a more encompassing view of the patient room (28,31,41,42). ...
... Accurate detection of the number of visitors and healthcare personnel in the room and environmental factors such as a room's noise and light at all hours allows for quantifying the effects of such disruptions on patients' sleep quality and circadian rhythm integrity. Computer vision has been used to determine the number of people in ICU care rooms (31,33,97) to understand the association between visitation and clinical care disruptions to patients with patients' sleep hygiene and outcomes. Such information could assist in developing more accurate evidencebased visitation and sleep quality guidelines for ICU patients. ...
Article
Full-text available
Patients in critical care settings often require continuous and multifaceted monitoring. However, current clinical monitoring practices fail to capture important functional and behavioral indices such as mobility or agitation. Recent advances in non-invasive sensing technology, high throughput computing, and deep learning techniques are expected to transform the existing patient monitoring paradigm by enabling and streamlining granular and continuous monitoring of these crucial critical care measures. In this review, we highlight current approaches to pervasive sensing in critical care and identify limitations, future challenges, and opportunities in this emerging field.
... As a method, one person's measurement is performed, and the use of the program in an environment where a large number of people are divided is a study that can be conducted later. Hospitals require a method of installing hardware that can normally perform programs without disturbing patients and medical staff [48][49][50]. Therefore, before performing the program, the angle and location in a general hospital environment where image data can be photographed were checked, and the measurable range was confirmed through a simple operation. ...
Article
Full-text available
Worldwide COVID-19 infections have caused various problems throughout different countries. In the case of Korea, problems related to the demand for medical care concerning wards and doctors are serious, which were already slowly worsening problems in Korea before the COVID-19 pandemic. In this paper, we propose the direction of developing a system by combining artificial intelligence technology with limited areas that do not require high expertise in the rehabilitation medical field that should be improved in Korea through the prevention of bedsores and leg rehabilitation methods. Regarding the introduction of artificial intelligence technology, medical and related laws and regulations were quite limited, so the actual needs of domestic rehabilitation doctors and advice on the hospital environment were obtained. Satisfaction with the test content was high, the degree of provision of important medical data was 95%, and the angular error was within 5 degrees and suitable for recovery confirmation.
Chapter
With the widespread availability of multiple data sources, such as image, audio-video, and text data, automatic summarization of multimodal data is becoming an important technology in decision support. This paper presents a comprehensive survey and summary of the main articles in the field of multimodal summarization techniques in recent years. Firstly, we define multimodal summarization and briefly describe the development process. Then, we survey existing techniques and their applicability in different domains. Additionally, we provide an analysis of their results and discuss the insights of those approaches, along with the challenges and future research directions. Based on our study, we found that the encoder-decoder approach is currently the best approach for automated summarization. In the future, we believe that the applications of multimodal summarization could develop rapidly in many different fields, particularly in medicine. In our case studies, we demonstrate that multimodal learning is a promising research direction for providing timely and accurate summarizations compared to unimodal approaches.
Article
Full-text available
Manual analysis of body poses of bed-ridden patients requires staff to continuously track and record patient poses. Two limitations in the dissemination of pose-related therapies are scarce human resources and unreliable automated systems. This work addresses these issues by introducing a new method and a new system for robust automated classification of sleep poses in an Intensive Care Unit (ICU) environment. The new method, coupled-constrained Least-Squares (cc-LS), uses multimodal and multiview (MM) data and finds the set of modality trust values that minimizes the difference between expected and estimated labels. The new system, Eye-CU, is an affordable multi-sensor modular system for unobtrusive data collection and analysis in healthcare. Experimental results indicate that the performance of cc-LS matches the performance of existing methods in ideal scenarios. This method outperforms the latest techniques in challenging scenarios by 13% for those with poor illumination and by 70% for those with both poor illumination and occlusions. Results also show that a reduced Eye-CU configuration can classify poses without pressure information with only a slight drop in its performance.
Article
Full-text available
This work targets human action recognition in video. While recent methods typically represent actions by statistics of local video features, here we argue for the importance of structural information derived from human poses. To this end we propose a new Pose-based Convolutional Neural Network descriptor (P-CNN) for action recognition. The descriptor aggregates motion and appearance information along tracks of human body parts. We investigate different schemes of temporal aggregation and experiment with P-CNN features obtained both for automatically estimated and manually annotated human poses. We evaluate our method on the recent and challenging JHMDB and MPII Cooking datasets. For both datasets our method shows consistent improvement over the state of the art.
Conference Paper
Clinical evidence suggests that sleep pose analysis can shed light onto patient recovery rates and responses to therapies. In this work, we introduce a formulation that combines features from multimodal data to classify human sleep poses in an Intensive Care Unit (ICU) environment. As opposed to the current methods that combine data from multiple sensors to generate a single feature, we extract features independently. We then use these features to estimate candidate labels and infer a pose. Our method uses modality trusts – each modality’s classification ability – to handle variable scene conditions and to deal with sensor malfunctions. Specifically, we exploit shape and appearance features extracted from three sensor modalities: RGB, depth, and pressure. Classification results indicate that our method achieves 100 % accuracy (outperforming previous techniques by 6 %) in bright and clear (ideal) scenes, 70 % in poorly illuminated scenes, and 90 % in occluded ones.
Conference Paper
We propose in this paper a fully automated deep model, which learns to classify human actions without using any prior knowledge. The first step of our scheme, based on the extension of Convolutional Neural Networks to 3D, automatically learns spatio-temporal features. A Recurrent Neural Network is then trained to classify each sequence considering the temporal evolution of the learned features for each timestep. Experimental results on the KTH dataset show that the proposed approach outperforms existing deep models, and gives comparable results with the best related works.
Article
In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively.
Chapter
Sleep-disordered breathing (SDB) has been shown to be associated with negative clinical sequelae such as systemic hypertension and cardiovascular disease. Pregnant patients can also be diagnosed with SDB, the negative consequences of which not only pertain to the patient but to the unborn fetus as well. Despite this, however, SDB is underdiagnosed in pregnant patients. In this chapter, we will discuss the physiologic respiratory changes that occur during pregnancy, SDB in pregnancy, supine hypotensive syndrome (SHS), the complications and current treatments for these events, and the potential role for positional therapy in pregnant women whose problems may be specifically position dependent.