ArticlePDF Available

Summarization of ICU Patient Motion from Multimodal Multiview Videos

June 2017

June 2017

Authors:

Carlos Torres

Actuate.ai

Jeffrey Charles Fried

Santa Barbara Cottage Hospital

B. S. Manjunath

University of California, Santa Barbara

Clinical observations indicate that during critical care at the hospitals, patients sleep positioning and motion affect recovery. Unfortunately, there is no formal medical protocol to record, quantify, and analyze patient motion. There is a small number of clinical studies, which use manual analysis of sleep poses and motion recordings to support medical benefits of patient positioning and motion monitoring. Manual processes are not scalable, are prone to human errors, and strain an already taxed healthcare workforce. This study introduces DECU (Deep Eye-CU): an autonomous mulitmodal multiview system, which addresses these issues by autonomously monitoring healthcare environments and enabling the recording and analysis of patient sleep poses and motion. DECU uses three RGB-D cameras to monitor patient motion in a medical Intensive Care Unit (ICU). The algorithms in DECU estimate pose direction at different temporal resolutions and use keyframes to efficiently represent pose transition dynamics. DECU combines deep features computed from the data with a modified version of Hidden Markov Model to more flexibly model sleep pose duration, analyze pose patterns, and summarize patient motion. Extensive experimental results are presented. The performance of DECU is evaluated in ideal (BC: Bright and Clear/occlusion-free) and natural (DO: Dark and Occluded) scenarios at two motion resolutions in a mock-up and a real ICU. The results indicate that deep features allow DECU to match the classification performance of engineered features in BC scenes and increase the accuracy by up to 8% in DO scenes. In addition, the overall pose history summarization tracing accuracy shows an average detection rate of 85% in BC and of 76% in DO scenes. The proposed keyframe estimation algorithm allows DECU to reach an average 78% transition classification accuracy.

Top view of the node locations (center of the image) and views of the real a medical ICU room and ICU patient.

…

Multimodal Multiview Hidden Markov Model (mmHMM) trellis. The variation in scene illumination between night and day are examples of scene changes.

…

HSMM diagram indicating the hidden segments S j indexed by j and their elements {b j , d j , y j }. The variable b is the first detection in a sequence, y is the hidden layer, (x) is the observable layer containing samples from time b to b + d − d. The variables b and d are the observation's detection (time tick) and duration.

…

Performance of the DECU framework for the fine motion summarization based on the number of key frames used to represent transitions and rotations between poses.

…

Pose history summarization log for patient motion analysis in medical ICUs.

…

Figures - uploaded by Jeffrey Charles Fried

Content may be subject to copyright.

Content uploaded by Jeffrey Charles Fried

Content may be subject to copyright.

Summarization of ICU Patient Motion from Multimodal Multiview Videos

Carlos Torres†Kenneth Rose†Jeffrey C. Fried*B.S. Manjunath†

†University of California Santa Barbara *Santa Barbara Cottage Hospital

{carlostorres, rose, manj}@ece.ucsb.edu jfried@sbch.org

Abstract

Clinical observations indicate that during critical care

at the hospitals, patients sleep positioning and motion affect

recovery. Unfortunately, there is no formal medical proto-

col to record, quantify, and analyze patient motion. There is

a small number of clinical studies, which use manual anal-

ysis of sleep poses and motion recordings to support med-

ical beneﬁts of patient positioning and motion monitoring.

Manual processes are not scalable, are prone to human er-

rors, and strain an already taxed healthcare workforce. This

study introduces DECU (Deep Eye-CU): an autonomous

mulitmodal multiview system, which addresses these issues

by autonomously monitoring healthcare environments and

enabling the recording and analysis of patient sleep poses

and motion. DECU uses three RGB-D cameras to moni-

tor patient motion in a medical Intensive Care Unit (ICU).

The algorithms in DECU estimate pose direction at differ-

ent temporal resolutions and use keyframes to efﬁciently

represent pose transition dynamics. DECU combines deep

features computed from the data with a modiﬁed version

of Hidden Markov Model to more ﬂexibly model sleep pose

duration, analyze pose patterns, and summarize patient mo-

tion. Extensive experimental results are presented. The

performance of DECU is evaluated in ideal (BC: Bright

and Clear/occlusion-free) and natural (DO: Dark and Oc-

cluded) scenarios at two motion resolutions in a mock-up

and a real ICU. The results indicate that deep features al-

low DECU to match the classiﬁcation performance of en-

gineered features in BC scenes and increase the accuracy

by up to 8% in DO scenes. In addition, the overall pose

history summarization tracing accuracy shows an average

detection rate of 85% in BC and of 76% in DO scenes. The

proposed keyframe estimation algorithm allows DECU to

reach an average 78% transition classiﬁcation accuracy.

Keywords: Healthcare, Multimodal, Multiview, Deep

Features, Hidden Markov Models, Multimodal Emission,

Pose Transitions, ICU Monitoring, Motion Summarization.

1. Introduction

The recovery rates of patients admitted to the ICU with

similar conditions vary vastly and often inexplicably [4].

ICU patients spend most of their time on a bed, while cy-

cling over various decubitus positions. The rate and range

of patient motion are believed to be indicators of distress

and increased or decreased recovery [25]. Although patients

are continuously monitored by staff, there are no procedures

to reliably analyze and understand pose variations from pa-

tient observations such as videos. Nevertheless, limited

clinical studies [11] suggest that patient health-dependent

positioning and controlled motion enhance patient recovery,

while inadequate poses and uncontrolled or erratic motion

aggravates wounds and injuries.

While recording and analyzing the motion of patients us-

ing human observers is a straightforward solution, it puts

strain on an already taxed healthcare workforce. It does not

scale with the volume of the data and is prone to human er-

rors. This work introduces DECU, a multimodal multiview

autonomous system for patient positioning and motion anal-

ysis. DECU enables the following analytical features for

healthcare:

1. motion quantiﬁcation (rate and range) to aid the anal-

ysis and prevention of decubitus ulcers (DUs) or bed

sores;

2. timely detection of erratic, harmful, or distressed mo-

tion that can be used to stop patients from pulling intra-

venous lines or falling off the bed; and

3. summarization of pose sequences (pose history) over

extended periods of time, which can be used to evalu-

ate quality sleep quality without intrusive equipment.

DECU incorporates algorithms for keyframe extraction

and pose (state) duration estimation to autonomously and

unobtrusively monitor patient motion at different tempo-

ral resolutions. DECU combines deep features from mul-

timodal multiview data with Hidden Semi-Markov Models

(HSMM) to more ﬂexible model pose durations. DECU

extracts keyframes from multiple sources to reliably repre-

sent transitions and monitor and summarize patient motion.

arXiv:1706.09430v1 [cs.CV] 28 Jun 2017

DECU is designed, trained, and tested in a mock-up ICU

and tested in a real ICU. Fig 1 shows the major elements

of the framework (stages A-H). Stage A (top right) contains

the references. Stage B (bottom left) shows frames from

a sample sequence recorded using multimodal (RGB and

Depth) multiview (three cameras) sources. At stage C, the

framework selects the summarization resolution and acti-

vates the keyframe identiﬁcation stage (for training). Stage

D contains the motion thresholds (dense optic-ﬂow esti-

mated at training) to distinguish between the motion types

and account for depth sensor noise. Deep features are ex-

tracted at stage E. Stage F shows the keyframe computa-

tion, which compresses motion and encodes motion seg-

ments (encoding of duration of poses and transitions). Stage

G shows the multimodal multiview HMM trellis with two

scene conditions. Finally, stage H shows the results: pose

history and pose transition summarizations.

DECU summarization is evaluated in ideal (BC: Bright

and Clear/occlusion-free) and natural (DO: Dark and Oc-

cluded) scenarios at two motion resolutions in a mock-

up and a real ICU. Experimental results indicate that us-

ing deep features for pose representation allows DECU to

match the classiﬁcation performance of engineered features

in BC scenes and increases the accuracy by up to 8% in DO

scenes. The overall pose history summarization (coarser

time resolution) tracing accuracy shows an average detec-

tion rate of 85% in BC scenes and of 76% in DO scenes.

The performance of pose transition summarization (ﬁner

time resolution) depends directly on the range of motion,

dissimilarity between poses, and direction of rotation. The

proposed multimodal multiview keyframe estimation algo-

rithm allows DECU to reach a mean transition classiﬁcation

accuracy of 78% using a maximum of ﬁve pseudo-poses

(keyframes) to represent a transition.

1.1. Background

In August 2016, Harvard Medical School published a re-

port stating that monitoring ICUs can save up to $15 bil-

lion by saving $20,000 in each of the 750,000 ICU beds

in the U.S. with preventive care and by reducing the effect

of preventable ICU-related conditions such as poor qual-

ity of sleep and DUs. The U.S. Department of Health and

Human Services 1states that the U.S. ICU expenditure is

about $130 million per year and, at its current state, rises by

$5 billion per year. The ICUs in the U.S. receive about ﬁve

million patients per year. The average ICU stay is 9.3 days

and patient mortality rate ranges from 10 to 30% depending

on health conditions.

Clinical studies covering sleep analysis indicate that

sleep hygiene directly impacts healthcare. In addition, qual-

ity of sleep and effective patient rest are correlated to shorter

hospital stays, increased recovery rates, and decreased mor-

1U.S. Department of Health Services – online report Feb 2016

tality rates. Clinical applications that correlate body pose

and movement to medical conditions include sleep apnea –

where the obstructions of the airway are affected by supine

positions [16]. Pregnant women are recommended to sleep

on their sides to improve fetal blood ﬂow [10]. The ﬁndings

of [2], [8], and [26] correlate sleep positions with quality

of sleep and its various effects on patient health. Decu-

bitus ulcers (bed sores) appear on bony areas of the body

and are caused by continuous decubitus positions 2. Al-

though nefarious, bed sores can be prevented by manipu-

lating patient poses over time. Standards of care require

that patients be rotated every two hours. However, this

protocol has very low compliance and in the U.S., a very

high number of ICU patients develop DUs [18]. There is

little understanding about the set of poses and pose dura-

tions that cause or prevent DU incidence. Studies that ana-

lyze pose durations, rotation frequency, rotation range, and

the duration of weight/pressure off-loading are required, as

are the non-obtrusive measuring tools to collect and ana-

lyze the relevant data. Additional studies analyze pose ma-

nipulation effects on treatment of severe acute respiratory

failure such as: ARDS (Adult Respiratory Distress Syn-

drome), pneumonia, and hemodynamics in patients with

various forms of shock. These examples highlight the im-

portance of DECU’s autonomous patient monitoring and

summarization tasks. They accentuate the need and chal-

lenges faced by the framework, which must be capable of

adapting to hospital environments and supporting existing

infrastructure and standards of care.

1.2. Related Work

There is a large body of research that focuses on recog-

nizing and tracking human motion. The latest developments

in deep features and convolutional neural network architec-

tures achieve impressive performance; however, these re-

quire large amounts of data [3], [24], [1], and [23]. These

methods tackle the recognition of actions performed at the

center of the camera plane, except for [19], which uses static

cameras to analyze actions. Method [19] allows actions to

not be centered on the plane; however, it requires scenes

with good illumination and no occlusions. At its current

stage of development the DECU framework cannot collect

the large number of samples necessary to train a deep net-

work without disrupting the hospital.

Multi-sensor and multi-camera systems and methods

have been applied to smart environments [6] and [27]. The

systems require alterations to existing infrastructure making

their deployment in a hospital logistically impossible. The

methods are not designed to account for illumination varia-

tions and occlusions and do not account for non-sequential,

subtle motion. Therefore, these systems and methods can-

not be used to analyze patient motion in a real ICU where

2Online Medical Dictionary

Figure 1: Stages of the DECU framework, which uses multimodal multiview (MM) data and the modiﬁed Hidden Semi-

Markov Modeling to monitor patient motion. From left-to-right (A to H): the set of references is shown on stage A (top-left);

(A1) a dictionary of poses and pose transitions, and (A2) a lattice showing possible motion dynamics between poses. Stage

B (bottom-left) shows the multimodal multiview input video. Stage C (center-left) selects the summarization resolution and

activates keyframe identiﬁcation when required. Stage D (center) integrates the motion thresholds (estimated at training) to

account for various levels of motion resolution and sensor noise. Stage E (bottom-center) represents the feature extraction

block via a convolutional neural network. Stage F (center-right) shows the keyframe identiﬁcation process using algorithm 1.

Stage G (bottom-right) shows the multimodal multiview HMM trellis, which encodes illumination and occlusion variations.

Stage H (top-right) shows the two possible summarization outputs (H1) pose history and (H2) pose transitions.

patients have limited or constrained mobility and the scenes

have random occlusions and unpredictable illumination.

Healthcare applications of pose monitoring include the

detection and classiﬁcation of sleep poses in controlled en-

vironments [7]. Static pose classiﬁcation in a range of

simulated healthcare environments is addressed in [22],

where the authors use modality trust and RGB, Depth, and

Pressure data. In [21], the authors introduce a coupled-

constrained optimization technique that allows them to re-

move the pressure sensor and increase pose classiﬁcation

performance. However, neither method analyzes poses over

time or pose transition dynamics. A pose detection and

tracking system for rehabilitation is proposed in [12]. The

system is developed and tested in ideal scenarios and cannot

be used to detect constrained motion. In [13] a controlled

study focuses on work ﬂow analysis by observing surgeons

in a mock-up operating room. A single depth camera and

Radio Frequency Identiﬁcation Devices (RFIDs) are used in

[9] to analyze work ﬂows in a Neo-Natal ICU (NICU) envi-

ronment. These studies focus on staff actions and disregard

patient motion. Literature search indicates that the DECU

framework is the ﬁrst of its kind. It studies patient motion

in a mock-up and a real ICU environment. DECU’s tech-

nical innovation is motivated by the shortcomings of pre-

vious studies. It observes the environment from multiple

views and modalities, integrates temporal information, and

accounts for challenging natural scenes and subtle patient

movements using principled statistics.

1.3. Proposed Approach

DECU is a new framework to monitor patient motion in

ICU environments at two motion resolutions. Its elements

include time-series analysis algorithms and a multimodal

multiview data collection system. The algorithms analyze

poses at two motion resolutions (sequence of poses and

pose transition directions). The system is capable of col-

lecting and representing poses from multiview multimodal

data. The views and modalities are shown in Figure 2 (a)

and (b). A sample motion summary is shown in Figure 2

(c). Patients in the ICU are often bed-ridden or immobi-

lized. Overall, their motion can be unpredictable, heavily

constrained, slow and subtle, or aided by caretakers. The

two resolutions address different medical needs. Pose his-

tory summarization is the coarser resolution. It provides

a pictorial representation of poses over time (i.e., the his-

tory). The applications of the pose history include preven-

tion and analysis of DUs and analysis of sleep-pose effects

on quality of sleep. The pose transition summarization is

the ﬁner resolution. DECU looks at the transition poses

that occur while a patient transitions between two clearly

deﬁned sleep poses. Physical therapy evaluation is one ap-

plication of transition summarization.

Main Contributions of this work:

1. An adaptive framework capable of monitoring patient

motion at various resolutions. The algorithm detects

patient motion behavior and summarizes the sequence

of sleep poses and the subtle motion and direction be-

tween two poses using segments.

2. A non-disruptive and non-obtrusive monitoring system

robust to natural healthcare scenarios and conditions

such as variable illumination and partial occlusions.

3. An algorithm that effectively compresses sleep pose

transitions using a subset of the most informative and

most discriminative keyframes. The algorithm incor-

porates data from all views and modalities to identify

keyframes and increase monitoring resolution.

4. A fusion technique that incorporates observations from

multiple modalities and views into emission probabil-

ities to leverage complementary information and esti-

mate intermediate poses and transitions over time.

2. DECU System Description

The DECU system is modular and adaptive. It is com-

posed of three nodes and each node has three modalities

(RGB, Depth, and Mask). At the heart of each node is

a Raspberry Pi3 running Linux Ubuntu, which controls a

Carmine RGB-D cameras3. The units are synchronized us-

ing TCP/IP communication. DECU combines information

from multiple views and modalities to overcome scene oc-

clusions and illumination changes.

2.1. Multiple modalities (Multimodal).

Multimodal studies use complementary modalities to

classify static sleep poses in natural ICU scenes with large

variations in illumination and occlusions. This study uses

the ﬁndings from those studies (which are also validated in

the experimental section) regarding the beneﬁts of multi-

modal systems.

3Primesense, manufacturer of Carmine sensors, was acquired by Apple

Inc. in 2013; however, similar devices can be purchased from structure.io

RGB (R) Standard RGB video data provides reliable in-

formation to represent and classify human sleep poses in

scenes with relatively ideal conditions. However, most peo-

ple sleep in imperfectly illuminated scenarios, using sheets,

blankets, and pillows that block and disturb sensor measure-

ments. The systems collects RGB color images of dimen-

sions 640 ×480 from each actor in each of the scene condi-

tions, and extracts pose appearance features representative

of the lines in the human body (i.e., limbs and extremities).

Depth (D) Infrared depth cameras can be resilient to il-

lumination changes. The Eye-CU system uses Primense

Carmine devices to collect depth data. The devices are de-

signed for indoor use and can acquire images of dimensions

640×480. These sensors use 16 bits to represent pixel inten-

sity values, which correspond to the distance from sensor to

a point in the scene. Their operating distance range is 0.8m

to 3.5m; and their spatial resolution for scenes 2.0m away

is 3.5mm for the horizontal (x) and vertical (y) axes, and

30 mm along the depth (z) axis. The systems uses the depth

images to represent the 3-dimensional shape of the poses.

The usability of these images, however, depends on depth

contrast, which is affected by the deformation properties of

the mattress and blanket present in ICU environments.

2.2. Multiple views (Multiview).

The studies from [21] and [15] show that analyzing ac-

tions from multiple views and multiple orientations greatly

improves detection. In particular, the studies indicate that

multiple views provide algorithmic view and orientation In-

dependence.

2.3. Time Analysis (Hidden Semi-Markov Models).

ICU patients are often immobilized or recovering. They

move subtly and slowly (very different from the walking

or running motion). DECU effectively monitors subtle and

abrupt patient motion by breaking the motion cues into seg-

ments.

3. Data Collection

Pose data is collected in a mock-up ICU with seven ac-

tors and tested in medical ICU with two real patients. The

diagram in Figure 2 (b) shows the top-view of the rigged

mock-up ICU room and the camera views. In the mock-

up ICU, actors are asked follow the same test sequence of

poses. The sequence is set at random using a random num-

ber generator. Figure 2 (c) shows a sequence of 20 observa-

tions, which include ten poses (p1to p10) and ten transitions

(t1to t10) with random transition direction.

All actors in the mock-up ICU are asked to assume

and hold each of the poses while data is being recorded

from multiple modalities and views. A total of 28 ses-

sions are recorded: 14 under ideal conditions (BC: bright

and clear) and 14 under challenging conditions (DO: dark

and occluded). The annotated dataset will available at

vision.ece.ucsb.

3.1. ICU Rooms Infrastructure.

The DECU system and algorithmic elements are de-

signed, tested, and reﬁned in a mock-up ICU with actors

and simulated hospital scenarios. Once ready for real-world

testing, DECU is deployed in a medical ICU with real pa-

tients and hospital scene conditions, and where medical ex-

perts evaluate its beneﬁts.

The mock-up ICU room The mock-up ICU room allows

researchers to collect data, design and test algorithms, and

evaluate and reﬁne the DECU system and algorithms. Three

views of the mock-up ICU are shown in Figure 2.

The real ICU room DECU is currently deployed in a real

ICU at a local community hospital where medical experts

validate its beneﬁts and performance and explore its appli-

cations. The system nodes are battery powered and the three

nodes account for unexpected occlusions and illumination

changes. Views of the medical ICU are shown in Figure 3

3.2. Pose Transitions.

The actors follow the sequence poses and transitions

shown in Stage A from Figure 1. Each initial pose has

10 possible ﬁnal poses (inclusive) and each ﬁnal pose can

be arrived to by rotating left or right. The combination of

pose pairs and transition directions generates a set of 20 se-

quences for each initial pose. There are 10 possible initial

poses. One actor and one recording session generates 200

sequence pairs.

3.3. Feature Selection.

Previous ﬁndings indicate that engineered features such

as geometric moments (gMOMs) and histograms of ori-

ented gradients (HOG) are suitable for the classiﬁcation of

sleep poses. However, these features are limited in their

ability to represent body conﬁgurations in dark and oc-

cluded scenarios. The latest developments in deep learning

and feature extraction led this study to consider deep fea-

tures extracted from the VGG [17] and the Inception [20]

architectures. Experimental results (see Sec 5.1) indicate

that Inception features perform better than gMOMs, HOG,

and VGG features. Parameters for gMOM and HOG extrac-

tion are obtained from [22]. Background subtraction and

calibration procedures from [5] are applied prior to feature

extraction.

4. Problem Description

Patients in the ICU spend most of their time in bed and

their motion is limited to a small set of poses. Practitioners

manipulate the patient poses to prevent DUs, evaluate sleep

hygiene, and enhance recovery rates among other. DECU

uses videos from multiple views and modalities to moni-

tor patient poses and their transitions. However, it is nec-

essary that the system and the algorithms properly handle

motion rates (speed) and motion ranges. For instance, pose

history summarization analysis patients poses over a longer

period of time (e.g., eight hours) at a very low sampling

rate. The pose transition summarization is another exam-

ple. The summarization and the analysis involves identify-

ing the set of pseudo poses associated with the transition

and quantifying the direction of rotation. The are various

challenges in DECU. The ﬁrst main challenge arouses by

conventional algorithms being unable model pose durations

effectively. The second challenges involves detecting the di-

rection of motion and rotation when transitioning between

poses. The last main challenge involves accurate represen-

tation of pseudo poses and keyframe estimation. The chal-

lenges and approaches are discussed in this section.

The DECU system uses Mmultimodal cameras sta-

tioned at different locations to obtaining Vviews of the pa-

tients and estimate pose transitions such as the one shown

in Figure 2 (c). Note that there are two directions of rota-

tion for the a patient or actor to transition from the faller

facing up (falU) position to the fetal laying left (fetL) posi-

tion. Features extracted from video frames F={ft}, for

1≤t≤Tto construct feature vectors X=X1:Tare used

to represent non-directly observable poses (Y=Y1:T). The

ﬁrst objective of DECU is to ﬁnd the sequence of poses

(Y=Y1:T) that probabilistically can best represent the ob-

servations i.e., Pr Y,X= Pr Y1:T, X1:T.

Temporal patterns caused by sleep-pose transitions are

simulated and analyzed using Hidden Semi-Markov Models

(HSMMs) as described in Section 4.2.3. The interactions

between the modalities to accurately represent a pose using

different sensor measurements are encoded into the emis-

sion probabilities. Scene conditions are encoded into the

set of states (the analysis of two scenes doubles the num-

ber of poses). Conventional Markov assumptions support

DECU and ideally ﬁt most of its analysis. However, HMMS

are limited in their ability to distinguish between poses and

pseudo-poses (i.e., transitory, short time body pose conﬁg-

urations observed when transitioning between poses) based

on pose duration. This is because, by design, HMMs model

the probability of staying in a given pose as a geometric dis-

tribution Pri(d)=(aii)d−1(1−aii ), where dis the duration

in pose i, and aii is the self-transition probability of pose i.

More details are discussed in this section and subsequent

subsections. Table 1 describes the DECU variables.

(a) (b) (c)

Figure 2: The transition data is collected in a mock-up ICU and a real ICU: (a) shows the relative position of the cameras with

respect to the ICU room and ICU bed; (b) shows a set of randomly selected poses and pose transitions, which are represented

by lines (dashed, dotted, and solid lines deﬁned in the legend box); (c) shows a sample set of sleep-pose transitions and

rotation directions.

Figure 3: Top view of the node locations (center of the image) and views of the real a medical ICU room and ICU patient.

4.1. Hidden Markov Models (HMMs)

HMMs are a generative approach that models the var-

ious poses (pose history) and pseudo-poses (pose transi-

tions summarization) as states. The hidden variable or state

at time step k(i.e., t=k) is yk(statekor posek) and

the observable or measurable variables (x(v)

k,m, the vector

of image features extracted from the k-th frame, the m-

th modality, and the v-th view) at time t=kis xk(i.e.,

xk=x(v)

k,m ={Rk, Dk, ...Mk}). The ﬁrst order Markov

assumption indicates that at time t, the hidden variable yt,

depends only on the previous hidden variable yt−1. At time

tthe observable variable xtdepends on the hidden variable

DECU VARIABLES

SYMBOL DESCRIPTION

ATransition probability matrix; A∈R|P|×|P|and A={aij}

ai,j Probability of transitioning from pose ito j

BEmission probability matrix ∈R|P|and B={µin}

buBeginning of the u-th segment with b1= 1

Dkk-th frame from the depth modality video

D Face-Down patient pose

dSegment duration

duSegment duration for u-th segment

HMM Abbreviation for Hidden Markov Model

HSMM Abbreviation for Hidden Semi-Markov Model

KData set size, K=|X |

kData point index, 1≤k≤K

KF Set of keyframes representing a pose transition

L Laying-Left patient pose

l,m, and nDummy variables

Rkk-th frame from the rgb modality video

R Laying-Right patient pose

µiProbability that state igenerates the observation xat time t

πInitial state probability vector ∈R|P|and πi∈π

kThe time step index (i.e., k=t)

PSet of patient poses P={pi}

Pmock Set of actor poses in the mock-up ICU

Pmicu Set of patient poses in the medical ICU (micu)

Pr(Y, X )Joint probability distribution between states and observations

SSet of time segments S={su}for 1≤u≤U

sSegment element s∈S

tTime tick with 1≤t≤T

τtd Store the estimated duration (1≤d≤D) at time (t)

θHMM model with probabilities A,B, and π

UNumber of segments U=|S|

U Face-Up patient pose

uSegment index: 1≤u≤U

VView set V={left, center, right}

VNumber of views V=|V|

vView index, 1≤v≤V

ykk-th hidden state yk∈Y

YSequence of hidden states |Y|=T

XDataset indexed by k(i.e., Xk)

Xkk-th datapoint with {fNm}k={fR, fD, fP}k

xkk-th observation feature vector

x(v)

km The k-th observable variable from view vand modality m

δKroenecker delta function

δtThe maximum probability duration

θDummy variable used in inference

ζStores the state label (for a pose) of the previous segment

φStores the best duration

ψt(i)Stores the label with the best duration for time tand state i

Table 1: DECU variable symbols and their descriptions.

yt. This information is used to compute P(Y , X)via:

PY1:T, X1:T=P(y1)

t=1

Pxt|yt

t=2

Pyt|yt−1(1)

where P(y1)is the initial state probability distribution (π).

It represents the probability of sequence starting (t= 1) at

posei(statei). Pxt|ytis the observation or emission prob-

ability distribution (B)and represents the probability that

at time tposei(statei) can generate the observable mumti-

modal multiview vector xt. Finally, Pyt|yt−1is the tran-

sition probability distribution (A)and represents the proba-

bility of going from poseito poseo(state ito o). The HMM

parameters are A={aij },B={µin}, and π={πi},

discussed below.

Initial State Probability Distribution (π).The initial

pose probabilities are obtained from [8] and adjusted to sim-

ulate the two scenes considered in this study. The scene

initial state probabilities πis shown in Table 2.

State Transition Probability Distribution (A).The

transition probabilities are estimated for one pose to the

next one for Left (L) and Right (R) rotation directions as

indicated in the results from Figs. 10 and 11.

Emission Probability Distribution (B).The scene in-

formation is encoded into the emission probabilities. This

information server to model moving from one scene condi-

tion to the next shown in Figure 4. The trellis shows two

scenes, which doubles the number of hidden states. The

alternating blue and red lines (or solid and dashed lines) in-

dicate transitions from one scene to the next.

One limitation of HMMs is their lack of ﬂexibility to

model pose and transition (pseudo-poses) duration. Given

an HMM in a known pose or pseudo-pose, the probabil-

ity that it stays in there for dtime slices is: Pi(d) =

(aii)d−1(1 −aii ), where Pi(d)is the discrete probability

density function (PDF) of duration din pose iand aii is the

self-transition probability of pose i[14].

4.2. Hidden Semi-Markov Models (HSMMs)

HSMMs are derived from conventiaonal HMMs to pro-

vide state duration ﬂexbility. HSMMs represent hidden

variables as segments, which have useful properties. Fig-

ure 5 shows the structure of the HSMM and its main com-

ponents. The sequence of states y1:Tis represented by the

segments (S). A segment is a sequence of unique, sequen-

tially repeated symbols. The segments contain information

to identify when an observation is ﬁrst detected and its du-

ration based on the number of observed samples. The el-

ements of the j-th segment (Sj)are the indexes (from the

original sequence) where the observation (bj) is detected,

the number of sequential observations of the same symbol

(dj), and the state or pose (yj). For example, the sequence

y1:8 ={1,1,1,2,2,1,2,2}is represented by the set of

segments S1:Uwith elements S1:J={S1, S2, S3, S4}=

{(1,3,1),(4,2,2),(6,1,1),(7,2,2)}. The letter Jis

the total number of segments and the total number of state

changes. The elements of the segment S1= (1,3,1) are,

from left to right: the index of the start of the segment (from

the sequence: y1:8); the number of times the state is ob-

served; and the symbol.

4.2.1 HSMM elements

The hidden variables are the segments S1:U, the observable

variables are the features X1:T. Their joint probability is:

Initial State Probability: π={πi}

Pose Name Acronym Symbol State - BC Probability State - DO Probability

Soldier Up solU p1s10.03 s11 0.02

Fetal Right fetR p2s20.145 s12 0.07

Fetal Left fetL p3s30.145 s13 0.07

Log Right logR p4s40.05 s14 0.03

Soldier Down solD p5s50.02 s15 0.01

Yearner Left yeaL p6s60.04 s16 0.02

Log Left logL p7s70.05 s17 0.03

Faller Down falD p8s80.05 s18 0.02

Faller Up falU p9s90.05 s19 0.03

Yearner Right yeaR p10 s10 0.04 s20 0.02

Other other p0s00.036 s00.073

Table 2: Initial probability for each of the 10 poses. Notice that poses facing Up have a higher probability than the poses that

face Down, while Left and Right poses are equally probable. Please note that there is a category for poses not covered in this

study identiﬁable by the label Other and the symbol p11. Also, note that one pose can have two states based on the BC and

DO scene conditions.

Figure 4: Multimodal Multiview Hidden Markov Model (mmHMM) trellis. The variation in scene illumination between

night and day are examples of scene changes.

PS1:U, X1:T=PY1:U, b1:U, d1:U, X1:T

PS1:U, X1:T=P(y1)P(b1)P(d1|y1)

b1+d1+1

t=b1

P(xt|y1)∗

u=2

P(yu|yu−1)Pbu|bu−1, du−1∗

Pdu|yu

b1+d1+1

t=bu

P(xt|yu),

(2)

where Uis the sequence of segments such that S1:U=

{S1, S2, ..., SU}for Sj=bj, dj, yjand with bjas the

start position (a bookkeeping variable to track the starting

point of a segment), djis the duration, and yjis the hidden

state (∈ {1, ..., Q}). The range of time slices starting at bj

and ending at bj+dj(exclusively) have state label yj. All

segments have a positive duration and completely cover the

time-span 1 : Twithout overlap. Therefore, the constraints

b1= 1,

u=1

and bj+1 =bj+djhold.

The transition probability P(yu|yu−1), represents the

probability of going from one segment to the next via:

A:Pyu=j|yt−u=i≡aij (3)

The ﬁrst segment (bu) always starts at 1 (u= 1). Con-

secutive points are calculated deterministically from the

previous point via:

Figure 5: HSMM diagram indicating the hidden segments Sjindexed by jand their elements {bj, dj, yj}. The variable bis

the ﬁrst detection in a sequence, yis the hidden layer, (x)is the observable layer containing samples from time bto b+d−d0.

The variables band dare the observation’s detection (time tick) and duration.

Pbu=m|bu−1=n, du−1=l=δm, n +l(4)

where δ(i, j)is the Kroenecker delta function (1, for i=j

and 0, else).

The duration probability is now given by P(du=l|yu=

i) = Pi(l). DECU uses Pi(l) = N(µ, σ ).

4.2.2 Parameter Learning

Learning is based on maximum likelihood estimation (mle).

The training sequence of key frames is fully annotated,

including the exact start and end frames for each seg-

ment X1:T, Y1:T. To ﬁnd the parameters that maximize

PY1:T, X1:T|θ, one maxizes the likelihood parameters of

each of the factors in the joint probability. In particular, the

observation probability Pxn|y=i, is a Bernoulli distri-

bution whose max likelihood is estimated via:

µn,i =PT

t=1 xi

tδyt, i

t=1 δyt, i,(5)

where Tis the number of data points, δ(i, j)is the Kroe-

necker delta function, and Pyt=j|yt−1=iis the multi-

nomial distribution with:

aij =PN

n=2 δyn, jδyn−1, i

n=2 δyt−1, j(6)

4.2.3 HSMM Inference

HSMM Viterbi The segment notation is used to represent

state sequences. The inference objective is to ﬁnd the state

sequence that maximizes PS1:U, X1:T|θ. The duration

is not known for a new sequence of observations. The se-

quence corresponding to the duration with the highest prob-

ability is determined at each time step by iterating over all

possible durations from 1 to a preﬁx duration D. This in-

formation is stored as follows:

τt,d = max

s1,...,sk−1

PX1:t, s1:k=t−d+ 1, d, i|θ,(7)

which represents the highest probability of a sequence of

Ksegments, where the ﬁnal segment started at t−d+ 1,

has duration dand label i.

NOTE: just as with conventional HMMs, it is sufﬁcient to

only keep track of the max probability of ending in state

sk−1to effectively compute the max probability of ending

up in state sk.

The state label (for a pose or pseudo-pose) of the previ-

ous segment is stored in array ζt(d, i). The max probability

duration (δ) is computed via:

δt(i) = max

s1,...,sk−1

Px1:t, s1:k=t−d∗+1, d∗, i|θ,(8)

where d∗is the duration with the highest probability at time

tfor state i. The best duration is stored in φt(i)and the

label of the previous segment is stored in ψt(i).

4.2.4 Finding the Best Sequence

The complete procedure for ﬁnding the best sequence is de-

scribed in the following procedure:

Initialization: The probability of label of the ﬁrst seg-

ment is given by the initial state distribution π.

τt,d =πiPi(d)

t=1

Pxt|yt

ζd(d, i)=0

Recursion: Iterate over all possible durations at each step.

τt,d = max

1≤i≤Qδt−d(i)aij Pj(d)

m=m1

P~xm|ym=j,

where m1=t−d+ 1 and

ζd(d, i) = arg max

1≤i≤Qδt−d(i)aij 

The duration with the highest probability is estimated via:

δt(i) = max

1≤d≤Dδt−d(i)aij ,

which represents the best segment. The variable d∗is the

duration with the highest probability at time tfor state i.

The best duration for state iat time tis estimated via:

φt(i) = arg max

1≤d≤D

τd,t(i).

Finally, ψt(i) = ζtφt(i), iis the label corresponding to

the best duration for time tand state i.

Termination: Estimate the state with the highest proba-

bility in the last timeslice.

P∗= max

1≤i≤Q[δT(i)]

y∗

T= arg max[δT(i)]

t=T

u=0

Backtracking: Starting from termination look up the du-

ration and previous states stored in variables φand ψ.

d∗

t=φty∗

t]

s∗

u=t−d∗

t+ 1, d∗

t, y∗

t

t=t−d∗

u=u−1

y∗

t=φt+d(y∗

t+d)

Negative indexing is used for the segments because the

number of segments is not known in advance. However, this

is corrected after inference by adding |S∗|to all indices.

4.3. Key Frame (KF ) Selection.

Data collected from pose transition is very large and of-

ten repetitive, since the motion is relatively slow and subtle.

The pre-processing stage incorporates a key frame estima-

tion step that integrates multimodal and multiview data. The

algorithm used to select a set (K F ) of K-transitory frames

is shown in Figure 6 and detailed in Algorithm 1. The size

of the key frame set is determined experimentally (K= 5)

on the feature scape using Inception vectors.

Let X={x(v)

m,n}fbe the set of training features ex-

tracted from Vviews and Mmodalities over Nframes and

let Piand Porepresent the initial and ﬁnal poses. The tran-

sition frames are indexed by n,1≤n≤ |N|; views are

indexed by v,1≤v≤ |V|and modalities are indexed by

m,1≤m≤ |M|. Algorithm 1 uses this information to

identify key frames. Experimental evaluation of |KF |size

is shown in Figure 7. Key frames are the most informative

and discriminant frames across all views and modalities.

Input: X, set of mm features and dissimilarity threshold th;

Result: KF ={Key Frames}K,K≥1

Initialize: KF ={empty}K,K≥1and count = 0 ;

Stage 1: Modality (m) and View (v) Selection;

for 1< v < V and 1< m < M do

D(v)

m=euclid(x(v)

mni, x(v)

mno),ni= 1, no=N;

end

ˆv,ˆm=max D(v)

m> th;

{x(ˆv)

ˆmn1, x(ˆv)

ˆmnN} → F K ;

Stage 2: Find Complementary Frames to K F ;

for 1< v < V and 1< m < M and 1< n < N do

D1=D(v)

m,n1=euclid(x(v)

mn1, x(v)

mn);

D2=D(v)

m,nN=euclid(x(v)

mnN, x(v)

mn);

end

Sort D1={d1> d2> ... > dN−2}descending;

Sort D2={d1> d2> ... > dN−2}descending;

di→KF if di

dj> th, for 1< i, j < N −2;

Stage 3: Find Center Frame (i.e., Motion Peak);

for KF2and K FK−1do

Use Stage 2 to compute D3and D4;

if max(D3, D4)>0)then

max (D3, D4)→KF ;

end

Algorithm 1: Multimodal multiview key frame selec-

tion using euclidean dissimilarity measure. The algo-

rithm is applied at training with labeled frames to estimate

the number and indexes of key frames across views and

modalities.

5. Experimental Results and Analysis

Experiments are performed to validate the feature selec-

tion, the keyframe set size (i.e., number of states) repre-

senting a transition, and the summarization performance of

DECU in a real and a mock-up ICU environment.

5.1. Static Pose Analysis - Feature Validation

Static sleep-pose analysis is used to compare the DECU

method to previous studies. Couple-Constrained Least-

Squares (cc-LS) and DECU are tested on the dataset from

Figure 6: Selection key frames for the represention of transitions between two poses. The key frame selection is based on

Algorithm 1. This ﬁgure show an example of how the algorithm is used to identify ﬁve key frames from three views and

two modalities. In this example, the ﬁrst two key frames are extracted from the RGB video from the ﬁrst camera (View 1).

Subsequent key frames are selected from the depth video from the second camera (View 2) and from the RGB video from

the third camera (View 3).

Feature Suitability Evaluation with cc-LS [21]

Scene HOG +

gMOM Vgg Inception

BC 100 100 100

DO 65 69 (+4) 73 (+8)

Table 3: Evaluation of deep features for sleep-pose recog-

nition tasks using the cc-LS method from [21] in dark and

occluded (DO) scenes using. The performance of HOG and

gMOM is compared to the performance of the Vgg and In-

ception features.

[21]. Combining the cc-LS method with deep features ex-

tracted from two common network architectures improved

classiﬁcation performance over the HOG and gMOM fea-

tures in dark and occluded (DO) scenes by an average of

eight percent with Inception and four percent with Vgg.

Deep features matched the performance of cc-LS (with

HOG and gMOM) for a bright and clear (BC) scenario

shown in Table 3.

5.2. Key Frame Performance

The key frame set size (|KF |= 5) and key frame dis-

similarity threshold (th ≥.8) affects DECU performance.

Figure 7 shows the effect of these parameters.

Figure 7: Performance of the DECU framework for the ﬁne

motion summarization based on the number of key frames

used to represent transitions and rotations between poses.

5.3. Summarization Performance

The mock-up ICU allows staging the motion and scene

condition variations without disturbing patients in the med-

ical ICU. A sample test sequence is shown in Figure 2(c)

and summarization history results are shown in Figure 9 for

(a) the mock-up and (b) the real ICU environments. The

pose numerical symbols are shown in Table 4

5.4. Summarization History

History summarization is the coarser time resolution and

its overall objective is shown in Figure 8.

DECU: Pose History Summarization

Symbol Pose Name

0 Aspiration

+1 / -1 Soldier (+Up / -Down)

+2 / -2 Yearner (+R / -L)

+3 / -3 Log (+R, -L)

+4 / -4 Faller (+Up / -Down)

+5 / -5 Other / Background

+6 / -6 Fetal (+R / -L)

Table 4: Pose symbols and descriptions used for ICU pose

history summarization.

Figure 8: Pose history summarization log for patient motion

analysis in medical ICUs.

Pose History Summarization in the Mock-Up ICU.

This summarization requires two parameters: sampling rate

and pose duration. The experiments are executed with a

sampling rate of one second and pose duration of 10 sec-

onds. A pose is assigned a label if consistently detect 80

percent of the time, else the assigned label is ”other”. Poses

not consistently detected are ignored (low conﬁdence). The

mock-up experiment uses a randomly selected scene and

sequence of poses, which can range from two to ten poses.

The pose duration is also set at random and includes one

scene transition (BC to DO or DO to BC). A sample (long)

sequence is shown in Figure 2 (c) and its history summa-

rization performance is shown in Table 5 and Figure 9(a).

DECU: Pose History Summarization

Scene Average Detection Rate

BC 85

DO 76

Table 5: Pose history summarization performance (percent

accuracy) of the DECU framework in bright and clear (BC)

and dark and occluded (DO) scenes. The sequences are

composed of 10 poses with duration ranging from 10 sec-

onds to 1 minute. The sampling rate is one second.

Pose Transition Dynamics: Motion Direction. The

analysis and pose transitions and rotation directions are im-

portant to physical therapy and recovery rate analysis. The

performance of DECU summarizing ﬁne motion to describe

transitions between poses for a bright and clear scene and a

dark and occluded scene are shown in Figs. 10 and 11. Re-

sults for each of the ﬁgures are shown for (a) singleview

and (b) multiview data. The bottom row (c) shows the gray

scale and the color-font legend.

Summarization of Transitions in the real-ICU Note

that it is logistically impossible to control ICU work ﬂows

and to account for unpredictable patient motion. ICU pa-

tients are not free to rotate, which reduces the set of pose

transitions (unavailable transitions are marked N/A). The

set of poses for the history summary require that a new pose

be included (aspiration). Figure 8 (b) shows the overall clin-

ical objective behind the pose history summarization.

The real medical ICU environment is shown in Figure

12 (a). DECU’s ﬁne motion summarization results for two

patients are shown in Figure 12(b) and the quantiﬁed detec-

tion accuracies are shown in Figure 12 (c). The blue trace

represents the true transition labels and the red trace indi-

cates the predicted labels. Table 4 has the pose symbols and

descriptions used in the summarization plot.

6. Conclusion

This work introduced the DECU framework to analyze

patient poses in natural healthcare environments at two mo-

tion resolutions. Extensive experiments and evaluation of

the framework indicate that the detection and quantiﬁcation

of pose dynamics is possible. The DECU system and mon-

itoring algorithms are currently being tested in real ICU en-

vironments. The performance results presented in this study

support its potential applications and beneﬁts to healthcare

analytics. The system is non-disruptive and non-intrusive.

It is robust to variations in illumination, view, orientation,

and partial occlusions. DECU is non-obtrusive and non-

intrusive but not without a cost. The cost is noticed in the

most challenging scenario where a blanket and poor illu-

mination block sensor measurements. The performance of

DECU to monitor pose transitions in dark and occluded en-

vironments is far from perfect; however, most medical ap-

plications that analyze motion transitions, such as physical

therapy sessions, are carried under less severe conditions.

Future Work Future studies will investigate the recog-

nition and analysis patient motion in similar challenging

scenarios using recurrent neural networks, incorporate ad-

ditional modalities, and integrate natural language under-

standing to analyze ICU events.

(a)

(b)

Figure 9: Performance of DECU pose history summarization in a the mock-up ICU with bright and clear conditions over a

10-minute time-span (a) and in the real ICU using multimodal data under natural scene conditions over a two-hr time-span

(b). Note that the set of patient poses is reduced for the real ICU and the summarization performance is limited to a maximum

session of two hours to avoid disrupting the Braden-scale protocol.

(a)

(b)

(c)

Figure 10: Performance of DECU in the mock-up ICU under bright and clear conditions. Detection results are obtained using

(a) single view and (b) multiview data. The cells are gray scaled to indicate detection accuracy. The color coded scale and

the legend are shown in (c). Note that overall detection rates increase with longer rotation angles and decrease when rotation

motion requires the actors to face the bed (i.e., cameras record actor backs).

(a)

(b)

(c)

Figure 11: Performance of DECU in the mock-up ICU under dark and occluded conditions. Detection results are obtained

using (a) single view and (b) multiview data. The cells are gray scaled to indicate detection accuracy. The color coded

scale and the legend are shown in (c). Again, detection rates increase with longer rotation angles and decrease when rotation

motion requires the actors to face the bed (i.e., cameras record actor backs).

(a)

(b)

(c)

Figure 12: Performance of DECU pose transition summarization in a real ICU shown in (a) using multimodal data under

natural scene conditions. The detection scores are shown in (b), where the cells are gray scaled to indicate detection accuracy.

The font color indicates rotation angle range and N/A indicated the pose is not available. The number of poses is reduced

due to patient health conditions and inability to move. The grading color scale and font-color legend are shown in (c).

Acknowledgements. Research is funded in part by the

Army Research Laboratory under Cooperative Agreement

Number W911NF-09-2-0053 (the ARL Network Science

CTA). The views and conclusions contained in this docu-

ment are those of the authors and should not be interpreted

as representing the ofﬁcial policies, either expressed or im-

plied, of the Army Research Laboratory or the U.S. Govern-

ment. The U.S. Government is authorized to reproduce and

distribute reprints for Government purposes notwithstand-

ing any copyright notation here on. The authors want to

thank Richard Beswick, PhD (Director of Research), Paula

Gallucci (Medical ICU Nurse Manager), Mark Mullenary

(Director Biomedical Engineering), and Leilani Price, PhD

(IRB Administrator) from Santa Barbara Cottage Hospital

for their help and patience identifying and recruiting pa-

tients and ensuring HIPAA compliance.

References

[1] M. Baccouche, F. Mamalet, C. Wolf, C. Garcia, and

A. Baskurt. Sequential deep learning for human action

recognition. In Springer Int’l Workshop on Human Behavior

Understanding, 2011.

[2] S. Bihari, R. D. McEvoy, E. Matheson, S. Kim, R. J. Wood-

man, and A. D. Bersten. Factors affecting sleep quality of

patients in intensive care unit. Journal of Clinical Sleep

Medicine, 2012.

[3] G. Ch´

eron, I. Laptev, and C. Schmid. P-cnn: Pose-based

cnn features for action recognition. In IEEE Int’l Conf. on

Computer Vision (ICCV), 2015.

[4] T. Giraud, J.-f. Dhainaut, J.-f. Vaxelaire, T. Joseph,

D. Journois, G. Bleichner, J.-p. Sollet, S. Chevret, and J.-f.

Monsallier. Iatrogenic complications in adult intensive care

units: a prospective two-center study. Critical care medicine,

21(1):40–51, 1993.

[5] R. I. Hartley and A. Zisserman. Multiple View Geometry in

Computer Vision. Cambridge Univ. Press, 2nd edition, 2004.

[6] E. Hoque and J. Stankovic. Aalo: Activity recognition in

smart homes using active learning in the presence of over-

lapped activities. In IEEE Int’l Conf. on Pervasive Com-

puting Technologies for Healthcare (PervasiveHealth) and

Workshops, 2012.

[7] W. Huang, A. A. P. Wai, S. F. Foo, J. Biswas, C.-C. Hsia,

and K. Liou. Multimodal sleeping posture classiﬁcation. In

IEEE Int’l Conf. on Pattern Recognition (ICPR), 2010.

[8] C. Idzikowski. Sleep position gives personality clue. BBC

News (September 16), 2003.

[9] C. Lea, J. Facker, G. Hager, R. Taylor, and S. Saria. 3d sens-

ing algorithms towards building an intelligent intensive care

unit. AMIA summits on translational science proceedings,

2013.

[10] S. Morong, B. Hermsen, and N. de Vries. Sleep position

and pregnancy. In Positional Therapy in Obstructive Sleep

Apnea. Springer, 2015.

[11] P. E. Morris. Moving our critically ill patients: mobility bar-

riers and beneﬁts. Critical care clinics, 2007.

[12] S. Obdrˇ

z´

alek, G. Kurillo, J. Han, T. Abresch, R. Bajcsy,

et al. Real-time human pose detection and tracking for tele-

rehabilitation in virtual reality. Studies in health technology

and informatics, 2012.

[13] N. Padoy, D. Mateus, D. Weinland, M.-O. Berger, and

N. Navab. Workﬂow monitoring based on 3d motion fea-

tures. In IEEE Int’l Conf. on Computer Vision Workshops

(ICCV Workshops), 2009.

[14] L. R. Rabiner. Ieee proc. a tutorial on hidden markov models

and selected applications in speech recognition. 1989.

[15] S. Ramagiri, R. Kavi, and V. Kulathumani. Real-time multi-

view human action recognition using a wireless camera net-

work. In ACM/IEEE Int’l Conf. on Distributed Smart Cam-

eras (ICDSC), 2011.

[16] C. Sahlin, K. A. Franklin, H. Stenlund, and E. Lindberg.

Sleep in women: normal values for sleep stages and position

and the effect of age, obesity, sleep apnea, smoking, alcohol

and hypertension. Sleep medicine, 2009.

[17] K. Simonyan and A. Zisserman. Very deep convolutional

networks for large-scale image recognition. arXiv preprint

arXiv:1409.1556, 2014.

[18] L. Soban, S. Hempel, B. Ewing, J. N. Miles, and L. V.

Rubenstein. Preventing pressure ulcers in hospitals. Joint

Commission Journal on Quality and Patient Safety, 2011.

[19] B. Soran, A. Farhadi, and L. Shapiro. Generating notiﬁca-

tions for missing actions: Don’t forget to turn the lights off!

In IEEE Int’l Conference on Computer Vision (ICCV), 2015.

[20] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed,

D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich.

Going deeper with convolutions. In IEEE Conf. on Computer

Vision and Pattern Recognition (CVPR), 2015.

[21] C. Torres, V. Fragoso, S. D. Hammond, J. C. Fried, and B. S.

Manjunath. Eye-cu: Sleep pose classiﬁcation for healthcare

using multimodal multiview data. In IEEE Winter Conf. on

Applications of Computer Vision (WACV), 2016.

[22] C. Torres, S. D. Hammond, J. C. Fried, and B. S. Manjunath.

Multimodal pose recognition in an icu using multimodal data

and environmental feedback. In Springer Int’l Conf. on Com-

puter Vision Systems (ICVS), 2015.

[23] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri.

Learning spatiotemporal features with 3d convolutional net-

works. In IEEE Int’l Conf. on Computer Vision (ICCV),

2015.

[24] V. Veeriah, N. Zhuang, and G.-J. Qi. Differential recurrent

neural networks for action recognition. In IEEE Int’l Con-

ference on Computer Vision (ICCV), 2015.

[25] C. L. von Baeyer, M. E. Johnson, and M. J. McMillan. Con-

sequences of nonverbal expression of pain: Patient distress

and observer concern. Social Science & Medicine, 1984.

[26] G. L. Weinhouse and R. J. Schwab. Sleep in the critically ill

patient. Sleep-New York Then Westchester, 2006.

[27] C. Wu, A. H. Khalili, and H. Aghajan. Multiview activity

recognition in smart homes with spatio-temporal features.

In ACM/IEEE Int’l Conf. on Distributed Smart Cameras

(ICDSC), 2010.

Potentials and Challenges of Pervasive Sensing in the Intensive Care Unit

Article

Full-text available

May 2022

Patients in critical care settings often require continuous and multifaceted monitoring. However, current clinical monitoring practices fail to capture important functional and behavioral indices such as mobility or agitation. Recent advances in non-invasive sensing technology, high throughput computing, and deep learning techniques are expected to transform the existing patient monitoring paradigm by enabling and streamlining granular and continuous monitoring of these crucial critical care measures. In this review, we highlight current approaches to pervasive sensing in critical care and identify limitations, future challenges, and opportunities in this emerging field.

Design of a Diagnostic System for Patient Recovery Based on Deep Learning Image Processing: For the Prevention of Bedsores and Leg Rehabilitation

Article

Full-text available

Jan 2022

Worldwide COVID-19 infections have caused various problems throughout different countries. In the case of Korea, problems related to the demand for medical care concerning wards and doctors are serious, which were already slowly worsening problems in Korea before the COVID-19 pandemic. In this paper, we propose the direction of developing a system by combining artificial intelligence technology with limited areas that do not require high expertise in the rehabilitation medical field that should be improved in Korea through the prevention of bedsores and leg rehabilitation methods. Regarding the introduction of artificial intelligence technology, medical and related laws and regulations were quite limited, so the actual needs of domestic rehabilitation doctors and advice on the hospital environment were obtained. Satisfaction with the test content was high, the degree of provision of important medical data was 95%, and the angular error was within 5 degrees and suitable for recovery confirmation.

Multimodal Learning for Automatic Summarization: A Survey

Chapter

Nov 2023

With the widespread availability of multiple data sources, such as image, audio-video, and text data, automatic summarization of multimodal data is becoming an important technology in decision support. This paper presents a comprehensive survey and summary of the main articles in the field of multimodal summarization techniques in recent years. Firstly, we define multimodal summarization and briefly describe the development process. Then, we survey existing techniques and their applicability in different domains. Additionally, we provide an analysis of their results and discuss the insights of those approaches, along with the challenges and future research directions. Based on our study, we found that the encoder-decoder approach is currently the best approach for automated summarization. In the future, we believe that the applications of multimodal summarization could develop rapidly in many different fields, particularly in medicine. In our case studies, we demonstrate that multimodal learning is a promising research direction for providing timely and accurate summarizations compared to unimodal approaches.

Going deeper with convolutions

Conference Paper

Full-text available

Jun 2015

Differential Recurrent Neural Networks for Action Recognition

Conference Paper

Full-text available

Dec 2015

Eye-CU: Sleep Pose Classification for Healthcare using Multimodal Multiview Data

Article

Full-text available

Feb 2016

Manual analysis of body poses of bed-ridden patients requires staff to continuously track and record patient poses. Two limitations in the dissemination of pose-related therapies are scarce human resources and unreliable automated systems. This work addresses these issues by introducing a new method and a new system for robust automated classification of sleep poses in an Intensive Care Unit (ICU) environment. The new method, coupled-constrained Least-Squares (cc-LS), uses multimodal and multiview (MM) data and finds the set of modality trust values that minimizes the difference between expected and estimated labels. The new system, Eye-CU, is an affordable multi-sensor modular system for unobtrusive data collection and analysis in healthcare. Experimental results indicate that the performance of cc-LS matches the performance of existing methods in ideal scenarios. This method outperforms the latest techniques in challenging scenarios by 13% for those with poor illumination and by 70% for those with both poor illumination and occlusions. Results also show that a reduced Eye-CU configuration can classify poses without pressure information with only a slight drop in its performance.

P-CNN: Pose-based CNN Features for Action Recognition

Article

Full-text available

Jun 2015

This work targets human action recognition in video. While recent methods typically represent actions by statistics of local video features, here we argue for the importance of structural information derived from human poses. To this end we propose a new Pose-based Convolutional Neural Network descriptor (P-CNN) for action recognition. The descriptor aggregates motion and appearance information along tracks of human body parts. We investigate different schemes of temporal aggregation and experiment with P-CNN features obtained both for automatically estimated and manually annotated human poses. We evaluate our method on the recent and challenging JHMDB and MPII Cooking datasets. For both datasets our method shows consistent improvement over the state of the art.

Learning spatiotemporal features with 3d convolutional networks

Article

Jan 2015

Sleep Pose Recognition in an ICU Using Multimodal Data and Environmental Feedback

Conference Paper

Jul 2015

Clinical evidence suggests that sleep pose analysis can shed light onto patient recovery rates and responses to therapies. In this work, we introduce a formulation that combines features from multimodal data to classify human sleep poses in an Intensive Care Unit (ICU) environment. As opposed to the current methods that combine data from multiple sensors to generate a single feature, we extract features independently. We then use these features to estimate candidate labels and infer a pose. Our method uses modality trusts – each modality’s classification ability – to handle variable scene conditions and to deal with sensor malfunctions. Specifically, we exploit shape and appearance features extracted from three sensor modalities: RGB, depth, and pressure. Classification results indicate that our method achieves 100 % accuracy (outperforming previous techniques by 6 %) in bright and clear (ideal) scenes, 70 % in poorly illuminated scenes, and 90 % in occluded ones.

Generating Notifications for Missing Actions: Don't Forget to Turn the Lights Off!

Conference Paper

Dec 2015

Sequential Deep Learning for Human Action Recognition

Conference Paper

Nov 2011

We propose in this paper a fully automated deep model, which learns to classify human actions without using any prior knowledge. The first step of our scheme, based on the extension of Convolutional Neural Networks to 3D, automatically learns spatio-temporal features. A Recurrent Neural Network is then trained to classify each sequence considering the temporal evolution of the learned features for each timestep. Experimental results on the KTH dataset show that the proposed approach outperforms existing deep models, and gives comparable results with the best related works.

Very Deep Convolutional Networks for Large-Scale Image Recognition

Article

Sep 2014

In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively.

Sleep Position and Pregnancy

Chapter

Oct 2014

Sleep-disordered breathing (SDB) has been shown to be associated with negative clinical sequelae such as systemic hypertension and cardiovascular disease. Pregnant patients can also be diagnosed with SDB, the negative consequences of which not only pertain to the patient but to the unborn fetus as well. Despite this, however, SDB is underdiagnosed in pregnant patients. In this chapter, we will discuss the physiologic respiratory changes that occur during pregnancy, SDB in pregnancy, supine hypotensive syndrome (SHS), the complications and current treatments for these events, and the potential role for positional therapy in pregnant women whose problems may be specifically position dependent.

Summarization of ICU Patient Motion from Multimodal Multiview Videos

Abstract and Figures

Recommended publications

Deep Eye-CU (DECU): Summarization of Patient Motion in the ICU

A Multiview Multimodal System for Monitoring Patient Sleep

Eye-CU: Sleep Pose Classification for Healthcare using Multimodal Multiview Data

Healthcare Event and Activity Logging