ArticlePDF Available

Shot Type Constraints in UAV Cinematography For Autonomous Target Tracking

January 2020
Information Sciences 506

January 2020
506

DOI:10.1016/j.ins.2019.08.011

Authors:

Iason Karakostas

Aristotle University of Thessaloniki

Ioannis Mademlis

Harokopio University

Nikos Nikolaidis

Aristotle University of Thessaloniki

During the past years, camera-equipped Unmanned Aerial Vehicles (UAVs) have revolutionized aerial cinematography, allowing easy acquisition of impressive footage. In this context, autonomous functionalities based on machine learning and computer vision modules are gaining ground. During live coverage of outdoor events, an autonomous UAV may visually track and follow a specific target of interest, under a specific desired shot type, mainly adjusted by choosing appropriate focal length and UAV/camera trajectory relative to the target. However, the selected UAV/camera trajectory and the object tracker requirements (which impose limits on the maximum allowable focal length) affect the range of feasible shot types, thus constraining cinematography planning. Therefore, this paper explores the interplay between cinematography and computer vision in the area of autonomous UAV filming. UAV target-tracking trajectories are formalized and geometrically modeled, so as to analytically compute maximum allowable focal length per scenario, to avoid 2D visual tracker failure. Based on this constraint, formulas for estimating the appropriate focal length to achieve the desired shot type in each situation are extracted, so as to determine shot feasibility. Such rules can be embedded into practical UAV intelligent shooting systems, in order to enhance their robustness by facilitating on-the-fly adjustment of the cinematography plan.

Examples of different target-tracking UAV camera motion types: a) Lateral Tracking Shot (LTS); b) Vertical Tracking Shot (VTS); c) Moving Aerial Pan with Moving Target (MAPMT); d) Moving Aerial Tilt with Moving Target (MATMT); e) Fly-By (FLYBY); f) Fly-Over (FLYOVER); g) Chase/Follow (CHASE); and h) Orbit (ORBIT) .

…

ROI translation between two consecutive video frames for time instance t and t = t + 1. The distance between the central pixels of the two ROIs, R can be calculated by employing the results of Eqs. (22) and (23).

…

The expected against the actual target position in the (t + 1)-th time instance, for the 8 simulated cases. TCS i and j axes are denoted by black and grey color, respectively.

…

Target velocity deviation vectors as seen from the UAV camera, when the camera axis lies on: a) the j-axis and b) the i-axis. Black dot denotes target expected position. Black vectors correspond to cases 1 and 2, grey vectors to cases 3 and 4 and, finally, the dashed lined vectors to cases 5-8. In a) target velocity deviation on the j-axis will affect less the fmax than target linear speed changes, while in b) the opposite.

…

Simulation results for ORBIT motion type: fmax against θ 0 .

…

Figures - uploaded by Ioannis Mademlis

Content may be subject to copyright.

Content uploaded by Ioannis Mademlis

Content may be subject to copyright.

Shot Type Constraints in UAV Cinematography For

Autonomous Target Tracking

Iason Karakostas*, Ioannis Mademlis*, Nikos Nikolaidis and Ioannis Pitas

Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece

Abstract

During the past years, camera-equipped Unmanned Aerial Vehicles (UAVs) have rev-

olutionized aerial cinematography, allowing easy acquisition of impressive footage.

In this context, autonomous functionalities based on machine learning and computer

vision modules are gaining ground. During live coverage of outdoor events, an au-

tonomous UAV may visually track and follow a speciﬁc target of interest, under a

speciﬁc desired shot type, mainly adjusted by choosing appropriate focal length and

UAV/camera trajectory relative to the target. However, the selected UAV/camera trajec-

tory and the object tracker requirements (which impose limits on the maximum allow-

able focal length) affect the range of feasible shot types, thus constraining cinematog-

raphy planning. Therefore, this paper explores the interplay between cinematography

and computer vision in the area of autonomous UAV ﬁlming. UAV target-tracking

trajectories are formalized and geometrically modeled, so as to analytically compute

maximum allowable focal length per scenario, to avoid 2D visual tracker failure. Based

on this constraint, formulas for estimating the appropriate focal length to achieve the

desired shot type in each situation are extracted, so as to determine shot feasibility.

Such rules can be embedded into practical UAV intelligent shooting systems, in order

to enhance their robustness by facilitating on-the-ﬂy adjustment of the cinematography

plan.

Keywords: UAV cinematography, shot type, target tracking, autonomous drones

1. Introduction

Automation in applications involving cinematic video footage (e.g., TV/movie pro-

duction, outdoor event coverage, advertising, etc.) is constantly improving, both in the

post-production stage (e.g., shot cut/scene change detection [26], automated editing [3]

or framing [1], etc.) and during production (e.g., [6]). Relevant algorithms typically

1*The ﬁrst two authors contributed equally and are joint ﬁrst authors.

22019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license

http://creativecommons.org/licenses/by-nc-nd/4.0/

Preprint submitted to Journal of L

X Templates August 10, 2019

utilize expert knowledge about the ﬁlm creative process and the cinematic grammar, in

order to assist in footage shooting, indexing, annotation, and/or post-processing.

While ﬁlming, the most important creative decisions made by the director pertain

to the shot type and the camera motion type. The shot type is deﬁned mainly by the per-

centage of the video frame area covered by the target being ﬁlmed. In traditional ﬁlm

grammar the target is assumed to be a human subject, but this is not strictly necessary

(for instance, it can be a static or moving vehicle). If the distance between the target

and the camera remains constant, the shot type is controlled primarily by changing the

camera focal length f, hence adjusting the zoom level. The camera motion type refers

to the camera motion trajectory relative to the target for the duration of a shot.

Despite the presence of a large body of research dedicated to automated shot type

and camera motion type recognition in existing footage during post-production (e.g.,

[37] [4] [11] [8]), little work has been performed on autonomously capturing new

videos with desired shot type/camera motion type combinations. Such methods are

typically given the label of intelligent shooting. In dynamic environments, relevant ap-

proaches require robotic cameras that partially rely on real-time machine learning and

computer vision algorithms, for visually detecting/tracking [25] [38] [19] [27] [31] [32]

and physically following a speciﬁc desired target (e.g., the lead athlete in a race). How-

ever, to the best of our knowledge, the interplay between 2D visual tracker operation

and cinematographic properties, i.e., shot type and camera motion type, has not been

thoroughly investigated.

An important issue from this respect is determining the range of feasible shot types

at each time point, so that visual tracking algorithms do not fail. The selected shot

type severely affects the perceived 2D displacement of a moving target image between

consecutive video frames, due to the effects of zooming. Thus, real-time visual object

tracking [18] is heavily inﬂuenced by cinematography decisions, given that virtually all

trackers search a restricted video frame region for the next target instance, positioned

around the previously found one. Although the size of this search region in pixels is

partially adaptive, according to the target’s image area on the previous video frame, it

is practically limited by the video frame dimensions. Thus, the shot type requested by

the director for a particular scenario at a certain time instance may not be feasible, de-

pending on the speciﬁcs of the target and the camera motion velocities and trajectories.

Vertical Take-off and Landing (VTOL) Unmanned Aerial Vehicles (UAVs, or “drones”)

equipped with professional cameras have recently become an indispensable asset in the

cinematographer’s arsenal. They permit rapid capture of impressive footage, ﬂexible

shot setup, novel shot types and access to narrow or hard-to-reach spaces, at a small

fraction of the cost associated with spidercams, helicopters and cranes. Essentially,

they provide a level of camera motion freedom that, so far, was only available in an-

imation. Typically, in professional productions, the UAV and its mounted camera are

manually remote-controlled by two different operators, acting in synchronization under

a rough cinematography plan deﬁned by the director. The latter can be conceived as a

sequence of desired target assignments, shot types and UAV/camera motion trajectories

relative to the target.

There is, however, a growing trend of increasing automation in drone functions,

so as to reduce the challenges arising from fully manual operation [21] [24]. This

is especially important in cinematography applications, where great precision and co-

ordination may be required in order to properly capture the desired shot. Thus, in

the near future, production costs are expected to be signiﬁcantly reduced, with semi-

autonomous or fully autonomous drones replacing human crews currently required and

shifting production focus to the direct realization of the director’s creative vision, rather

than the minutiae of drone operation.

Autonomous UAV ﬁlming is, therefore, a promising emerging offshoot of intelli-

gent shooting with potentially exceptional industrial impact. However, challenges such

as tracking fast and unpredictably moving targets in real-time, as well as the lack of

standardization in UAV shot types and meaningful UAV/camera motion trajectories,

are a reality interfering with the ability to on-the-ﬂy adjust the cinematography plan,

according to dynamic environment conditions. The restrictions imposed on the feasi-

ble shot types by the requirements of the 2D visual tracker, especially, are particularly

signiﬁcant for autonomous UAVs, when contrasted with indoor robotic cameras, due to

the possibly higher target speed in outdoor settings and the increased camera mobility

offered by a drone.

Therefore, although the above apply to autonomous ﬁlming in general, this pa-

per focuses on outdoor target-following UAV cinematography applications (e.g., for

live sports event coverage). By signiﬁcantly extending preliminary work [23] [40]

[20] [22], it presents a theoretical study of the constraints imposed on cinematography

decision-making during autonomous UAV shooting. The contributions of this paper

are:

•Formalizing and geometrically modelling a range of common, target-following

UAV motion types.

•Analytically determining the maximum permissible camera focal length fmax,

so that 2D visual object tracking does not get lost, for each UAV motion type.

•Extracting formulas for determining the feasibility of the requested shot type

(dependent on fmax and on the appropriate focal length fsfor that shot type).

•Providing speciﬁc examples and simulated scenarios that showcase the practical

applicability of the proposed study.

Current industry practice simply ignores constraints implicitly imposed on zoom

level/shot type by 2D visual tracker requirements. This is problematic, since it dis-

regards the possibility of the target ROI going out of frame (or simply getting too

spatially displaced in 2D pixel coordinates) among consecutive time instances, due to

the target’s abrupt 3D motion and too high a focal length, thus breaking visual track-

ing. Therefore, to the best of our knowledge, our proposed, analytically derived rule

set marks the ﬁrst time this issue is studied in-depth in the context of autonomous UAV

cinematography.

Incorporating shot type permissibility rules into media production automation soft-

ware, such as intelligent UAV shooting algorithms [15] [16] [30] [35], is expected

to greatly enhance the robustness of autonomous drones deployed in cinematography

applications, by facilitating tracker-aware on-the-ﬂy adjustment of the pre-computed

cinematography plan.

Table 1: Shot types and their corresponding ROI to video frame height ratio percentage.

Shot type Video frame height coverage

Extreme Long Shot (ELS) <5%

Very Long Shot (VLS) 5−20%

Long Shot (LS) 20 −40%

Medium Shot (MS) 40 −60%

Medium Close-Up (MCU) 60 −75%

Close-Up (CU) >75%

2. UAV Cinematography Modelling

In cinematography, each camera motion type can be combined with a subset of the

available shot types, so as to achieve an aesthetically pleasing visual result. Thus, a

shot can be described by the combination of a camera motion type and a shot type.

Below, shot types and camera motion types are studied for the speciﬁc case of UAV

cinematography.

Each shot type is mainly deﬁned by the ratio of the Region-of-Interest (ROI) height

to the video frame height. The ratio can vary from less than 5% for the Extreme Long

Shot, to more than 75% for Close-Up shot. The taxonomy presented in Table 1 is

derived/adapted from traditional ground and aerial cinematography [5] [7] [34], based

on extensive visual inspection of professional and semi-professional UAV footage.

In a typical scenario, the on-board camera is mounted on a gimbal that allows rapid

camera rotation around its yaw, pitch and roll axes. Additionally, a zoom lens with

adjustable focal length f(within certain limits) is employed. Simply altering fis

typically sufﬁcient for achieving the shot type desired by the director and prescribed

in the cinematography plan. Thus, any constraints on the maximum permissible focal

length directly correspond to restrictions in the range of feasible shot types at each time

instance.

Regarding UAV/camera motion, several industry-standard types have emerged since

the popularization of UAVs, with most of them being derived/adapted from traditional

ground and aerial cinematography. For outdoor events (e.g., in live sports broadcast-

ing), the most important motion types are relative to a still or moving target being

tracked.

Recent aerial videography literature [7] [34] contains a description of a few such

UAV motion types. However, no systematic analysis has been presented in the literature

so far. Below, 8 UAV industry-standard camera motion types are detailed, geometri-

cally modelled and matched to compatible shot types, based on our extensive visual

survey of professional UAV footage. For instance, in a Chase shot (where the UAV

follows/leads a moving target from behind/from the front, while maintaining a steady

distance), the viewer is meant to experience a “simulation” of the target motion within

its environment, while the target is fully visible. Thus, a CU that excludes most of

the surroundings from the video frame is an unsuitable shot type in this context. Such

ﬁndings are summarized in Table 2.

The mathematical treatment in this paper assumes a realistic setting similar to [35],

where the autonomous UAV operates in a consistent, global, Cartesian 3D map, upon

Table 2: Compatibility of UAV camera motion and shot types.

Camera motion Shot types

MAPMT LS, MS, MCU

MATMT LS, MS

LTS VLS, LS, MS, MCU

VTS VLS, LS, MS, MCU

ORBIT LS, MS, MCU, CU

FLYOVER LS, MS, MCU, CU

FLYBY LS, MS, MCU, CU

CHASE VLS, LS, MS

which both the drone itself and the target are constantly localized. This can be achieved

by employing Global Positioning System (GPS) receivers [10] on both the UAV and

the target. For increased robustness, GPS-derived drone localization information can

be aligned and fused with Visual SLAM results [28], preferably derived by jointly

exploiting stereoscopic 3D camera and Inertial Measurement Unit (IMU) [29] inputs,

based on a similarity transformation [13]. Issues such as the possibility of temporarily

losing the GPS signal, or the usual GPS position error (in the range of up to 5 me-

ters [10]), may be overcome by fusing IMU/GPS and Visual SLAM localization, or

by replacing GPS with an Active Radio-Frequency IDentiﬁcation (RFID) positioning

system [14]. Regarding the target, the output of 2D visual tracking itself can also be

exploited for augmenting target localization precision (assuming a calibrated camera),

thus making it even more imperative to reduce the chance of visual tracker failure.

Below, given a camera frame-rate F, time tis discrete and proceeds in steps of

Fseconds. A separate timeline is employed for each shot description, i.e., t= 0

indicates the start of a shot shooting session. At each time instance t, the 3D positions

xt= [˜xt1,˜xt2,˜xt3]T,˜

pt= [˜pt1,˜pt2,˜pt3]Tof the UAV and the target respectively

(assuming they are 3D points), as well as an estimated 3D target velocity vector ˜

ut, are

assumed known (as in [35]) in a ﬁxed, orthonormal, right-handed World Coordinate

System (WCS), ˜

i,˜

j,˜

kwith its ˜

k-axis perpendicular to a local tangent plane (hereafter

shortened to “ground plane”). A local East-North-Up (ENU) coordinate system may be

employed [9]. Note that the term “local tangent plane” is employed for a plane parallel

to the local sea level, while the term “terrain tangent plane” is reserved for the plane

instantaneously tangent to the local terrain surface.

Additionally, at each time instance t, a current, orthonormal, right-handed target-

centered coordinate system (TCS), i,j,k, is deﬁned. Its origin lies on the current

target position, its k-axis is perpendicular to the ground plane and its i-axis is the L2-

normalized projection of the current target velocity vector onto the ground plane. In the

case of a still target, the TCS i-axis is deﬁned as parallel to the projection of the vector

p0−˜

x0onto the ground plane. In both coordinate systems, the ij-plane is parallel to

the ground plane and the k-component is called “altitude”. Below, vectors expressed

in TCS are denoted without the tilde symbol (e.g., xt,pt,qtand ut).

Transforming between the two coordinate systems is trivial. A subset of the pre-

sented motion types require pre-speciﬁcation of motion parameters meant to adapt the

UAV motion trajectory to concrete directorial guidelines (e.g., distance to be covered

by the UAV).

In mobile robotics literature, an additional, vehicle-centered coordinate system is

typically employed, having its origin located at a ﬁxed distance from the UAV-mounted

camera. Since the scope of this paper does not include UAV control per se, we do not

make use of such a coordinate frame and limit our analysis to cinematography issues.

Additionally, for reasons of simplicity, the employed modelling ignores the distinction

between the drone and its mounted camera, since it is typically trivial to compute the

3D pose of the one given the other and gimbal feedback.

The 3D scene point where the camera looks at time instance t, is denoted by lt(in

TCS). The LookAt vector at time instance tis a scalar multiple of the camera axis and

denoted by ot=lt−xt(or ˜

ot, when expressed in WCS). Below, it is assumed that

lt=ptand, therefore, ot=−xt. As a result, the selected target point is visible at

the center of the video frame. This is a simple and common framing approach, called

“central composition”. Standard measurement units for the implicated quantities are

also assumed, i.e., distance is measured in meters, speed in meters per second and the

video frame-rate in frames per second.

In a number of cases, the UAV/camera motion type is only meaningful if the target

is moving linearly. Moreover, such an assumption is additionally made below in cases

where the future target or UAV position needs to be predicted, for reasons of modelling

convenience (these cases are appropriately marked in the following analysis). Constant

linear motion is assumed for both these scenarios, although extending the formulas

for the case of constantly accelerated linear motion is trivial (assuming that the target

acceleration vector can be reliably estimated).

The eight target-tracking UAV motion types are illustrated in Figure 1 and de-

scribed below:

1) Lateral Tracking Shot (LTS) [7] [34] and 2) Vertical Tracking Shot (VTS) are

non-parametric camera motion types, where the camera gimbal does not rotate and the

camera is directly locked on the moving target. In LTS, the camera axis is approxi-

mately perpendicular both to the local target trajectory and to the WCS vertical axis

vector ˜

k, while the UAV ﬂies sideways/in parallel to the target, matching its speed (if

possible). In VTS, the camera axis is perpendicular to the target trajectory and the

UAV ﬂies exactly above the target, matching its speed (if possible). In both cases, ˜

refers to a varying target position in WCS. During shooting, the UAV position remains

constant in TCS, but varies in WCS.

The base mathematical description for both these UAV/camera motion types is

fairly simple:

vt=˜

ut,˜

t˜

ut≈0,xt=xt−1,lt=pt,∀t. (1)

Additionally, the following relations hold for LTS and VTS, respectively:

ot×j≈0, x03 ≈0,(2)

tj≈0, x03 >0.(3)

a) b)

c) d)

e) f)

g) h)

Figure 1: Examples of different target-tracking UAV camera motion types: a) Lateral Tracking Shot (LTS); b)

Vertical Tracking Shot (VTS); c) Moving Aerial Pan with Moving Target (MAPMT); d) Moving Aerial Tilt

with Moving Target (MATMT); e) Fly-By (FLYBY); f) Fly-Over (FLYOVER); g) Chase/Follow (CHASE);

and h) Orbit (ORBIT) .

3) Moving Aerial Pan with Moving Target (MAPMT) and 4) Moving Aerial Tilt

with Moving Target (MATMT) are parametric camera motion types, where the cam-

era gimbal rotates (mainly with respect to the yaw/pitch axis, for MAPMT/MATMT,

respectively) so as to always keep the linearly moving target centrally framed, while

the UAV is ﬂying at a linear trajectory with constant velocity. ˜

ptrefers to the target

position, varying over time in such a manner that the target and the UAV velocity vec-

tor projections onto the ground plane are approximately perpendicular/parallel to each

other, for MAPMT/MATMT, respectively.

The drone velocity vector ˜

vt= [˜vt1,˜vt2,˜vt3]Tmust be speciﬁed. The base mathe-

matical description for both these UAV/camera motion types is given by:

vt=˜

vt−1,˜

xt=˜

x0+˜

Ft, lt=pt,∀t. (4)

Additionally, the following relations hold for MAPMT and MATMT, respectively:

[˜ut1,˜ut2,0][˜vt1,˜vt2,0]T≈0,(5)

[˜ut1,˜ut2,0]T×[˜vt1,˜vt2,0]T≈0.(6)

5) Fly-By (FLYBY) and 6) Fly-Over (FLYOVER) [34]. They are parametric camera

motion types, where the camera gimbal is rotating, so that the still or linearly mov-

ing target is always centrally framed. The UAV intercepts the target from behind/from

the front (and to the left/right, in the case of FLYBY), at a steady altitude (in TCS)

with constant velocity, ﬂies exactly above it/passes it by (for FLYOVER/FLYBY, re-

spectively) and keeps on ﬂying at a linear trajectory, with the camera still pointing at

the receding target. The UAV and target velocity vector projections onto the ground

plane remain approximately parallel during shooting. They can have either identical or

opposite direction. ˜

ptrefers to a varying or static target position in WCS.

The common parameter that must be speciﬁed is K, i.e., the time (in seconds) until

UAV is located exactly above the target (for FLYOVER), or until the distance between

the target and the UAV is minimized (for FLYBY). Additionally, the length dof the

projection of that minimum distance vector onto the ground plane, must be speciﬁed

for FLYBY. Below, the target velocity is assumed constant for reasons of modelling

convenience. The mathematical description common to both camera motion types is

the following one, for t∈[0,2KF ]:

v0= [u01 K−x01

K,0, u03]T,(7)

vt=˜

vt−1,˜

ut=˜

ut−1,lt=pt,∀t, (8)

xt=˜

x0+t

KF (˜

xKF −˜

x0),(9)

[˜ut1,˜ut2,0]T×[˜vt1,˜vt2,0]T≈0.(10)

Additionally, the following relations holds for FLYOVER:

xKF = [ ˜p01 + ˜u01 K, ˜p02 + ˜u02K, ˜x03 + ˜u03 K]T,(11)

xt2≈0,xT

tj≈0,∀t, (12)

and the following hold for FLYBY:

|x02|=d > 0, xt2=x02 ,∀t, (13)

xKF = [0, x02 , x03]T.(14)

7) Chase/Follow Shot (CHASE) is a non-parametric camera motion type, where the

camera gimbal does not rotate and the camera always points at the target [34]. The

UAV follows/leads the target from behind/from the front, while maintaining a steady

distance by matching its speed, if possible. ˜

ptrefers to a varying target position in

WCS. The mathematical description is the following:

vt≈˜

ut,(15)

xt2=x02 ≈0,xt=xt−1,lt=pt,∀t. (16)

8) Orbit (ORBIT). It is a parametric camera motion type, where the camera gimbal

is slowly rotating, so as to always keep the still or linearly moving target properly

framed, while the UAV (semi-)circles around the target and, simultaneously, follows

the target linear trajectory (if the target is moving) [7] [34]. During shooting, the UAV

altitude remains constant in TCS, but may vary in WCS. ˜

ptrefers to a varying or static

target position in WCS.

The parameters that must be speciﬁed are the desired 3D Euclidean distance d3D=

k˜

xt−˜

ptk2=kxtk2(constant over time), the rotation angle θaround the target and

the desired UAV angular velocity ω. Additionally, we can easily derive the initial angle

θ0formed by the TCS i-axis (of time instance t= 0) and the vector from p0to the

projection of the known initial position x0onto the TCS ij-plane. Then, ORBIT may

be described in TCS using a planar circular motion, for t∈[0,T θ

ω]:

θ0=arctan x02

x01 ,(17)

xt3=x03,∀t, (18)

λ=qλ2

3D−x2

t3,(19)

xt= [λcos (tω

F+θ0), λ sin (tω

F+θ0), xt3]T,(20)

lt=pt.(21)

3. Constraints on Maximum Focal Length

In order for a visual tracker to operate properly, the location (in pixel coordinates)

of the target ROI should differ no more than a threshold between successive video

frames/time instances. This requirement places a constraint on the maximum target

speed and on the maximum camera focal length f(the main factor determining max-

imum achievable zoom level), since a given 3D target displacement (in WCS) corre-

sponds to a greater 2D ROI displacement (in pixels) at a greater zoom level. Proper

estimation of the maximum allowable fin each shooting case is of utmost importance

in cinematography applications, since it directly affects the range of permissible shot

types.

Without loss of generality, we always consider time instance t= 0 and, thus,

examine an entire shooting session as a sequence of repeated transitions between the

“ﬁrst” (t= 0) and the “second” video frame (t+ 1 = 1). We also assume that the

target ROI center is always meant to be ﬁxed at the principal point (image center) of

all video frames (central composition). Target position ˜

ptis initially known and ˜

pt+1

can be predicted using the estimated velocity vector ˜

ut, i.e., ˜

pt+1 =˜

pt+˜

ut1

F. If

the prediction is accurate, the target ROI indeed remains at the center of the (t+ 1)-th

video frame.

In contrast, if the actual current target motion differs from the predicted one by the

unknown velocity deviation vector ˜

qt= [˜qt1,˜qt2,˜qt3]T, the target ROI at time t+ 1

has to be explicitly localized via 2D visual tracking (in pixel coordinates), so that it can

be exploited for 3D target position ˜

pt+1 estimation and/or for adjusting the framing.

Since ˜

qtand, therefore, ˜

pt+1 are unknown, the following analysis utilizes the TCS

deﬁned by the expected/predicted target position at time instance t+ 1.

Whenever ˜

qtis a non-zero vector and, therefore, prediction of ˜

pt+1 fails, the re-

sults of 2D visual tracking and actual ˜

pt+1 estimation must be employed for updating

the target velocity vector and, hopefully, achieving a better prediction during the next

time instance. Given that tracker behavior varies per algorithm, we simply assume a

maximum search radius Rmax (in pixels) deﬁning the video frame region within which

the tracked object ROI of time instance t+1 must lie, relatively to the video frame cen-

ter, in order to permit successful tracking. Thus, a distance Rt+1 between the actual

target ROI center of t+ 1 and the center of that video frame, where Rt+1 > Rmax,

implies tracking failure. The case where Rt+1 =Rmax marks the limit scenario where

the tracker marginally succeeds. Note that Rmax is not ﬁxed, since modern trackers

adapt the size of their search region to the current ROI size.

3.1. Maximum focal length

In order to ﬁnd the maximum focal length so that there is no target tracking failure,

we assume that the expected position of the target in TCS is always at [0,0,0]T. Let

ot=lt−xtbe the LookAt vector at time instance tand dt=px2

t1+x2

t2is the

distance between the target and the UAV, projected on the ij-plane.

Based on the above and the camera projection equations [36], the following hold:

xd(t+ 1) = ox−f

1(pt+1 −xt+1)

3(pt+1 −xt+1),(22)

yd(t+ 1) = oy−f

2(pt+1 −xt+1)

3(pt+1 −xt+1),(23)

where xd(t+ 1),yd(t+ 1) are the target center pixel coordinates at the time instance

(t+ 1),ox,oydeﬁne the image center in pixel coordinates and sx,sydenote the

pixel size (in mm) along the horizontal and vertical directions. r1,r2and r3refer,

respectively, to the ﬁrst, second and third row of the rotation matrix Rthat orients the

camera gimbal according to the LookAt vector.

In general, the coordinate transform matrix from TCS to the camera coordinate

system can be found by two rotations and one translation of the unit TCS vectors. The

required rotations are around the TCS k-axis and j-axis. Thus, Rcan be described as

follows [2]:

R=



cos(θz)cos(θy)−sin(θz)cos(θz)sin(θy)

sin(θz)cos(θy)cos(θz)sin(θz)sin(θy)

−sin(θy) 0 cos(θy)

,(24)

where θzand θyare the appropriate angles of rotation for Rzand Ryrespectively.

However, given that Ris an orthogonal change-of-basis matrix and that, in most of the

motion types, the UAV does not ﬂy exactly above the target, it is easier to obtain the

rows of Ras follows. Since the camera axis points directly at the target, the unit vector

of the k-axis for the Camera Coordinate System, i.e., r3, can be obtained from xt+1 as

follows:

r3=−xt+1

kxt+1 kT

.(25)

For motion types where the UAV does not ﬂy exactly above the target, r1is the cross

product of r3with the unit vector k:

1=k×−xt+1

kxt+1 kT

,(26)

r1=r0

kr0

1k.(27)

Thus, r2is given by the cross product r3×r1:

2=−xt+1

kxt+1 k×k×−xt+1

kxt+1 kT

,(28)

r2=r0

kr0

2k.(29)

In our approach we consider central composition, thus the target ROI center should

be located at (ox,oy) at all times. Assuming that in time instance tthe target ROI center

is aligned with the frame center, in time instance t0=t+ 1, the target ROI center will

be translated to a new pixel coordinates, due to camera and target movement in the real

world. The central pixel translation of the ROI, R, can be calculated by employing

Frame

ROIt

ROIt+1

Rt+1

Figure 2: ROI translation between two consecutive video frames for time instance tand t0=t+ 1. The

distance between the central pixels of the two ROIs, Rcan be calculated by employing the results of Eqs.

(22) and (23).

Eqs. (22) and (23), and simple geometrical rules, as depicted in Fig. 2. By setting

a maximum Rvalue, thus applying the limit constraint Rt+1 =Rmax, we derive the

following equation:

Rmax =q(xd(t+ 1) −ox)2+ (yd(t+ 1) −oy)2.(30)

Assuming that xt0= [xt01, xt02, xt03]Tand pt0= [qt1

F,qt2

F,qt3

F]T, where t0=t+ 1,

and substituting Eqs. (22) and (23) in Eq. (30), Rmax can be obtained by:

Rmax =sf2

max kxt0k2E2

+(qt3N−E2xt03)2

y(N+x2

t03)(31)

where

N= (x2

t01+x2

t02)

Eq. (31) can be solved for fto obtain the maximum focal length fmax for motion

types having dt0>0:

fmax =Rmaxdt0sxsy|E1+Fkxt0k2|

q(sxqt3d2

t0−sxxt03E2)2+s2

yE2

3kxt0k2

,(32)

where

E1=−qt1xt01−qt2xt02−qt3xt03,

E2=qt1xt01+qt2xt02,

E3=qt2xt01−qt1xt02.

Since most of the UAV motion types are not affected by target altitude changes

between successive video frames, which are less likely to happen than direction and

speed changes, pt0can be expressed as follows:

pt0= [qt1

F,qt2

F,0]T.(33)

In this case, the maximum focal length is given by:

fmax =Rmaxdt0sxsy| − E2+Fkxt0k2|

qs2

xE2

2x2

t03+s2

yE2

3kxt0k2

.(34)

When the UAV/camera is located exactly above the target for the (t+ 1)-th video

frame, i.e., xt0= [0,0, xt03]T,Rcannot be derived as described in Eqs. (25)-(29),

since r1×k=0. In this special case, where dt0= 0, it is easier to calculate the

rotation matrix using (24), for θz= 0 and θy= 180o:

R=



−1 0 0

010

0 0 −1

.(35)

Then, the maximum focal length is given by:

fmax =RmaxF xt03sxsy

qs2

yq2

t1+s2

xq2

.(36)

As it can be seen from the above, in general, the derived formulas rely on knowing,

predicting or estimating a velocity deviation vector qtthat models the degree to which

instantaneous target 3D motion differs from uniform linear motion. Several options are

available for obtaining qt. A reasonable choice would be to assume an instantaneously

constant acceleration vector at each time instance. A more strict policy would be to

derive fmax for various candidate velocity deviations, which displace the target towards

different spatial directions, and output the minimum among the computed fmax values.

3.2. Simulations for speciﬁc UAV/camera motion types

In order to investigate the maximum possible focal length for a speciﬁc motion

type shot, we simulated the motion for various representative UAV shooting scenarios.

We studied 8 different cases for the deviation vector qt. In the ﬁrst two cases, the

target linearly accelerates/decelerates, i.e., qt1= [7.5,0,0]T,qt2= [−7.5,0,0]T.

Velocity deviations are expressed in meters/second. In the third and fourth cases, the

target is moving along a different direction than the expected one (qt3= [0,7.5,0]T,

qt4= [0,−7.5,0]T), but remains on the TCS j-axis. In the remaining cases, the target

is moving diagonally to the TCS axes (qt5= [7.5,7.5,0]T,qt6= [−7.5,−7.5,0]T,

qt7= [−7.5,7.5,0]T,qt8= [7.5,−7.5,0]T). Figure 3 depicts the expected against

the actual position of the target in each case.

The following parameters have been used in the performed simulations. Maximum

tracker search radius Rmax was generously ﬁxed to 360 pixels, so as to model the

obvious constraint that the central target ROI pixel stays visible among consecutive

video frames (when using High Deﬁnition camera sensor), otherwise visual tracking

case 5

case 6 case 8

case 7 case 3

case 4

case 1case 2 expected

Figure 3: The expected against the actual target position in the (t+ 1)-th time instance, for the 8 simulated

cases. TCS iand jaxes are denoted by black and grey color, respectively.

fails. This is a hard upper bound on Rmax , thus bypassing the need for adaptive Rmax

in this set of experiments. The pixel size was set to sx=sy= 0.009 mm and video

frame rate to F= 25 fps. All of the experiments were carried out on a Linux PC

equipped with an Intel i7 CPU and 32 GB of RAM. However, the proposed rules can

be easily computed in real-time on an embedded system (e.g. nVidia Jetson, Intel NUC,

etc.), in conjunction with a fast 2D visual tracker.

3.2.1. Lateral Tracking Shot

In LTS, the UAV ﬂies alongside the target, as described in Section 2. In this

shot type, even small target altitude variations have a great impact on picture framing.

Therefore, we assume that qt36= 0. The UAV position is given by xt+1 = [0, xt2,0]T.

As pt+1 = [qt1

F,qt2

F,qt3

F]T, Eq. (32) can now be rewritten as follows:

fmax =Rmaxsxsy|qt2−F xt2|

qs2

yq2

t1+s2

xq2

.(37)

The LTS simulation was performed for varying values of qt3. The horizontal distance

between the UAV and the target was chosen to be λ=xt2= 30m. Simulation results

are shown in Figure 4. As expected, variations in altitude affect all study cases 1 - 8.

When the target deviates from its expected TCS position [0,0,0]T, but is located on

the j-axis, i.e., pt+1 = [0,qt2

F,0]T,fmax is only affected by altitude changes. This

behavior is reasonable, since the camera k-axis unit vector can be expressed in TCS

as kc= [0,−1,0]T. Consequently, the projected ROI center will not change in pixel

coordinates, therefore, this target deviation should have no impact at all on fmax, when

qt3= 0. The other results are affected by linear target acceleration/deceleration along

the TCS i-axis. As expected, fmax is maximized for these cases (1, 2 and 5 - 8) when

the target altitude does not vary between successive video frames. Due to the position

of the UAV, target acceleration and deceleration have identical impact on fmax.

-10 -5 0 5 10

qt3 (m/s)

500

1000

1500

2000

maximum focal length (mm)

Cases 3, 4

Cases 1, 2, 5 - 8

Figure 4: Simulation results for LTS: fmax against qt3.

20 40 60 80 100

xt3 (m)

500

1000

1500

maximum focal length (mm)

Cases 1, 2, 3, 4

Cases 5, 6, 7, 8

Figure 5: Simulation results for VTS: fmax against altitude (xt3).

3.2.2. Vertical Tracking Shot

In VTS, the UAV ﬂies exactly above the target, therefore, the maximum focal length

is given by Eq. (36). The UAV is positioned at xt+1 = [0,0, xt3]T. The 8 case studies

were simulated for various UAV TCS altitudes, i.e., for various values of xt3. Thus,

we obtained the maximum focal length allowed in the VTS scenario for various UAV

altitudes, under the assumption that target altitude remains approximately constant be-

tween successive video frames, i.e., qt3= 0. Target position at time t+ 1 is given

by: pt+1 = [qt1

F,qt2

F,0]T. The results are presented in Figure 5, where the horizontal

axis unit is meters and the vertical axis unit is millimetres. As expected, the maximum

focal length increases linearly with xt3. When the target is moving diagonally to the

TCS axes (cases 5 - 8) the maximum possible focal length is lower than in cases 1 -

4. Target motion along the j-axis (cases 3 and 4) and target linear acceleration (cases

1 and 2) have similar effect on the maximum allowed focal length, since the UAV is

positioned exactly above the target.

3.2.3. Moving Aerial Pan with Moving Target/Moving Aerial Tilt with Moving Target

Given the mathematical description for MAPMT/MATMT in (4) and the fact that

the target is moving along the i-axis, we can assume that xt+1 = [xt1, xt2+vt2

F, xt3]T

for MAPMT and xt+1 = [xt1+vt1

F, xt2, xt3]Tfor MATMT. For the UAV position at

time instance t+ 1, the target position in the next video frame is given by Eq. (33). By

substituting xt+1 in Eq. (34), the following relations hold:

fmax =Rmaxdmp sxsy| − Emp1+Fkxt+1 k2|

qs2

xE2

mp1x2

t3+s2

yE2

mp2kxt+1 k2

(38)

fmax =Rmaxdmt sxsy| − Emt1+Fkxt+1 k2|

qs2

xE2

mt1x2

t3+s2

yE2

mt2kxt+1 k2

(39)

for MAPMT and MATMT, respectively, where:

dmp =rx2

t1+ (xt2+vt2

F)2,

Emp1=qt1xt1+qt2(xt2+vt2

F),

Emp2=qt2xt1+qt1(xt2+vt2

F),

dmt =rx2

t2+ (xt1+vt1

F)2,

Emt1=qt2xt2+qt1(xt1+vt1

F),

Emt2=qt1xt2+qt2(xt1+vt1

F).

For simulation purposes, fmax was studied for varying distances between the target

and the UAV, corresponding to consecutive time instances of the UAV/camera motion

type execution. The following initial values were selected: x01 = 30 m,x02 =−60 m

(MAPMT), x01 =−60 m,x02 = 30 m(MATMT), x03 = 10 m,vt2= 10 m

s(both).

The similarities between Figures 6 and 7, for MAPMT and MATMT, respectively, are

evident. As expected, cases 1, 2/3, 4 of MAPMT correspond to cases 3, 4/1, 2 of

MATMT, since these two motion types differ only in the UAV motion direction: it is

parallel to the j-axis/i-axis in MAPMT/MATMT, respectively. The impact on fmax

for target motion deviation along the TCS j-axis for MAPMT will be the same as the

impact for target motion deviation along the TCS i-axis for MATMT, and vice versa,

as Figure 8 demonstrates. Therefore, cases 5, 6 and 7, 8 produce identical results in

both motion types.

Studying the results of cases 1 and 2 for MAPMT and cases 3 and 4 for MATMT,

fmax takes its maximum value when xt2= 0 and xt1= 0, respectively. The reason

is that, in these positions, the UAV in MAPMT is above the i-axis, while in MATMT

above the j-axis, thus any deviations in target motion affect minimally the ROI location

in the next video frame. On the other hand, in all other cases, these UAV positions are

approximately where any target motion deviations have the greatest impact on the next

ROI location.

-50 0 50

xt2 (m)

500

1000

1500

maximum focal length (mm)

Cases 1, 2

Cases 3, 4

Cases 5, 6

Cases 7, 8

Figure 6: Simulation results for MAPMT: fmax against xt2.

-50 0 50

xt1 (m)

500

1000

1500

maximum focal length (mm)

Cases 1, 2

Cases 3, 4

Cases 5, 6

Cases 7, 8

Figure 7: Simulation results for MATMT: fmax against xt1.

Figure 8: Target velocity deviation vectors as seen from the UAV camera, when the camera axis lies on: a)

the j-axis and b) the i-axis. Black dot denotes target expected position. Black vectors correspond to cases 1

and 2, grey vectors to cases 3 and 4 and, ﬁnally, the dashed lined vectors to cases 5-8. In a) target velocity

deviation on the j-axis will affect less the fmax than target linear speed changes, while in b) the opposite.

3.2.4. Fly-By/Fly-Over

In these motion types, where shot duration is speciﬁed by K, we can determine the

maximum focal length directly over time (t∈[0,2K]). For FLYBY, the UAV position

in TCS is given by xt+1 = [−x01

Kt+x01, x02 , x03]T. We study these motion types

together, since FLYOVER is a special case of FLYBY, where x02 = 0.

By substituting xt+1 in Eq. (34), fmax is given by:

fmax =Rmaxdf b sxsy| − Ef b1+Fkxt+1 k2|

qs2

xE2

fb1x2

t3+s2

yE2

fb2kxt+1 k2

,(40)

fmax =Rmaxdf o1sxsy| − Ef o1+Fkxt+1 k2|

qs2

xE2

fo1x2

t3+s2

yE2

fo2kxt+1 k2

,(41)

for FLYBY and FLYOVER, respectively, where:

dfb =r(−x01

Kt+x01)2+x2

02,

Efb1=qt1(−x01

Kt+x01) + qt2xt2,

Efb2=qt2(−x01

Kt+x01)−qt1xt2,

dfo =|(−x01

Kt+x01)|,

Efo1=qt1(−x01

Kt+x01),

Efo2= (qt2(−x01

Kt+x01)).

The following parameter values where chosen for the simulation: x01 =−30 m,x03 =

10 m,K= 10, thus t∈[0,20]. Additionally, x02 = 15 mfor FLYBY. Results

are shown in Figures 9 and 10, for FLYBY and FLYOVER, respectively. The gap in

FLYOVER for t= 10 stems from the fact that the UAV is actually above the target

and, thus, the motion type is momentarily converted to VTS.

In cases 1 and 2, both motion types produce similar results. As the UAV approaches

the target, the maximum focal length decreases, before increasing again as the UAV is

ﬂying parallel to the i-axis. When the drone is positioned far from the target, any

change in target speed corresponds to a small change in the distance between the UAV

and the target.

In general, for cases 3 and 4 of FLYBY, where the target deviates from its expected

position but remains on the j-axis, fmax increases with rising distance between the

UAV and the target. Additionally, fmax also slightly increases when the UAV is very

close to the target. Then, the latter’s velocity deviation corresponds to a small change

in distance between the target and the UAV, mapped to a small ROI displacement and,

thus, greater focal length tolerance. In FLYOVER, where any deviation of the target

motion on the j-axis will always displace the target ROI to the left or right of the video

frame, fmax is signiﬁcantly smaller for cases 3 and 4.

Finally, in cases 5-8 of FLYBY, fmax depends on the angle between the LookAt

vector and the i-axis: it has lower values when this angle is close to π

2(t= 10 in

the simulation). In FLYOVER, the overall minimum values of fmax are also obtained

for cases 5-8 when t= 10, since, then, the 3D distance between the expected and

the actual target position is slightly greater compared to cases 1-4, as it can be seen in

Figure 3, leading to greater 2D ROI displacement.

3.2.5. Chase

The focal length constraint for this motion type is a special case of Eq. (34) where

xt2= 0. Since the UAV is always located in front of/behind the target and at a steady

distance, its position at time instance t+ 1 is given by xt+1 = [xt1,0, xt3]T. Target

position in the next time instance is given by Eq. (33). By combining (33) and (34),

0 5 10 15 20

time (seconds)

200

400

600

800

maximum focal length (mm)

Cases 1, 2

Cases 3, 4

Cases 5, 6

Cases 7, 8

Figure 9: Simulation results for FLYBY: fmax over time t.

0 5 10 15 20

time (seconds)

200

400

600

800

1000

1200

maximum focal length (mm)

Cases 1, 2

Cases 3, 4

Cases 5, 6, 7, 8

Figure 10: Simulation results for FLYOVER: fmax over time t.

10 20 30 40 50 60

xt1 (m)

1000

2000

3000

4000

maximum focal length (mm)

Cases 1, 2

Cases 3, 4, 5, 6, 7 ,8

Figure 11: Simulation results for CHASE: fmax against distance from target.

the following relation holds:

fmax =Rmaxsxsyφc| − F φ2

c+xt1qt1|

xt1qs2

yφ2

cq2

t2+s2

xx2

t3q2

,(42)

where

φc=qx2

t1+x2

t3.(43)

For simulation purposes, we studied fmax using varying distances between the

target and the UAV, as well as constant TCS altitude (xt3= 10 m). The results are

shown in Figure 11. As expected, the maximum focal length increases with rising

distance between the UAV and the target. In cases 1 and 2, fmax is much larger than

in the other cases, since an increase or a decrease of the target speed will simply move

the target slightly away or closer to the UAV. When distance between the UAV and

the target is increased, the target has to deviate more from its expected position, so

that Rt+1 > Rmax in the next video frame. This is due to the fact that target speed

deviation has less effect on target position in the next video frame, as this UAV/camera

motion type starts to produce a visual result similar to that of LTS, but with the UAV

located ahead/behind the target.

On the contrary, for cases 3 and 4 where the target deviates along the j-axis in

the next video frame, this UAV/camera motion type is highly affected. As Figure 8b

demonstrates, if the target moves along the j-axis, the ROI center in the next video

frame is displaced according to target motion velocity deviation. However, this dis-

placement is also inversely proportional to the distance between the target and the

UAV/camera, due to perspective projection. Thus, lower focal length tolerances and

a more linear increase in fmax as xt1rises is expected. Similar conclusions can be

drawn for cases 5 - 8.

3.2.6. Orbit

For the ORBIT motion type, the target position is given by Eq. (33). By using Eqs.

(17) - (21), fmax is given by substituting

xt+1 = [λcos ( ω

F+θ0), λ sin ( ω

F+θ0), xt3]T(44)

in (34):

fmax =Rmaxdor sxsy| − Eor1+Fkxt+1 k2|

qs2

xE2

or1x2

t3+s2

yE2

or2kxt+1 k2

,(45)

where:

dor =r(λcos ( ω

F+θ0))2+ (λsin ( ω

F+θ0))2,

Eor1=qt1λcos ( ω

F+θ0) + qt2λsin ( ω

F+θ0),

Eor2=qt1λsin ( ω

F+θ0) + qt2λcos ( ω

F+θ0).

The following parameter values where used in the simulations: λ= 30 m, x03 =

10 m, ω=π

20 rad/sec. The results are depicted in Figure 12. The horizontal axis

represents the current θ0, i.e., the angle denoting the current UAV position relative

to the target along a circular trajectory. The estimated fmax complies with intuitive

expectations in all cases. For instance, in case 1, the target linearly accelerates. If

the UAV lies exactly behind the target (θ0= 0◦), fmax takes its maximum value,

since, from that perspective, a linear acceleration will not signiﬁcantly alter the target

ROI center pixel coordinates. In contrast, linear acceleration will have a much greater

impact from a lateral perspective (θ0= 90◦). Indeed, fmax takes its minimum value

in this case. As expected, fmax varies periodically as the UAV view changes from a

lateral one to a collinear one and vice versa. Similar conclusions can be drawn for the

scenario of linear target deceleration (case 2), where the target trajectory also remains

identical to the expected one.

In cases 3 and 4, if the UAV is positioned collinearly to the estimated target velocity

vector (θ0= 0◦), it has in fact a lateral view of the actual target motion. If it is

positioned perpendicularly to the estimated velocity vector (θ0= 90◦), it has in fact

a collinear (frontal/rear) view of the actual target motion. Therefore, the plots of the

cases 1, 2 and of the cases 3, 4 have a relative phase difference of π

2, as one would

expect.

As shown in Figure 12, in cases 5 and 6, where the target moves diagonally to

its expected trajectory, the corresponding plots have an absolute phase difference of

8relative to the previously described plots. Additionally, the fmax values are lower

than those of cases 3 and 4. These observations are reasonable, since, when θ0= 45◦,

the UAV has in fact a frontal/rear view of the actual target motion. Also, this scenario

presents the greatest difference (in pixel coordinates) between the expected and the

actual target ROI center location. Therefore, greater limitations are naturally imposed

on fmax, so that 2D visual tracking is successful.

0 50 100 150 200

0 (degrees)

200

400

600

800

1000

1200

maximum focal length (mm)

Cases 1, 2

Cases 3, 4

Cases 5, 6

Cases 7, 8

Figure 12: Simulation results for ORBIT motion type: fmax against θ0.

Finally, cases 7 and 8 produce similar results, since the target again moves diago-

nally to the TCS axes. However, when compared to cases 5 and 6, the perpendicularity

of the motion directions leads to a phase difference of π

4. Shot Type Feasibility

In cinematography planning, it is important to be able to determine whether a de-

sired shot type is feasible, given a speciﬁc camera motion type and the target’s physical

dimensions. The shot type is primarily deﬁned by the ratio of the target ROI height to

the video frame height, therefore, it is linked to the video frame area being covered by

the target ROI. Thus, below, video frame coverage refers to the ROI-to-video-frame-

height ratio.

In order to examine the feasibility of a shot type, the appropriate focal length fs

leading to the desired target video frame coverage must be calculated. For motion

types where the distance between the target and the UAV varies over time, keeping a

constant target video frame coverage by constantly adjusting the camera focal length

simulates the cinematographic “dolly zoom” effect [5].

The shot type can be achieved without risking 2D visual tracking failure, if the

following relation holds:

fs≤fmax (46)

In order to calculate the appropriate fsfor achieving the shot types described in

Section 2 with respect to the desired UAV/camera motion type, we model the target as

a sphere, with its center located at the TCS point [0,0,0]Tand having constant radius

Rt. Simple sphere-modelling allows us to consider its image on the video frame as a

circle, with no perspective distortion when lt= [0,0,0]T.

This rather simplistic target volume modelling facilitates us in deriving closed

forms for fs, without much deviation from reality when the object is not very ﬂat-

tened. In the case of signiﬁcantly ﬂattened targets, which could be better modelled

with a rectangular parallelepiped, sphere-based modelling results in an overestimation

of fs. Then, a simple solution is to perform the same analysis considering three differ-

ent sphere radii, i.e., one for each parallelepiped dimension, and use either their mean,

their maximum or their minimum. However, in the case of human heads, which is

very important in cinematic media imaging, simple bounding sphere-based modelling

is already quite accurate.

Below, the deviation vector qtis assumed to be equal to [0,0,0]Tfor the desired

fscalculations. Thus, no target motion deviations are taken into consideration, since

they do not signiﬁcantly affect the resulting video frame coverage percentage.

4.1. Constant target video frame coverage

Determining the video frame coverage for every UAV/camera motion type would

normally include projecting the target sphere onto the video frame, ﬁnding the cor-

responding radius of the projected circle and computing the resulting coverage. This

requires a search for the radius of the projected circle. The parameters determining

the video frame coverage are the distance between UAV/camera and target, the camera

focal length fand the physical target dimensions. Thus, without loss of generality,

instead of directly projecting the target onto the current image plane, we determine the

video frame coverage as if the UAV/camera was positioned exactly above the target in

an altitude equal to the actual distance between them. Thus, it is trivial to ﬁnd a 3D

point being projected on the target image circle. Then, the latter’s radius is the distance

between the projection of the above 3D point and the principal point. This projec-

tion can be obtained by Eqs. (22) and (23) in pixel coordinates. The corresponding

continuous coordinates of xim and yim on the image sensor are given by:

xim =xdsx, yim =ydsy.(47)

Thus, the video frame coverage percentage for the circular target ROI is given by:

cs=2Rim

Hsy

, Rim =qx2

im +y2

im.(48)

where His the height of the video frame in pixels and sythe physical height of one

pixel.

The above equations can be further simpliﬁed by deﬁning Rim as the perspective

projection of pr= [Rt,0,0]T(in TCS), where Rtis target radius, and by positioning

the UAV/camera at x0=xt+1 = [0,0, zd]Twhere zd=px2

t01+x2

t02+x2

t03is the

distance between the target and the camera. Then, yim = 0, thus, Rim =xim and:

xim =1

2csHsy(49)

By utilizing Eqs. (22) and (47), and setting ox= 0:

xim =−fs

r1(pr−x0)

r3(pr−x0).(50)

The rotation matrix in this case is described by Eq. (35), and the appropriate focal

length can be obtained by:

fs=csHsyzd

2Rt

.(51)

Table 3: Shot type feasibility for UAV/camera motion types with constant distance from the target.

Motion type min fmax fs, when cs= 25% fs, when cs= 85%

LTS 194.4mm 78.57 mm 267.14 mm

CHASE 142.4mm 78.57 mm 267.14 mm

ORBIT 241.5mm 78.57 mm 267.14 mm

0 20 40 60 80 100

xt3 (m)

200

400

600

800

1000

maximum focal length (mm)

Minimum f max

fs for c s = 25 %

fs for c s = 85 %

Figure 13: Maximum focal length fmax and fsfor Medium Shot and Close-Up Shot, against various UAV

altitude, for VTS.

4.2. Simulations for constant target video frame coverage

In order to investigate the target tracking feasibility for speciﬁc shot type-UAV/camera

motion type combinations, one can repeat the simulations described in Section 3.2 and

determine if the desired fsis below the minimum value of fmax for all cases. A triv-

ial addition, which is omitted here for brevity, would include a check for violations of

lens-speciﬁc upper/lower focal length limits.

For the UAV/camera motion types where the distance between the camera and the

target remains constant (i.e., CHASE, ORBIT, LTS), the desired fsis also constant

for the entire shot. On the contrary, when the distance between the target and the

UAV/camera varies (i.e., MAPMT, MATMT, FLYBY, FLYOVER, VTS), the appropri-

ate fsvaries correspondingly. Although VTS is normally a UAV/camera motion type

where the distance between the UAV and the target remains constant, it was studied for

varying zdin our simulations. Hence, in the ﬁrst group of camera motion types, shot

feasibility can be determined simply by two values, the minimum fmax and the desired

fs. In the second group, feasibility should be examined for the entire shot duration, or

for a range of zdvalues in the case of VTS.

For simulation purposes, we assume a sphere-shaped target positioned in p=

[0,0,0]T(in TCS), with radius Rt= 1 m(e.g., a racing bicycle during sports event

coverage). In all motion types, the UAV and target position/motion/deviation proper-

ties comply with the descriptions in Section 3.2. In addition, the video frame resolution

was set to W= 1280 pixels and H= 720 pixels. Simulations were carried out for

two desired video frame coverage percentages, i.e., cs= 25% and cs= 85%, corre-

0 5 10 15 20

time (seconds)

100

150

200

250

300

350

maximum focal length (mm)

Minimum f max

fs for c s = 25 %

fs for c s = 85 %

Figure 14: Maximum focal length fmax and fsfor Medium Shot and Close-Up Shot, against time t, for

FLYBY.

0 5 10 15 20

time (seconds)

100

150

200

250

300

350

maximum focal length (mm)

Minimum f max

fs for c s = 25 %

fs for c s = 85 %

Figure 15: Maximum focal length fmax and fsfor Medium Shot and Close-Up Shot, against time t, for

FLYOVER.

-50 0 50

xt2 (m)

100

200

300

400

500

600

700

maximum focal length (mm)

Minimum f max

fs for c s = 25 %

fs for c s = 85 %

Figure 16: Maximum focal length fmax and fsfor Medium Shot and Close-Up Shot, against various UAV

positions, for MAPMT.

-50 0 50

xt1 (m)

100

200

300

400

500

600

700

maximum focal length (mm)

Minimum f max

fs for c s = 25 %

fs for c s = 85 %

Figure 17: Maximum focal length fmax and fsfor Medium Shot and Close-Up Shot, against various UAV

positions, for MATMT.

sponding to a Long Shot and a Close-Up Shot, respectively. Table 3 indicates that a

Long Shot is achievable for the UAV/camera motion types CHASE, ORBIT and for

LTS, while a Close-Up not feasible for any of these motion types.

For VTS, FLYBY, FLYOVER, MAPMT and MATMT the results are presented in

Figures 13, 14, 15, 16 and 17 respectively. In these motion types, a Long Shot is

achievable at all times (fs< fmax), but a Close-Up could cause visual tracking failure

in the presence of target velocity deviations.

The simulation results lead to the conclusion that 2D visual tracking of a real target

is indeed a fairly challenging task at greater zoom levels, if the target deviates non-

negligibly from the expected position on the next video frame.

4.3. Maximum permissible velocity deviation vector

By inverting the analysis made for fmax and ﬁxing focal length to the fsneeded

for a speciﬁc shot type, we can deﬁne the maximum permissible norm of the target

velocity deviation vector qt= [qt1, qt2,0]T. This way, one can pre-determine whether

a shot type is feasible from known/expected target/target route characteristics.

Below, we assume for simplicity that:

qt=qt1=qt2,(52)

to demonstrate the process. By denoting t0=t+ 1, then qtis given by solving the

following equation, derived from Eq. (34):

(f2

sDq−A2

qB2

q)q2

t+ 2A2

qBqCqqt−A2

qC2

q= 0,(53)

where Aq=Rmaxdt0sxsy,Bq=xt01+xt02,Cq=Fkxt0k2and Dq=s2

xx2

t03B2

ykxt0k2(xt01−xt02)2.

When qt>0, as in case 5 of the performed simulations, qtcan be directly obtained

by:

qt=AqFkxt0k2

fspDq+Aq(xt01+xt02).(54)

The maximum qtcan be obtained similarly for other cases and UAV/camera motion

types, in order to estimate the range of permissible target velocity deviations for a

speciﬁc shot type-UAV/camera motion type combination.

4.4. AirSim simulations for evaluating shot feasibility rules

In order to evaluate the presented shot feasibility rules under actual media produc-

tion conditions, a realistic simulation was developed that implements the platform setup

discussed thus far and incorporates the proposed rules. To this end, AirSim [33] was

employed, i.e., an open source, highly realistic UAV simulation environment (based on

the Unreal 4 real-time 3D graphics engine). For the evaluation purposes two differ-

ent scenarios were developed (bike and track and ﬁeld scenarios). In both scenarios,

the generated shots involve a moving target (cyclist or running athlete) and a UAV

equipped with a cinematographic camera, controlled by an API script, that follows the

target according to the desired shot type/camera motion type combination. Snapshots

Figure 18: Snapsot from the synthetic, realistic evaluation environment. The UAV follows the target (bicycle)

while performing an ORBIT motion type. The focal length of the camera is set to 50mm, resulting in a Long

Shot shot type.

Figure 19: Snapsot from the scenario in the synthetic, realistic evaluation environment. The UAV follows a

running athlete while performing an ORBIT motion type.

from the generated footage are depicted in Figures 18 and 19, while an example 2D

plot of the target and UAV trajectories, during an ORBIT, are shown in Figure 20.

The various parameters (e.g., focal length, UAV height, initial position relative to

target etc.) were set similarly to the evaluation in Section 3.2. Rmax was set adaptively

to min(1

2H, wk

syRim), where the latter term is the search region size, deﬁned by the 2D

target ROI radius (in pixels) 1

syRim, a constant scaling factor w(set here to 1.5, as is

the default value in [12]) and a varying scaling factor k∈[0,1] that shrinks the search

region according to the proximity of the current ROI to the video frame borders, so as

to restrict out-of-frame ROI translations that would cause 2D tracker drift and gimbal

control failure.

Datasets created in such a manner can produce fully accurate results for both the

target and UAV 3D location. However, this is not in line with a real-world scenario

involving noisy GPS sensors. Thus, the 3D positions of both the target and UAV for

every time instance twere distorted according to a Gaussian noise distribution, so as

to simulate GPS measurements.

The experiments were carried out for all motion types, while attempting to achieve

three different shot types: Long Shot (LS), Medium Close-Up (MCU) and Close-Up

-500 0 500 1000

-1000

-500

500

UAV trajectory

Target trajectory

Figure 20: 2D plot of the UAV and target trajectories in WCS, during an ORBIT session in the AirSim

simulator.

(CU). For evaluation purposes, we obtained the noisy 3D positions of both the target

and the UAV at every time instance t. Additionally, the previous noisy 3D position of

the target (from time instance t−1) was employed to calculate its velocity. Assuming

that the target will follow momentarily a linear trajectory, we estimate its 3D position

in the next time instance (t0=t+ 1) and adjust the UAV motion, so that the desired

central composition framing is maintained. Then, at time instance t0, we compare the

2D projection of the estimated 3D target position with the 2D projection of the ground-

truth 3D target position. If the distance of the two ROI center points, Rf, is above the

Rmax limit, ground-truth tracking failure is assumed (Rf> Rmax). This is then com-

pared with the predictions of Eqs. (32) for the current maximal permissible focal length

and (51) for the desired one, regarding the current shot’s feasibility, given the noisy 3D

positions of the target and the UAV, the calculated target velocity and the estimated

target position on the next video frame. By employing the above the proposed method

assumes tracking failure when the desired focal length given by Eq. (51) is greater than

the result of Eq. (32), as described by Eq. (46). The velocity deviation vector qtin

Eq. (32) is simply calculated as the difference between the estimated target velocity at

time instance t−1and the actual target velocity at time instance t(distorted by noise).

Therefore, a reasonable assumption of temporally localized constant target accelera-

tion is made. Thus, true/false positive/negative prediction labels (TP, FP, TN, FN) are

computed for each time instance. Then, precision is calculated as P=T P

T P +F P , recall

rate R=T P

T P +F N and F-Measure as F=2T P

2T P +F P +F N .

In the ﬁrst evaluation scenario of cycling, the mean precision, recall and F-Measure

of the proposed rules over all motion types were 0.929,0.994 and 0.960, respectively.

Table 4 depicts the evaluation results per shot type, while Figure 21 contains the F-

Measure box-plots for all motion types, separately for each shot type. In the second

scenario of the running athlete, the mean precision, recall and F-Measure were 0.961,

0.927,0.995 while the individual results per shot types are depicted in Table 5. Figure

22 demonstrates the F-Measure box-plots for all motion types in the second scenario,

MCU

0.8

0.9

Figure 21: Box-plot of F-Measure for the three different shot types in the AirSim cycling evaluation test.

The line inside the boxes demonstrates the median value in each case. Overall, CHASE performed the best

and FLYOVER the worst.

MCU

0.8

0.9

Figure 22: Box-plot of F-Measure for the three different shot types in the AirSim track and ﬁeld evaluation

test. The line inside the boxes demonstrates the median value in each case. Overall, VTS performed the best

and LTS the worst.

Table 4: Mean evaluation results for the proposed shot feasibility rules over all motion types, in the realistic

AirSim cycling setup.

Shot type F-Measure Precision Recall

LS 0.992 0.991 0.997

MCU 0.956 0.923 0.993

CU 0.926 0.872 0.990

Mean 0.960 0.929 0.994

Table 5: Mean evaluation results for the proposed shot feasibility rules over all motion types, in the realistic

AirSim track and ﬁeld setup.

Shot type F-Measure Precision Recall

LS 0.999 0.991 0.997

MCU 0.971 0.944 0.991

CU 0.913 0.845 0.999

Mean 0.961 0.927 0.995

separated per shot type.

In addition, the target ROI size calculation methodology was evaluated. As already

mentioned, we treat the target as a sphere-shaped object in order to derive the desired

focal length fs. This can lead to approximation errors in video frame coverage estima-

tion, especially with ﬂattened targets. The focal length necessary to keep the desired

shot type was calculated for each video frame, using the noisy 3D UAV and target

positions, as well as the target ROI prediction for the next video frame.

The actual ROI-to-video-frame-height ratio was calculated at each time instance

and compared with the desired value of cs, as deﬁned by each shot type. Figure 23 de-

picts the distribution of the actual video frame coverage vs the estimated one. Despite

variations in the actual target ROI size, the proposed fscalculation manages to keep

the estimated target ROI size within the video frame coverage range of the desired shot

type. Table 6 demonstrates the mean video frame coverage values for the three eval-

LS act

LS est

MCU act

MCU est

CU act

CU est

0.4

0.6

0.8

1CU

MCU

Figure 23: Box-plot of the estimated vs the actual target video frame coverage for the three desired framing

shot types. Despite the simple sphere-based target modeling and the target/UAV localization noise, the

estimated target ROI size lies within the range of the same shot type as the actual target ROI size.

Table 6: Desired, actual and estimated mean video frame coverage.

Shot type Desired csActual csEstimated cs

LS 0.3 0.307 0.310

MCU 0.6 0.606 0.620

CU 0.85 0.872 0.880

uated shot types, over all the simulated motion types. Desired csis the video frame

coverage percentage requested by the director, actual csis the video frame coverage

percentage achieved by the produced ROIs, while estimated csrefers to the coverage

percentage that would be achieved if ground-truth, non-noisy UAV and target 3D posi-

tions were available. The largest deviation is observed in the CU case where, as already

demonstrated in Section 4, target tracking is not feasible most of the time.

5. Conclusions

In this paper, a close examination of the shot type constraints arising in computer

vision-assisted UAV active target following for cinematography applications has been

performed.To this end, a number of industry-standard target-tracking UAV motion

types have been strictly deﬁned and geometrically modelled, while compatible shot

types have been identiﬁed for each case. Subsequently, maximum permissible cam-

era focal length, so that 2D visual tracking does not fail, as well shot type feasibility

conditions were analytically determined. The relevant derived formulas can be readily

employed as low-level rules in UAV intelligent shooting and cinematography planning

systems. Practical simulations showcase the validity of our ﬁndings, since results com-

ply with intuitive expectations in all cases.

Several extensions can be envisioned for the proposed rules. For instance, tighter

integration with a speciﬁc real-time 2D visual tracker may lead to improvements. Ad-

ditionally, since our formulas rely on the estimated velocity deviation vector qat each

time instance, learning to predict this vector from visual data (e.g., expected target

route) would be a promising avenue for future research. Such a prediction may con-

currently beneﬁt the 2D visual tracker itself, as in [17] [39].

6. Acknowledgement

Funding: The research leading to these results has received funding from the Euro-

pean Union’s Horizon 2020 research and innovation programme under grant agreement

No 731667 (MULTIDRONE). This publication reﬂects the authors’ views only. The

European Commission is not responsible for any use that may be made of the informa-

tion it contains.

References

[1] Computational UAV cinematography for intelligent shooting based on semantic

visual analysis.

[2] J. Angeles. Fundamentals of robotic mechanical systems, volume 2. Springer,

2002.

[3] I. Arev, H. S. Park, Y. Sheikh, J. K. Hodgins, and A. Shamir. Automatic editing

of footage from multiple social cameras. ACM Transactions on Graphics, 33(4):

81, 2014.

[4] S. Bhattacharya, R. Mehran, R. Sukthankar, and M. Shah. Classiﬁcation of cine-

matographic shots using lie algebra and its application to complex event recogni-

tion. IEEE Transactions on Multimedia, 16(3):686–696, 2014.

[5] B. Brown. Cinematography: Theory and Practice: Image Making for Cinematog-

raphers and Directors. Focal Press, 3rd edition, 2016.

[6] P. Carr, M. Mistry, and I. Matthews. Hybrid robotic/virtual pan-tilt-zom cam-

eras for autonomous event recording. In Proceedings of the ACM International

Conference on Multimedia. ACM, 2013.

[7] E. Cheng. Aerial Photography and Videography Using Drones. Peachpit Press,

2016.

[8] L.-Y. Duan, J. S. Jin, Q. Tian, and C.-S. Xu. Nonparametric motion characteri-

zation for robust classiﬁcation of camera motion patterns. IEEE Transactions on

Multimedia, 8(2):323–340, 2006.

[9] H. Fourati and D.E.C. Belkhiat. Multisensor Attitude Estimation: Fundamental

Concepts and Applications. CRC Press LLC, 2016.

[10] M. S. Grewal, L. R. Weill, and A. P. Andrews. Global Positioning Systems,

inertial navigation, and integration. John Wiley & Sons, 2007.

[11] M. A. Hasan, M. Xu, X. He, and C. Xu. CAMHID: Camera motion histogram

descriptor and its application to cinematographic shot classiﬁcation. IEEE Trans-

actions on Circuits and Systems for Video Technology, 24(10):1682–1695, 2014.

[12] J. F. Henriques, R. Caseiro, P. Martins, and J. Batista. High-speed tracking with

kernelized correlation ﬁlters. IEEE Transactions on Pattern Analysis and Ma-

chine Intelligence, 37(3):583–596, 2015.

[13] B. K. P. Horn. Closed-form solution of absolute orientation using unit quater-

nions. Journal of the Optical Society of America A, 4(4):629–642, 1987.

[14] X. Huang, R. Janaswamy, and A. Ganz. Scout: Outdoor localization using Ac-

tive RFID technology. In Proceedings of the IEEE Conference on Broadband

Communications, Networks and Systems (BROADNETS), pages 1–10, 2006.

[15] N. Joubert, M. Roberts, A. Truong, F. Berthouzoz, and P. Hanrahan. An interac-

tive tool for designing quadrotor camera shots. ACM Transactions on Graphics,

34(6):238, 2015.

[16] N. Joubert, D. B. Goldman, F. Berthouzoz, M. Roberts, J. A. Landay, and P. Han-

rahan. Towards a drone cinematographer: Guiding quadrotor cameras using vi-

sual composition principles. arXiv preprint arXiv:1610.01691, 2016.

[17] T. Li. Single-road-constrained positioning based on deterministic trajectory ge-

ometry. IEEE Communications Letters, 23(1):80–83, 2018.

[18] N. Liang, G. Wu, W. Kang, Z. Wang, and D. D. Feng. Real-time long-term

tracking with prediction-detection-correction. IEEE Transactions on Multimedia,

PP(99):1–1, 2018.

[19] C. Liu, P. Liu, W. Zhao, and X. Tang. Robust tracking and re-detection: Collab-

oratively modeling the target and its context. IEEE Transactions on Multimedia,

2017.

[20] I. Mademlis, V. Mygdalis, C. Raptopoulou, N. Nikolaidis, N. Heise, T. Koch,

J. Grunfeld, T. Wagner, A. Messina, F. Negro, S. Metta, and I. Pitas. Overview

of drone cinematography for sports ﬁlming. In European Conference on Visual

Media Production (CVMP) (short), 2017.

[21] I. Mademlis, V. Mygdalis, N. Nikolaidis, and I. Pitas. Challenges in Autonomous

UAV Cinematography: An Overview. In Proceedings of the IEEE International

Conference on Multimedia and Expo (ICME), 2018.

[22] I. Mademlis, V. Mygdalis, N. Nikolaidis, M. Montagnuolo, F. Negro, A. Messina,

and I. Pitas. High-level multiple-UAV cinematography tools for covering outdoor

events. IEEE Transactions on Broadcasting, 2019.

[23] I. Mademlis, N. Nikolaidis, A. Tefas, I. Pitas, T. Wagner, and A. Messina. Au-

tonomous UAV cinematography: A tutorial and a formalized shot type taxonomy.

ACM Computing Surveys, 2019. accepted for publication.

[24] I. Mademlis, N. Nikolaidis, A. Tefas, I. Pitas, T. Wagner, and A. Messina. Au-

tonomous unmanned aerial vehicles ﬁlming in dynamic unstructured outdoor en-

vironments. IEEE Signal Processing Magazine, 36(1):147–153, 2019.

[25] S. Minaeian, J. Liu, and Y.-J. Son. Effective and efﬁcient detection of moving

targets from a UAV’s camera. IEEE Transactions on Intelligent Transportation

Systems, 2018.

[26] P. P. Mohanta, S. K. Saha, and B. Chanda. A model-based shot boundary detection

technique using frame transition parameters. IEEE Transactions on Multimedia,

14(1):223–233, 2012.

[27] M. Mueller, N. Smith, and B. Ghanem. A benchmark and simulator for UAV

tracking. In Proceedings of the European Conference on Computer Vision

(ECCV). Springer, 2016.

[28] R. Mur-Artal and J. D. Tard´

os. ORB-SLAM2: an open-source SLAM system for

monocular, stereo and RGB-D cameras. arXiv preprint arXiv:1610.06475, 2016.

[29] R. Mur-Artal and J. D. Tard´

os. Visual-inertial monocular SLAM with map reuse.

IEEE Robotics and Automation Letters, 2(2):796–803, 2017.

[30] T. N¨

ageli, L. Meier, A. Domahidi, J. Alonso-Mora, and O. Hilliges. Real-time

planning for automated multi-view drone cinematography. ACM Transactions on

Graphics, 36(4):132:1–132:10, 2017.

[31] P. Nousi, E. Patsiouras, A. Tefas, and I. Pitas. Convolutional neural networks for

visual information analysis with limited computing resources. In Proceedings of

the IEEE International Conference on Image Processing (ICIP), 2018.

[32] P. Nousi, I. Mademlis, I. Karakostas, A. Tefas, and I. Pitas. Embedded UAV

Real-time Visual Object Detection and Tracking. In Proceedings of the IEEE

International Conference on Real-time Computing and Robotics (RCAR), 2019.

[33] S. Shah, D. Dey, C. Lovett, and A. Kapoor. AirSim: High-Fidelity Visual and

Physical Simulation for Autonomous Vehicles. In Proceedings of the Field and

Service Robotics Conference, 2017.

[34] C. Smith. The Photographer’s Guide to Drones. Rocky Nook, 2016.

[35] A. Torres-Gonz´

alez, J. Capit´

an, R. Cunha, A. Ollero, and I. Mademlis. A mul-

tidrone approach for autonomous cinematography planning. In Proceedings of

the Iberian Robotics Conference (ROBOT’), 2017.

[36] E. Trucco and A. Verri. Introductory Techniques for 3-D Computer Vision. Pren-

tice Hall, 1998.

[37] I. Tsingalis, A. Tefas, N. Nikolaidis, and I. Pitas. Shot type characterization in

2D and 3D video content. In Proceedings of the IEEE International Workshop on

Multimedia Signal Processing (MMSP), 2014.

[38] X. Wang, H. Zhu, D. Zhang, D. Zhou, and X. Wang. Vision-based detection and

tracking of a mobile ground target using a ﬁxed-wing UAV. International Journal

of Advanced Robotic Systems, 11, 2014.

[39] L. Xu, Y. Liang, Z. Duan, and G. Zhou. Route-based dynamics modeling and

tracking with application to air trafﬁc surveillance. IEEE Transactions on Intelli-

gent Transportation Systems, 2019.

[40] O. Zachariadis, V. Mygdalis, I. Mademlis, N. Nikolaidis, and I. Pitas. 2D visual

tracking for sports UAV cinematography applications. In Proceedings of the IEEE

Global Conference on Signal and Information Processing (GlobalSIP), 2017.

Optimizing UAV Photography: Strategies in Flight Control and Image Capture

Preprint

Full-text available

May 2024

This paper addresses the challenges faced by novice drone operators in mastering flight altitude, speed, and shooting angles. It analyzes strategies for adjusting these parameters to capture satisfactory photographs, focusing on four key questions. Firstly, it examines the geometric relationship between flight altitude and camera coverage area, establishing an optimization model for drone shooting accuracy. Secondly, it independently analyzes the numerical relationship between shooting angles and camera coverage area, identifying optimal shooting angles using a UAV shooting accuracy score optimization model. Thirdly, it develops a small UAV system model and ground target model, employing a recursive target tracking algorithm to continuously adjust shooting angles for target acquisition. Finally, it introduces a novel RRT* algorithm for path planning around obstacles encountered during flight. We use MATLAB to select a reasonable obstacle avoidance strategy, and the global optimal route is obtained by smoothing processing. Simulation results demonstrate model stability and robustness across varying flight conditions.

Review of Photogrammetric and Lidar Applications of UAV

Article

Full-text available

May 2023

Using Unmanned Aerial Vehicles (UAVs) combined with various sensors brings the benefits associated with fast, automatic, and contactless spatial data collection with high resolution and accuracy. The most frequent application is the possibility of effectively creating spatial models based on photogrammetric and lidar data. This review analyzes the current possibilities of UAVs. It provides an overview of the current state of the art and research on selected parameters regarding their history and development, classification, regulation, and application in surveying with creating spatial models. Classification and regulation are based on national sources. The importance and usability of this review are also carried out by analyzing the UAV application with selected photogrammetric and lidar sensors. The study explores and discusses results achieved by many authors in recent years, synthesizing essential facts. By analyzing the network of co-occurring High-Frequency Words, in addition, we visualized the importance of the primary keyword UAV in the context of other keywords in the literary sources processed.

Vision-based Drone Control for Autonomous UAV Cinematography

Article

Full-text available

Aug 2023
MULTIMED TOOLS APPL

One of the most important aesthetic concepts in autonomous Unmanned Aerial Vehicle (UAV) cinematography is the UAV/Camera Motion Type (CMT), describing the desired UAV trajectory relative to a (still or moving) physical target/subject being filmed. Usually, for the drone to autonomously execute such a CMT and capture the desired shot in footage, the 3D states (positions/poses within the world) of both the UAV/camera and the target are required as input. However, the target's 3D state is not typically known in non-staged settings. This paper proposes a novel framework for reformulating each desired CMT as a set of requirements that interrelate 2D visual information, UAV trajectory and camera orientation. Then, a set of CMT-specific vision-driven Proportional-Integral-Derivative (PID) UAV controllers can be implemented, by exploiting the above requirements to form suitable error signals. Such signals drive continuous adjustments to instant UAV motion parameters, separately at each captured video frame/time instance. The only inputs required for computing each error value are the current 2D pixel coordinates of the target's on-frame bounding box, detectable by an independent, off-the-shelf, real-time, deep neural 2D object detector/tracker vision subsystem. Importantly , neither UAV nor target 3D states are required ever to be known or estimated, while no depth maps, target 3D models or camera intrinsic parameters are necessary. The method was implemented and successfully evaluated in a robotics simulator, by properly reformulating a set of standard, formalized UAV CMTs.

Real-Time Mapping for Teleoperation Systems in VR of Unmanned Aerial Vehicles

Conference Paper

Full-text available

Jun 2024

This study presents the development of virtual environments as control centers for remote teleoperation tasks of unmanned aerial vehicles (UAV s), Initially, our focus lies on reconstructing outdoor environments using a ZED mini stereo camera and reconstructing them through point cloud techniques. The virtual environment is hosted in UNITY, a widely recognized platform for designing virtual reality (VR) video games. Within this environment, a digital twin UAV is embedded, tasked with replicating the real positions and orientations of the vehicle. To achieve this, we propose using PID and PD control for the positions and rotations of the virtual vehicle, allowing it to follow the desired position, in this case, the real position of the vehicle. A series of experiments were conducted in outdoor and indoor environments to validate the functionality of this approach.

Enhancing UAV Aerial Docking: A Hybrid Approach Combining Offline and Online Reinforcement Learning

Article

Full-text available

Apr 2024

In our study, we explore the task of performing docking maneuvers between two unmanned aerial vehicles (UAVs) using a combination of offline and online reinforcement learning (RL) methods. This task requires a UAV to accomplish external docking while maintaining stable flight control, representing two distinct types of objectives at the task execution level. Direct online RL training could lead to catastrophic forgetting, resulting in training failure. To overcome these challenges, we design a rule-based expert controller and accumulate an extensive dataset. Based on this, we concurrently design a series of rewards and train a guiding policy through offline RL. Then, we conduct comparative verification on different RL methods, ultimately selecting online RL to fine-tune the model trained offline. This strategy effectively combines the efficiency of offline RL with the exploratory capabilities of online RL. Our approach improves the success rate of the UAV’s aerial docking task, increasing it from 40% under the expert policy to 95%.

Deep Reinforcement Learning with semi-expert distillation for autonomous UAV cinematography

Conference Paper

Full-text available

Apr 2023

Unmanned Aerial Vehicles (UAVs, or drones) have revolutionized modern media production. Being rapidly deployable "flying cameras", they can easily capture aesthetically pleasing aerial footage of static or moving filming targets/subjects. Current approaches rely either on manual UAV/gimbal control by human experts, or on a combination of complex computer vision algorithms and hardware configurations for automating the flight+filming process. This paper explores an efficient Deep Reinforcement Learning (DRL) alternative, which implicitly merges the target detection and path planning steps into a single algorithm. To achieve this, a baseline DRL approach is augmented with a novel policy distillation component, which transfers knowledge from a suitable, semi-expert Model Predictive Control (MPC) controller into the DRL agent. Thus, the latter is able to autonomously execute a specific UAV cinematography task with purely visual input. Unlike the MPC controller, the proposed DRL agent does not need to know the 3D world position of the filming target during inference. Experiments conducted in a photorealistic simulator showcase superior performance and training speed compared to the baseline agent, while surpassing the MPC controller in terms of visual occlusion avoidance.

The secret of immersion: actor driven camera movement generation for auto-cinematography

Preprint

Mar 2023

Immersion plays a vital role when designing cinematic creations, yet the difficulty in immersive shooting prevents designers to create satisfactory outputs. In this work, we analyze the specific components that contribute to cinematographic immersion considering spatial, emotional, and aesthetic level, while these components are then combined into a high-level evaluation mechanism. Guided by such a immersion mechanism, we propose a GAN-based camera control system that is able to generate actor-driven camera movements in the 3D virtual environment to obtain immersive film sequences. The proposed encoder-decoder architecture in the generation flow transfers character motion into camera trajectory conditioned on an emotion factor. This ensures spatial and emotional immersion by performing actor-camera synchronization physically and psychologically. The emotional immersion is further strengthened by incorporating regularization that controls camera shakiness for expressing different mental statuses. To achieve aesthetic immersion, we make effort to improve aesthetic frame compositions by modifying the synthesized camera trajectory. Based on a self-supervised adjustor, the adjusted camera placements can project the character to the appropriate on-frame locations following aesthetic rules. The experimental results indicate that our proposed camera control system can efficiently offer immersive cinematic videos, both quantitatively and qualitatively, based on a fine-grained immersive shooting. Live examples are shown in the supplementary video.

An Extended Model for the UAVs-Assisted Multiperiodic Crowd Tracking Problem

Article

Full-text available

Feb 2023
COMPLEXITY

The multiperiodic crowd tracking (MPCT) problem is an extension of the periodic crowd tracking (PCT) problem, recently addressed in the literature and solved using an iterative solver called PCTs solver. For a given crowded event, the MPCT consists of follow-up crowds, using unmanned aerial vehicles (UAVs) during different periods in a life-cycle of an open crowded area (OCA). Our main motivation is to remedy an important limitation of the PCTs solver called “PCTs solver myopia” which is, in certain cases, unable to manage the fleet of UAVs to cover all the periods of a given OCA life-cycle during a crowded event. The behavior of crowds can be predicted using machine learning techniques. Based on this assumption, we proposed a new mixed integer linear programming (MILP) model, called MILP-MPCT, to solve the MPCT. The MILP-MPCT was designed using linear programming technique to build two objective functions that minimize the total time and energy consumed by UAVs under a set of constraints related to the MPCT problem. In order to validate the MILP-MPCT, we simulated it using IBM-ILOG-CPLEX optimization framework. Thanks to the “clairvoyance” of the proposed MILP-MPCT model, experimental investigations show that the MILP-MPCT model provides strategic moves of UAVs between charging stations (CSs) and crowds to provide better solutions than those reported in the literature.

Unmanned Aerial Vehicles in Sustainable Smart Cities

Chapter

Sep 2023

In this chapter, we introduced an unmanned aircraft system (UAS)/unmanned ariel vehicle (UAV) that serves various purposes such as monitoring and regulating urban development, transportation, monitoring air quality, infrastructure inspection, emergency services, and communication establishment. These UAVs have proven to be valuable assets in the context of smart cities, offering a range of potential applications supported by key technologies. One notable application is the delivery of goods, which not only reduces traffic congestion but also minimizes emissions compared to traditional delivery vehicles. In addition, it provides real-time data to service provider and governing authorities of smart cities for better decision-making and resource allocation. This contributes to create more sustainable, efficient, and equitable smart cities. The main purpose is to introduce UAVs in smart cities for several potential applications along with the key technologies supporting. Throughout this chapter, we discuss how UAVs address challenges and build a smart city sustainable. Finally, we evaluate UAV prospects and look forward to its future research scopes.

Secure Communications for Autonomous Multiple-UAV Media Production

Chapter

Full-text available

Apr 2023

Equipping Unmanned Aerial Vehicles (UAVs/drones) with professional cameras has rapidly transformed the media production landscape in recent years. However, their creative potential in aerial cinematography applications can only be fully exploited by enhancing their cognitive autonomy and deploying them in a collaborative, multi-drone fleet setting. Thus, networking, security and data streaming issues arise naturally. In this Chapter, we assume a stand-alone UAV fleet coordinated in real-time by a central on-ground compute station for live outdoor event media coverage, with high-definition, low-latency video streaming from many moving sources. This is the most general and technically difficult filming scenario: on top of security concerns, fluctuations in wireless signal power inevitably make stable wireless communications a real challenge with current technology. Motivated by these difficulties, we designed and evaluated a novel multiple-UAV platform for live outdoor media production, featuring a communications architecture able to handle and overcome the relevant communication issues. Both 4G/LTE and WiFi are utilized to make this infrastructure easy to deploy, secure and robust, as indicated by the included empirical evaluation. Notably, this is an innovative, prototype platform: the first one specifically designed for handling difficult professional filming scenarios with multiple autonomous UAVs.

Autonomous UAV Cinematography: A Tutorial and a Formalized Shot-Type Taxonomy

Article

Full-text available

Sep 2019

The emerging field of autonomous UAV cinematography is examined through a tutorial for non-experts, which also presents the required underlying technologies and connections with different UAV application domains. Current industry practices are formalized by presenting a UAV shot-type taxonomy composed of framing shot types, single-UAV camera motion types, and multiple-UAV camera motion types. Visually pleasing combinations of framing shot types and camera motion types are identified, while the presented camera motion types are modeled geometrically and graded into distinct energy consumption classes and required technology complexity levels for autonomous capture. Two specific strategies are prescribed, namely focal length compensation and multidrone compensation, for partially overcoming a number of issues arising in UAV live outdoor event coverage, deemed as the most complex UAV cinematography scenario. Finally, the shot types compatible with each compensation strategy are explicitly identified. Overall, this tutorial both familiarizes readers coming from different backgrounds with the topic in a structured manner and lays necessary groundwork for future advancements.

Embedded UAV Real-Time Visual Object Detection and Tracking

Conference Paper

Full-text available

Aug 2019

The use of camera-equipped Unmanned Aerial Vehicles (UAVs, or "drones") for a wide range of aerial video capturing applications, including media production, surveillance, search and rescue operations, etc., has exploded in recent years. Technological progress has led to commercially available UAVs with a degree of cognitive autonomy and perceptual capabilities, such as automated, on-line detection and tracking of target objects upon the captured footage. However, the limited computational hardware, the possibly high camera-to-target distance and the fact that both the UAV/camera and the target(s) are moving, makes it challenging to achieve both high accuracy and stable real-time performance. In this paper, the current state-of-the-art on real-time object detection/tracking is overviewed. Additionally , a relevant, modular implementation suitable for on-drone execution (running on top of the popular Robot Operating System) is presented and empirically evaluated on a number of relevant datasets. The results indicate that a sophisticated, neural network-based detection and tracking system can be deployed at real-time even on embedded devices.

Multisensor Attitude Estimation: Fundamental Concepts and Applications

Book

Full-text available

Nov 2016

There has been an increasing interest in multi-disciplinary research on multisensor attitude estimation technology driven by its versatility and diverse areas of application, such as sensor networks, robotics, navigation, video, biomedicine, etc. Attitude estimation consists of the determination of rigid bodies’ orientation in 3D space. This research area is a multilevel, multifaceted process handling the automatic association, correlation, estimation, and combination of data and information from several sources. Data fusion for attitude estimation is motivated by several issues and problems, such as data imperfection, data multi-modality, data dimensionality, processing framework, etc. While many of these problems have been identified and heavily investigated, no single data fusion algorithm is capable of addressing all the aforementioned challenges. The variety of methods in the literature focus on a subset of these issues to solve, which would be determined based on the application in hand. Historically, the problem of attitude estimation has been introduced by Grace Wahba in 1965 within the estimate of satellite attitude and aerospace applications. This book intends to provide the reader with both a generic and comprehensive view of contemporary data fusion methodologies for attitude estimation, as well as the most recent researches and novel advances on multisensor attitude estimation task. It explores the design of algorithms and architectures, benefits, and challenging aspects, as well as a broad array of disciplines, including: navigation, robotics, biomedicine, motion analysis, etc. A number of issues that make data fusion for attitude estimation a challenging task, and which will be discussed through the different chapters of the book, are related to: 1) The nature of sensors and information sources (accelerometer, gyroscope, magnetometer, GPS, inclinometer, etc.); 2) The computational ability at the sensors; 3) The theoretical developments and convergence proofs; 4) The system architecture, computational resources, fusion level.

High-Level Multiple-UAV Cinematography Tools for Covering Outdoor Events

Article

Full-text available

Jan 2019

Camera-equipped unmanned aerial vehicles (UAVs), or "drones," are a recent addition to standard audiovisual shooting technologies. As drone cinematography is expected to further revolutionize media production, this paper presents an overview of the state-of-the-art in this area, along with a brief review of current commercial UAV technologies and legal restrictions on their deployment. A novel taxonomy of UAV cinematography visual building blocks, in the context of filming outdoor events where targets (e.g., athletes) must be actively followed, is additionally proposed. Such a taxonomy is necessary for progress in intelligent/autonomous UAV shooting, which has the potential of addressing current technology challenges. Subsequently, the concepts and advantages inherent in multiple-UAV cinematography are introduced. The core of multiple-UAV cinematography consists in identifying different combinations of multiple single-UAV camera motion types, assembled in meaningful sequences. Finally, based on the defined UAV/camera motion types, tools for managing a partially autonomous, multiple-UAV fleet from the director's point of view are presented. Although the overall focus is on cinematic coverage of sports events, the majority of our contributions also apply in different scenarios, such as movies/TV production, newsgathering, or advertising.

Autonomous Unmanned Aerial Vehicles Filming in Dynamic Unstructured Outdoor Environments

Article

Full-text available

Dec 2018

Recent mass commercialization of affordable Unmanned Aerial Vehicles (UAVs, or "drones") has significantly altered the media production landscape, allowing easy acquisition of impressive aerial footage. Relevant applications include production of movies, television shows or commercials, as well as filming outdoor events or news stories for TV. Increased drone autonomy in the near future is expected to reduce shooting costs and shift focus to the creative process, rather than the minutiae of UAV operation. This short overview introduces and surveys the emerging field of autonomous UAV filming, attempting to familiarize the reader with the area and, concurrently, highlight the inherent signal processing aspects and challenges .

Single-Road-Constrained Positioning Based on Deterministic Trajectory Geometry

Article

Full-text available

Nov 2018

Tiancheng Li

We consider the single-road-constrained estimation problem for positioning a target that moves on a single, deterministic and exactly known trajectory. Based on the geometry of the trajectory curve, we cast the constrained estimation problem as an unconstrained problem with reduced state dimension. Two approaches are devised based on a Markov transition model for unscented Kalman filtering and a continuous function of time for (weighted) least square fitting, respectively. A popular simulation model has been used for demonstrating the performance of the proposed approaches in comparison to existing approaches.

Challenges in Autonomous UAV Cinematography: An Overview

Poster

Full-text available

Jul 2018

Autonomous UAV cinematography is an active research field with exciting potential for the media industry. It bears the promise of greatly facilitating UAV shooting for various applications, while significantly reducing the costs compared to manual shooting. However, the general problem has not been clearly defined and the challenges arising from current legislation and technology restrictions have not been fully charted. A complete overview of issues related to autonomous UAV cinematography is needed, pertaining to the current situation in the field, so as to guide immediate-future research. The purpose of this paper is to lay exactly this groundwork, with the expectation of providing a global perspective to multiple domain-specific research communities. The outlined issues are partitioned into challenges deriving from ethical/legal/safety considerations and from operational/production requirements. A brief survey of current technological solutions, including their limitations, is also provided for each issue.

Challenges in Autonomous UAV Cinematography: An Overview

Conference Paper

Full-text available

Jul 2018

Autonomous UAV cinematography is an active research field with exciting potential for the media industry. It bears the promise of greatly facilitating UAV shooting for various applications , while significantly reducing the costs compared to manual shooting. However, the general problem has not been clearly defined and the challenges arising from current legislation and technology restrictions have not been fully charted. A complete overview of issues related to autonomous UAV cinematography is needed, pertaining to the current situation in the field, so as to guide immediate-future research. The purpose of this paper is to lay exactly this groundwork, with the expectation of providing a global perspective to multiple domain-specific research communities. The outlined issues are partitioned into challenges deriving from ethical/legal/safety considerations and from oper-ational/production requirements. A brief survey of current technological solutions, including their limitations, is also provided for each issue.

Route-Based Dynamics Modeling and Tracking With Application to Air Traffic Surveillance

Article

Jan 2019

In transportation networks, the majority of moving vehicles are route-based or trajectory-scheduled. Taking advantage of such predictive information generally produces more accurate dynamic models and better surveillance performance. This paper is concerned with the route-based dynamic modeling along with the route-aided tracking. First, the evolution of the positions across the route is formulated as a stationary Markov process from the characteristics of the route-based dynamics, which follows that the second- and third-order models of the straight-line route-based motions are constructed. This novel modeling strategy is in reverse to the conventional ones starting from the acceleration and its resultant dynamic models are easy to implement due to the linearity with respect to the system states. Second, an optimal initialization technique for route-aided tracking is proposed by utilizing the stationary process information sufficiently. Furthermore, an extension to the circular route-based dynamic modeling and a combinational modeling structure are also presented. Finally, in the context of aerial surveillance, numerical simulations are provided to show the effectiveness of the proposed dynamic modeling and to verify the theoretical results given in the paper.

Convolutional Neural Networks for Visual Information Analysis with Limited Computing Resources

Conference Paper

Oct 2018

Shot Type Constraints in UAV Cinematography For Autonomous Target Tracking

Abstract and Figures

Recommended publications

UAV Cinematography Constraints Imposed by Visual Target Tracking

UAV Cinematography Constraints Imposed by Visual Target Tracking

Shot Type Feasibility in Autonomous UAV Cinematography

Challenges in Autonomous UAV Cinematography: An Overview

Autonomous Unmanned Aerial Vehicles Filming in Dynamic Unstructured Outdoor Environments