ThesisPDF Available

PANORAMA IMAGING FOR MOBILE PHONES

May 2010

May 2010

Authors:

University of Oulu

A panoramic image is a wide-angle mosaic image done by combining several still images. Panoramas usually refer to single viewpoint images, created by rotating the camera around its optical center. Digital panoramas, on the other hand, are assembled from a series of perspective images through a computationally costly and a memory hungry multistage process. This thesis describes a real-time implementation of a video based image stitcher, designed to create high-quality image mosaics on a mobile phone, which can be integrated on the image pipeline. Handheld devices such as mobile phones have limited memory and processing capabilities. The slow performance or lack of floating point arithmetic units on mobile processors requires a careful selection of the panorama construction algorithms and their implementations. In the context of this thesis, several solutions have been developed to adapt the computationally costly application to a small footprint platform. These include fixed point implementations of the most expensive algorithms and the use of a carefully tailored memory management. The application runs in real time on multimedia phones, producing 360 degrees panorama pictures on the fly. In most practical cases the quality of the pictures for a human viewer is indistinguishable from optical panoramas.

Two mobile communications devices with two cameras.

…

. Time and error rate comparison of registration methods.

…

Application memory usage.

…

Application speed performance.

…

+30

Time consumption of panorama stages.

…

Figures - uploaded by Miguel Bordallo Lopez

Content may be subject to copyright.

Content uploaded by Miguel Bordallo Lopez

Content may be subject to copyright.

PANORAMA IMAGING FOR MOBILE PHONES

MIGUEL BORDALLO

Bordallo M. (2010) Panorama Imaging for Mobile Phones. University of Oulu,

Department of Electrical and Information Engineering. Master’s thesis, 58p.

ABSTRACT

A panoramic image is a wide-angle mosaic image done by combining several

still images. Panoramas usually refer to single viewpoint images, created by

rotating the camera around its optical center. Digital panoramas, on the other

hand, are assembled from a series of perspective images through a

computationally costly and a memory hungry multistage process.

This thesis describes a real-time implementation of a video based image

stitcher, designed to create high-quality image mosaics on a mobile phone,

which can be integrated on the image pipeline. Handheld devices such as mobile

phones have limited memory and processing capabilities. The slow performance

or lack of floating point arithmetic units on mobile processors requires a careful

selection of the panorama construction algorithms and their implementations.

In the context of this thesis, several solutions have been developed to adapt

the computationally costly application to a small footprint platform. These

include fixed point implementations of the most expensive algorithms and the

use of a carefully tailored memory management.

The application runs in real time on multimedia phones, producing 360

degrees panorama pictures on the fly. In most practical cases the quality of the

pictures for a human viewer is indistinguishable from optical panoramas.

Keywords: panorama, video, mobile phone, stitching, mosaic

Bordallo M. (2010) Panoraamakuvantaminen matkapuhelimilla. Oulun

yliopisto, Sähkö- ja tietotekniikan osasto. Diplomityö, 58 s.

TIIVISTELMÄ

Panoraamalla tarkoitetaan yleensä kuvaa, joka on luotu kiertämällä kameraa

sen optisen keskipisteen ympäri. Panoraamakuva voi olla myös erillisistä

kuvista muodostettu laajakulmainen mosaiikkikuva. Digitaalisten

panoraamojen koostaminen sarjasta perspektiivikuvia on monivaiheinen

prosessi, joka vaatii suorittimelta runsaasti resursseja.

Tämä tutkimus esittää kuvamosaiikkien luomiseen matkapuhelimella

tarkoitetun videopohjaisen järjestelmän reaaliaikaisen toteutuksen, joka

voidaan integroida kuvankäsittelytoimintoihin. Kannettavien laitteiden

muistirajoitteet, pieni laskentateho ja liukulukuyksikön puute edellyttää

panoraamalgoritmien huolellista valintaa ja toteutusta.

Panoraamakuvauksen vaatimista algoritmeista on kehitetty useita ratkaisuja,

joissa suorittimelta runsaasti resursseja vievä sovellus on mukautettu pienen

laskentatehon alustalle. Lisäksi eniten resursseja vievät algoritmit on toteutettu

kiinteän pisteen tekniikoilla ja niiden muistinhallinta on räätälöity.

Sovellus toimii reaaliajassa multimediapuhelimissa ja tuottaa 360 asteen

panoraamakuvia kuvanottotahdissa. Useimmissa käytännön tapauksissa

ihminen ei pysty näkemään eroa tuotettujen kuvien laadussa verrattuna optisiin

panoraamoihin.

Avainsanat: panoraama, sulautettu järjestelmä, mosaiikki

Bordallo M. (2010) Imágenes Panorámicas para Teléfonos Móviles. Universidad

de Oulu, Departamento de Ingeniería Eléctrica y Procesado de Información. Proyecto

Fin de Carrera, 58 p.

RESUMEN

Llamamos imagen panorámica a un mosaico obtenido por la combinación de

varias imágenes sencillas. El término “panorama” normalmente se refiere a una

imagen con un solo punto de vista, creada al rotar la cámara alrededor de su

centro óptico. Las imágenes panorámicas digitales, sin embargo, están

compuestas a partir de una serie de perspectivas mediante un proceso

multietapa, computacionalmente muy costoso y que conlleva un gran consumo

de memoria.

Esta Proyecto Fin de Carrera describe la implementación de una aplicación

diseñada para componer imágenes panorámicas de gran calidad a partir de

fotogramas de una secuencia de vídeo obtenidos con un teléfono móvil y que

puede ser integrada como parte del sistema de adquisición de imágenes del

mismo. Los pequeños dispositivos electrónicos como los teléfonos móviles, tienen

una cantidad de memoria limitada y unas capacidades de proceso muy

reducidas. La falta de unidades de punto flotante en los procesadores móviles o

el bajo rendimiento de las mismas, requieren una cuidadosa selección de los

algoritmos y las implementaciones a utilizar.

En el contexto de este Proyecto Fin de Carrera, han sido desarrolladas

numerosas soluciones técnicas para adaptar este tipo de aplicaciones a una

plataforma con capacidades reducidas. Las soluciones incluyen

implementaciones con aritmética de punto fijo de los algoritmos más costosos y

el uso de un cuidadoso manejo de la memoria utilizada.

La aplicación desarrollada, es capaz de producir imágenes panorámicas de

360 grados en tiempo real. En la mayoría de casos prácticos, la calidad de las

imágenes es indistinguible para el ojo humano de las obtenidas con sistemas

ópticos.

Palabras clave: panorama, teléfono móvil, composición de imágenes, mosaico

CONTENTS

ABSTRACT

TIIVISTELMÄ

RESUMEN

CONTENTS

FOREWORD

GLOSSARY

1. INTRODUCTION .................................................................................................... 9

1.1. Scope of the thesis ........................................................................................... 10

1.2. Structure of the thesis ...................................................................................... 11

2. PANORAMA IMAGING ...................................................................................... 12

2.1. Traditional panorama imaging ........................................................................ 12

2.2. Types of digital panorama images .................................................................. 13

2.3. Different approaches to panorama stitching .................................................... 15

2.3.1. Panorama from several pictures ............................................................ 15

2.3.2. Panorama from a video sequence .......................................................... 16

2.4. Applications of panorama imaging ................................................................. 16

2.5. Panorama algorithm types ............................................................................... 17

2.5.1. Batch-type algorithms ........................................................................... 17

2.5.2. Real-time algorithms ............................................................................. 17

2.5.3. Prior Implementations ........................................................................... 17

3. MULTIMEDIA PHONE PLATFORM .................................................................. 19

3.1. Symbian S60 multi-media framework and camera access .............................. 19

3.2. Nokia N93 mobile phone ................................................................................ 21

4. PANORAMA FROM MOBILE DEVICE VIDEO ............................................... 23

4.1. Multistage approach ........................................................................................ 23

4.2. Projection and camera motion model .............................................................. 23

4.3. Frame registration stage .................................................................................. 24

4.3.1. Frame registration sub-tasks .................................................................. 25

4.3.2. Approaches to frame registration .......................................................... 25

4.4. Frame stitching stage ....................................................................................... 27

4.4.1. Frame stitching tasks ............................................................................. 28

4.4.2. Approaches to frame stitching ............................................................... 28

4.5. Frame selection subsystem .............................................................................. 28

4.5.1. Frame selection tasks............................................................................. 29

4.5.2. Approaches to frame selection .............................................................. 30

4.6. Frame correction subsystem ............................................................................ 32

4.6.1. Frame correction tasks........................................................................... 32

4.6.2. Approaches to frame correction ............................................................ 32

5. PANORAMA BUILDER ALGORITHMS AND IMPLEMENTATIONS ........... 33

5.1. Overview of the algorithm .............................................................................. 33

5.2. Projection and camera motion model .............................................................. 33

5.3. Image pre-processing ...................................................................................... 34

5.4. Image registration ............................................................................................ 36

Fast Fourier transform ..................................................................................... 37

5.4.1. Shift estimation...................................................................................... 38

5.4.2. Rotation estimation................................................................................ 38

5.4.3. Image registration comparison .............................................................. 39

5.5. Moving objects detection and blur estimation ................................................ 40

5.6. Frame selection and evaluation ....................................................................... 40

5.7. Image blending ................................................................................................ 42

5.7.1. Global Luminance correction ................................................................ 42

5.7.2. Pixel Luminance correction................................................................... 43

6. USER INTERFACE AND EXPERIMENTS......................................................... 45

6.1. PanoramaX Application .................................................................................. 45

6.1.1. User interface ........................................................................................ 47

6.1.2. Frame capturing interface ...................................................................... 48

6.2. Stitching DLL .................................................................................................. 49

6.3. Memory management ...................................................................................... 49

6.4. Error correction ............................................................................................... 50

6.5. Adjusting parameters ....................................................................................... 50

6.6. Quantitative performance on a N93 phone ...................................................... 52

6.7. Other Symbian phones .................................................................................... 53

6.8. Further directions ............................................................................................ 54

7. SUMMARY ........................................................................................................... 55

8. REFERENCES ....................................................................................................... 56

FOREWORD

This Master’s Thesis was done in the Machine Vision Group of the Department of

Electrical and Information Engineering at University of Oulu.

I would like to express my gratitude to Professor Olli Silvén for giving me the

chance to work in the MVG and for supervising this thesis. Without his numerous

valuable ideas and guidance, this thesis would have never been completed.

I would like to thank Dr. Jari Hannuksela for reviewing the thesis manuscript. His

valuable comments greatly improved the quality of the thesis.

Thanks to Dr. Jani Boutellier for advising and helping me at an early stage of my

research and for being always available for questions.

Thanks to Dr. Markku Vehviläinen and Dr. Marius Tico from Nokia Research for

their useful feedback during the project.

Many thanks to my working roommates and the rest of my colleagues in the

Machine Vision group for creating such a nice working environment.

Thanks to the University of Oulu for allowing me to study here, first as an

exchange student and later on in its degree program.

I would like to thank my family for their support from the distance and for their

care and love over the years.

Finally, I also want to thank Hillevi for her love, support and patience even when

she had to read my thesis over and over.

Oulu, April 2010

Miguel Bordallo

GLOSSARY

ARM 32-bit RISC processor architecture widely used in embedded designs

CCD Charged Coupled Device. Used often as a synonym for a type of image

sensor

CPU Central Processing Unit

DLL Dynamic Link Library

DSP Digital Signal Processor

EKA Epoc Kernel Architecture

FFT Fast Fourier Transform

FPS Frames per Second

FPU Floating Point Unit

GPU Graphics processing Unit

LAN Local Area Network

LCD Liquid Crystal Display

MMF Multi Media Framework

MVG Machine Vision Group

OS Operating System

PDA Personal Digital Assistant

RAM Random Access Memory

RGB A color space that is based on the three additive primary colors, Red,

Green, and Blue

SIFT Scale-invariant feature transform

UI User Interface

Y Luminance of a picture, usually a function of Red, Green and Blue

values of a pixel

1. INTRODUCTION

Smartphones are normal mobile phones with some extra features that make them the

perfect all-in-one handheld device. They offer access to the internet through GPRS,

3G or Wireless LAN adapter, they can record and reproduce sounds, they can be

used as a personal digital assistant (PDA), and it is possible to install in them some

applications that allow mobile banking or online purchase, maps, positioning

systems, music players and so on.

The memory capacity and the speed of the processors of handheld devices are

improving all the time, showing better capabilities that create opportunities for new

applications. Most smartphones have built-in digital cameras with still image and

video capturing. Smartphones have a fast processor and some free RAM memory and

some even offer a floating-point unit and hardware graphics accelerators. With the

available imaging and computational capabilities, we may conclude that computer

vision applications are among the developing paths on mobile platforms. Figure 1

depicts two mobile communications devices both with two cameras: Nokia N93i and

Nokia N900.

Although the imaging capabilities of mobile phones have improved close to the

performance of digital cameras, the lack of space makes it almost impossible to

provide for enough image detail or very wide angle capabilities. As a replacement for

optics, image sequences can be used as a source for wide angle images. Taking high

resolution still images is a slow process that requires a large amount of offline

processing. Video sequences, on the other hand, can be used in an interactive manner

in real time to capture and compose a high resolution panorama, trading off the

optical performance for computational power.

Figure 1. Two mobile communications devices with two cameras.

Panorama construction from a video sequence is a complex multi-stage process

where several subtasks have to be performed. Each one of the tasks presents their

own problems that have to be solved in order to achieve a good final image. The

main tasks that have to be performed are an accurate image registration system, the

selection of the frames that are going to be used and the blending of these frames in

order to be combined into a single panorama.

The main problems that have to be faced when thinking about an interactive

handheld panorama construction depend on the camera and the environment. These

include eventualities such as moving objects across the scene, color imbalance

problems, dynamic change of gain or the occasional blur of some frames due to

motion or autofocus. This cannot be ignored when trying to obtain quality images.

Figure 2 represents an example panorama image with some of these typical

problems.

Figure 2. Example panorama with moving objects and luminance changes.

1.1. Scope of the thesis

In this thesis, a series of problems on the way to a complete interactive handheld

panorama application have been considered. The main contribution of this thesis is

the development of a 360 degree real-time video based panorama application for a

mobile phone. This application is also the first one whose implementation is

described.

Panorama imaging needs some sub-tasks like frame capturing, image registration,

and blending. Some of these tasks have been applied in several other purposes and

can alone represent a substantial field of study.

Image blending is a computationally costly low level image processing operation

that needs to be performed in real time and that will determine the quality of the final

picture. The source images for the blending stage have to be carefully selected with

image quality assessment methods such as moving object detection or blur

computation, forming together a complete frame selection process. The images that

are going to be selected need an accurate image registration system that is capable of

identifying the matching parts of the individual frames. These frames can be captured

as a video sequence in an interactive manner, as opposed to the slow high-resolution

still image capturing. This approach allows us to take advantage of the small

variations between frames and defines the requirements of the panorama algorithms.

The Machine Vision Group (MVG) at the University of Oulu has made significant

progress in panorama imaging and has developed different algorithms for each one

of the sub-tasks. These algorithms cannot be used directly in handheld devices. In

this thesis, the goal was to study, adapt and implement some of these algorithms in a

handheld embedded system environment as well as providing a working solution

capable of executing on a mobile platform with a good performance.

1.2. Structure of the thesis

This thesis is organized as follows: First, general definitions on panorama imaging

and different approaches to it are given in Chapter 2. Several currently used stitching

algorithms are also described. Chapter 3 deals with the multimedia phone platform

that has been used for the development. In Chapter 4, the tasks of a stitching

algorithm are described and possible solutions are discussed. In Chapter 5, the actual

implementation of the main tasks of a panorama algorithm, such as image

registration and image stitching on the target device, is described. The chapter also

shows how to improve the panorama construction system with a video frame

selection stage. Chapter 6 describes the implementation of the application and its

architecture and memory management. Some performance experiments and results

are also presented. Chapter 7 gives a summary of the whole process.

2. PANORAMA IMAGING

Traditionally, a panorama is a wide angle image that shows at least a horizontal field

of view as wide as the human head-eye combination is capable of. Often, left-to-right

widths up to 360-degree are used.

To be able to obtain a panorama result multiple images are needed, as well as a

stitching method, to compose a wide image from the input frames. The amount of

needed pictures and the aspect of a 360-degree panorama depend on the lens used as

well as on the overlap between images. For example, a 360-degree panorama with a

standard 35mm focal length lens can have an aspect ratio comparable to 57:9, which

corresponds to 19 horizontal images. If a wide-angle lens (18mm) is used, the

panorama can become about 57:14 or 13 horizontal images. Mosaic pictures are

generalized panoramas where the camera movement is not horizontally restricted.

Figure 3 represents an example of panorama and mosaic constructing.

Figure 3. Example of panorama (up) and mosaic (down) constructing.

2.1. Traditional panorama imaging

Originally, panoramic imaging had to be done with conventional film cameras and

without the aid of computers. Panoramic cameras have been being invented for 150

years. A good timeline of the history of panoramic cameras can be consulted on

archives of the International Association of Panoramic Photographers in the article

written by McBride [1] or on the technical history of panoramas written by Polden

[2]. Traditionally, several panorama composing methods have been used:

• Conventional method – This method consisted only of cropping or masking

the upper and lower part of a conventional image to give the false idea of a

wider field of view. A wide angle lens gives better results.

• Segmented panorama – This panorama is just a group of pictures, usually

taken at the same time, that overlap about one third with the next and the

previous one.

• Swing lens panorama – This kind of panorama has a big advantage over

using a fish-eye lens on a regular camera. The film is held by a curve support,

while the lens moves from one side to another in the camera. It is possible to

obtain an up to 150-degree horizontal field of view.

• Rotational panorama – These panoramas are made by cameras that can

rotate around a single point that acts as the point of view for images. The film

moves in the opposite direction at the same speed as the camera does. Some

of these cameras have existed since the early 1900s.

• Strip panorama – This kind of images are usually taken with stationary

cameras. The film is moving at the same speed as the moving image. It is also

possible to use them by moving the camera in a linear way.

2.2. Types of digital panorama images

In the case of digital panorama pictures, we can distinguish among several ways of

computing the result. Depending on the use that we are going to give to the image,

we can make a choice. Basic types of digital panorama images are:

• Flat or planar panorama – A conventional picture taken with only one shot

is just a planar picture. These flat images are intended to be viewed without

any perspective correction. A panorama made this way is just a wider angle

conventional picture made from a group of stitched images. In planar

panoramas we can recognize a starting and ending point. Flat panoramas are

the best choice if we need to print or publish them. Figure 4 depicts a flat

panorama image.

Figure 4. Flat Panorama taken with the developed application.

• Circular-planar panorama – If the coordinates of a flat panorama are

converted into polar coordinates, a circular panorama is obtained. For a 360-

degree panorama, the result is a circle where the whole field of view is

covered. Circular panoramas present deformation in both axes. Figure 5

shows a circular panorama.

Figure 5. Circular Panorama taken with the developed application.

• Cylindrical panorama – If the picture is intended to be viewed as if it were

curved around one point like the inner face of a cylinder, a cylindrical

panorama is needed. These pictures can be viewed surfing them horizontally

with a left to right eternal loop. Images of this kind have to be 360-degree

panoramas, and viewing software is needed. Displaying these images as flat

will show curves in the horizontal axis. Figure 6 represents a cylindrical

panorama.

Figure 6. Example of a cylindrical panorama.

• Spherical panorama – Spherical panoramas are stored as equirectangular

image files. They represent 180-degree vertical axis and 360-degree

horizontal axis. If not all of this field of view can be covered, blank pictures

can fill the empty zone. These images are intended to be viewed from the

center of a hypothetic sphere, and when they are shown in a flat format, the

horizontal and vertical axes are curved. In a spherical projection of a

panorama, the horizon is represented by a circumference around the central

point of a flat image. Horizontal distance is represented then as angular

distance and vertical distance is represented by radial distance to the

geometrical centre. The main difference with this format when compared to

the others is that the very top and the very bottom of the image can also be

displayed. Special viewing software is needed in order to display this kind of

images. Figure 7 represents a spherical panorama.

Figure 7. Example of a spherical panorama.

• Cubic panorama – The projection of this 360-degree horizontal axis and

180-degree vertical axis is done in a flat way towards the six faces of a cube.

Six different images are stored. A special viewer is needed in order to display

them, but each face of the cube can be printed or published the same way as a

planar image. Figure 8 shows an example of a cubic panorama.

Figure 8. Example of a cubic panorama.

2.3. Different approaches to panorama stitching

As digital imaging devices are becoming cheaper and better, more and more

possibilities to compose panorama images are appearing. Several approaches to

image capturing can be discussed. Traditionally, stitching panoramas from several

still-frames has been the most developed approach. Nowadays video capabilities are

available in most of the digital imaging devices. Video sequences as a source present

some pros and cons that deserve to be studied and discussed.

2.3.1. Panorama from several pictures

The most common way to approach to panorama imaging consists of stitching

several pictures that overlap at least the 30% to enable precise registration. The

pictures are carefully selected, taken exactly from the desired point and on the same

horizontal or vertical plane. A tripod is normally used to avoid undesired rotation and

tilting.

Nowadays, digital compact cameras and other handheld devices are so popular that

trying to compose a panorama image has become usual. Several pictures are then

taken in order to match the requirements, and after that, they are stitched

automatically with the help of a computer.

The main advantage of taking a panorama from a set of pictures is the possibility

of carefully taking each one of the panorama parts. Every picture can have a good

resolution and quality even after image compression. This is a good approach for

users that do not mind storing a set of pictures and stitching them afterwards with a

quality algorithm in a personal computer.

2.3.2. Panorama from a video sequence

Another approach to panorama imaging can be the use of a video sequence as the

source for image capturing. Video applications have usually been designed for

content creation, access and replay. Nowadays, it is also possible to employ video to

enable new applications. In this case, the motion information can be extracted from

the image sequence captured by the camera to construct a real-time panorama

builder.

Video sequences usually have less resolution than still images. However, they

have a high frame-rate and a big overlapping area between consecutive frames. The

high redundancy of the images allows us to construct the panorama and provides

compensation for a lower resolution.

Shooting a video sequence to make a panorama is a very intuitive process. The

starting point of the video is the beginning of the mosaic and then it can be moved to

desired directions with soft transitions. To get a panorama image from a video,

usually a sequence is recorded in order to post-process it. Each frame from the video

is separated and treated as a single image that overlaps for example, around the 90%

with the previous one.

In addition to interactivity benefits, the use of low resolution video imaging can be

defended from purely technical aspects. In low resolution mode the sensitivity of the

camera can be better as the effective size of the pixels is larger, therefore reducing

the illumination requirements and improving the tolerance against motion blur.

This thesis also explains the method developed by Boutellier and Silvén [3] that

specifies how to select only the best group of frames in a video sequence to capture a

panorama and discarding the others. The panorama quality improves with sharper

input images.

2.4. Applications of panorama imaging

Graphical panoramas are a valuable tool to display a 3D world environment in an

easily understandable manner. In cartography, panorama techniques have been used

to map three dimensional objects into a two dimensional image. With the

development of the automatic stitching systems, a wide field of applications to these

techniques have been developed.

Nowadays, visual surveillance systems employ panorama stitching from multiple

camera data using it for several industrial and security processes [4]. Another

possible application is the development of a handheld scanner either for documents

or large posters using the blending of some frames. A multirow scan of a document

can lead to a high resolution mosaic picture suitable to be digitally stored for later

uses.

2.5. Panorama algorithm types

Generally, the panorama stitching algorithms can be separated into two categories.

Batch-type algorithms are applied offline after the images that will compose the

mosaic are already taken. These algorithms process every frame one by one and then

stitch them together into a single image.

The other group of algorithms, in which the proposed solution leans on, is related

to the real-time algorithms. In these kinds of solutions, the frames are processed as

they are being captured, and the resulting image is obtained just when the capturing

ends. The processing requirements for this second group are demanding as several

approximations may be needed due to the target device constrains.

2.5.1. Batch-type algorithms

Batch-type panorama algorithms have been used in Personal Computers for several

years and they are therefore well known. With these algorithms, the starting point is

a series of already taken images that are supposed to be blended into a single

scenario panorama picture.

In fully automatic algorithms, each single picture is registered against the others to

obtain the relative position in the whole landscape or scenario. Once this position has

been determined, the images are blended into the final mosaic, correcting the

conflicting points.

2.5.2. Real-time algorithms

Real-time panorama algorithms are relatively new because substantial amounts of

resources are needed. In these algorithms, the starting point is a single frame that is

used as the beginning of the panorama image that grows as the frames are coming.

Most of the processing sub-tasks for each frame must be done before capturing the

next one. These algorithms can be fully automatic or they may require some

interaction from the user to achieve better performance.

The interactivity of real-time algorithms presents a considerable advantage over

batch-type realizations, but there are also disadvantages: The quality may suffer due

to streamlined stitching techniques. However, the possibility of a high overlap

between frames and the chance of providing user feedback during the capture and

stitching, leads to satisfactory results.

2.5.3. Prior Implementations

There are some commercial solutions already in the market in both algorithm

categories. Canon PhotoStitch [5] implements a batch type algorithm as a part of the

EOS Cameras software package. PhotoStitch allows a computer user to select,

display and merge a series of pictures taken with a digital camera specifying certain

parameters such as the approximate overlap between pictures and the focal length.

Some other solutions following the same overall scheme are PanoStitch,

3DVistaStitcher, the plug-ins integrated into Photoshop Elements or the Microsoft

Stitching software present in Digital Image Pro, Digital Image Suite, and Windows

Live Photo Gallery [6].

Real-time algorithms have been developed for embedded systems such as digital

cameras or mobile phones. Bitside Panoman, Scalado Autorama [7], the latest

QuickPanorama solution from Morpho Inc [8], and the panoramic shooting mode of

SonyEricsson and Nokia phones are some of these commercial solutions that can be

found.

SonyEricsson’s application relies mostly on the user feedback to get good captures

for a total amount of 3 or 5 frames. The user is instructed on how to shoot the next

frame from a good point and the blending process is performed [9].

First version of Panoman and Nokia panoramic mode work quite similarly. The

user is asked to move the camera in a smooth manner, and some frames are shot on

the process, converted into panorama stitching blocks and blended together on a

fixed size resulting image that can be composed of several frames [10]. The latest

version of Panoman works as a mix of the first version and SonyEricsson´s

application but it allows high resolution frames with the cost of several minutes of

post-processing time.

3. MULTIMEDIA PHONE PLATFORM

Implementing mosaic stitching for a mobile environment is more challenging than

implementing such a system, e.g., for a PC environment. On the other hand, when

compared to a ‘generic’ implementation, the specificity of the mobile environment

requires knowing the internals of the chosen platform and environment.

Modern mobile communication devices are becoming attractive platforms for

multimedia applications as their display and imaging capabilities are improving

together with the computational resources. Many of the devices have two built-in

cameras, one for high resolution still and video imaging and the other for obtaining

lower, e.g., VGA resolution (640x480 pixels) frames. Handheld devices offer

versatility of imaging equipment in comparison to laptop computers.

To illustrate the current typical designs, we consider two modern cellular phone

designs that both contain two cameras (see Figure 1). The flip phone has a high

resolution camera on the cover, on the same side with a display. However, it is

intended to be operated with the lid open, exposing a higher resolution display and an

additional camera to the user. The monoblock design is similar to digital still

cameras with the high resolution display and cameras on opposite sides. It is obvious

that with the display side cameras the designers have aimed at handheld video

telephony, while at the same time satisfying the needs of occasional photography,

video capture, and playback. Figure 9 depicts a simplified scheme of the organization

of a typical portable multimedia device.

Figure 9. Organization of a typical portable multimedia device.

Nokia N93 Smartphone was used as the main target hardware for the development of

the panorama application. The N93 phone is running Symbian OS v9.0, and Series

60 3rd Edition platform. A previous version of the application running on a Nokia

N90 phone with Symbian OS v8.1 and Series 60 2nd Edition has been also developed

for testing purposes. In the development of the panorama stitching application, the

quality of the main camera was the most important feature in selecting the target

device.

3.1. Symbian S60 multi-media framework and camera access

Symbian OS, produced by Symbian Ltd., is a proprietary operating system, designed

for mobile devices, with associated libraries, user interface frameworks and reference

implementations of common tools. It is a descendant of Psion's EPOC and runs

exclusively on ARM processors.

For hardware, Symbian OS is optimised for low-power battery-based devices and for

ROM-based systems. There is a strong emphasis on conserving resources, using

Symbian-specific programming idioms such as descriptors and a cleanup stack.

There are similar techniques for conserving disk space. Furthermore, all Symbian OS

programming is event-based, and the CPU is switched off when applications are not

directly dealing with an event [11].

The S60 Platform (formerly Series 60 User Interface) is a software platform for

mobile phones that uses Symbian OS. S60 consists of a suite of libraries and standard

applications, such as telephony, PIM tools, and Helix-based multimedia players. It is

intended to power fully-featured modern phones with large color screens, which are

commonly known as smartphones. Within the main characteristics of S60 platform

we can find the allowance of multitasking, the standardization of several interfaces

such as buttons or devices access and the possibility of executing Java MIDP and

Symbian C++ applications [12].

A multimedia framework is a software framework that handles media on a

computer and through a network. A good multimedia framework offers an intuitive

API and a modular architecture to easily add support for new codecs or container

formats. It is meant to be used by applications such as media players and audio or

video editors [13].

Symbian Multi-Media Framework is known as MMF. It is composed by a MMF

Client API that handles all the interaction with the sound, audio and video devices

through the MMF Controller Framework. This framework can access the audio and

video controller plug-ins where several codecs and formats can be used. Symbian

MMF codecs and formats can be easily extended through the API. Figure 10 presents

the organization of the Symbian multimedia framework [14].

Figure 10. Nokia Symbian S60 Multi-Media Framework architecture

The camera device can be accessed through the camera API. With similar structure

as the MMF, a Client API handles all the interaction between the developer and the

device through a Camera Controller Framework.

3.2. Nokia N93 mobile phone

Nokia N93 phone is a clamshell-shaped, twistable-screen mobile phone that includes

several features that can be used for the stitching application purposes. It includes a

three megapixel camera with both optical and digital zoom. A smaller VGA camera

is also present pointing towards the user and can be mainly used for video

conference. Figure 11 depicts the internal organization of a Multimedia Cellular

device [15].

Figure 11. Internal organization of a Multimedia Cellular device.

The application processor of Nokia N93, the Texas Instrument OMAP2420

controller, includes an integrated ARM1136 processor (330 MHz), a TI

TMS320C55x™DSP (220 MHz), 2D/3D graphics accelerator, imaging and video

accelerators, high-performance system interconnects and industry-standard

peripherals.

The multimedia enhancements of OMAP2420 include an imaging and video

accelerator for higher-resolution still image capture application and video encoding

and decoding of VGA resolution sequences at 30 frames per second. A floating-point

unit and an OpenGL ES 1.1 compatible GPU 2D/3D graphics accelerator are

available. The graphics processing unit can be used even for image processing by

feeding camera images to the textures interface.

The available RAM memory, when the operating system is started and several

basic applications are running, can be up to 20MB, offering enough space to

manipulate uncompressed resulting images. Figure 12 shows a Nokia N93 device

with the screen flipped to capture camera images.

Figure 12. A Nokia N93 phone.

N93 imaging capabilities

The two cameras present in the device can be accessed through the Symbian API.

The API allows capturing up to three megapixel images and up to 640x480 raw

video frames with a frame rate up to 15 frames per second. Still frames can be

directly obtained from the camera at several resolutions either as raw bitmaps or

JPEG objects.

The N93 screen presents a fixed resolution of 320x240 pixels but several fast

scaling functions are available. The video frames are received as an ARGB

viewfinder image that can be drawn directly in the screen or processed before, but

also YUV frames can be obtained directly from the camera without any conversion.

Also, a compressed video interface can be used, resulting in an encoded file that is

stored on the memory card, but with no access to individual frames.

N93 phone offers the possibility of switching between two cameras. Both cameras,

main and videoconference, can be used in the same way, but it is not possible to use

them at the same time, and fast changes between them are also not implemented on

the API.

4. PANORAMA FROM MOBILE DEVICE VIDEO

4.1. Multistage approach

A panorama algorithm needs to go through several stages, each of which can be

treated as a separate problem. These stages have been case of study of their own, and

several solutions can be chosen to deal with the needs of a complete algorithm.

Figure 13 shows a multi-stage scheme of a panorama algorithm.

The first stage refers to the capturing of every frame of a video sequence while

second stage deals with the preparation of each frame through cropping or pre-

processing. The second stage describes how to register every frame against the

previous ones. The registration task has several sub-tasks that depend on the selected

method.

The third stage in the system is the computation of the quality of every frame to

enable the selection of the most suitable ones based on several criteria. Along with

the evaluation of the quality, also corrections can be performed to every frame to

normalize certain values. The final phase of the algorithm blends the selected frames

together into one image to obtain the resulting image that can be saved in a mass

storage device.

Figure 13. Scheme of an example multi-stage panorama algorithm.

4.2. Projection and camera motion model

The first step of designing an algorithm capable of stitching several frames into a

single image is defining a camera model and the projection desired. The projection

model will express how a set of real 3D objects are going to be translated into 2D

images. Several projections and models can be used to achieve this, but for the

definition of a video based panorama algorithm developed on a constrained

environment, a simple projection and camera motion model have to be chosen.

The manifold projection, originally introduced by Peleg [16], has been used as a

camera motion model. In manifold projection, a thin stripe is taken from the centre of

each frame to be used in the construction of the mosaic image. If there is no

significant change of scale or motion parallax, the frames can be registered

accurately by a rigid camera motion model. Manifold projection is quick to process

since frames do not have to be projected onto a different surface and it offers

excellent image quality in the resulting mosaic.

4.3. Frame registration stage

Frame registration or image registration refers to the transformation of two different

sets of data containing some coincident information into a single reference

coordinate system. In the particular case of a panorama stitching, frame registration

has to be able to identify the overlapping parts of two frames and the transformations

that should be applied to them to match the same view with the best accuracy

possible. Figures 14 and 15 show two examples of image registration procedures.

Figure 14. Feature-based registration scheme.

Figure 15. Phase correlation-based registration scheme.

Image registration methods can be divided into two separated categories. Feature-

based approaches search for common feature points in a pair of images and then

calculate parameters that make these points to match in the resulting image. They

pay attention to small regions of the image and the shapes such as lines, curves,

corners or boundaries that can be found on them. The definitions and properties of

these two fundamentally different methods are explained well in the survey of Zitová

and Flusser [17].

Depending on the chosen method, a transformation model has also to be defined

based on the parameters that are obtained with the registration of each frame. The

most limiting factor on a small footprint platform is defined by the computational

complexity of the registration algorithms.

4.3.1. Frame registration sub-tasks

Although frame registration can be considered just a single task, in many cases, it

can also be decomposed into several single smaller sub-tasks that can be performed

simultaneously or with a cascade approach. Given two frames with a different point

of view of an overlapping area, a transformation can be defined with up to six

parameters. The point of view can be shifted in three axes (vertical, horizontal and

scale) and also rotated around them. Some registration methods assume some of

these parameters to be zero, thus simplifying the computation of image registration.

The main sub-tasks that can be identified on the image registration are shift

estimation, planar rotation estimation, change of scale estimation and warping

estimation. The shift estimation task should be able to measure the camera movement

while kept perpendicular to the scene. Rotation estimation can measure the rotation

around the axis that is perpendicular to the scene. Change of scale and warping

estimation solutions must also be capable of measuring any other movement or tilt of

the camera. Some other smaller sub-tasks can be considered an image pre-processing

stage. A pre-processing stage composed by window filtering and image

transformations like grayscale conversion might be needed to improve the

performance.

4.3.2. Approaches to frame registration

The reference algorithm selected for this work relies on the recent image registration

method of Vandewalle [18], a direct method which has many advantages. It is very

robust against blur and image intensity changes, and it can handle rotated images.

The method is also computationally fast. On the other side, it does not offer the

possibility of detecting or correcting any change of scale or perspective. Thus it can

provide for only three parameters.

This method presents the following characteristics:

• Fast to compute – This is probably the most desirable feature for a method

that has to be used on a mobile device. Developing a real-time panorama

builder requires a method capable of offering good accuracy while

allowing the possibility of computing several frames per second on a

constrained environment. Vandewalle’s method relies on the frequency

computation via Fast Fourier Transforms (one and two dimensional).

• Robust against blur – The quality of the individual frames captured from

a video sequence with a handheld device is relatively low compared with

still frames from a regular digital camera. Blur, and especially motion blur,

is present in some of the frames. Although it is possible to minimize this

effect with the selection of the best frames, the method used for the image

registration has to be robust against blurry frames. Methods based on

frequency and phase analysis, as Vandewalle’s method, present excellent

results against blur.

• Can handle rotated images – Taking handheld sequences always result in

some tilting and undesired small rotations between frames. This feature

offers us the possibility of achieving a better user experience.

• Robust against image intensity changes – Handheld devices such mobile

phones present a very limited access to the camera settings. Using a

sequence as the entry point for a panorama can result in different

luminance and exposure time between frames. This can be corrected with

frame selection and color balance. However, the registration method has to

be robust against image intensity changes.

• Highly optimizable – The used method relies on well known operations

such as FFT. This method offers the possibility of optimizations and fast

routines that result in a fast performance. It has also been experimentally

found that the method presents an excellent numerical precision

performance, enabling the use of very short numerical word lengths.

The shortcoming of the method is that it estimates only the shift and rotation of each

frame against its neighbours. This is often acceptable with panoramas, but the

construction of mosaics becomes almost impossible without a model that can

compute and correct the changes in the perspective and parallax.

A gradient feature machine has been implemented [19] to obtain a more flexible

registration method. The implementation runs in two stages. The feature extraction

stage identifies some points of interest on both of the images to be registered. The

feature matching stage tries to match these points on both of the images using best

linear unbiased estimation. Although the feature machine algorithm is not as robust

against blur and image intensity changes, it provides more flexibility and is also able

to compute changes of scale or perspective in addition to shifts and rotation.

The gradient feature machine method presents the following characteristics:

• Fast to compute – Like Vandewalle’s method [18], a gradient feature

machine can perform in a very fast manner when the regions to be matched

are small and the number of features extracted is reduced.

• Can handle changes of scale images – Taking handheld sequences always

results in some small changes of scale between frames. Although it is not

always the most important factor to achieve quality panoramas, the

computation of this factor allows the frame selection subsystem to discard

images with excessive scaling.

• Can handle changes of perspective– Depending on the selected motion

model, the computation of the change of perspective between two images,

offers the frame correction subsystem the possibility of achieving a better

final frame before being blended.

• Highly optimizable – The feature-based method relies on the extraction of

several points of interest. When capturing images to compose a panorama,

the small distance between video frames offers the possibility to carefully

adapt the number and size of the points to the desired image size and

characteristics. It has been experimentally found that the method presents

excellent accuracy with a small number of low-complexity features

extracted.

Some other registration methods have also been studied. Zitová [17] surveys and

analyses several area based registration methods such as the one published by

Lucchesse [20], which is able to compute full affine transformations from operations

on the frequency domain. In feature based methodology, several other feature types

have been surveyed. Lowe [21] proposed the extraction of Scalar Invariant Feature

Transforms or SIFT descriptors that can be used for image registration. Speeded-Up

Robust Features or SURF descriptors, published by Bay [22] are inspired by SIFT

descriptors. They several times faster and provide for a good alternative to gradient

features. Although these methods provide for an apparently attractive basis, the

computational cost of them results in several major problems while facing the actual

real-time implementation on mobile platforms.

4.4. Frame stitching stage

The frame stitching stage refers to the process that merges a group of frames

together, producing a seamless single image. The frames must be processed in order

to create the mosaic. Then the final image can be used for print, as a motion picture

or screen display.

Blending a set of frames usually requires several sub-tasks that allow global and

local correction of each individual frame and methods to eliminate the presence of

perceivable differences in the seam area. For a given pixel location and based on one

or more pixel values and their neighbours, the stitching function must give the final

value of the pixel in the resulting image. Smooth results with no steps, stripes or

steep curves between frames are always preferred when composing high quality

panoramas. Figure 16 shows an example of Gaussian mask blending.

Figure 16. Frame stitching example.

4.4.1. Frame stitching tasks

Panorama stitching usually assumes that the sequence of images has a close

illumination condition, but when the luminance changes from frame to frame, one

problem becomes apparent. If the neighbouring images are stitched with notable

illumination difference, the final panorama will change abrupt in illumination when

we browse it. This effect gives us an unnatural impression. To avoid these artefacts

on the final image, a color balance method should be performed.

The merging of two images is most critical in the area where the images meet.

This seam area can be processed in many ways, but some local luminance correction

or blending must be implemented. The complex seam handling is most critical when

the registration step produces erroneous results.

4.4.2. Approaches to frame stitching

In the proposed solution, color balancing is made in a manner used before by Xiao

[23]. This blending method adopted a simple statistical analysis to implement the

alignment by imposing one image’s illumination characteristic to the next one. The

mean and the standard deviation of the luminance are computed. The next frame is

corrected using the minimization of square errors.

The paper of Zomet [24] contains a good comparison of image blending methods

and also proposed a new approach for optimizing the stitching result by a gradient

based cost function. Szeliski proposed a simple local blending method to eliminate

seam artifacts [25]. However, only straightforward methods like a Gaussian

weighting mask [4] or a linear weighting mask are computationally affordable for a

real-time approach on current mobile platforms.

4.5. Frame selection subsystem

To improve the quality of a panorama stitching system based on video sequences,

another subsystem can be incorporated to the solution. A frame selection subsystem

should be able to measure the quality of each individual frame, offering the

possibility of discarding the worst ones in the panorama construction process.

The frame selection process is a matter of weighting the importance of different

frame features. An important criterion is the presence of moving objects, since the

most severe problems are created in the blending process by objects that get clipped.

Another high importance factor is the amount of blur on a frame. The frame selection

subsystem also gives the chance of automatically discarding several frames that

might have a possible error in the image registration phase. Figure 17 shows an

example of the selection process.

Figure 17. Frame selection scheme.

4.5.1. Frame selection tasks

To be able to select the best frames from a video sequence, several sub-tasks have to

be performed. An image stitching algorithm needs to employ of motion detection to

avoid the distortion of moving objects [26]. A motion detection algorithm analyzes

the frames by looking for regions that contain moving objects and provide for motion

values that can be used as a measure for quality. Figure 18 shows the impact of

motion detection in panorama construction. Figure 18 a) is a part of a panorama

created by our algorithm with motion detection enabled. Figure 18 b) is also from our

algorithm, but motion detection is disabled. Figure 18 c) is created by Autostitch

[27].

(a) (b) (c)

Figure 18. Effect of motion detection in panorama construction.

A blur detection subsystem also needs to be developed to be able to judge the global

quality of a frame. Blurry frames, even if registered correctly, lead to the loss of

some sharpness in the construction of the final image and an overall worse

panorama. Blur has to be computed and each frame must score a value that

contributes to the final quality measurements.

Other factors such as errors in other stages or sudden luminance changes between

two consecutive frames may affect the evaluation of the quality of a frame. If a frame

with a moving object is not discarded, the region selection must avoid cropping the

object in the final panorama. Finally, the frame discarding stage must put together

each already evaluated factor that affects the quality of individual frames and

discards the ones that are not suitable to construct the final image. Figure 19 shows

the effects of blur detection. Figure 19 a) is a detail from a panorama constructed

with blur detection enabled. Figure 19 b) is constructed without blur detection.

(a) (b)

Figure 19. Effect of blur detection: a) with blur detection, b) without blur detection.

4.5.2. Approaches to frame selection

The frame selection concept applied to panorama stitching was first introduced by

Boutellier and Silvén [3] who explain how to select only the best group of frames in

a video sequence to capture a panorama. With this approach, the image quality

improves notably. Previous to that work, Hsu [28] has considered the criteria of a

suitable overlap to select the best subset of images for panorama construction. Li

[29] has also considered this selection based on the amount of distortion caused by

rotation and perspective.

Each sub-task of the frame selection subsystem has been widely analyzed across

the time. Motion blur can be detected in several ways. One simple way relies on

obtaining a value as a side-product [18] of Vandewalle’s method used in registration.

It requires the calculation of the amplitude spectrum for each image to be registered.

The spectrum is now also used for blur estimation by calculating the amount of high-

frequency components in it.

Vandewalle states that the frequencies above a certain limit ρmax need to be

discarded in the registration process, since those frequencies contain alias if the

image is blurred. The darkened area in Figure 2 depicts the frequency area from ρmax

to the maximum frequency, which we use to estimate the amount of blur. If the sum

of the frequencies in the area is small, it means that sharp image details are absent

due to blurring. A large sum tells us that the frame is free of blur. Figure 20

represents an amplitude spectrum of a fictional image. The darkened area represents

the high-frequency area that is used to estimate the image blur.

Figure 20. An amplitude spectrum of a fictional image.

This method produces one single number that expresses the amount of high-

frequency detail in the image. This value can then be used comparatively between

the overlapping parts of two consecutive frames. If an image Ia scores a higher result

than Ib, it usually means that Ia is sharper than image Ib, but in some occasions the

difference in the image content can distort the result.

Other methods used for blur detection have also been considered. The lack of

resources existing on the target device suggests the use of straightforward methods.

According to Liang [30], the amount of blur in a frame can be obtained by a summed

derivatives computation. This method consists of summing together the derivatives

of each row and each column to obtain a single number that expresses the amount of

blur contained in a picture.

To detect moving objects across the scene, a very simple procedure can be used in

order to make the process fast. In this method, the difference between the current

frame and the previous one is calculated. The result is a two-dimensional motion

image that covers the overlapping area of the two frames. After that, this motion

image is low-pass filtered to remove noise and thresholded against a fixed value to

produce a binary motion map.

If the binary image contains a sufficient amount of pixels that can be classified as

motion, the dimensions of the moving object are determined statistically. The centre

point of the object is approximated by computing the average coordinates of all

moving pixels, and the standard deviation of coordinates is used to approximate the

dimensions of the object. However, this method can produce a wrong result if there

are multiple moving objects in the same frame.

Motion detection can be performed using sophisticated methods that are

computationally costly. A good survey on this problem and its solutions is covered

by Radke [31]. In the method created by Zhu [32], the moving objects were extracted

by differencing three successive frames and defined further by calculating the active

contour of each object. Davis [26] solved the problem by drawing the seams around

the moving objects. However, the computational restrictions of the target platform

suggest that only straightforward methods are convenient for a real-time application.

4.6. Frame correction subsystem

A frame correction subsystem can also be added to the panorama construction

solution in order to improve the quality of the resulting images. A frame correction

subsystem should be able to transform the input frames of the panorama system

according to certain registration parameters and the problems shown on the frame

selection and quality assessment stage. Figure 21 shows an example of luminance,

warp and perspective correction.

Figure 21. Frame correction examples.

4.6.1. Frame correction tasks

To be able to correct the input frames on a video sequence, several sub-tasks have to

be performed. A luminance correction algorithm analyzes the illumination difference

between two matching regions and corrects one of them to match the other. Too dark

or too bright frames used as input, even if registered correctly lead to worse overall

resulting panoramas.

A warping and perspective correcting algorithm should be able to perform a full

affine transformation between two matching regions, including rotations and changes

of scale. The algorithm must interpolate the values of the corresponding pixels in the

final image to avoid the loss of any detail. The correction of the perspective on the

input images can be a crucial stage when the difference between subsequent frames

is substantial.

4.6.2. Approaches to frame correction

In the proposed solution several interpolation methods have been tested. Although

the nearest neighbour interpolation is computationally the most efficient method, it

has been discarded due to loss of image quality.

Among the most typical interpolation methods, bilinear interpolation is preferred

to bicubic interpolation due to its computational simplicity. For the luminance

correction, the method followed just adds the intensity average difference to the

frame that wants to be corrected.

5. PANORAMA BUILDER ALGORITHMS AND

IMPLEMENTATIONS

5.1. Overview of the algorithm

In Chapter 4, we defined the panorama construction process using a multistage

approach. A panorama construction implementation needs to compute several

operations in each one of these stages. A global overview of the panorama

construction algorithm is given in Algorithm 1.

(1) Transform the input ARGB frame into grayscale images.

(2) Multiply the current frame by a Tukey window to make it circularly symmetric.

(3) Compute the Fourier transform of the current frame.

(4) Compute the rotation estimation.

a) Compute the Polar coordinates of the current frame.

b) For every angle a, compute the average value of the Fourier coefficients.

c) Find the maximum of the correlation between the current frame and the

previous one between 15 and -15 dg. This is the estimated rotation angle.

(5) Compute the vertical and horizontal shift estimation.

a) Compute the phase difference between the current frame and the previous

one.

b) The planar shift corresponds to the maximum of the shift matrix.

(6) Error correction:

a) Check if the shift or the rotation estimation surpasses a threshold

comparing it with the previous one.

b) Use previous motion vector if an error is found.

(7) Detect moving object with simple difference.

a) Compute simple difference of pixels.

b) Mark regions with movement.

(8) Estimate the blur of an image by the computation of summed derivatives.

(9) Compute the change of luminance between matching regions calculating the

difference of intensity averages.

(10) Calculate quality score.

a) Use moving objects, luminance changes, blur, and rotation values to

compose the quality value.

(11) Select best frame on the stitching range based on quality score.

(12) Correct selected frame.

a) Correct luminance.

b) Rotate frame using bilinear interpolation.

c) Select blending region based on moving objects.

(13) Calculate image global coordinates.

(14) Stitch image using local blending techniques.

a) Calculate the seam location avoiding crossing moving regions.

b) Blend using a Gaussian or linear mask.

Algorithm 1. An overview of the panorama construction algorithm

5.2. Projection and camera motion model

Traditionally, the camera movement on a panorama capturing process is assumed to

be a pure rotation around the optical axis. However, for a handheld panorama, this

assumption is not accurate. The final version of the panorama application uses the

flexible camera motion model provided by the manifold projection [16]. In a

manifold projection, a thin strip is taken from the centre of each frame to be used in

the construction of the panorama image. The camera motion speed is considered to

be close to a linear one. Figure 22 shows an example of a typical camera motion

during the panorama capture and the manifold projection used by the application.

Figure 22. Example of the camera motion in a typical handheld panorama.

5.3. Image pre-processing

To conform to the manifold projection model and to be able to register each frame

against the previous one, several operations must be done. First, the frames should be

cropped to save just the central part of them, where the image quality is better. Figure

10 shows how the original frames are cropped to ensure a better quality in stitching

and to save memory use. Our algorithm crops frames immediately after capturing

them. There is a lot of data redundancy between consecutive frames and thus

cropping saves both memory space and computational resources in image

registration. The cropped frame has roughly one third of the original frame’s width

and 90% of its height.

Figure 23. Image cropping example.

Most of the existing registration algorithms use grayscale images to compute the

transformation parameters between two pictures. Therefore, the central part of the

source images must be converted into grayscale to be used in the actual registration

process. A downscale to the processing square resolution can be performed to

achieve a smaller size that results in fewer operations.

Several versions of the RGB to grayscale converting function have been

implemented. The size of the cropped and the coefficients for the grayscale

conversion can be arbitrarily chosen, as well as the desired downscaling factor. In the

more straightforward case, the luminance Y is defined as follows:

BGR

=, (1)

To be able to use a smaller mask without changing our source images size, a

downscaling function with a factor of N has been coded. The original frames are kept

the same size as the camera gives, but the grayscale conversion is done using the

average value of NxN pixels. This average filter acts as a low-pass filter and cuts off

parts of the high frequency values, decreasing the effects of possible aliasing. The

resulting mask that is going to be used for image registration is now NxN times

smaller. The number of operations needed to perform a registration with a smaller

mask is much less, resulting in a significant decrease of the processing time per

frame.

Window filtering

To provide for robustness during the registration process, a Tukey window function

has been implemented to filter the gray-scale images. A Tukey window is a tapered

cosine window defined as follows:

[ ]

12/1 ))2/1

cos(1(

2/12/ 1

2/0 ))2/1

cos(1(

)(

≤<−+−+

−<≤

<≤+−+

αα

(2)

where α denotes the ratio of tapered section to constant section with 0 ≤α≤1.

A windowing function applied to each frame guarantees its circular symmetry,

providing for a better performance of the phase correlation base registration method.

Figure 24 presents a Tukey window filtering scheme.

Figure 24. Tukey window filtering scheme.

The actual implementation of the Tukey window filter consists of a lookup-table

constructed with fixed value coefficients for the desired size. This table can be pre-

calculated on a desktop computer or just on the starting routine of the library,

whenever the mask size or resolution is changed. To filter each image, a table look-

up is performed and the value needed to multiply each pixel is selected.

5.4. Image registration

Vandewalle’s method [18] uses a frequency domain algorithm to estimate the motion

parameters between the reference image and each of the other images. In this

method, only planar motion parallel to the image plane is computed. The motion can

be described as a function of three parameters: horizontal and vertical shifts, ∆x1 and

∆x2, and a planar rotation angle

. A frequency domain approach allows us to

estimate the horizontal and vertical shift and the (planar) rotation separately. Assume

we have a reference signal f1(x) and its shifted and rotated version f2(x):











−













∆

=∆













∆

φφ

cossin

sincos

, ,

)),(()(

xxRfxf

(3)

This can be expressed in the Fourier domain as

∫∫

′

−−

−

′′

=Τ

xπu∆xπu

xπu

xd)ex(Rfe

dx∆x))e(R(xf

dxexfuF

ΤΤ

)()(

(4)

with F2(u) the Fourier transform of f2(x) and the coordinate transformation x’ = x +

∆x. After another transformation x’ = Rx’, the relation between the amplitudes of the

Fourier transforms can be computed as

)(

)(2

RuF

xd)ex(f

xd)ex(Rf

xd)ex(RfeuF

xRuπu

xRπu

xπu

xπu∆xπu

ΤΤ

′′′′

′′

∫∫

′

′′

−

′

′′

−

′

−

′

−−

(5)

where |F2(u)| is a rotated version of |F1(u)| over the same angle φ as the spatial

domain rotation, |F1(u)| and |F2(u)| do not depend on the shift values ∆x, because the

spatial domain shifts only affect the phase values of the Fourier transforms.

Therefore we can first estimate the rotation angle φ from the amplitudes of the

Fourier transforms |F1(u)| and |F2(u)|. After compensation for the rotation, the shift

∆x can be computed from the phase difference between F1(u) and F2(u).

Fast Fourier transform

An FFT computes the Discrete Fourier Transform and produces exactly the same

result as evaluating the DFT definition directly with the only difference rooting on

the faster processing time of an FFT. Let x0, ...., xN-1 be complex numbers.

Evaluating the DFT definition directly requires O(N2) operations: there are N

outputs Xk, and each output requires a sum of N terms. An FFT is any method to

compute the same results in O(N log N) operations.

To perform the image registration in fixed point, the general purpose FFT project

developed Mark Borgerding has been used [33]. The code consists on a Fast Fourier

Transform implementation based on the “Keep it simple” principle. The FFT project

is reasonably efficient mixed radix FFT library that can use either fixed or floating

point data types and that has a very small footprint.

Several modifications have been made to the routine in order to port it to efficient

Symbian C++ code. The final performance of this operation in fixed point can be up

to 25 times faster than the same algorithm in floating-point for a mobile phone

without a dedicated chip. Table 1 shows the computation times of the FFT function

with several profiles.

Table 1. FFT computation times on a Nokia N93 device

Size 64 bit floating-point 32 bit floating point 32 bit fixed point

64x 64 3200 ms

950 ms

20 ms

128x128 3800 ms

1200 ms

50 ms

256x256 13500 ms

4500 ms

200 ms

5.4.1. Shift estimation

The shift of the image parallel to the image plane can be expressed in Fourier domain

as a linear phase shift. The shift parameters ∆x can thus be computed as the slope of

the phase difference (F2(u)/ F1(u)).

)(

)()(

uFexd)ex(fe

dx∆x))e(R(xfdxexfuF

∆xπuj

xπuj∆xπuj

xπuj

xuj

ΤΤΤ

′′

+==

∫∫

∫∫∫∫

′

−

−− Τ

(6)

To estimate the shift between two frames with the Vandewalle’s method [18], the

results of the frequency components of the two frames are divided element by

element and this result is normalized, dividing again each element by its amplitude.

After applying the inverse FFT function, the amplitude of the result is computed to

obtain a registration matrix. The greatest element of that matrix will show the

number of pixels that should be taken as the shift.

If a downscale factor is applied during the construction of the processing square,

the resolution of the estimated shift will decrease on the same factor. To deal with

this subject, the single-pixel resolution can be interpolated linearly with the values of

the neighbours in the shift estimation matrix. The result of this interpolation is single

real number which has to be rounded up and represents the actual estimated shift.

5.4.2. Rotation estimation

The rotation angle between the spectrum of two images can be computed as the angle

for which the Fourier transform of each one is rotated. The computation of a

spectrum rotation for every evaluation of the correlation is computationally heavy.

When the images’ spectrum is expressed in polar coordinates, the rotation over the

angle is reduced to a circular shift over it.

The chosen reference algorithm uses a slightly different approach that is

computationally much more efficient. Once the 2-dimensional FFT of the grayscale

images is obtained and the frequency components are identified, a function that gets

the radial sum of them is performed. The radial sum function adds the frequency

values corresponding to a specific angle ranged with the desired resolution. An array

of summed values corresponding with the angles is obtained. The computation of the

frequency content h as a function of the angle can be expressed as follows:

.),()( 2/

2/ 0

θθα

αα

∫ ∫

∆+

∞

=drdrFh (7)

In order to get the value of the relative rotation with two frames, the two arrays

containing their radial sums are compared. To look for a correlation, a 1-dimensional

FFT is performed on each radial sum, and one of the results is multiplied element

wise with the conjugate of the other. After performing an inverse FFT of the

multiplied array, the maximum value is obtained, representing the rotation between

both frames. Figure 25 shows on the left the spectrum of an image and its rotated

version. The right side shows the correlation between the frequencies of the radial

sums.

Figure 25. Rotation estimation example.

In the final application, two different profiles are used and rotation estimation is

disabled by default. Although rotation estimation has been theoretically proven to

enhance the final results of panorama stitching, the actual implementation of this

presents some inconveniences. Estimating the rotation can be a quite costly task that

decreases the global speed of the system. This can result in fewer frames obtained

each second with less overlapping areas between frames and subsequently more

registration errors.

5.4.3. Image registration comparison

A feature-based registration algorithm can provide more flexibility and is able to

compute changes of scale or perspective in addition to shifts and rotation. However,

feature-based methods are less robust against blur and intensity changes. Hannuksela

et al. [19] implemented a gradient feature machine that is able to register frames in

real time and runs on mobile devices. The code provided by this implementation has

been adapted to the panorama building application. Table 2 shows the error rate for a

similar processing time of the phase correlation method and the feature machine

method. The error rate is defined by the number of registrations detected as a failure

case divided by the number of total registrations.

Table 2. Time and error rate comparison of registration methods.

Method (64 x 64 pixels) Time consumed Error rate

Phase Correlation

(FFT) Shift 100 ms.

0.2%

Shift + Rotation 150 ms.

0.5%

Feature Machine

(Gradient

Features)

Shift 100 ms.

0.3%

Shift + Rotation + Scale 120 ms.

0.6%

When only the shift and rotation parameters are needed, the phase correlation method

offers a lower error rate for a fixed computation time. However, if more parameters

are needed in the model, a feature machine still offers a low error rate with a good

speed performance.

5.5. Moving objects detection and blur estimation

Once the registration process is completed, the quality of each frame has to be

determined. To compute the motion detection, a very simple procedure can be used

in order to make the process fast. The difference between the current frame and the

previous one is calculated. The result is a two-dimensional matrix that covers the

overlapping area of the two frames. After that, this matrix is low-pass filtered to

remove noise and thresholded against a fixed value to produce a binary motion map.

The motion map is used as a quality measure for the frame selection stage and the

blending region determination.

The computation of the blur as a side product of the registration method has been

discussed in the previous chapter. However, the lack of resources existing on the

target device suggests using a less ambitious method that can be performed faster.

The final version of the application acquires the amount of blur in the frame by a

summed derivatives computation [30]. This method of blur calculation produces a

single number that expresses the amount of high-frequency detail in the image. When

taken from the same area of the image, this usually means that the frame with more

high-frequency detail is sharper than the others. In some occasions, the difference

between consecutive frames due to changes in the landscape, e.g., a moving object,

can distort the result.

The main contributing factor for the selection of the best frame in our final

implementation is the result of the blur detection algorithm, since the moving objects

estimation is only activated in some of the profiles.

5.6. Frame selection and evaluation

A video frame selection sub-system can increase the quality of a panorama stitcher

application. The frame selection process is a matter of weighting the importance of

different frame quality features. The quality values are computed for a certain scope

of frames at a time. The scope encompasses the frames that have a suitable

translation with respect of the previous selected frame, as depicted in Figure 26.

Figure 26. Frame selection scheme.

Each one of the quality evaluation tasks can be switched on and off depending on the

desired speed performance of the final application. In practice, each frame gets a

quality value that is calculated from the aforementioned factors and this can be

implemented successfully in many different ways. According to experiments, the

presence of moving objects is the most important criteria, since the most severe

artifacts are created in the stitching process by moving objects that get clipped.

However, the motion detection stage is not activated in every profile. The factor of

second highest importance was chosen to be the amount of blur. Finally, frames that

are highly rotated, that present high luminance differences with the previous one or

that are likely to have registration errors are less preferred.

Blending region selection

The seam location used on the blending stage is determined by the size of the

coincident regions of the two images that are to be blended. By default, the seam is

located exactly in the middle point of the coincident region. However, in the

presence of moving objects, the seam is drawn outside the boundaries of the objects.

If no suitable seam can be drawn on the frame, it is discarded.

5.7. Image blending

The previously published image blending methods keep all the frames of a sequence

on the memory and then proceed to stitch them. The memory constrictions of a

mobile phone suggest that the best way of facing this task consists of blending each

frame before capturing the next one. This way, only the resulting image is kept in

memory and some post-capturing processing time can be saved. Figure 27 depicts a

scheme of the blending algorithm data flow. The first row shows how only the

selected regions of the best quality frames are blended. As new frames are selected,

they are merged into the final image.

Figure 27. Sequence scheme of the blending algorithm after registration.

5.7.1. Global Luminance correction

Global luminance and exposure correction is needed because most mobile phones’

video devices adjust automatically the camera’s aperture and exposure time to

prevent over- or underexposure of the frames. Although automatic adjustment makes

it pleasant to watch a video on playback, it makes the creation of a panorama much

more complicated.

The best results are obtained when the camera API allows keeping the exposure

fixed. In this case, only local correction that works solely on individual pixels is

needed. This correction is usually referred to pixel luminance correction and it is

described in the next subsection. Figure 28 shows an example of a panorama

construction after luminance correction.

Figure 28. Luminance correction example.

Symbian S60 camera’s API does not allow keeping an exposure value fixed. The

frame selection and the low frame rate make the shifts between stitched frames large

enough to consider that the exposure has changed. Global correction of luminance

must be implemented. To perform this correction, the average luminance value is

computed over the pixels belonging to the overlapping area. Every pixel of the

current frame is corrected according to the value of the average on the previous one,

adding or subtracting this difference to green, blue and red values of the frame to

stitch. Figure 29 shows how the global luminance correction routine only uses the

overlapping area.

Figure 29. Global luminance correction.

5.7.2. Pixel Luminance correction

Instead of using the Gaussian weighting mask present on the reference algorithm, a

less computationally costly method has been selected. A linear function that

gradually merges one frame to the next, changing the weight assigned to each one, is

used. This method prevents some vertical stripes from being visible between the

individual frames that compose the panorama. A seamless image with smooth

transitions between frames is then created.

The RGB image pixels of a N93 Symbian bitmap can be interpreted as 32 bit

unsigned integers. The weighting function is implemented as a floating-point

coefficient that multiplies each channel on each pixel, and computes the contribution

of each one. Figure 30 depicts the linear contribution of pixels during the blending

process. The weight of each pixel depends only on the distance to the seam point.

Figure 30. Linear contribution of pixels.

6. USER INTERFACE AND EXPERIMENTS

The panorama building solution analyses the frames in the video for motion and

moving objects, quantifies the quality of each frame, and stitches up to 360 degree

panoramas from the best available images [34, 35]. The high interactivity of video

sequences and mobile devices makes the design of a user interface particularly

challenging.

The proposed solution follows an intuitive process. In order to get a final

panorama image, the user focuses the camera to the desired starting point of the

mosaic. The camera starts turning around up to 360 degrees and a sequence of

images starts to be captured. Each image is then individually processed to estimate

the shift and rotation. The blurriness of each picture is measured and moving objects

are identified. Based on the quality of each individual frame, a selection process

takes place. Each frame is either accepted or discarded. For every selected frame, if a

moving object is present and it fits the image, the image is blended drawing a seam

that is outside the boundaries of the object. If only a partial object is present, the part

of the frame without the object is the one blended. Figure 31 represents the image

capturing process of the panorama application.

Figure 31 Panorama capturing process.

6.1. PanoramaX Application

The basis for the mobile face detection implementations created for this work is the

PanoramaX application. The application was made for using the actual panorama

stitching modules, which were implemented as Symbian dynamic link libraries

(DLLs). The application also provides a user interface that is capable of saving the

images, starting and stopping the capturing or saving the resulting images.

PanoramaX calls the stitching algorithms always through a fixed interface created

for this purpose. The same application can be utilized in running all the algorithms

that were created in this project, as well as any other application that processes video

frames from the camera. The application works by receiving camera frames captured

by the phone, and before drawing them on the display, the shift against the previous

frame is computed. Figure 32 shows the application flow diagram of the panorama

builder.

Figure 32. Application flow.

6.1.1. User interface

The user interface has been kept as simple as possible and modified based on user

feedback. When the application is started, the screen shows a viewfinder with the

images straight forwarded from the camera. Figure 33 shows the user interface

during the capture. In order to take a picture, the user should follow the next

sequence:

• Once the desired starting point for the panorama image has been chosen and

can be seen on the screen, the user has to press the capture button.

• The device has to be slowly and softly moved from left to right in order to

capture and process the frames

• When the desired ending point for the panorama has been reached, the user

has to press the capture button again.

• The whole image and a detail of it are shown on the screen. Browsing

through the image is possible using the device’s joystick.

• If the capture button is pressed again, the image is saved on the desired

format, with a sequenced name. The cancel button will discard the picture

and exit the application.

• An information note is displayed once the image is saved. The viewfinder

starts again, and another capture can be made.

Figure 33. User interface during image capture.

Some additional functions can be accessed with key presses or by menus, such as

selecting the image saving format (.PNG or .JPG), displaying the version

information, or changing the size of the viewfinder. Figure 31 shows the user

interface on the image capturing phase.

Figure 34. Using interface during image saving phase.

6.1.2. Frame capturing interface

The construction of panoramic images requires a set of single frames that have to be

captured. To reduce memory needs and improve the user experience, most of the

computations can be done in the time between the capture of the frames.

To construct an application that can work in real time, the frame capturing

interface has to offer frames at a sufficient frame rate, providing also the possibility

of processing each one of them before the capture of the next. To achieve this, a

camera application with viewfinder has been developed. The interface sends a call-

back once the camera is ready to capture a new video frame. After the capture and

when the main thread is idle, this frame is sent to the processing routines. Once the

process is done, the handle is given back to the frame capturing interface that

captures and sends another frame. Figure 35 represents the processing diagram and

the application data flow.

Figure 35. Processing diagram and data flow.

This capturing interface is a good solution for a real-time application because it is

able to offer frames asynchronously, depending on the needed processing time. The

interface can adapt to a variable frame rate without any difference among frames and

without any intervention from the user. Several different applications and processing

algorithms based on video analysis can be developed, using exactly the same

interface such as a Point & Find application [36] or a handheld document scanner

[37].

6.2. Stitching DLL

The stitching algorithm module has been implemented as a Symbian static Dynamic

Linked Library. The implementation of a module as a DLL is usually the best

solution especially when several applications can make use of it or when several

versions of one algorithm are going to be developed.

Only one copy of a static DLL is loaded into memory and this same copy is shared

between every application using it. However, it also addresses one restriction in the

version of Symbian OS: the static DLLs should not have any writable static data.

This includes both the static variables inside functions and the global variables that

are defined outside any class, structure or function. This constraint is an obvious

consequence of the fact that in the case of static DLLs the same binary code of the

modules is shared between multiple users. Thus, all the binary code of Symbian

static DLLs is read only.

The EKA2 kernel version of Symbian OS allows writable static data on a DLL,

although in most situations, the use of this kind of data in Symbian C++ code is not

justified. This is because as with old kernels, the DLLs might be used by several

users. Each user of the DLL would need their own memory chunk for their global or

static writable data. The use of static writable data is not recommended even when

developing software for new Symbian OS versions. In the latest Symbian OS

versions, this kind of data is supported mainly in order to facilitate software porting.

For instance, in ANSI C code it is very typical to have global non-const variables as

the “glue” between C function calls. Therefore, software originally developed for

other environments may contain large amounts of static data, which can make

porting this software for Symbian OS a challenging process.

6.3. Memory management

Symbian S60 applications offer by default a small chunk of memory heap available

for applications. A minor tweak of the compiling and linking options has to be done

in order to be able to store all the intermediate information needed for real-time

performance. Even so, the reduced RAM capacity on mobile phones makes on-the-

fly stitching procedures more complicated and calls for a carefully implemented

memory management scheme. All the image processing should be done following a

cascade structure with the help of some buffers to save previous data.

In the final version of the program, the first frame is captured and cropped and its

frequency components and blur are calculated and stored to the memory. The next

frames are acquired asynchronously. Just after each frame is acquired, the image blur

is computed, frame registration is performed and a decision about the quality of the

frame is then made. If the quality is better than that of the previous frame, based on

blur detection, exposure correction and blending functions are called.

The result image is growing frame by frame each time the blending function is

called. There is no need to store several frames in memory at all; only the last

computed one and its features: its frequency components, its shift estimation and the

total amount of shift in this moment, as well as the evolving final image.

When the user stops capturing the frame sequence that will become the mosaic,

current resulting image is stored on a mass storage device, memory objects are

deleted and application is reset. The application can capture several images in a short

period of time with no need of a restart.

6.4. Error correction

Repeating textures, images excessively blurry or bad illumination can lead to

catastrophic registration errors, as a single small error on a single frame can

accumulate through the panorama stitching. To cope with these errors, a simple

error-correcting method has been implemented: each time a frame is registered, its

planar motion vector is compared to the planar motion vector of the previous frame.

If the difference between the adjacent motion vectors surpasses a certain threshold,

the motion vector is replaced by the previous motion vector. This works very well in

panorama imaging from a video, where the usual motion trajectory of the camera is

close to a linear one.

6.5. Adjusting parameters

When developing an application like a panorama stitcher under limited resources,

several constraints have to be taken into account. After coding every algorithm, a

fine tuning of the parameters, based on the experimentation and the observation of

the specific characteristics of the device, is needed.

Description of parameters

The most important parameters that can be changed on the registration and

evaluation of the frames are:

• Video frame size – This is the most important characteristic in the

application. It affects the final resolution and size of the resulting images. It is

strongly limited by the interface that the mobile phone provides. A frame size

of 320x240 pixels (QVGA) was selected as the default configuration, but

some other sizes can be selected as well.

• Coefficients for grayscale conversion – All the image processing is done

with grayscale images. To perform an RGB to Y conversion, several weights

can be assigned of each one of the RGB channels. Several options have been

tested, deciding finally to use a uniform approach with equal contributions. If

the specifics of the camera sensor are not known, this option is the safest one.

• Mask size for the processing square – This is the parameter that has the

most important effect on the global performance. It affects the number of

mathematical operations that we should perform on each frame to obtain a

registration. It has been experimentally found that with the quality of the raw

frames obtained by the camera. One sixth of the original frame size is a lower

bound for a correct registration process. With a fixed frame size of 320x240

frames, a square mask of 128x128 pixels is the default choice. However, it is

possible to lower it by downscaling and interpolating the result.

• Downscaling and interpolation factor – If we downscale and filter the

processing square, we can reduce significantly the size of it. An interpolation

of the registration results will give back the lost precision. A factor of two on

each dimension has been found to offer excellent result. Other values tried

were no downscaling and a factor of three downscaling where the

interpolation of results did not offer exact values.

• Maximum number of frames – Although there is not any theoretical limit

for the size of the images, the RAM memory of a handheld device is a scarce

commodity: with every tested device, 300 frames offer the possibility for a

720 degrees panorama.

• Frame selection buffer length – When selecting frames, we should keep in

the memory a set of them to be able to discard the worst ones. Several sizes

have been tried, but due to computational reasons, a small buffer of only 2

frames is used in the final version of the panorama stitcher.

• Out of reach indicator threshold – While selecting frames, it is necessary to

establish a maximum shift in pixels to search for the best one. The size is

related to the processing square size. The experiments showed that a good

indicator threshold is half of the size of the processing square.

• Error correction threshold – This threshold is used to detect when a motion

vector is unusual when compared to the previous one, and should be

corrected. Its value depends on the frame rate. A distance of 30 pixels on the

motion vector has been chosen to score as an error.

• Coefficients for frame quality score – To measure the quality of a frame,

several criteria can be used. As no motion estimation is included on the

Symbian implementation, blur detection has become the most important one,

along with the distance to the previous stitched frames.

After the registration and evaluation of the frames, some other parameters can be

taken into account in order to create the best possible blending algorithm depending

on the specific characteristics of the images that are obtained from the previous

phase. Some of these parameters are:

• Blending seam size – This parameter references the number of columns used

in the blending operation. The best performance depends strongly on the

sharpness of the raw images, the size of the overlapping portion of

consecutive frames and the error rate. When global luminance correction is

performed, the error rate is low and the source images are sharp not too many

levels for local pixel blending have to be chosen. Shorter values of this

parameter offer sharper images while high values help correcting errors and

imprecision at other stages.

• Coefficients for pixel weight – In the final implementation, a linear

weighting mask for blending has been selected. The slope of the linear

function is another parameter that has to be taken into account. Steep slopes

give sharper results while mild slopes offer a softer transition between

frames, ensuring that no vertical stripes are caused by differences in

luminance or registration errors.

• Luminance correction thresholds – When global luminance correction is

performed on each frame, soft transitions between frames give better result

than an exact match of luminance values. A threshold for the maximum value

of the luminance correction value and a threshold to define the tolerance

between frames are needed. It has been experimentally found that tolerances

of about five luminance steps and a maximum change of ten levels offer soft

transitions without degrading too much each frame due to the loss of dynamic

range.

6.6. Quantitative performance on a N93 phone

The panorama builder can construct high quality images in real time under various

different conditions with a good quality performance. Figure 36 shows a typical case

for panorama capture. A 180 degrees image of a uniformly illuminated landscape is

constructed seamlessly. Figure 37 depicts a tough case. A 180 degrees image is

composed from source frames with different illuminations and moving objects across

the scene.

Figure 36. A typical case for a 180 degree panorama.

Figure 37. A tough case for a 180 degree panorama.

The memory requirements of the application are related to the video frame size and

to the final image desired. Table 3 shows the free RAM memory needed by the

application when capturing a 360 degrees panorama for different video frame sizes.

Table 3. Application memory usage.

Image Size Memory use (kB)

(M) Final image 360 dg (kB)

(F) Memory needed (kB)

(1,125 * M + 2 * F)

160x120 77 kB

1 536 kB

3 093 kB

320x240 307 kB

6 144 kB

12 374 kB

480x360 691 kB

6 912 kB

27 842 kB

640x480 1 229 kB

12 288 kB

49 498 kB

The speed of the panorama builder depends on the video frame size and the result

image to be stitched. The final image size also determines the time needed to save it

on the memory card. Table 4 shows the computation times of the application for a

single frame when the capture ends. The overall speed decreases proportionally to

the final size due to the time consumption of the blending stage. The video capturing

interface has a frame rate limit of 15 frames per second on the N93 phone.

Table 4. Application speed performance.

Image Size Time per frame (ms) Frames per second Saving time (360 dg)

(ms)

160x120 30 - 62 ms

15.0 fps

125 ms

320x240 62 - 125 ms

8.0 - 15.0 fps

500 ms

480x360 140 - 250 ms

4.0 - 7.5 fps

1000 ms

640x480 300 - 500 ms

2.0 - 3.5 fps

2000 ms

The frame processing time consumed by the panorama builder can be decomposed

on the speeds by each one of the stages of the algorithm. Table 5 shows the

approximated time consumptions of the key functions in the panorama construction

process with different image sizes. The registration time for a pair of frames is

constant during the whole process while the times for the rest of the stages depend on

the number of frames already captured and the overall quality and size of the source

images.

Table 5. Time consumption of panorama stages.

Image Size Registration Correction Selection Blending

160x120 15 ms

0- 3 ms

2-4 ms

13-45 ms

320x240 40 ms

2-10 ms

5-10 ms

15-65 ms

480x360 75 ms

5-20 ms

12-25 ms

48-130 ms

640x480 140 ms

10-50 ms

25-50 ms

125-260 ms

The time of the image registration stage is defined by the registration method, the

size of the desired computation region and the downscaling factor used. The presence

of errors in the registration process depends also on the selected registration

parameters. Table 6 depicts the time differences and the error rates for several

parameters on a QVGA image registration process. For the final application,

compromise between the error rate and speed was chosen.

Table 6. Registration speed performance.

Region size Interpolation factor

and region size Frames per second Average error rate

70% QVGA No 256x256

4 fps

0,1%

2 128x128

9 fps

0,2%

35% QVGA No 128x128

8 fps

0,5%

2 64x64

20 fps

0,7%

20% QVGA No 64x64

22 fps

2,5%

2 32x32

40 fps

>5,0%

6.7. Other Symbian phones

Although the main target of the developed application has been the N93 phone,

several other phones such as the older N90 or the more modern Nokia N95 and

Nokia 6120 have also been extensively tested with excellent results. The application

has experimentally shown to be portable.

Nokia N95 phone offers a new platform based on Symbian 9.1 operating system as

well as some other improved capabilities compared with N93 phone. The main

camera of N95 can capture still frames of up to 5 megapixel, and raw video frames

interface is speeded up to 30 frames per second. The processing capabilities of the

phone are quite similar to N93’s although the larger amount of available RAM

allows more space for image manipulations.

Nokia 6120 is a mid-budget phone that offers a fast general purpose processor.

The included ARM processor has a clock frequency of 369 MHz. Most of the extra

chipsets present in more expensive phones have not been included on the platform.

The imaging capabilities of the phone consist of a two megapixel camera that allow

the capture of still frames through the imaging API and a video interface that can

obtain VGA frames 15 frames per second rates. The application shows excellent

performance on this device due to the very little use of floating-point operations.

6.8. Further directions

The model presented in this thesis is a first approach to the mobile panorama

stitching problem. However, in order to get best results, the image registration

method has to be able to detect not only global motion and rotation on two axes, but

any free movement that can result on single frame warping. To achieve such an

image registration system, some other methods different than phase correlation have

to be explored. A carefully coded feature machine registration method could lead to

the estimation of any shift, rotation and tilt between frames even in real time with a

good computational performance.

Blending, warping, rotating and changing the scale of a frame are quite costly

operations if only a general purpose processor is used. To increase the performance

of these operations, the general purpose graphics chips included in newer phones can

be used. Taking advantage of a graphic processing unit, if it is present, will save

processor time that can be used to improve the quality of the computations.

Interpolations and transformations to the frames can be done at high speed with GPU

[38]. Figure 38 shows an example of a warping correction procedure using OpenGL

ES 1.1.

Figure 38. Image warping with bilinear interpolation on OpenGL ES 1.1.

If the graphics processing unit includes a programmable pipeline, it can also be used

for general purpose calculations in the registration phase such as feature extraction or

robust estimation. Figure 39 shows the differences between a fixed and a

programmable graphics pipeline.

Figure 39. Fixed graphics pipeline vs. programmable graphics pipeline.

7. SUMMARY

Cameras were primarily intended for capturing still and video frames, while the

computing and display resources have been optimized for video playback. As a

result, the efficiency of other uses is inferior. However, mobile devices’ cameras

create the need of new applications, capable of using them in different ways other

than the usual ones. They usually lack good optics, but the advanced processing

capabilities of a mobile phone compared to a normal camera can make the image

processing a good solution for solving this issue.

The goal of this thesis was to investigate how a real-time image mosaicking and

panorama imaging application could be included in a mobile platform. The

implementations were mainly coded for a Nokia N93 although they have been

proved to work on several other phone models. A floating-point implementation

offered a first approach to the topic, giving the information needed to identify the

most challenging parts to improve and offering a first profile of the final application.

A fixed-point implementation reached the desired performance in real time with the

use of a carefully tailored algorithm and memory management. A library that can

take advantage of this algorithm has also been implemented for Symbian OS during

this project.

As a result we have a standalone application that can run in several mid-budget

phones and that is able to stitch frames at a speed near 15 frames per second,

resulting in more than 360 degree panoramas. The algorithms were tested using

several sets of parameters. Depending on them, the size and quality of the final

image might vary as well as the frame rate that can be high enough to lead the user to

a better user experience. Although in this kind of application the user is always

collaborative, minor camera shaking due to the handheld capturing process can be

compensated.

The selection of the proper algorithms turned out to be a key to achieve real-time

performance and good quality images. The results of the development were very

promising and panorama stitching will probably become a common feature in mobile

phones in the near future.

8. REFERENCES

[1] McBride B. (accesed 30 January 2010) Panoramic Cameras Timeline: 1843-

1994. International Association of Panoramic Photographers URL:

http://www.panphoto.com/ttime1.html

[2] Polden B. (accessed 30 January 2010) Photography at Length, part 1.

International Association of Panoramic Photographers URL:

http://www.panphoto.com/documentlibrary/pano-at-length.html

[3] Boutellier J. & Silvén O. (2006) Panoramas from partially blurred video. The

International Workshop on Intelligent Computing in Pattern

Analysis/Synthesis, pp. 300-307.

[4] Heikkilä M. & Pietikäinen M. (2005) M. An image mosaicing module for

wide-area surveillance. VSSN ’05: Proceedings of the third ACM international.

[5] Canon Photostitch (accessed on December 2009) URL:

http://www.canon.com/camera-museum/tech/report/200206/report.html

[6] Image Composite Editor web page (accessed on 20 January 2010) URL:

http://research.microsoft.com/groups/ivm/ICE/

[7] Scalado autorama. (accessed on November 2009) Press Release. URL:

http://www.scalado.com

[8] Morpho QuickPanorama web page (accessed on 20 January 2010) URL:

http://www.morphoinc.com/QuickPanorama.htm

[9] SonyEricsson (2004) K700i user manual. PDF file.

[10] Bitside Panoman home page (accessed on 15 August 2009) URL:

http://panoman.net

[11] SymbianOS (accessed on 20 January 2010) URL:

http://developer.symbian.org/main/documentation/index.php

[12] S60 platform web page (accessed on 20 January 2010) URL:

http://www.forum.nokia.com/main/platforms/s60/index.html

[13] Hiipakka J. Symbian OS 7.0S (2004), Multimedia Framework. Multimedia

Seminar - Mobile Multimedia Application Platforms.

[14] Symbian MMF web page (accessed on 20 January 2010) URL:

http://forum.nokia.com/index.php/MMF

[15] Neuvo Y. (2007) DSP, from filter design to system design, past and future.

Opening lecture, Finnish Signal Processing Symposium.

[16] Peleg S. & Herman J. (1997) Panoramic mosaics by manifold projection

Computer Vision and Pattern Recognition, pp. 338-343, IEEE Computer

Society, Washington, DC, USA.

[17] Zitová B. & Flusser J. (2003) J. Image registration methods: a survey. Image

and Vision Computing 21(11), pp. 977–1000.

[18] VandeWalle P., Süsstrunk S. & Vetterli M. (2006) A Frequency Domain

Approach to Registration of Aliased Images with Application to Super-

Resolution. EURASIP Journal on Applied Signal Processing, special issue on

Super-resolution. pp. 1-14.

[19] Hannuksela J., Sangi P & Heikkilä J. (2007) Vision-based motion estimation

for interaction with mobile devices. Computer Vision and Image

Understanding: Special Issue on Vision for Human-Computer Interaction

108(1–2), pp. 188–195.

[20] Lucchese L., Doretto G. & Cortelazzo G.M. (2002) A frequency domain

technique for range data registration. IEEE Transactions on Pattern Analysis

and Machine Intelligence 24, pp. 1468–1484.

[21] Lowe D. (2004) Distinctive image features from scale-invariant keypoints. Int.

Journal of Computer Vision, 60(2):91–110.

[22] Bay H., Ess A., Tuytelaars T. & Van Gool L. (2008) SURF: Speeded Up

Robust Features. Computer Vision and Image Understanding. Computer Vision

and Image Understanding, Vol. 110, No. 3, pp. 346—359.

[23] Xiao F., Wu H.Z., Xiao L., Tang Y. & Ma W.J. (2004) Auto method for

ambient light independent panorama mosaics. Proceedings of International

Conference on Machine Learning and Cybernetics, 6, pp. 3851–3854, IEEE,

Shanghai, China.

[24] Zomet A., Levin A., Peleg S. & Weiss Y. (2006) Seamless image stitching by

minimizing false edges. IEEE Transactions on Image Processing 15(4), pp.

969–977.

[25] Szeliski R. (1996) Video mosaics for virtual environments. IEEE Computer

Graphics & Applications 16, pp. 22-30.

[26] Davis J. (1998) Mosaics of scenes with moving objects. Computer Vision and

Pattern Recognition: Proceedings of the IEEE Computer Society Conference

on Computer Vision and Pattern Recognition, pp. 354–360, IEEE Computer

Society, Washington, DC, USA.

[27] Brown M. & Lowe D.G. (2003) Recognising panoramas. International

Conference of Computer Vision: Proceedings of the Ninth IEEE International

Conference on Computer Vision, p. 1218, IEEE Computer Society,

Washington, DC, USA.

[28] Hsu C.T., Cheng T.H., Beuker R.A. & Horng J.K. (2000) Feature-based video

mosaic. Proceedings of the International Conference on Image Processing, 2,

pp. 887–890, Vancouver, Canada.

[29] Li J.S. & Randhawa S. (2004) Improved video mosaic construction by

selecting a suitable subset of video images. Conferences in Research and

Practice in Information Technology: Proceedings of the 27th conference on

Australasian computer science, pp. 143–149, Australian Computer Society,

Inc., Darlinghurst, Australia.

[30] Liang J., DeMenthon D. & Doermann D. (2006) Camera-based document

image mosaicking. International Conference in Pattern Recognition,

Proceedings of the 18th International Conference on Pattern Recognition , pp.

476–479, IEEE Computer Society, Washington, DC, USA.

[31] Radke R.J., Andra S., Al-Kofahi O. & Roysam B. (2005) Image change

detection algorithms: a systematic survey. IEEE Transactions on Image

Processing 14(3), pp. 294–307.

[32] Zhu Z., Xu G., Riseman E. & Hanson A. (1999) Fast generation of dynamic

and multi-resolution 360 degrees panorama from video sequences. Proceedings

of the IEEE International Conference on Multimedia Computing and Systems,

1, pp. 400–406, Florence, Italy.

[33] KISS FFT sourceforge project. (accesed on January 2010) URL:

http://sourceforge.net/projects/kissfft/

[34] Boutellier J., Bordallo López M., Silvén O., Tico M. & Vehviläinen M. (2007)

Creating Panoramas in mobile phones Proceeding of SPIE Electronic Image,

vol. 6498, pp. 649807.1-649807.10.

[35] Bordallo López M., Boutellier J. & Silvén O. (2007) Implementing Image

Stitching in Mobile Phones. FINSIG proceedings.

[36] Silvén O., Hannuksela J., Bordallo López M., Turtinen M, Niskanen M,

Boutellier J., Vehviläinen M. & Tico M. (2008) New video applications on

mobile communication devices. Multimedia on Mobile Devices. Proceeding of

SPIE Electronic Image., 6821, pp. 682108.1-682108.14 .

[37] Takacs G, Chandrasekhar V, Gelfand N, Xiong Y, Chen W-C, Bismpigiannis

T, Grzeszczuk R., Pulli K. & Girod B. (2008) Outdoors augmented reality on

mobile phone using loxel-based visual feature organization. IEEE Transactions

on pattern analysis and machine intelligence. 3D Object retrieval, pp 427-434.

[38] Bordallo López M., Hannuksela J. & Silven O., Vehviläinen M. (2009)

Graphics hardware accelerated panorama builder for mobile phones.

Proceeding of SPIE Electronic Image. Volume 7256, pp. 72560D-72560D-9.

ResearchGate has not been able to resolve any citations for this publication.

Creating panoramas on mobile phones

Conference Paper

Full-text available

Feb 2007
Proceedings of SPIE

Image stitching is used to combine several images into one wide-angled mosaic image. Traditionally mosaic images have been constructed from a few separate photographs, but nowadays that video recording has become commonplace even on mobile phones, it is possible to consider also video sequences as a source for mosaic images. However, most stitching methods require vast amounts of computational resources that make them unusable on mobile devices. We present a novel panorama stitching method that is designed to create high-quality image mosaics from both video clips and separate images even on low-resource devices. The software is able to create both 360 degree panoramas and perspective-corrected mosaics. Features of the software include among others: detection of moving objects, inter-frame color balancing and rotation correction. The application selects only the frames of highest quality for the final mosaic image. Low-quality frames are dropped on the fly while recording the frames for the mosaic. The complete software is implemented on Matlab, but also a mobile phone version exists. We present a complete solution from frame acquisition to panorama output with different resource profiles that suit various platforms.

IMPLEMENTING MOSAIC STITCHING ON MOBILE PHONES

Conference Paper

Full-text available

Jun 2007

Stitching frames from a video sequence has become a good alternative to the traditional method with only a few photographs. The big overlapping between consecutive video-frames makes it possible to construct high-quality panoramas. Most stitching methods require a vast amount of computational resources that make them not suitable for mobile devices. We present a panorama stitching method that is able to create panorama images from a video sequence. The application is able to stitch frames on-the-fly, with low memory consumption. The resulting images can cover a field of view in excess of 360 degree images. Lack of a floating-point-unit in most mobile phones has made necessary the use of fixed-point mathematics to achieve better performance and speed. The complete application has been implemented for several Symbian mobile phones. Features of the software include among others: different resolution choices, on-the-fly stitching, best frames selection and inter-frame luminance balancing.

Graphics hardware accelerated panorama builder for mobile phones

Article

Full-text available

Feb 2009
Proceedings of SPIE

Modern mobile communication devices frequently contain built-in cameras allowing users to capture high-resolution still images, but at the same time the imaging applications are facing both usability and throughput bottlenecks. The difficulties in taking ad hoc pictures of printed paper documents with multi-megapixel cel-lular phone cameras on a common business use case, illustrate these problems for anyone. The result can be examined only after several seconds, and is often blurry, so a new picture is needed, although the view-finder image had looked good. The process can be a frustrating one with waits and the user not being able to predict the quality beforehand. The problems can be traced to the processor speed and camera resolution mismatch, and application interactivity demands. In this context we analyze building mosaic images of printed documents from frames selected from VGA resolution (640x480 pixel) video. High interactivity is achieved by providing real-time feedback on the quality, while simultaneously guiding the user actions. The graphics processing unit of the mobile device can be used to speed up the reconstruction computations. To demonstrate the viability of the concept, we present an interactive document scanning application implemented on a Nokia N95 mobile phone.

New video applications on mobile communication devices

Article

Full-text available

Mar 2008
Proceedings of SPIE

The video applications on mobile communication devices have usually been designed for content creation, access, and playback. For instance, many recent mobile devices replicate the functionalities of portable video cameras and video recorders, and digital TV receivers. These are all demanding uses, but nothing new from the consumer point of view. However, many of the current devices have two cameras built in, one for capturing high resolution images, and the other for lower, typically VGA (640x480 pixels) resolution video telephony. We employ video to enable new applications and describe four actual solutions implemented on mobile communication devices. The first one is a real-time motion based user interface that can be used for browsing large images or documents such as maps on small screens. The motion information is extracted from the image sequence captured by the camera. The second solution is a real-time panorama builder, while the third one assembles document panoramas, both from individual video frames. The fourth solution is a real-time face and eye detector. It provides another type of foundation for motion based user interfaces as knowledge of presence and motion of a human faces in the view of the camera can be a powerful application enabler.

SURF: Speeded up robust features

Conference Paper

Full-text available

Jul 2006

In this paper, we present a novel scale- and rotation-invariant interest point detector and descriptor, coined SURF (Speeded Up Robust Features). It approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster. This is achieved by relying on integral images for image convolutions; by building on the strengths of the leading existing detectors and descriptors (in casu, using a Hessian matrix-based measure for the detector, and a distribution-based descriptor); and by simplifying these methods to the essential. This leads to a combination of novel detection, description, and matching steps. The paper presents experimental results on a standard evaluation set, as well as on imagery obtained in the context of a real-life object recognition application. Both show SURF’s strong performance.

Image Registration Methods: A Survey

Article

Full-text available

Oct 2003
IMAGE VISION COMPUT

This paper aims to present a review of recent as well as classic image registration methods. Image registration is the process of overlaying images (two or more) of the same scene taken at different times, from different viewpoints, and/or by different sensors. The registration geometrically align two images (the reference and sensed images). The reviewed approaches are classified according to their nature (area-based and feature-based) and according to four basic steps of image registration procedure: feature detection, feature matching, mapping function design, and image transformation and resampling. Main contributions, advantages, and drawbacks of the methods are mentioned in the paper. Problematic issues of image registration and outlook for the future research are discussed too. The major goal of the paper is to provide a comprehensive reference source for the researchers involved in image registration, regardless of particular application areas.

Improved Video Mosaic Construction by Selecting a Suitable Subset of Video Images.

Conference Paper

Full-text available

Jan 2004

By stitching together adjacent images from a video sequence surveying a scene, a video mosaic of the entire panorama can be formed. Since a video survey consists of a sequence of images having small relative displacements with respect to each other, there is redundant overlapping information in consecutive images so that not all consecutive video images are required to create a mosaic, and only a subset of suitable images needs to be chosen. Images that are misaligned due to subpixel translation, rotation or shear are difficult to perfectly re-align and stitching of such images can result in a mosaic where discontinuities are noticeable. We propose a new technique for the construction of a seamless mosaic to minimise discontinuities. Our technique partitions an image into four quadrants to register with those of successive images. These registration values are used to form a misalignment index for selecting the best images for stitching.

Distinctive image features from scale-invariant key points

Article

Jan 2003
INT J COMPUT VISION

D. Lowe

SURF: Speeded up robust features

Article

Jan 2008
COMPUT VIS IMAGE UND

Feature-based video mosaic

Conference Paper

Oct 2000
Image Process

A complete method to create a panoramic video mosaic is proposed in this paper. This method includes three principal processes: real-time frame selection, feature based estimation of an eight-parameter projective coordinate transformation, and image mosaic composing. This proposed method works well on many video sequences captured from a PC camera to generate both 360 degree and planar panoramas. Experimental results demonstrate the validity and superior quality of this proposed method.

PANORAMA IMAGING FOR MOBILE PHONES

Abstract and Figures

Recommended publications

OPEN SOURCE PLATFORM FOR REMOTE ENCODING AND DISTRIBUTION OF MULTIMEDIA CONTENTS

Efficient relative camera orientation detection for mobile applications

Interactive Collaborative Scene Assembly Using AR on Mobile Phones

Augmented assembly using a mobile phone