A Lightweight Intelligent Virtual Cinematography System for Machinima Production.

Article

Feb 2015

We describe an optimization-based approach for automatically creating well-edited movies from a 3D animation. While previous work has mostly focused on the problem of placing cameras to produce nice-looking views of the action, the problem of cutting and pasting shots from all available cameras has never been addressed extensively. In this paper, we review the main causes of editing errors in literature and propose an editing model relying on a minimization of such errors. We make a plausible semi-Markov assumption, resulting in a dynamic programming solution which is computationally efficient. We also show that our method can generate movies with different editing rhythms and validate the results through a user study. Combined with state-of-the-art cinematography, our approach therefore promises to significantly extend the expressiveness and naturalness of virtual movie-making.

Automated staging for virtual cinematography

Conference Paper

Full-text available

Nov 2018

While the topic of virtual cinematography has essentially focused on the problem of computing the best viewpoint in a virtual environment given a number of objects placed beforehand, the question of how to place the objects in the environment with relation to the camera (referred to as staging in the film industry) has received little attention. This paper first proposes a staging language for both characters and cameras that extends existing cinematography languages with multiple cameras and character staging. Second, the paper proposes techniques to operationalize and solve staging specifications given a 3D virtual environment. The novelty holds in the idea of exploring how to position the characters and the cameras simultaneously while maintaining a number of spatial relationships specific to cinematography. We demonstrate the relevance of our approach through a number of simple and complex examples.

Making 360{\deg} Video Watchable in 2D: Learning Videography for Click Free Viewing

Article

Mar 2017

360{\deg} video requires human viewers to actively control "where" to look while watching the video. Although it provides a more immersive experience of the visual content, it also introduces additional burden for viewers; awkward interfaces to navigate the video lead to suboptimal viewing experiences. Virtual cinematography is an appealing direction to remedy these problems, but conventional methods are limited to virtual environments or rely on hand-crafted heuristics. We propose a new algorithm for virtual cinematography that automatically controls a virtual camera within a 360{\deg} video. Compared to the state of the art, our algorithm allows more general camera control, avoids redundant outputs, and extracts its output videos substantially more efficiently. Experimental results on over 7 hours of real "in the wild" video show that our generalized camera control is crucial for viewing 360{\deg} video, while the proposed efficient algorithm is essential for making the generalized control computationally tractable.

Pano2Vid: Automatic Cinematography for Watching 360$^{\circ}$ Videos

Article

Full-text available

Dec 2016

We introduce the novel task of Pano2Vid $-$ automatic cinematography in panoramic 360$^{\circ}$ videos. Given a 360$^{\circ}$ video, the goal is to direct an imaginary camera to virtually capture natural-looking normal field-of-view (NFOV) video. By selecting "where to look" within the panorama at each time step, Pano2Vid aims to free both the videographer and the end viewer from the task of determining what to watch. Towards this goal, we first compile a dataset of 360$^{\circ}$ videos downloaded from the web, together with human-edited NFOV camera trajectories to facilitate evaluation. Next, we propose AutoCam, a data-driven approach to solve the Pano2Vid task. AutoCam leverages NFOV web video to discriminatively identify space-time "glimpses" of interest at each time instant, and then uses dynamic programming to select optimal human-like camera trajectories. Through experimental evaluation on multiple newly defined Pano2Vid performance measures against several baselines, we show that our method successfully produces informative videos that could conceivably have been captured by human videographers.

Narrative-driven camera control for cinematic replay of computer games

Article

Full-text available

Nov 2014

This paper presents a system that generates cinematic replays for dialogue-based 3D video games. The system exploits the narrative and geometric information present in these games and automatically computes camera framings and edits to build a coherent cinematic replay of the gaming session. We propose a novel importance-driven approach to cinematic replay. Rather than relying on actions performed by characters to drive the cinematography (as in idiom-based approaches), we rely on the importance of characters in the narrative. We first devise a mechanism to compute the varying importance of the characters. We then map importances of characters with different camera specifications, and propose a novel technique that (i) automatically computes camera positions satisfying given specifications, and (ii) provides smooth camera motions when transitioning between different specifications. We demonstrate the features of our system by implementing three camera behaviors (one for master shots, one for shots on the player character, and one for reverse shots). We present results obtained by interfacing our system with a full-fledged serious game (Nothing for Dinner) containing several hours of 3D animated content.

Automatic Cinematography and Editing in Virtual Environments

Thesis

Full-text available

Oct 2015

Quentin Galvane

The wide availability of high-resolution 3D models and the facility to create new geometrical and animated content, using low-cost input devices, opens to many the possibility of becoming digital 3D storytellers. To date there is however a clear lack of accessible tools to easily create the cinematography (positioning and moving the cameras to create shots) and perform the editing of such stories (selecting appropriate cuts between the shots created by the cameras). Creating a movie requires the knowledge of a significant amount of empirical rules and established conventions. In particular continuity editing -- the creation of a sequence of shots ensuring visual continuity -- is a complex endeavor. Most 3D animation packages lack continuity editing tools, calling the need for automatic approaches that would, at least partially, support users in their creative process.In this thesis we address both challenges of automating cinematography and editing in virtual environments. In a first contribution we propose a system that relies on Reynolds' model of steering behaviors to control and locally coordinate a collection of camera agents in dynamic 3D environments. The second contribution consists of a novel optimization-based approach for automatically creating well-edited movies from a 3D animation. We propose an efficient solution through dynamic programming, by relying on a plausible semi-Markov assumption. The next contribution uses and extends our previous work to propose a novel importance-driven approach to cinematic replay that exploits both the narrative and geometric information in games to automatically compute camera paths and edits. Finally, our last contribution addresses camera control issues by constraining motion on camera rails to ensure realistic shots.

A Virtual Director Using Hidden Markov Models

Article

Nov 2015

Automatically computing a cinematographic consistent sequence of shots over a set of actions occurring in a 3D world is a complex task which requires not only the computation of appropriate shots (viewpoints) and appropriate transitions between shots (cuts), but the ability to encode and reproduce elements of cinematographic style. Models proposed in the literature, generally based on finite state machine or idiom-based representations, provide limited functionalities to build sequences of shots. These approaches are not designed in mind to easily learn elements of cinematographic style, nor do they allow to perform significant variations in style over the same sequence of actions. In this paper, we propose a model for automated cinematography that can compute significant variations in terms of cinematographic style, with the ability to control the duration of shots and the possibility to add specific constraints to the desired sequence. The model is parameterized in a way that facilitates the application of learning techniques. By using a Hidden Markov Model representation of the editing process, we demonstrate the possibility of easily reproducing elements of style extracted from real movies. Results comparing our model with state-of-the-art first order Markovian representations illustrate these features, and robustness of the learning technique is demonstrated through cross-validation.

Continuity Editing for 3D Animation

Conference Paper

Full-text available

Jan 2015

We describe an optimization-based approach for automatically creating well-edited movies from a 3D animation. While previous work has mostly focused on the problem of placing cameras to produce nice-looking views of the action, the problem of cutting and pasting shots from all available cameras has never been addressed extensively. In this paper, we review the main causes of editing errors in literature and propose an editing model relying on a minimization of such errors. We make a plausible semi-Markov assumption, resulting in a dynamic programming solution which is computationally efficient. We also show that our method can generate movies with different editing rhythms and validate the results through a user study. Combined with state-of-the-art cinematography, our approach therefore promises to significantly extend the expressiveness and naturalness of virtual movie-making.

Film Directing for Computer Games and Animation

Article

Full-text available

Jun 2021
COMPUT GRAPH FORUM

Remi Ronfard

Over the last forty years, researchers in computer graphics have proposed a large variety of theoretical models and computer implementations of a virtual film director, capable of creating movies from minimal input such as a screenplay or storyboard. The underlying film directing techniques are also in high demand to assist and automate the generation of movies in computer games and animation. The goal of this survey is to characterize the spectrum of applications that require film directing, to present a historical and up‐to‐date summary of research in algorithmic film directing, and to identify promising avenues and hot topics for future research.

Virtual Camera Layout Generation using a Reference Video

Conference Paper

May 2021

GAZED- Gaze-guided Cinematic Editing of Wide-Angle Monocular Video Recordings

Preprint

Oct 2020

We present GAZED- eye GAZe-guided EDiting for videos captured by a solitary, static, wide-angle and high-resolution camera. Eye-gaze has been effectively employed in computational applications as a cue to capture interesting scene content; we employ gaze as a proxy to select shots for inclusion in the edited video. Given the original video, scene content and user eye-gaze tracks are combined to generate an edited video comprising cinematically valid actor shots and shot transitions to generate an aesthetic and vivid representation of the original narrative. We model cinematic video editing as an energy minimization problem over shot selection, whose constraints capture cinematographic editing conventions. Gazed scene locations primarily determine the shots constituting the edited video. Effectiveness of GAZED against multiple competing methods is demonstrated via a psychophysical study involving 12 users and twelve performance videos.

Deep 360 Pilot: Learning a Deep Agent for Piloting through 360{\deg} Sports Video

Article

Full-text available

May 2017

Watching a 360{\deg} sports video requires a viewer to continuously select a viewing angle, either through a sequence of mouse clicks or head movements. To relieve the viewer from this "360 piloting" task, we propose "deep 360 pilot" -- a deep learning-based agent for piloting through 360{\deg} sports videos automatically. At each frame, the agent observes a panoramic image and has the knowledge of previously selected viewing angles. The task of the agent is to shift the current viewing angle (i.e. action) to the next preferred one (i.e., goal). We propose to directly learn an online policy of the agent from data. We use the policy gradient technique to jointly train our pipeline: by minimizing (1) a regression loss measuring the distance between the selected and ground truth viewing angles, (2) a smoothness loss encouraging smooth transition in viewing angle, and (3) maximizing an expected reward of focusing on a foreground object. To evaluate our method, we build a new 360-Sports video dataset consisting of five sports domains. We train domain-specific agents and achieve the best performance on viewing angle selection accuracy and transition smoothness compared to [51] and other baselines.

Contrôle de caméra virtuelle à base de partitions spatiales dynamiques

Article

Full-text available

Oct 2013

Christophe Lino

Virtual camera control is nowadays an essential component in many computer graphics applications. Despite its importance, current approaches remain limited in their expressiveness, interactive nature and performances. Typically, elements of directorial style and genre cannot be easily modeled nor simulated due to the lack of simultaneous control in viewpoint computation, camera path planning and editing. Second, there is a lack in exploring the creative potential behind the coupling of a human with an intelligent system to assist users in the complex task of designing cinematographic sequences. Finally, most techniques are based on computationally expensive optimization techniques performed in a 6D search space, which prevents their application to real-time contexts. In this thesis, we first propose a unifying approach which handles four key aspects of cinematography (viewpoint computation, camera path planning, editing and visibility computation) in an expressive model which accounts for some elements of directorial style. We then propose a workflow allowing to combine automated intelligence with user interaction. We finally present a novel and efficient approach to virtual camera control which reduces the search space from 6D to 3D and has the potential to replace a number of existing formulations.

Comparing film-editing

Conference Paper

Full-text available

May 2015

Through a precise 3D animated reconstruction of a key scene in the movie "Back to the Future" directed by Robert Zemekis, we are able to make a detailed comparison of two very different versions of editing. The first version closely follows film editor Arthur Schmidt original sequence of shots cut in the movie. The second version is automatically generated using our recent algorithm [GRLC15] using the same choice of cameras. A shot-by-shot and cut-by-cut comparison demonstrates that our algorithm provides a remarkably pleasant and valid solution, even in such a rich narrative context, which differs significantly from the original version more than 60% of the time. Our explanation is that our version avoids stylistic effects while the original version favors such effects and uses them effectively. As a result, we suggest that our algorithm can be thought of as a baseline ("film-editing zero degree") for future work on film-editing style.

Virtual Director - A Concept for Automatic Vision Mixing of Live Video Streams

Thesis

Full-text available

Nov 2022

Rene Kaiser

For recorded video content, researchers have proposed advanced concepts and approaches that enable the automatic composition and personalised presentation of coherent videos. This is typically achieved by selecting from a repository of individual video clips and concatenating a new sequence of clips based on some kind of model. However, there is a lack of generic concepts dedicatedly enabling such video mixing functionality for scenarios based on live video streams. This thesis aims to address this gap and explores how a live vision mixing process could be automated in the context of live television production, and, consequently, also extended to other application scenarios. This approach is coined the 'Virtual Director' concept. The name of the concept is inspired by the decision making processes which human broadcast TV directors are conducting when vision mixing live video streams stemming from multiple cameras. Understanding what is currently happening in the scene, they decide which camera view to show, at what point in time to switch to a different perspective, and how to adhere to cinematographic and cinematic paradigms while doing so. While the automation of vision mixing is the focus of this thesis, it is not the ultimate goal of the underlying vision. To automate for many viewers in parallel in a scalable manner allows taking decisions for each viewer or groups of viewers individually. To successfully do so allows moving away from a broadcast model where every viewer gets to see the same output. Particular content adaptation and personalisation features may provide added value for users. Preferences can be expressed dynamically, enabling interactive media experiences. In the course of this thesis, Virtual Director research prototypes are developed for three distinct application domains. Firstly, for distributed theatre performance, a script-based approach and a set of software tools are designed. A basic approach for the decision making process and a pattern how to decouple it into two core components are proposed. A trial validates the technology which does not implement full automation, yet successfully enables a theatre play. The second application scenario is live event 'narrowcast', a term used to denote the personalised equivalent to a 'broadcast'. In the context of this scenario, several computational approaches are considered for the implementation of an automatic Virtual Director with the conclusion to use and recommend a combination of (complex) event processing engines and event-condition-action (ECA) rules to model the decision making behaviour. Several content genres are subject to experimentation. Evaluation interviews provide detailed feedback on the specific research prototypes as well as the Virtual Director concept in general. In the third application scenario, group video communication, the most mature decision making behaviour is achieved. This behaviour needs to be defined in what can be a challenging process and is formalised in a model that is referred to as the 'production grammar'. The aforementioned pattern is realised such that a 'Semantic Lifting' process is processing low-level cue information in order to derive in more abstract, higher-level terms what is currently happening in the scene. The output of the Semantic Lifting process is informing and triggering the second process which is called the 'Director' decision making and eventually takes decisions on how to present the available content on screens. Overall, the exploratory research on the Virtual Director concept resulted in its successful application in the three domains, validated by stakeholder feedback and a range of informal and formal evaluation efforts. As a synthesis of the research in the three application scenarios, the thesis includes a detailed description of the Virtual Director concept. This description is contextualised by many detailed learnings that are considered relevant for both scholars and practitioners regarding the development of such technology.

Creative Potential Through Artificial Intelligence: Recommendations for Improving Corporate and Entrepreneurial Innovation Activities

Article

Full-text available

Sep 2021

This article shows how the creative performance of start-ups or established organizations can be improved through the use of AI-based systems for actively promoting creative processes. With insights from two studies conducted with entrepreneurs, innovation managers and workshop facilitators, we provide recommendations for companies and entrepreneurs on the ability of AI to support creative potential to remain innovative and marketable in the long term. Our studies cover aspects such as AI for entrepreneurial activities or creativity workshops and show how to make use of AI-based systems to enhance the creative potential of the person, the process or the press (environment). Our findings also provide theoretical insights into the perception of AI as an equal partner and call for further research on the design of AI for the future creative workplace.

Development and Initial Feasibility Testing of the Virtual Research Navigator (VRN): A Public-Facing Agent-Based Educational System for Clinical Research Participation

Chapter

Jul 2020

The overall goal of VRN is to develop a novel technology solution at Children’s Hospital Los Angeles (CHLA) to overcome barriers that prevent the recruitment of diverse patient populations to clinical trials by providing both caregivers and children with an interactive educational experience. This system consists of 1) an intelligent agent called Zippy that users interact with by keyboard or voice input, 2) a series of videos covering topics including Privacy, Consent and Benefits, and 3) a UI that guides users through all available content. Pre- and post-questionnaires assessed willingness to participate in clinical research and found participants either increased or maintained their level of willingness to participate in research studies. Additionally, qualitative analysis of interview data revealed participants rated the overall interaction favorably and believed Zippy to be more fun, less judgmental and less threatening than interacting with a human. Future iterations are in-progress based on the user-feedback.

Intelligent Cinematic Camera Control for Real-Time Graphics Applications

Article

Jan 2020

Ian Harris Meeder

E-sports is currently estimated to be a billion dollar industry which is only growing in size from year to year. However the cinematography of spectated games leaves much to be desired. In most cases, the spectator either gets to control their own freely-moving camera or they get to see the view that a specific player sees. This thesis presents a system for the generation of cinematically-pleasing views for spectating real-time graphics applications. A custom real-time engine has been built to demonstrate the effect of this system on several different game modes with varying visual cinematic constraints, such as the rule of thirds. To create the cinematic views, we encode cinematic rules as cost functions that are fed into a non-linear least squares solver. These cost functions rely on the geometry of the scene, minimizing residuals based on the 3D positions and 2D reprojections of the geometry. The final cinematic view is found by altering camera position and angle until a local minimum is met. The system was evaluated by comparing video output from a traditional rigidly constrained camera and the results of our algorithm’s optimally solved views. User surveys are then used to qualitatively evaluate the system. The results of these surveys do not statistically find a preference between the cinematic views and the rigidly constrained views. In addition, we present performance and timing considerations for the system, reporting that the system can operate within modern expectations of latency when enough constraints are placed on the non-linear least squares solver.

Profiling Gaya Sutradara Berdasarkan Penempatan Posisi Kamera dengan Fuzzy Logic

Article

Full-text available

Dec 2018

Teknologi komputer telah banyak digunakan di tidak hanya untuk tujuan penelitian dan pendidikan saja tetapi juga di dunia hiburan. Salah satu hiburan berbasis komputer adalah permainan komputer dan animasi. Salah satu komponen pendukungnya adalah machinima. Machima adalah teknologi yang akan menempatkan sebuah komponen sinematik dalam dunia virtual. Salah satu komponen yang dapat dikontrol adalah penempatan posisi kamera. Seorang sutradara bisa dibedakan gaya penyutradaraannya, salah satunya berdasarkan penempatan posisi kamera. Dengan menerapkan suatu gaya penyutradaraan pada sebuah permainan atau animasi bisa mendapatkan suasana yang berbeda. Penelitian ini akan mencoba melakukan profiling terhadap gaya seorang sutradara berdasarkan kebiasaan penempatan posisi kamera. Pendekatan yang dilakukan berbasis logika fuzzy. Penelitian ini akan menggunakan 19 variabel input yang berasal dari hasil ektraksi data simulasi dan 5 variabel output untuk melakukan profiling terhadap dua buah gaya sutradara yang berbeda dengan pendekatan logika fuzzy. Akan dihasilkan diagram area dan histogram sehingga mempermudah pembacaan dalam membedakan gaya sutradara dan berhasil dibedakan berdasarkan modus hasil analisa terhadap diagram histogram.

Profiling Director’s Style Based on Camera Positioning Using Fuzzy Logic

Article

Full-text available

Nov 2018

Machinima is a computer imaging technology typically used in games and animation. It prints all movie cast properties into a virtual environment by means of a camera positioning. Since cinematography is complementary to Machinima, it is possible to simulate a director’s style via various camera placements in this environment. In a gaming application, the director’s style is one of the most impressive cinematic factors, where a whole different gaming experience can be obtained using different styles applied to the same scene. This paper describes a system capable of automatically profile a director’s style using fuzzy logic. We employed 19 output variables and 15 other calculated variables from the animation extraction data to profile two different directors’ styles from five scenes. Area plots and histograms were generated, and, by analyzing the histograms, different director’s styles could be subsequently classified.

Thinking Like a Director: Film Editing Patterns for Virtual Cinematographic Storytelling

Article

Full-text available

Oct 2018

This article introduces Film Editing Patterns (FEP), a language to formalize film editing practices and stylistic choices found in movies. FEP constructs are constraints, expressed over one or more shots from a movie sequence, that characterize changes in cinematographic visual properties, such as shot sizes, camera angles, or layout of actors on the screen. We present the vocabulary of the FEP language, introduce its usage in analyzing styles from annotated film data, and describe how it can support users in the creative design of film sequences in 3D. More specifically, (i) we define the FEP language, (ii) we present an application to craft filmic sequences from 3D animated scenes that uses FEPs as a high level mean to select cameras and perform cuts between cameras that follow best practices in cinema, and (iii) we evaluate the benefits of FEPs by performing user experiments in which professional filmmakers and amateurs had to create cinematographic sequences. The evaluation suggests that users generally appreciate the idea of FEPs, and that it can effectively help novice and medium experienced users in crafting film sequences with little training.

Empowering Creative People: Virtual Reality for Previsualization

Conference Paper

Full-text available

Apr 2018

Previsualization (previs) is an essential phase in the design process of narrative media such as film, animation, and stage plays. Digital previs can involve complex technical tasks, e.g. 3D scene creation, animation, and camera work, which require trained skills that are not available to all personnel involved in creative decisions for the production. Interaction techniques such as virtual reality (VR) enable users to interact with 3D content in a natural way compared to classical 2D interfaces. As a first step, we developed VR based prototypes and performed an exploratory user study to evaluate how non-technical professionals from the film, animation, and theater domain assess the use of VR for previs. Our results show that users were able to interact with complex 3D scenes after a short phase of familiarization and rated VR for previs as useful for their professional work.

Shot Orientation Controls for Interactive Cinematography with 360 Video

Conference Paper

Oct 2017

Virtual reality filmmakers creating 360-degree video currently rely on cinematography techniques that were developed for traditional narrow field of view film. They typically edit together a sequence of shots so that they appear at a fixed-orientation irrespective of the viewer's field of view. But because viewers set their own camera orientation they may miss important story content while looking in the wrong direction. We present new interactive shot orientation techniques that are designed to help viewers see all of the important content in 360-degree video stories. Our viewpoint-oriented technique reorients the shot at each cut so that the most important content lies in the the viewer's current field of view. Our active reorientation technique, lets the viewer press a button to immediately reorient the shot so that important content lies in their field of view. We present a 360-degree video player which implements these techniques and conduct a user study which finds that users spend 5.2-9.5% more time viewing the important points (manually labelled) of the scene with our techniques compared to the traditional fixed-orientation cuts. In practice, 360-degree video creators may label important content, but we also provide an automatic method for determining important content in existing 360-degree videos.

A Pattern-Based Tool for Creating Virtual Cinematography in Interactive Storytelling

Conference Paper

Full-text available

Aug 2014

In this work we design a tool for creators of interactive stories to explore the effect of applying camera patterns to achieve high level communicative goals in a 3D animated scenario. We design a pattern language to specify high level communicative goals that are translated into simple or complex camera techniques and transitions, and then flexibly applied over a sequence of character actions. These patterns are linked to a real-movie shot specification database through elements of context such as characters, objects, actions, and emotions. The use of real movies provides rich context information of the film, and allows the users of our tool to replicate the feel and emotion of existing film segments. The story context, pattern language and database are linked through a decision process that we call the virtual director, who reasons on a given story context and communicative goals, translates them into the camera patterns and techniques, and selects suitable shots from the database. Our real-time 3D environment gives the user the freedom to observe and explore the effect of applying communicative goals without worrying about the details of actor positions, scene, or camera placement.

Stylistic Patterns for Generating Cinematographic Sequences

Article

Full-text available

May 2015

The definitive version is available at http://diglib.eg.org/

Action Recognition in the Presence of One Egocentric and Multiple Static Cameras

Conference Paper

Full-text available

Nov 2014

In this paper, we study the problem of recognizing human actions in the presence of a single egocentric camera and multiple static cameras. Some actions are better presented in static cameras, where the whole body of an actor and the context of actions are visible. Some other actions are better recognized in egocentric cameras, where subtle movements of hands and complex object interactions are visible. In this paper, we introduce a model that can benefit from the best of both worlds by learning to predict the importance of each camera in recognizing actions in each frame. By joint discriminative learning of latent camera importance variables and action classifiers, our model achieves successful results in the challenging CMU-MMAC dataset. Our experimental results show significant gain in learning to use the cameras according to their predicted importance. The learned latent variables provide a level of understanding of a scene that enables automatic cinematography by smoothly switching between cameras in order to maximize the amount of relevant information in each frame.

The Influence of the Artificial Intelligence based CGI on the growth of the Film Industry

Conference Paper

Nov 2023

Real Time GAZED: Online Shot Selection and Editing of Virtual Cameras from Wide-Angle Monocular Video Recordings

Conference Paper

Jan 2024

Optimization-based User Support for Cinematographic Quadrotor Camera Target Framing

Conference Paper

May 2021

Makinimanın Gelişimi: Film Yapımının Sanal Prodüksiyona Dönüşümü // Development Of Machinima: Transformation Of Filmmaking into Virtual Production

Article

Full-text available

Apr 2021

Burcu Nehir Halacoglu

Video oyunlarının, teknolojiyle birlikte gelişim gösterip yaygınlaştıkça oynanabilirliğin ötesine geçerek yeni öykü anlatma yöntemleri doğurduğu görülmektedir. Bu yöntemlerden biri olan makinima, animasyon ve sinema tekniklerinin 3B oyun ortamlarında uygulanmasıyla, hibrid bir film türü olarak ortaya çıkmıştır. Ancak ilgili literatür incelendiğinde bu yöntemin gerçek bir tanımının yapılabilmesi hususunda, sınırları muğlak, sürekli değişen bir ara form olduğu için zorlanıldığı görülmektedir. Bu çalışmada, 1990’lı yıllardan beri gelişimini sürdüren makinima, farklı medya araçlarını kullanırken, bu araçların video oyunları ve ilişkili kavramlar üzerinden evrimleşmesine neden olan dönüştürücü bir teknik uygulama olarak ele alınmıştır. Bu teknik aracılığıyla, oyunların doğası gereği interaktif tasarıların ve dünyaların içerisine çekilerek etkin bir role bürünmüş oyuncu/izleyicinin de yeniden pasifize edilerek hareketsizleştirildiği görülmektedir. Bu evrimsel ve rol değişikliğine yol açan dönüşümün anlaşılabilmesi için öncelikle makinimanın ortaya çıkışı ve tanımlanışıyla ilgili tartışmalar ele alınmıştır. Böylece film prodüksiyon ortamlarının sanala doğru dönüşümünün makinimayla ilişkisinin ve bu dönüşüm sürecinin oyun çalışmalarındaki yansımalarının ortaya çıkarılması amaçlanmaktadır. Bu bağlamda MovieStorm, iClone, NVIDIA Omniverse Machinima ve Unreal 4 gibi video oyun ve makinima motorlarının sunduğu olanaklar incelenmiş, makinimanın görsel-işitsel öykü anlatımına etkisi değerlendirilmiştir. Sonuç olarak, makinima motorlarının kullanılarak sinematik eserlerin yaratılabilmesi olasılığı, film yapımının ve geleneksel iletişim aygıtlarının geleceğini ve gerçekliğin sınırlarını değiştirecek bir yaklaşım olarak değerlendirilmiştir.

An interactive staging-and-shooting solver for virtual cinematography

Conference Paper

Oct 2020

Introducing Canvas: Combining Nonverbal Behavior Generation with User-Generated Content to Rapidly Create Educational Videos

Conference Paper

Oct 2020

Attention-based Deep Reinforcement Learning for Virtual Cinematography of 360° Videos

Article

Sep 2020

Virtual cinematography refers to automatically selecting a natural-looking normal field-of-view (NFOV) from an entire 360 $^{\circ}$ video. In fact, virtual cinematography can be modeled as a deep reinforcement learning (DRL) problem, in which an agent makes actions related to NFOV selection according to the environment of 360 $^{\circ}$ video frames. More importantly, we find from our data analysis that the selected NFOVs attract significantly more attention than other regions, i.e., the NFOVs have high saliency. Therefore, in this paper, we propose an attention-based DRL (A-DRL) approach for virtual cinematography in 360 $^{\circ}$ video. Specifically, we develop a new DRL framework for automatic NFOV selection with the input of both the content, and saliency map of each 360 $^{\circ}$ frame. Then, we propose a new reward function for the DRL framework in our approach, which considers the saliency values, ground-truth, and smooth transition for NFOV selection. Subsequently, a simplified DenseNet (called Mini-DenseNet) is designed to learn the optimal policy via maximizing the reward. Based on the learned policy, the actions of NFOV can be made in our A-DRL approach for virtual cinematography of 360 $^{\circ}$ video. Extensive experiments show that our A-DRL approach outperforms other state-of-the-art virtual cinematography methods, over the datasets of Sports-360 video, and Pano2Vid.

GAZED Gaze-guided Cinematic Editing of Wide-Angle Monocular Video Recordings

Conference Paper

Apr 2020

Intelligent Camera Using a Finite-State Machine (FSM)

Conference Paper

Jan 2020

WooHyun Park

Prominent Structures for Video Analysis and Editing

Article

Jan 2020

We present prominent structures in video, a representation of visually strong, spatially sparse and temporally stable structural units, for use in video analysis and editing. With a novel quality measurement of prominent structures in video, we develop a general framework for prominent structure computation, and an efficient hierarchical structure alignment algorithm between a pair of videos. The prominent structural unit map is proposed to encode both binary prominence guidance and numerical strength and geometry details for each video frame. Even though the detailed appearance of videos could be visually different, the proposed alignment algorithm can find matched prominent structure sub-volumes. Prominent structures in video support a wide range of video analysis and editing applications including graphic match-cut between successive videos, instant cut editing, finding transition portals from a video collection, structure-aware video re-ranking, visualizing human action differences, etc.

Leveraging Machinima to Characterize Comprehension of Character Motivation

Chapter

Oct 2019

Deliberation-driven reflective sequences, or DDRSs, are cinematic idioms used by film makers to convey the motivations for characters adopting a particular course of action in a story. We report on an experiment where the cinematic generation system Ember was used to create a cinematic sequence with variants making different choices for DDRS use around a single decision point for a single character.

Firebolt: A System for Automated Low-Level Cinematic Narrative Realization

Chapter

Oct 2019

Creation of machine generated cinematics currently requires a significant amount of human author time or manually coding domain operators such that they may be realized by a rendering system. We present FireBolt, an automated cinematic realization system based on a declarative knowledge representation that supports both human and machine authoring of cinematics with reduced authorship and engineering task loads.

Write-a-video: computational video montage from themed text

Article

Nov 2019

We present Write-A-Video, a tool for the creation of video montage using mostly text-editing. Given an input themed text and a related video repository either from online websites or personal albums, the tool allows novice users to generate a video montage much more easily than current video editing tools. The resulting video illustrates the given narrative, provides diverse visual content, and follows cinematographic guidelines. The process involves three simple steps: (1) the user provides input, mostly in the form of editing the text, (2) the tool automatically searches for semantically matching candidate shots from the video repository, and (3) an optimization method assembles the video montage. Visual-semantic matching between segmented text and shots is performed by cascaded keyword matching and visual-semantic embedding, that have better accuracy than alternative solutions. The video assembly is formulated as a hybrid optimization problem over a graph of shots, considering temporal constraints, cinematography metrics such as camera movement and tone, and user-specified cinematography idioms. Using our system, users without video editing experience are able to generate appealing videos.

Making 360° Video Watchable in 2D: Learning Videography for Click Free Viewing

Conference Paper

Jul 2017

Deep 360 Pilot: Learning a Deep Agent for Piloting through 360° Sports Videos

Conference Paper

Jul 2017

Virtual cinematography using optimization and temporal smoothing

Conference Paper

Nov 2017

We propose an automatic virtual cinematography method that takes a continuous optimization approach. A suitable camera pose or path is determined automatically by computing the minima of an objective function to obtain some desired parameters, such as those common in live action photography or cinematography. Multiple objective functions can be combined into a single optimizable function, which can be extended to model the smoothness of the optimal camera path using an active contour model. Our virtual cinematography technique can be used to find camera paths in either scripted or unscripted scenes, both with and without smoothing, at a relatively low computational cost.

Computational video editing for dialogue-driven scenes

Article

Jul 2017
ACM T GRAPHIC

We present a system for efficiently editing video of dialogue-driven scenes. The input to our system is a standard film script and multiple video takes, each capturing a different camera framing or performance of the complete scene. Our system then automatically selects the most appropriate clip from one of the input takes, for each line of dialogue, based on a user-specified set of film-editing idioms. Our system starts by segmenting the input script into lines of dialogue and then splitting each input take into a sequence of clips time-aligned with each line. Next, it labels the script and the clips with high-level structural information (e.g., emotional sentiment of dialogue, camera framing of clip, etc.). After this pre-process, our interface offers a set of basic idioms that users can combine in a variety of ways to build custom editing styles. Our system encodes each basic idiom as a Hidden Markov Model that relates editing decisions to the labels extracted in the pre-process. For short scenes (< 2 minutes, 8--16 takes, 6--27 lines of dialogue) applying the user-specified combination of idioms to the pre-processed inputs generates an edited sequence in 2--3 seconds. We show that this is significantly faster than the hours of user time skilled editors typically require to produce such edits and that the quick feedback lets users iteratively explore the space of edit designs.

Pano2Vid: Automatic Cinematography for Watching 360 $$^{\circ }$$ Videos

Conference Paper

Mar 2017
Lect Notes Comput Sci

We introduce the novel task of Pano2Vid — automatic cinematography in panoramic 360$^{\circ }$ videos. Given a 360$^{\circ }$ video, the goal is to direct an imaginary camera to virtually capture natural-looking normal field-of-view (NFOV) video. By selecting “where to look” within the panorama at each time step, Pano2Vid aims to free both the videographer and the end viewer from the task of determining what to watch. Towards this goal, we first compile a dataset of 360$^{\circ }$ videos downloaded from the web, together with human-edited NFOV camera trajectories to facilitate evaluation. Next, we propose AutoCam, a data-driven approach to solve the Pano2Vid task. AutoCam leverages NFOV web video to discriminatively identify space-time “glimpses” of interest at each time instant, and then uses dynamic programming to select optimal human-like camera trajectories. Through experimental evaluation on multiple newly defined Pano2Vid performance measures against several baselines, we show that our method successfully produces informative videos that could conceivably have been captured by human videographers.

Discourse and Camera Control in Interactive Narratives

Chapter

Aug 2017

Arnav Jhala

This chapter provides a survey of research on the analysis and generation of narrative discourse that deals with effective presentation of story content through the visual medium. It starts with a theoretical grounding in narratology and cognitive science where the distinction between story and discourse is established. Theories of visual discourse that expand the notions of textual discourse to fit the analysis of visual narratives will be described. Finally, a discussion of automatic generation of coherent visual discourse in terms of viewpoint selection in virtual environments will be carried out.

Discourse and Camera Control in Interactive Narratives

Chapter

Mar 2016

Arnav Jhala

This chapter provides a survey of research on the analysis and generation of narrative discourse that deals with effective presentation of story content through the visual medium. It starts with a theoretical grounding in narratology and cognitive science where the distinction between story and discourse is established. Theories of visual discourse that expand the notions of textual discourse to fit the analysis of visual narratives will be described. Finally, a discussion of automatic generation of coherent visual discourse in terms of viewpoint selection in virtual environments will be carried out.

Mappets: An Interactive Plugin for Transmedia Machinima on Unity3D

Conference Paper

Full-text available

Oct 2013

The popularity of Machinima movies has increased greatly in the recent years. From a transmedia point of view, there was little development regarding tools to assist the production of Machinima. These are still mainly focused on the gaming community, and 3D animators. The developed tool aims to bring the typical workflow present on a normal movie set, into a machinima creation environment, expanding possibilities for transmedia productions. With Mappets as a plugin for the Unity3D game engine, we allow a translation from the typical movie dimension to a virtual one. This work evaluates the current state of art of machinima development tools and presents a working solution more adequate for transmedia productions and non-expert users interested in the production of machinima. © IFIP International Federation for Information Processing 2013.

Physical rig for first-person, look-at cameras in video games

Article

Nov 2014

This paper proposes a physics based model to simulate a reactive camera that is capable of both high-quality tracking of moving target objects and producing plausible response interactively to a variety of game scenarios. The virtual physical rig consists of a motorized pan-tilt head that is controlled to meet desired target look-at directions as well as an active suspension system that stabilizes the camera assembly against disturbances. To showcase its differences with other camera systems, we contrast our physically based technique with other direct (kinematic) computed methods from industry standard techniques.

Computational Film Analysis - From image sequences to action sequences.

Article

Full-text available

Dec 2009

Remi Ronfard

I review my research activities in Video Indexing and Action Recognition and sketch a research agenda for bringing those two lines of research together to address the difficult problem of recognizing actions in movies. I first present a series of older projects in Video Indexing, starting with the DIVAN project at INA and the MPEG expert group (1998-2000), and continuing at INRIA under the VIBES project (2001-2004). This research falls under the general approach of "computational media aesthetics", where we attempt to recognize film events based on our knowledge of filming styles and conventions (cinematography and editing). This is illustrated with two applications - the automatic segmentation of TV news into topics; and the automatic indexing of movies with their scripts. I then present my more recent research in Action Recognition with the MOVI group at INRIA (2005-2008). Building upon the GRIMAGE infrastructure, I present experiments in (1) learning and recognizing a small repertoire of full body gestures in 3D using "motion history volumes"; (2) segmenting a raw stream of 3D image sequences into recognizable "primitive actions" ; and (3) using statistical models learned in 3D for recovering primitive actions and relative camera positions from a single 2D video recording of similar actions.

Contrôle multidimensionnel interactif et application à l'édition de contenus cinématographiques virtuels

Article

Jun 2011

Charles Perin

A Lightweight Intelligent Virtual Cinematography System for Machinima Production.

Abstract

No full-text available

Recommended publications

An Interactive Camera Planning System for Automatic Cinematographer

A Lightweight Intelligent Virtual Cinematography System for Machinima Production