ArticlePDF Available

A data driven approach to generate realistic 3D tree barks

September 2022
Graphical Models 123(12)

September 2022
123(12)

DOI:10.1016/j.gmod.2022.101166

Authors:

Aishwarya Venkataramanan

Friedrich Schiller University Jena

Antoine Richard

Georgia Institute of Technology

Cedric Pradalier

Georgia Institute of Technology

3D models of trees are ubiquitous in video games, movies, and simulators. It is of paramount importance to generate high quality 3D models to enhance the visual content, and increase the diversity of the available models. In this work, we propose a methodology to create realistic 3D models of tree barks from a consumer-grade hand-held camera. Additionally, we present a pipeline that makes use of multi-view 3D Reconstruction and Generative Adversarial Networks (GANs) to generate the 3D models of the barks. We introduce a GAN referred to as the Depth-Reinforced-SPADE to generate the surfaces of the tree barks and the bark color concurrently. This GAN gives extensive control on what is being generated on the bark: moss, lichen, scars, etc. Finally, by testing our pipeline on different Northern-European trees whose barks exhibit radically different color patterns and surfaces, we show that our pipeline can be used to generate a broad panel of tree species’ bark.

Magnitude of optical flow calculated on video sequences: when the camera translates, the magnitude of the flow is greater in the foreground ; (b) when the camera rotates, the magnitude of the flow is larger in the background

…

Pipeline used for suppressing the background regions in the tree images. The images are passed through a segmentation network to remove the background portions of the images. Then the largest contour is detected and retained. Finally the mask obtained is used to acquire tree bark images with the background portions erased.

…

Encoder architecture of the generator in Depth-Reinforced-SPADE

…

Our GAN architecture for tiling the images

…

Samples of oak images with different illuminations.

…

Figures - uploaded by Aishwarya Venkataramanan

Content may be subject to copyright.

Content uploaded by Aishwarya Venkataramanan

Content may be subject to copyright.

A Data Driven Approach to Generate Realistic 3D

Tree Barks

Aishwarya Venkataramanan,1Antoine Richard,2Cédric Pradalier3

Abstract

3D models of trees are ubiquitous in video games, movies, and

simulators. It is of paramount importance to generate high quality 3D

models to enhance the visual content, and increase the diversity of the

available models. In this work, we propose a methodology to create

realistic 3D models of tree barks from a consumer-grade hand-held

camera. Additionally, we present a pipeline that makes use of multi-

view 3D Reconstruction and Generative Adversarial Networks (GANs)

to generate the 3D models of the barks. We introduce a GAN referred

to as the Depth-Reinforced-SPADE to generate the surfaces of the

tree barks and the bark color concurrently. This GAN gives extensive

control on what is being generated on the bark: moss, lichen, scars, etc.

Finally, by testing our pipeline on diﬀerent Northern-European trees

whose barks exhibit radically diﬀerent color patterns and surfaces, we

show that our pipeline can be used to generate a broad panel of tree

species’ bark.

1 Introduction

Tree modeling is an important aspect of computer graphics: its wide range of

applications in computer-generated scenes and video games makes it a key-

element of today’s digital scene. Furthermore, the forestry industry, where

this project originates from, could beneﬁt from high quality synthetic data to

increase the performances of data-driven, non-destructive quality assessment

techniques.

The computer graphics literature consists of various methods to generate

high quality trees, but only a handful focuses on modeling the tree barks

realistically [19, 38]. The diﬃculties in generating realistic tree barks arise

1Georgia Institute of Technology, Atlanta, USA, Present address:Université de Lor-

raine, CNRS, LIEC, F-57000 Metz, France

2Georgia Institute of Technology, Atlanta, USA

3Georgia Tech Lorraine, CNRS IRL 2958, F-57070 Metz, France

from the complex appearance of the bark surfaces and color. Within a same

tree specie, the barks often present a wide variety of patterns and intricate

details. Consequently, realistic bark modeling is an active ﬁeld of research.

In the literature, the majority of the works [4,9,25, 26,35,39] use traditional

3D modeling methods to generate synthetic tree barks. In contrast to the

classical modeling methods, we explore a deep-learning-based methodology

to generate high quality 3D models of tree barks. More precisely, we propose

a methodology to generate realistic tree barks using 3D Reconstruction and

deep Generative Adversarial Networks (GANs) [8]. 3D Reconstruction is

used to generate tree barks of real trees, while GANs are trained on real tree

barks and used to generate synthetic barks. The GANs used here generate

tiles that need to be merged together to form a continuous map that is then

rolled onto a tree trunk. Because of the rolling and the tiling, there tends

to be a discontinuity at the place where the tiles meet. We alleviate this

using a GAN based tiling method. Furthermore, GANs oﬀer a lot of control

on how the bark is generated, allowing the users to add scars, cut branches,

moss and many other details easily. All in all, the GANs proposed here allow

the users to modify the texture and features of the trees trunks to suit their

needs easily and realistically. Our proposed pipeline can be divided into the

following two steps:

•3D reconstruction of real barks to obtain their geometry and color.

•GANs to generate intricate bark details and colors in order to create

synthetic tree barks.

Using our method, we generate barks of Oak, Beech and Robinia trees that

have strongly contrasting structures, demonstrating the broad application

scope of our method. Furthermore, we showcase the robustness of our

method by training the GANs on a dataset acquired under various light-

ing conditions. Our experimental results show that our approach generates

realistic looking features for trees with both smooth and deeply ridged sur-

faces. When provided with smooth tree bark models with no surface features,

this method could be used to generate thousands of realistic looking 3D bark

models for several types of trees. Overall, our contributions are as follows:

•We propose a new pipeline for generating 3D models of realistic tree

barks using 3D Reconstruction and GANs. This method oﬀers ex-

tensive control over the generated bark allowing to synthesize more

features other than bark patterns such as moss, scars, etc.

•We demonstrate a method to automatically acquire labelled dataset to

perform deep-learning based background removal on our trees.

•We introduce a GAN network: Depth-Reinforced-SPADE, that can

generate multi-modal outputs and also generates the surface geometry

information and the color of the barks simultaneously.

To the best of our knowledge, this is the ﬁrst attempt at generating realistic

tree barks using deep learning.

Code is available in: https://github.com/vaishwarya96/Tree-barks-generation.

git and dataset in https://www.dropbox.com/scl/fo/19mktx02p49eq2vxx1bf1/

h?dl=0&rlkey=ghsaqiznjw5kgdnig4tot2jw5

2 Related Work

2.1 3D Modeling of Trees and Tree Barks

Numerous works have focused on generating complete tree models, including

the branches and the leaves. Amongst these, [39] uses tree-cuts from real

tree models to generate synthetic trees and [4,26, 35] use sketch-based model-

ing. [9,25] use a procedural approach, but they do not model the bark. Many

works [9, 20, 25, 29, 35] model trees or plants from images. [21] uses a com-

bination of GAN and procedural modeling to generate tree models. While

these methods have been successful in generating the overall structures of

the trees, they do not model the realistic bark textures which is important

for identifying the diﬀerent tree species. Thus, our focus in this study is to

develop a tree-specie speciﬁc bark generator.

[6, 18, 36] model cracks and basic structures, but they are not capable

of imitating the bark patterns of a speciﬁc species. [2] proposes to use X-

ray images of a real bark to generate Maple bark structures. While this

method is interesting, it is cumbersome and requires X-ray images of trees

which are hard to come by. [38] uses texton analysis to generate barks from

a single photograph, and [19] uses procedural modeling to generate barks.

Contrary to these methods, we use deep-learning to generate realistic tree

barks. Learning based approach facilitates the users in generating several

unique barks without much of manual intervention. Additionally, the ap-

proach presented here allows the user to control what will be generated on

the trees’ barks.

2.2 Generative Adversarial Networks

The GAN introduced in [8] uses neural networks to generate the images.

However, using neural networks to generate images are limited to simple

datasets. To solve this, DCGAN [30] incorporates Deep Convolutional Neu-

ral Networks (CNN) in the generator and discriminator to generate complex

images. Using deep CNNs boosted the quality of the generated images com-

pared to using neural networks. Since then, novel loss functions [1, 23] and

deep CNN architectures [14–16] have been evolving to stabilize the train-

ing, and to generate higher quality images. For instance, [14] could achieve

high resolution images of human faces by training a model on images of

progressively increasing resolution.

With the introduction of pix2pix [12], conditional GANs have attracted

wide-spread attention due to their ability to generate good-quality images

from an input label. Pix2pix is an image-to-image translation network, where

the input label is an image and the network generates an image corresponding

to the input. Various image-to-image translation networks have been pro-

posed in [5, 27, 37, 40, 42] since the success of pix2pix. CycleGAN [42] uses a

cycle consistency loss to perform image-to-image translation on an unpaired

set of training data. StarGAN [5] makes use of mask vector method to trans-

late images from one domain to another, when the training data is from two

diﬀerent domains. Self-Attention GAN [40] adds self-attention blocks to the

generator and discriminator to generate ﬁne details from all feature locations.

Pix2pixHD [37] uses a multi-scale generator and discriminator to generate

high-resolution images from input images, while SPADE [27] uses a spa-

tially adaptive normalization to retain the semantic information throughout

the generator network while generating photo-realistic images from semantic

maps.

Additionally, as part of our pipeline, we have to tile the images generated

by the GANs to obtain a continuous image. The state-of-the-art GAN net-

work for tiling images is TileGAN [7]. However, TileGAN requires the images

to be generated by Progressive GAN [14], and it tends to modify large parts

of the original images to be tiled, which is not desirable for our objective.

Hence, we developed a GAN based on in-painting for our objective.

3 Generating Tree Barks

In the coming sections, we explain the pipeline to generate realistic tree

barks from sequences of images. The block diagram of our pipeline is given

in Fig. 1. Our pipeline is divided into two stages:

•In the ﬁrst stage, we acquire the dataset, perform background suppres-

sion on the images, conduct 3D reconstruction, and extract the surface

geometry and the color of the bark.

•In the second stage, we use GANs to generate the synthetic surface

geometry and bark colors and provide control over the generation of

the moss, scars, lichens, etc.

     

       

         

         

      

    

      

       

       

       

        

        

    

       

        

   

     

     

Figure 1: Our proposed pipeline for generating 3D models of real and syn-

thetic trees

3.1 3D Reconstruction, Surface Geometry and Color Extrac-

tion

3.1.1 Dataset Acquisition

As shown in Fig. 1, the ﬁrst step in the pipeline is to capture image se-

quences of trees. To perform 3D reconstruction and triangulation, the image

sequences of trees must be captured from multiple camera positions. To make

our method aﬀordable and as general as possible, we rely on consumer-grade

monocular cameras. Initially, while testing with smartphone cameras, mo-

tion blur was introduced due to jerking of the hand, which deteriorated the

quality of the 3D reconstruction. Hence, we use the DJI Osmo pocket, an in-

expensive gyro-stabilized camera. Gyro-stabilization counteracts the camera

shakes, and gives sharper and better quality images.

The videos were acquired by slow movement of the camera to avoid mo-

tion blur. First, the vertical portions of the trees were captured, followed by

a slight horizontal rotation of the camera. This sequence was repeated to

cover the entire circumference of the tree.

Since we capture the images using a single monocular camera, the global

scale information of the tree is not preserved in the ﬁnal 3D point cloud. To

get the scale information, we attach square markers of 3 by 3 cm onto parts

of the trees while capturing the video. We place the markers onto the part

of the tree near the ground so that it does not obstruct the mid portion of

the trunk which is required for extracting the surface geometry and color

information. Once the 3D point cloud of the reconstructed tree is obtained,

the known marker dimension is used to scale the tree accordingly.

3.1.2 Background Suppression

To perform the reconstruction, we rely on Pix4DMapper [28]: a feature-based

3D reconstruction tool. With this type of tool, removing the background re-

duces the number of features points, that not only makes the reconstruction

faster but also more accurate. Background segmentation reduces the risk

of confusing the object of interest with the associated background. This

is especially true in images, since they are 2D projections of the 3D world

with both object and background ending up on the same plane. Hence,

we developed a semi-supervised learning method to remove the background

portions of the images and retain only the tree barks. In this method we

ﬁrst create a coarse dataset using traditional computer vision methods, and

then use this dataset to train a Deep Neural Network. The advantage of us-

ing standard computer vision algorithm to make a coarse dataset is that no

manual labeling is required. Considering that our images were acquired us-

ing a gyro-stabilized camera, we can assume that the transition between the

consecutive frames is gradual and smooth. Under this assumption, we de-

veloped a mechanism to automatically label the images as bark (foreground)

or non-bark (background), using optical ﬂow.

To remove the background portions from the images, we use PSPNet

[41], a deep learning segmentation network. It is ﬁrst trained on the CMU

dataset [31], and ﬁned-tuned on our automatically acquired dataset.

From each of the consecutive image frames, we obtain the horizontal and

vertical ﬂow vectors and use these to calculate the magnitude and direction

of the ﬂow. As our images are acquired using a gyro-stabilized camera,

we can hypothesize the following: when the camera translates along the

tree, the direction of the vector is vertical and the magnitude of the ﬂow

vectors is greater in the foreground than in the background, seeing that the

objects closer to the camera move faster than the far-away objects. On the

contrary, when the camera rotates, the magnitude of the ﬂow is greater in

the background since the far-away objects move faster than the objects closer

to the camera. A sample of the ﬂow magnitude when the camera translates

and rotates is shown in Fig. 2.

(a) (b)

Figure 2: Magnitude of optical ﬂow calculated on video sequences: when

the camera translates, the magnitude of the ﬂow is greater in the foreground

; (b) when the camera rotates, the magnitude of the ﬂow is larger in the

background

We use an adaptive threshold on the magnitude of the ﬂow vectors to

mask the background portions in the images. We calculate the magnitude

of the maximum ﬂow vector for each frame and compare it with the other

ﬂow vectors for the same frame. When the camera translates vertically,

and the magnitude of the ﬂow vector is greater than 50% of the maximum

magnitude, we label those regions as the foreground and the other regions to

be the background. The adaptive threshold method worked best when the

camera translates and failed for the rotation case. Hence we use the labels

when the camera translates and ignore when it rotates. For a more robust

method to perform background suppression than optical ﬂow, we train a

deep learning segmentation network.

After training, the images extracted from all the trees’ videos are sent

through the network. The output from the segmentation network is a label

map with two classes: bark-regions and non-bark regions. Sometimes, the

segmentation network detects trees in the background and retains them.

To remove those trees, we ﬁnd the largest connected region, which is the

tree bark to be reconstructed. This region is retained while the others are

masked. Finally, we dilate the ﬁnal mask to include the edges of the tree

Figure 3: Pipeline used for suppressing the background regions in the tree

images. The images are passed through a segmentation network to remove

the background portions of the images. Then the largest contour is detected

and retained. Finally the mask obtained is used to acquire tree bark images

with the background portions erased.

bark and apply the resulting mask on to the original images to suppress the

background regions, as shown in Fig. 3.

If we had acquired the images with a 3D camera, such as Kinect or Re-

alsense, this process would not have been necessary. However, these cameras

are cumbersome and require taking a laptop or small embedded computer to

record the trees, which is impractical in forests. Furthermore, the RGB and

depth images acquired by these cameras are of low quality, particularly in

outdoor conditions. Hence, if one wanted to extract the foreground objects

using depth information, we would favor a small 3D laser-scanner, such as a

Livox Mid-40, which we could remap to the cameras pixels.

3.1.3 3D Reconstruction

The images, with the background portions suppressed are given to Pix4dMapper [28]

as it is faster and more accurate than Colmap [32, 33] to generate a dense

3D point cloud of the tree.

3.1.4 Surface Geometry and Color Extraction

Once the 3D dense point cloud of the tree is obtained, we extract the surface

geometry and the bark color from the point cloud. To do this, we approxi-

mate the 3D point cloud of the tree to a series of circles stacked vertically.

In the cylindrical coordinates, we discretize the point cloud along the height

and angle. The height is discretized in steps of 1 mm and angle in steps of

0.5◦. For each of the circles obtained from the discretized height, we obtain

the center point using a circular least-square ﬁt.

To extract the surface geometry information, we construct a 2D map in

polar coordinates, with the radius measured from the center point of the tree

as a function of the height and the angle. Since the height and angle are

discretized, there may be many points corresponding to a particular value of

height and angle. Thus, we take the mean value of the radius of the points

to construct the radius map. The mean radius is calculated as a Gaussian

weighted sum of the distance between the point and the center of the cell in

the map divided by the sum of weights.

Similarly, to extract the color of the bark, we calculate the mean values

of R, G and B as a function of height and angle, where instead of the radius,

the R, G, B values of the point are used.

Sometimes, there may be cells in the map where there is no point. This

is much more prevalent in the case of oak and robinia trees as the ridges

on the bark surface creates occlusions which are not reconstructed in the

point cloud. As the bark of the beech tree is much smoother, the occlusions

are minimal. The presence of cells with no point creates empty regions in

the ﬁnal map. To avoid this, we construct a multi-level image pyramid.

The lowest level in the pyramid consists of the original resolution map, and

the highest level consists of map with resolution 1

27of the original map. For

each level in the pyramid, if there are cells with no point, the radius and

RGB values are interpolated from the map of the next level up. Fig. 4

demonstrates the need for interpolating. Fig. 4(a) is the oak bark before

interpolation. The black regions in the image are due to empty cells. After

interpolation, the empty cells are ﬁlled with values from the higher level in

the pyramid, which can be seen in Fig. 4(b)

3.2 Synthetic Bark Surface and Color Generation

In this section, we will discuss the second stage of our pipeline: generating

synthetic surface and color of the barks using conditional GAN(cGAN). For

each tree specie, we train a separate GAN that specialises to the particular

(a) (b)

Figure 4: Example of an oak bark (a) before interpolation, (b) after inter-

polation

tree type and generates the ﬁne bark details corresponding to that tree specie.

For training the GAN we smooth the ground truth real barks until no ﬁne

bark details are present. We provide the smooth bark as input to the GAN

and the network learns to generate the ﬁne bark details on this by using the

real barks with ﬁne details as ground truth. Later, the user could provide

maps of smooth tree trunks to the network and use the trained GAN models

to generate thousands of synthetic barks.

To test our approach, we use empirical mathematical models to generate

thousands of smooth tree bark structures. These mathematical models are

approximations of large varieties of the trees that have cylinder-like structure

and does not accurately represent any tree in particular.

3.2.1 Generating Smooth Approximate Radius Maps

The smooth surface of the bark represents the overall structure of the tree

trunk without any ﬁner structure details. The map consists of radius values

rof the tree trunk as a function of height zand angle. To generate these

maps, initially, we create a 2D map with an exponentially decaying radius

r0.

r0=axz(1)

where, xis a ﬂoating point number between the range 0.8 and 0.9, and a

is a random value chosen between 0.1 and 0.9. To approximate the wavy

structure of an actual bark, we add a series of sinusoids of varying amplitude

and frequencies to r0.

r=r0+

i=1

bisin(ωiθi)(2)

where, nis an integer chosen randomly between 1 and 3, biis the amplitude

of the ith sinusoid. ωi= 2πfiwhere fiis a value between 20 and 100. This

determines the frequency of the generated sinusoids. θiis chosen based on

ωito ensure that ris 2π-periodic. The amplitudes and frequencies of the

sinusoids, and the value of aare randomly chosen in order to generate unique

smooth tree barks every time. The parameter values are hand-tuned for each

tree species based on the visual observations of the tree bark. For instance,

the beech trees are mostly cylindrical and hence biis set to a value close to

zero (range of 0 to 0.005). For oaks, bihas a higher range of values (0.01

to 0.05). Similarly, one could adjust the range of radius values to ﬁt to the

tree type. For example, palm trees are mostly thin and so will lie in the

lower range of the radius. The number of parameters to be tuned are limited

(radius, amplitude of sinusoid) and so we could easily ﬁnd the suitable range

of values of these parameters and use the generic model for our testing.

While we developed the smooth trunk models and parameters empirically,

one can also use one’s own models and use the pipeline to generate the ﬁne

barks details.

Our GANs were trained using the real tree bark models that were smoothed,

but still works well when tested on the empirically developed bark models

which shows that the GANs are robust to the input smooth barks and can

generalise well. While they work well for most of the trees with cylindrical-

like structure, it would fail on non-cylindrical barks such as the buttress root

trees.

3.2.2 Generating Bark Surface and Color Using GANs

In this section, we explain the architecture that generates the surface and

color maps from the smooth surface maps. Apart from the ability to generate

barks, we also want to give the user the ability to generate moss, lichen, scars,

and any other particularities that would be labelled in the dataset, which

we provide through semantic labels. Each label value in the semantic labels

indicate the location of moss, lichen and scars.

The main drawback of using most of the state-of-the-art cGANs for bark

generation is that they generate a single image corresponding to the input.

Most of these networks lack multi-modality i.e, they generate a deterministic

output for an input label. Ideally the network must generate multiple outputs

for identical inputs to add variability in the bark patterns generated. Thus,

to generate the surface and color maps using these networks, one must use

a two-stage pipeline: ﬁrst input the smooth surface maps and obtain the

surface maps with the generated details. Second, use the generated surface

maps and semantic labels as input to another network and obtain the color

maps. It is necessary to use the generated surface maps as input in order

to ensure multi-modality in the generated colors. When the networks are

provided with identical semantic labels, it generates deterministic output.

Therefore, to obtain unique color maps, one must add variability in the

input, which we do through the surface maps. The uniqueness of surface

maps is ensured through the unique input smooth surface maps.

However, as the input surface maps pass through the deep layers of the

color map generator, the surface information gets lost which results in the

color maps lacking the necessary sharpness in texture. Additionally, the color

generator network is dependent on the semantic labels(moss, lichens, etc),

while the surface generator is independent of it. This necessitates the need for

two diﬀerent architecture types for generating the surface and color maps. As

a result, we developed an architecture based on SPADE [27] called "Depth-

Reinforced-SPADE". The network takes two inputs: a smooth surface map

and a semantic map, and outputs a precise surface map and a color image

at the same time.

     

    

     

    

     

    

    

         

Figure 5: Encoder architecture of the generator in Depth-Reinforced-SPADE

The architecture of Depth-Reinforced-SPADE is made up of a main gen-

erator and two discriminator networks. The main generator consists of an

encoder and two decoders for generating the surface and color maps. The

encoder is made of ResNet blocks [10]. The input to the encoder is the

smooth surface map. The encoder compresses the input into a concise latent

space, which preserves the input smooth surface bark information, and also

helps in achieving multi-modality in the generated outputs. The decoder

uses the latent space obtained from the encoder to generate the accurate

surface maps and the corresponding color. Rather than using a single de-

         

    

     

  

    

     

  

    

     

  

    

     

  

     

  

     

  

     

  

     

  

                                   

              

                         

              

Surface Generator

Color Generator

Figure 6: Decoder architecture of the generator in Depth-Reinforced-SPADE

coder to generate the surface and color, we separate the generation. This

is for two reasons: ﬁrst, the surface and color generation are two separate

tasks. Using a single decoder to generate them resulted in unstable training.

Second, the color maps are dependent on semantic labels whereas the surface

maps are independent of them. Thus, the two decoders are architecturally

diﬀerent. They take in the latent space as input. The surface generator con-

sists of ResNet blocks whereas the color generator consists of SPADE ResNet

blocks [27]. The SPADE ResNet blocks take in the semantic labels and the

output channels from the surface generator’s ResNet blocks. This is done

to reinforce the surface information onto the color generator to enhance the

texture of the generated color maps. Since both the surface and semantic

labels are passed through the SPADE ResNet layers, the information from

surface maps can overwhelm the semantic labels. To maintain a balance

between both the information, we pass the surface generator’s feature maps

through convolution layers with an output of 16 channels and append its

output to the input semantic map. Without this convolution which acts as a

surface regularizer, the color generator was not able to leverage the semantic

input fully, leading to an uncontrolled growth of moss and lichen all over

the generated bark. The network architecture of Depth-Reinforced-SPADE

is shown in Fig. 5 and Fig. 6.

                    

   

      

    

   

           

              

       

   

       

     

     

   

                                

Figure 7: Our GAN architecture for tiling the images

To obtain high quality outputs, we use separate discriminator networks

for the surface and color maps that specialise in discriminating the surface

and the colors respectively. the architecture of the discriminator networks

are identical to the one used in pix2pixHD [37].

We train the network with the same loss functions used in SPADE [27],

namely, the GAN loss, feature matching loss, perceptual loss [13] and KL-

Divergence Loss.

3.2.3 Tiling the Images and Creating 3D Meshes

The surface and color maps obtained from the GANs are parts of a full

tree bark. To build the full structure of a tree trunk, the images must be

combined together to form a full map. When the image tiles are combined,

there will be discontinuities at the places where the images are joined. The

discontinuities in the surface maps appear as bumps in the 3D models of

the tree barks, while the discontinuities in the color maps results in abrupt

change in bark colors. Thus these discontinuities must be removed to obtain

a continuous map. The existing GAN-based tiling method [7] modiﬁes large

parts of the images and it requires the images to be generated by Progressive

GAN [14]. This cannot be used for our objective because we do not rely on

Progressive GAN to generate our images. Also, it is preferred that only

a minimal portion of our generated images are modiﬁed. Thus, for tiling

the images, we used an in-painting based method, where the discontinuities

in the tiled image are masked and a conditional GAN network is used to

generate a continuous image by ﬁlling only the masked regions.

To tile the images, we combine four image tiles into a single larger image.

The combined images will have discontinuities at the places where the images

are joined. Before feeding the images to our tiling network, we mask 15 pixels

from each side of the discontinuities and ﬁll the masked areas with the mean

values of the image. Thus, contrary to the previous method, only the masked

regions are modiﬁed by the GAN. To prevent the GAN from modifying

the whole image we constrain its generation space to the gap regions only.

This was done to prevent it from learning a trivial copy task that can be

implemented separately. Doing so drastically improved the quality of the

generated images and increased its convergence speed. This process can be

seen in Fig. 7.

The network is a modiﬁed pix2pix architecture [12], where the generated

image is copied with the unmasked pixels from the input image. Thus, the

unmasked portions from the input images are by-passed from the generator

and the resulting images are given to the discriminator. This way, we can

ensure that the generated images are of the same quality as the original

images. Because we are copying the unmasked portions of the original images

on to the generated images, we can be assured that the GAN only ﬁlls

the masked regions without modifying the other image parts. This tiling

technique is applied to both the surface and color images to obtain the full

maps. Once we obtain the full surface and color maps, we construct a 3D

models of the tree bark using VTK [34].

4 Experiments

4.1 Dataset

We conducted experiments on our tree bark dataset. As of today it contains

3 tree-species: Oak, Beech, and Robinia (Locust).

To build this dataset we recorded videos of diﬀerent trees. In total, we

acquired videos of: 20 oak trees, 15 beech trees, and 18 robinia trees. These

videos were captured with a DJI Osmo camera at 29.99fps in full HD res-

olution (1920x1080). All the trees were captured from local forests/streets

under natural lighting, with automatic exposure. Fig. 8 shows some of the

captures from the video frame of oak trees. Each video consists of approx-

imately 7000 frames. Since we use a gyro-stabilized camera, the transition

between the consecutive frames are smooth and there are no major changes

Figure 8: Samples of oak images with diﬀerent illuminations.

between consecutive frames. Most of the consecutive frames contain redun-

dant information and so, the images are extracted from every tenth frame of

the whole video sequence. This way, the number of images for reconstruc-

tion are in the range of a few hundreds, which could be handled eﬃciently

by the photogrammetry software: Pix4D Mapper. For better reconstruction

results, we recommend using hand held gimbals and a professional grade

camera with controlled brightness, although as we demonstrate here, satis-

fying results can still be achieved without any of those.

The videos of the oak trees were acquired on a sunny day and most of the

oaks in our dataset consists of regions with shadows and parts illuminated

by sunlight and sun glares. Because of this, the 3D reconstruction and the

color maps extracted from the reconstructed point clouds consists of regions

with varying color intensities. Considering this, we acquired the videos of

the beech and robinias on an overcast day. The bark colors were free from

drastic changes due to shadows. The contrast in between these two datasets

allowed us to evaluate how our approach fared in challenging situations, and

helped us evaluate the reliability of our pipeline.

Our approach requires semantic label maps for training. The semantic

maps for the tree barks were obtained by hand labelling each of the bark

images. We used an online labelling tool, Labelbox: https://labelbox.com.

Each of the bark color images were labelled into four categories, and are

represented with the following colors in the semantic maps:

1. Bark - yellow

2. Defect/scar - orange

3. Moss - red

4. Lichens - blue

The dataset used to train the GANs consists of maps of size 256 ×256.

To generate these images, we take the large surface and color maps and take

crop out of them with a stride of 16. This allowed us to artiﬁcially augment

the number of samples and let us train complex GANs with a fairly limited

amount of data. Once the images are acquired, the entire set of surface/depth

maps were normalized to the range 0 to 65535. Before feeding all the images

to the network, they are instance normalized to the range [-1,1].

4.2 Baseline

4.2.1 Generation

To evaluate the generation quality of our network to other GANs we com-

pare our Depth-Reinforced SPADE with two state-of-the-art GAN architec-

tures: SPADE [27] and pix2pixHD [37]. SPADE and pix2pixHD can generate

photo-realistic images from input semantic maps. To synthesize the surface

and color maps for bark generation from pix2pixHD and SPADE, we use a

two-stage pipeline. First, we provide the smooth surface map as input to

the network and obtain the generated surface map. Next, we train another

network, where we provide the generated surface map and semantic label as

input and obtain the color map. (1) For pix2pixHD, we provide the smooth

surface map through the encoder and obtain the generated surface maps.

Next, we concatenate the generated surface maps and semantic labels and

pass it through the encoder of another pix2pixHD network and obtain the

color maps. (2) For SPADE, since the network has been designed to work

with semantic labels, we provide the input surface maps through the encoder

and semantic labels through the SPADE ResNet blocks to obtain the surface

maps. Next, we provide the generated surface map through another encoder

and decoder of SPADE to obtain the color maps.

Instead of generating the surface and color maps using two independent

networks for pix2pixHD and SPADE, one could use a single encoder and

decoder network to generate surface and color maps with the outputs con-

catenated. However, synthesizing both the surface and color maps at the

same time using a single generator (encoder and decoder) and discriminator

resulted in the training becoming very unstable. This is due to the inherent

nature of the GANs where the generator and discriminator compete against

each other and are thus unstable while training. Concatenating two diﬀerent

types of images (surface maps and color maps) for generation results in the

network not learning and resulted in non-convergence.

4.2.2 Tiling

To evaluate the results of our tiling technique, we compare ourselves to Edge-

Connect [24]. It consists of two networks: the ﬁrst network adds edges in

the areas to be in-painted so that there is a continuation of the edges from

the surrounding areas. The second network ﬁlls the color in the in-painting

areas. Finally, before saving the images, they copy the areas other than the

in-painted ones to the generated images.

5 Results

5.1 3D Reconstruction and Surface Geometry Extraction

After acquiring the images and performing background suppression on them,

they were used to obtaining 3D reconstructed trees using Pix4DMapper.

Examples of 3D point-clouds of an oak, beech, and robinia trees are shown

in Fig. 9.

Figure 9: 3D point clouds of oak, beech, and robinia trees

Once the 3D point clouds of the trees are acquired, the radius and color

information are extracted from them. The surface maps consists of the ge-

ometry of the tree bark in polar coordinates, where, the map contains the

radius values from the center points as a function of the height of the tree

and the angle. This is similar to cutting open a 3D cylinder and ﬂattening

it as a plane. Examples of the maps obtained for an oak, beech, and robinia

trees are shown in Fig. 10.

5.2 Architecture

In this section, we compare the results obtained from pix2pixHD, SPADE

and Depth-Reinforced SPADE for the generated surface and color maps.

(a) (b) (c)

Figure 10: (a) Real surface and color map of an oak tree (b) Surface and

color map of a beech tree (c) Surface and color map of a robinia tree. The

colorbar shows the radius values of the barks in metres.

Datasets pix2pixHD [37] SPADE [27] Ours

Oak 65.873 97.592 50.432

Beech 55.372 89.320 52.868

Robinia 55.043 75.534 45.096

Table 1: Quantitative evaluation of the diﬀerent architectures for surface

map generation. The values here are the FID scores, where a lower score

indicates better performance.

5.2.1 Surface-Map Generation

Fig. 11 shows the generation results of the surface generators for pix2pixHD,

SPADE and Depth-Reinforced SPADE. The synthetic maps generated by

pix2pixHD are blurry. When the SPADE network was provided with the se-

mantic labels, it distorted the generated surface maps. The semantic labels

Input pix2pixHD SPADE

Depth-

Reinforced

SPADE

Ground Truth

Figure 11: Diﬀerent methods to generate surface and color tiles for oak,

beech and robinia barks. The inputs are the semantic labels and the smooth

surface maps. Under pix2pixHD and SPADE: First columns are the gen-

erated surface maps. Second columns are the color maps generated when

only the semantic labels are provided as input. Third column is when both

semantic labels and smooth surface maps are provided as input.

Datasets pix2pixHD [37] SPADE [27] Ours

Acc mIOU FID Acc mIOU FID Acc mIOU FID

Oak 87.88 22.49 183.09 89.06 22.87 85.17 89.17 22.57 49.21

Beech 50.77 15.68 113.02 51.73 15.57 65.273 52.23 15.70 54.45

Robinia N/A N/A 715.72 N/A N/A 164.19 N/A N/A 57.51

Table 2: Quantitative evaluation of the diﬀerent architectures. Higher ac-

curacy, mIOU and lower FID indicate better performance. The datasets are

highly imbalanced which explains the low mIoU results of the segmentation

networks.

Architecture Number of parameters

pix2pixHD 366M

SPADE 249.8M

Depth-Reinforced SPADE 207.6M

Table 3: Number of parameters in the network. Note: For pix2pixHD and

SPADE, the values reported here is the total number of parameters for train-

ing both the surface and the color generator.

interfered with the input smooth surface in the deep layers of the decoder

network. Whereas, the surface maps from Depth-Reinforced SPADE are

sharper than the other methods. This is because, the KL-Divergence loss

preserves the input information in the latent space. Moreover, there is no in-

terference with the semantic labels, this allows the network to generate ﬁner

bark structures compared to the other two methods. This is the main rea-

son for separating the surface and color generation in the Depth-Reinforced

SPADE architecture. The synthetic Robinia trees surface-maps are particu-

larly convincing, with realistic patterns that spawn the entire generated tile.

Overall the results of the surface maps look good, for each tree species our

GAN is capable of generating specie speciﬁc patterns from a smooth surface.

This can also be veriﬁed from the quantitative metrics shown in Table. 1.

We calculate the Fréchet Inception Distance (FID) [11] score between the

generated images and the ground truth images, and a lower score indicates

better performance.

5.2.2 Color-Map Generation

Fig. 11 shows some generation results for the oak, beech and robinia trees.

For pix2pixHD and SPADE we have shown two versions of color map gen-

eration. In the ﬁrst version, we provide only the semantic labels as input.

This is to see how the network behaves when provided with identical inputs.

In the second version, we provided the surface maps along with the semantic

labels as input to verify that surface maps can help the network generate

multi-modal color maps. The fourth column and seventh column in Fig. 11

are the color maps generated for pix2pixHD and SPADE respectively, when

only the semantic labels are provided as input to the network. Based on the

results, pix2pixHD fails to generate correct outputs with a constant label

map. This is a well known issue of pix2pixHD that the SPADE architec-

ture addresses. Even though SPADE solves the problem encountered with

pix2pixHD, SPADE is deterministic: given a label, the output of the gener-

ator will always be the same, which is demonstrated in column seven for the

robinia barks. To make these networks generate varying outputs, one must

introduce signiﬁcant variability in the input. Thus, when these networks

were provided with the surface maps, they started to generate multi-modal

outputs as shown in columns ﬁve and eight of Fig. 11. In the case of Depth-

Reinforced SPADE, since we generate the surface maps and let it guide the

color generator, we always get unique color maps. Furthermore, if one has a

close look at the color maps synthesized by our GAN the results look sharper,

since we use the generated surface maps to reinforce the color maps which

enhances the texture of the generated maps and make them look sharp. More

results can be seen in appendix D, E, and F.

To compare the diﬀerent network quantitatively we use the same metrics

as [27]: the pixel accuracy, the mean intersection over union (mIOU), and

the Fréchet Inception Distance (FID) [11] score. The semantic segmentation

which gives the pixel accuracy and mIoU scores is applied on the generated

images and compared to the labels used to generate the image. DeepLab

V3+ [3] is used to perform the semantic segmentation. As shown in Table 2,

the results provided by our architecture are superior with a lower FID across

all datasets, a slightly better accuracy and a comparable mIoU. Please note

that on the robinia trees we could not evaluate the mIoU or the accuracy as

there is only one label.

5.2.3 Controlling the Bark Synthesis

Figure 12: Diﬀerent generation results where we draw unusual shapes on

beech barks.

Using our GAN we can bend mother nature to our will, allowing us to

painlessly draw familiar objects on the bark as illustrated in Fig. 12. In this

ﬁgure we chose to draw the Stanford bunny and a heart on the bark of a beech

tree. This allows us to demonstrate the beneﬁts of using GANs to generate

tree barks: using GANs let us change the trees easily and realistically. The

main drawback of using GANs is that to correctly generate some labels they

have to appear quite often in the dataset on which they were trained. In our

case we could not reproduce those results with the oak or robinia trees as

their dataset had too few occurrences of moss or lichen.

Finally, we also compare the architecture between pix2pixHD, SPADE

and Depth-Reinforced SPADE in Table. 3. For pix2pixHD and SPADE,

we calculate the total number of parameters for generating the surface and

the color. Since the surface and color generator of Depth-Reinforced SPADE

share the encoder, the number of parameters is reduced. This in turn reduces

the computational cost and reduces the time for training.

5.3 Tiling the Images

The generated images from the GAN networks were tiled using our modi-

ﬁed pix2pix architecture. An example of a continuous image obtained by

combining four smaller color images of an oak bark is shown in Fig. 13.

Fig. 13(a) consists of an image with four images combined together. This

creates a discontinuity in the regions where they are joined. The disconti-

nuities were masked as shown in Fig. 13(b). As it can be seen, only a small

part surrounding the discontinuities are masked, and only these regions are

ﬁlled by the GAN. Since we use a copying task in the architecture, where

the unmasked regions are retained, the majority of the parts in the original

images are retained and the quality of the original images are also main-

tained. Finally, after tiling, the image is shown in Fig. 13(c). The GAN has

ﬁlled the masked regions with suitable values so as to generate a continuous

image.

Fig. 14 shows a situation where the state of the art in-painting method

Edge-Connect [24] fails to join the diﬀerent tiles. It can be seen that, al-

though the network generates the right edge patterns, it generates radius

values with a range higher than the actual values. Lastly, as the copy task

is performed on the generated images, the diﬀerence in the radius values is

clearly visible. However, in our proposed architecture, the copy task is per-

formed before feeding the images to the discriminator. Hence, the generator

is forced to ﬁll the missing parts with the correct range of radius, and so it

generates a continuous map.

5.4 Construction of the 3D Meshes

After the tiled radius and the corresponding color maps were obtained, they

were used to construct 3D meshes. The 3D models of oak, beech and robinia

trees without and with the color are shown in Fig. 15 and Fig. 16. From

the 3D mesh without color, the ﬁner structural details of the tree barks can

be observed on the real and the GAN generated meshes. The GANs have

captured the nuances on the oak, beech and robinia barks, and the synthetic

3D models look as realistic as the original 3D model. Since there is more

control over the bark color generated, it results in realistic looking 3D mod-

els. More results can be seen in appendix D, E and F. To demonstrate the

robustness of the models to the input smooth bark geometry, we have gen-

erated the barks with bi= 0 and for three diﬀerent input radius values. The

radius values are 0.20, 0.50 and 0.80, on which the network has successfully

generated the ﬁne bark patterns and color as shown in Fig. 16.

To assess the quality of our generated 3D models of the tree barks, a

survey was conducted , where we had a randomly mixed up set of oak and

beech trees and we asked our participants to diﬀerentiate the real barks

(3D meshes of the barks reconstructed from the real tree images) from the

generated barks. 160 participants contributed in the survey, and were given

unlimited time to diﬀerentiate the barks. Among our participants 34 were

forestry experts. At the beginning,the participants were shown samples of

real and generated barks from both the oak and beech barks. Our survey

(a) (b) (c)

Figure 13: (a) The tiled image with discontinuities (b) Image after the discon-

tinuities are masked (c) Continuous image obtained from the tiling network

Input Edge-Connect Ours

Figure 14: Example of a situation where Edge-Connect [24] fails

Oak Beech Robinia

Generated colorlessGenerated colorReal colorlessReal color

Figure 15: Synthetic and real barks

consisted of 12 samples, 6 in oak and 6 in beech. Based on the survey

analysis the precision and recall of the participants to detect real trees was

calculated. The results are given in Table. 4. We recall that: precision =

Bark 1 Bark 2 Bark 3

OakBeechRobinia

Figure 16: Synthetic oak, beech and robinia barks generated with bi= 0 and

with three diﬀerent radius values in the smooth bark. The value of ain Eq.

1 for bark 1 is 0.20, bark 2 is 0.50 and bark 3 is 0.80. Irrespective of the

radius values used, the network is able to generate realistic barks and colors.

T P /(T P +F P ), and recall =T P /(T P +F N ).

From Table. 4 we can see that the precision of the participants is around

55%, which shows that a large portion of them mistook our synthetic trees for

real trees. Additionally, the recall which evaluates the ability of the classiﬁer

to ﬁnd all the positive samples, is close to 0.6, which also indicates that the

participants have trouble diﬀerentiating the real barks from the synthetics

one. All in all this results prove that our GAN generated trunks are realistic

enough to fool around 40% of people, expert and layman alike.

6 Limitations and Future Work

As of today, our work faces some limitations. The ﬁrst limitation comes from

the very nature of our approach. Because we are using a data-driven method-

Table 4: Precision and recall for the detection of real 3D models of oak and

beech trees. Higher is better.

laymans experts

Precision Recall Precision Recall

Oak 0.54 0.60 0.51 0.56

Beech 0.60 0.63 0.55 0.59

Overall 0.57 0.615 0.53 0.585

ology, we need a signiﬁcant amount of data to train our neural network on.

This is particularly true with the robinia and oak trees, which had very little

moss and lichen on them, making the network incapable of synthetizing the

Stanford bunny. The second limitation in this work is related to the tiling

techniques that could be improved on: the junction between the tiles are not

perfectly smooth over the height. New methodologies, such as SRFlow [22]

or GLOW [17] could be investigated. Yet, this tiling problem could also be

partially solved by acquiring the trees under controlled illumination. This

would give uniform colors and illumination across the dataset making the

GANs generation much more homogeneous on the color side. Furthermore,

higher quality recording hardware would deﬁnitely provide higher resolution

color and surface maps and hence sharper GAN results. Our method works

well on trees with cylindrical structure, however it can’t handle the case of

non-cylindrical trees such as those with buttress roots. In an eﬀort to in-

crease the realism of our method, future work will focus on applying diﬀerent

style to generate barks of diﬀerent age: allowing to generate a broader di-

versity of tree barks. Finally we will collect a larger dataset with more trees

and more species.

7 Conclusion

In this work, we propose a novel pipeline to generate realistic tree barks

from a set of images. Deviating from the conventional methods, we use a

data-driven generation method relying on deep neural networks to generate

the tree barks. Once a deep neural network is trained, there is not much

of manual intervention required to generate new barks. This is in contrast

to traditional methods which requires manual tuning of parameters to gen-

erate barks. Additionally, our method gives extensive control to the users

to generate additional features such as moss, scars, etc. We ﬁrst created an

eﬃcient method using self-supervised learning to suppress the background

portions of the image sequences that yields better quality tree models, and

at the same time, reconstructs the trees faster. We also proposed a GAN

architecture called Depth-Reinforced-SPADE. This GAN takes as an input

both a label map and a smooth surface. From these two inputs, it gener-

ates a detailed bark surface and the color map simultaneously. Instead of

using two separate GANs to generate the surface/depth and color, Depth-

Reinforced-SPADE can generate both simultaneously, thus saving time and

computational resources. We also presented a method to tile smaller images

obtained from our GANs to produce a continuous map of the surface and

color of the tree barks. Our method was successfully tested on trees with

smooth and ridged barks, which yielded high-quality tree barks, implying

that our method could be easily extended to other tree bark types to obtain

genuine looking barks. Finally, we demonstrated that our GANs enable users

to synthesize bark tiles with user-deﬁned inputs like a bunny or a heart. The

only limitation being the diversity of the datatset the GAN was trained on.

8 Acknowledgement

This research was made possible with the support from the French National

Research Agency, in the framework of the project WoodSeer, ANR-19-CE10-

011.

References

[1] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein gan,” arXiv

preprint arXiv:1701.07875, 2017.

[2] J. Bloomenthal, “Modeling the mighty maple,” ACM SIGGRAPH Com-

puter Graphics, vol. 19, no. 3, pp. 305–311, 1985.

[3] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroﬀ, and H. Adam, “Encoder-

decoder with atrous separable convolution for semantic image segmen-

tation,” in Proceedings of the European conference on computer vision

(ECCV), 2018, pp. 801–818.

[4] X. Chen, B. Neubert, Y.-Q. Xu, O. Deussen, and S. B. Kang, “Sketch-

based tree modeling using markov random ﬁeld,” in ACM SIGGRAPH

Asia 2008 papers, 2008, pp. 1–9.

[5] Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo, “Star-

gan: Uniﬁed generative adversarial networks for multi-domain image-to-

image translation,” in Proceedings of the IEEE conference on computer

vision and pattern recognition, 2018, pp. 8789–8797.

[6] B. Desbenoit, E. Galin, and S. Akkouche, “Modeling cracks and frac-

tures,” The Visual Computer, vol. 21, no. 8-10, pp. 717–726, 2005.

[7] A. Frühstück, I. Alhashim, and P. Wonka, “Tilegan: synthesis of

large-scale non-homogeneous textures,” ACM Transactions on Graphics

(TOG), vol. 38, no. 4, pp. 1–11, 2019.

[8] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,

S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,”

in Advances in neural information processing systems, 2014, pp. 2672–

2680.

[9] J. Guo, S. Xu, D.-M. Yan, Z. Cheng, M. Jaeger, and X. Zhang, “Re-

alistic procedural plant modeling from multiple view images,” IEEE

transactions on visualization and computer graphics, 2018.

[10] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image

recognition,” in Proceedings of the IEEE conference on computer vision

and pattern recognition, 2016, pp. 770–778.

[11] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochre-

iter, “Gans trained by a two time-scale update rule converge to a local

nash equilibrium,” in Advances in neural information processing sys-

tems, 2017, pp. 6626–6637.

[12] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image transla-

tion with conditional adversarial networks,” in Proceedings of the IEEE

conference on computer vision and pattern recognition, 2017, pp. 1125–

1134.

[13] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time

style transfer and super-resolution,” in European conference on computer

vision. Springer, 2016, pp. 694–711.

[14] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing

of gans for improved quality, stability, and variation,” arXiv preprint

arXiv:1710.10196, 2017.

[15] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture

for generative adversarial networks,” in Proceedings of the IEEE Confer-

ence on Computer Vision and Pattern Recognition, 2019, pp. 4401–4410.

[16] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila,

“Analyzing and improving the image quality of stylegan,” arXiv preprint

arXiv:1912.04958, 2019.

[17] D. P. Kingma and P. Dhariwal, “Glow: Generative ﬂow with invert-

ible 1x1 convolutions,” in Advances in neural information processing

systems, 2018, pp. 10 215–10 224.

[18] J. Kratt, M. Spicker, A. Guayaquil, M. Fiser, S. Pirk, O. Deussen, J. C.

Hart, and B. Benes, “Woodiﬁcation: User-controlled cambial growth

modeling,” in Computer Graphics Forum, vol. 34, no. 2. Wiley Online

Library, 2015, pp. 361–372.

[19] P. Laitoch, “Procedural modeling of tree bark,” Bachelor thesis, Charles

University, Prague, 2018.

[20] C. Li, O. Deussen, Y.-Z. Song, P. Willis, and P. Hall, “Modeling and

generating moving trees from video,” ACM Transactions on Graphics

(TOG), vol. 30, no. 6, pp. 1–12, 2011.

[21] Z. Liu, K. Wu, J. Guo, Y. Wang, O. Deussen, and Z. Cheng, “Single

image tree reconstruction via adversarial network,” Graphical Models,

vol. 117, p. 101115, 2021.

[22] A. Lugmayr, M. Danelljan, L. Van Gool, and R. Timofte, “Srﬂow:

Learning the super-resolution space with normalizing ﬂow,” arXiv

preprint arXiv:2006.14200, 2020.

[23] X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. Paul Smolley, “Least

squares generative adversarial networks,” in Proceedings of the IEEE

International Conference on Computer Vision, 2017, pp. 2794–2802.

[24] K. Nazeri, E. Ng, T. Joseph, F. Z. Qureshi, and M. Ebrahimi, “Edge-

connect: Generative image inpainting with adversarial edge learning,”

arXiv preprint arXiv:1901.00212, 2019.

[25] B. Neubert, T. Franken, and O. Deussen, “Approximate image-based

tree-modeling using particle ﬂows,” in ACM SIGGRAPH 2007 papers,

2007, pp. 88–es.

[26] M. Okabe, S. Owada, and T. Igarashi, “Interactive design of botani-

cal trees using freehand sketches and example-based editing,” in ACM

SIGGRAPH 2006 Courses, 2006, pp. 18–es.

[27] T. Park, M.-Y. Liu, T.-C. Wang, and J.-Y. Zhu, “Semantic image

synthesis with spatially-adaptive normalization,” in Proceedings of the

IEEE Conference on Computer Vision and Pattern Recognition, 2019,

pp. 2337–2346.

[28] S. Pix4D, “Pix4dmapper 4.1 user manual,” Pix4D SA: Lausanne,

Switzerland, 2017.

[29] L. Quan, P. Tan, G. Zeng, L. Yuan, J. Wang, and S. B. Kang, “Image-

based plant modeling,” in ACM SIGGRAPH 2006 Papers, 2006, pp.

599–604.

[30] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation

learning with deep convolutional generative adversarial networks,” arXiv

preprint arXiv:1511.06434, 2015.

[31] T. Sattler, W. Maddern, C. Toft, A. Torii, L. Hammarstrand, E. Sten-

borg, D. Safari, M. Okutomi, M. Pollefeys, J. Sivic, et al., “Bench-

marking 6dof outdoor visual localization in changing conditions,” in

Proceedings of the IEEE Conference on Computer Vision and Pattern

Recognition, 2018, pp. 8601–8610.

[32] J. L. Schonberger and J.-M. Frahm, “Structure-from-motion revisited,”

in Proceedings of the IEEE Conference on Computer Vision and Pattern

Recognition, 2016, pp. 4104–4113.

[33] J. L. Schönberger, E. Zheng, J.-M. Frahm, and M. Pollefeys, “Pixelwise

view selection for unstructured multi-view stereo,” in European Confer-

ence on Computer Vision. Springer, 2016, pp. 501–518.

[34] W. J. Schroeder, B. Lorensen, and K. Martin, The visualization toolkit:

an object-oriented approach to 3D graphics. Kitware, 2004.

[35] P. Tan, T. Fang, J. Xiao, P. Zhao, and L. Quan, “Single image tree

modeling,” ACM Transactions on Graphics (TOG), vol. 27, no. 5, pp.

1–7, 2008.

[36] B. Tao, Z. Changshui, and S. Wei, “A multi-agent based approach to

modelling and rendering of 3d tree bark textures,” in European Confer-

ence on Artiﬁcial Life. Springer, 2003, pp. 572–579.

[37] T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, A. Tao, J. Kautz, and B. Catan-

zaro, “High-resolution image synthesis and semantic manipulation with

conditional gans,” in Proceedings of the IEEE conference on computer

vision and pattern recognition, 2018, pp. 8798–8807.

[38] X. Wang, L. Wang, L. Liu, S. Hu, and B. Guo, “Interactive model-

ing of tree bark,” in 11th Paciﬁc Conference onComputer Graphics and

Applications, 2003. Proceedings. IEEE, 2003, pp. 83–90.

[39] K. Xie, F. Yan, A. Sharf, O. Deussen, H. Huang, and B. Chen, “Tree

modeling with real tree-parts examples,” IEEE transactions on visual-

ization and computer graphics, vol. 22, no. 12, pp. 2608–2618, 2015.

[40] H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention gen-

erative adversarial networks,” arXiv preprint arXiv:1805.08318, 2018.

[41] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing

network,” in Proceedings of the IEEE conference on computer vision and

pattern recognition, 2017, pp. 2881–2890.

[42] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image

translation using cycle-consistent adversarial networkss,” in Computer

Vision (ICCV), 2017 IEEE International Conference on, 2017.

A Learning Objective

We use the same loss functions as pix2pixHD [37], namely the GAN loss,

feature-matching loss, perceptual loss and the KL-divergence loss..

A.1 GAN Loss

The objective of the cGAN is to optimize the minmax loss function of the

GAN. The GAN loss for the surface generator is given by

LGANs=E(s)[log(Ds(x, s))] + E(x)[log(1 −Ds(x, Gs(x)))] (3)

where sis the real surface map. Dsis the prediction from the surface dis-

criminator. xis the input smooth surface map, Gs(x)is the surface map

generated from the surface generator. Similarly, the GAN loss of the color

generator is given by:

LGANc=E(l,c)[log(Dc(l, c))] + Es[log(1 −Dc(l, Gc(l, x)))] (4)

where lis the input semantic label, cis the real color image, Gc(l, c)is the

color map generated from the color generator. Dc(l, c)is the prediction from

the discriminator when provided with the semantic labels and the real color

image. Dc(l, Gc(l, x)) is the prediction from the discriminator when provided

with the semantic labels and the generated color image. The total GAN loss

is the sum of the surface and color GAN loss

LGAN =LGANs+LGANc(5)

A.2 Feature Matching Loss

The feature-matching loss is calculated between the real and synthesized

images. We extract the features of the real and synthetic images from the

multi-layer discriminator and calculate the L1 distance between the extracted

features. The feature-matching loss for the surface generator is

LF Ms=E(s)

i=1

[||D(i)

s(x, s)−D(i)

s(x, Gs(x))||1](6)

where Tis the number of layers in the discriminator and Niis the number

of elements in each layer. Similarly, the feature-matching loss for the color

generator is

LF Mc=E(l,c)

i=1

[||D(i)

c(l, c)−D(i)

c(l, Gc(l, x))||1](7)

The total feature-matching loss is

LF M =LF Ms+LF Mc(8)

A.3 Perceptual Loss

We use a pre-trained VGG network to get pair-wise features of the real

and synthetic images at diﬀerent layers of the network and calculate the L1

distance between the maps. This is given by Lp.

A.4 KL-Divergence Loss

The KL-Divergence Loss is calculated for the encoder of the network. This

is given by

LKD =DK L[Q(z|x)||P(z|x)] (9)

where x is the input smooth surface map, z is the latent space obtained from

the encoder. P is the original distribution and Q is the simpler estimation

of it.

The total loss is the sum of all the loss functions

Ltotal =LGAN +LF M +Lp+LKD (10)

B Training Details

B.1 pix2pix

We used the original parameters proposed in [12] for our experiments the

code can be found here https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix.

git. The learning rates of the generator and discriminator were 0.0002. We

used the Adam solver with β1= 0 and β2= 0.999, with a batch size of 8.

The experiments were conducted on an 8 GB Nvidia GeForce GTX 1080.

B.2 pix2pixHD

We used the original parameters of [37] for training the network. The

learning rate was 0.0002, and we used an Adam optimizer with β1= 0.5

and β2= 0.999. The code for network can be found here: https://github.

com/NVIDIA/pix2pixHD.git. The experiments were conducted on a 11 GB

Nvidia GeForce GTX 1080Ti, with a batch size of 4.

B.3 SPADE and Depth-Reinforced-SPADE

We used the original parameters proposed in [27] for our experiments, the

code used to train the SPADE GANs can be found here: https://github.

com/NVlabs/SPADE.git. The learning rates of the generator and discrimi-

nator were 0.0001 and 0.0004 respectively. We used the Adam solver with

β1= 0 and β2= 0.999. The experiments on the tree bark dataset were

conducted on a 11 GB Nvidia GeForce GTX 1080Ti, with a batch size of 4.

B.4 Edge-Connect

The code for training the network can be found in: https://github.com/

knazeri/edge-connect.git. The learning rate of the generator is 0.0001.

We used the Adam solver with β1= 0 and β2= 0.999. The experiments

on the tree bark dataset were conducted on a 11 GB Nvidia GeForce GTX

1080Ti, with a batch size of 4.

C Eﬀect of limiting depth passed to color genera-

tor.

To study the eﬀect of convolution layers that limit the passage of depth infor-

mation between the depth and color generator in our decoder architecture,

we conduct an experiment on the Oak bark dataset. When all the layers

from the depth generator are transferred to the color generator unrestricted,

the semantic information gets suppressed by the depth, as seen in Figure 17.

Thus, the network generates the moss even on the regions speciﬁed for the

bark. Whereas when the depth information is transferred selectively using

the convolution layers, the network is able to generate the moss and bark at

their respective locations without overlapping. Thus, the convolution layers

ensure that there is a balance between the semantic and depth information,

which ensures good quality generation.

Without With

Label convolution convolution

layer layer

Figure 17: Eﬀect of limiting the number of channels passed between the

depth generator and the color generator. Left: Input label, indicating moss

and bark. Center: When no convolution layer is used, the semantic informa-

tion is suppressed by the depth. Right: When a convolution layer is used,

both the depth and the semantic information are used to generate the bark.

D Method results: Oak

Figure 18: Point clouds

Figure 19: Color Maps and Surface Maps

Figure 20: Generated Tiles

Figure 21: Synthetic Trunks

E Method results: Beech

Figure 22: Point clouds

Figure 23: Color Maps and Surface Maps

Figure 24: Generated Tiles

Figure 25: Synthetic Trunks

F Method results: Robinia

Figure 26: Point clouds

Figure 27: Color Maps and Surface Maps

Figure 28: Generated Tiles

Figure 29: Synthetic Trunks

ResearchGate has not been able to resolve any citations for this publication.

Semantic Image Synthesis With Spatially-Adaptive Normalization

Conference Paper

Full-text available

Jun 2019

EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning

Conference Paper

Full-text available

Oct 2019

Over the last few years, deep learning techniques have yielded significant improvements in image inpainting. However , many of these techniques fail to reconstruct reasonable structures as they are commonly over-smoothed and/or blurry. This paper develops a new approach for image in-painting that does a better job of reproducing filled regions exhibiting fine details. We propose a two-stage adversarial model EdgeConnect that comprises of an edge generator followed by an image completion network. The edge generator hallucinates edges of the missing region (both regular and irregular) of the image, and the image completion network fills in the missing regions using hallucinated edges as a priori. We evaluate our model end-to-end over the publicly available datasets CelebA, Places2, and Paris StreetView, and show that it outperforms current state-of-the-art techniques quantitatively and qualitatively.

TileGAN: Synthesis of Large-Scale Non-Homogeneous Textures

Preprint

Full-text available

Apr 2019

We tackle the problem of texture synthesis in the setting where many input images are given and a large-scale output is required. We build on recent generative adversarial networks and propose two extensions in this paper. First, we propose an algorithm to combine outputs of GANs trained on a smaller resolution to produce a large-scale plausible texture map with virtually no boundary artifacts. Second, we propose a user interface to enable artistic control. Our quantitative and qualitative results showcase the generation of synthesized high-resolution maps consisting of up to hundreds of megapixels as a case in point.

High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs

Conference Paper

Full-text available

Jun 2018

Single Image Tree Reconstruction via Adversarial Network

Article

Jul 2021
GRAPH MODELS

Realistic 3D tree reconstruction is still a tedious and time-consuming task in the graphics community. In this paper, we propose a simple and efficient method for reconstructing 3D tree models with high fidelity from a single image. The key to single image-based tree reconstruction is to recover 3D shape information of trees via a deep neural network learned from a set of synthetic tree models. We adopted a conditional generative adversarial network (cGAN) to infer the 3D silhouette and skeleton of a tree respectively from edges extracted from the image and simple 2D strokes drawn by the user. Based on the predicted 3D silhouette and skeleton, a realistic tree model that inherits the tree shape in the input image can be generated using a procedural modeling technique. Experiments on varieties of tree examples demonstrate the efficiency and effectiveness of the proposed method in reconstructing realistic 3D tree models from a single image.

SRFlow: Learning the Super-Resolution Space with Normalizing Flow

Chapter

Oct 2020

Super-resolution is an ill-posed problem, since it allows for multiple predictions for a given low-resolution image. This fundamental fact is largely ignored by state-of-the-art deep learning based approaches. These methods instead train a deterministic mapping using combinations of reconstruction and adversarial losses. In this work, we therefore propose SRFlow: a normalizing flow based super-resolution method capable of learning the conditional distribution of the output given the low-resolution input. Our model is trained in a principled manner using a single loss, namely the negative log-likelihood. SRFlow therefore directly accounts for the ill-posed nature of the problem, and learns to predict diverse photo-realistic high-resolution images. Moreover, we utilize the strong image posterior learned by SRFlow to design flexible image manipulation techniques, capable of enhancing super-resolved images by, e.g., transferring content from other images. We perform extensive experiments on faces, as well as on super-resolution in general. SRFlow outperforms state-of-the-art GAN-based approaches in terms of both PSNR and perceptual quality metrics, while allowing for diversity through the exploration of the space of super-resolved solutions. Code: git.io/Jfpyu.

Analyzing and Improving the Image Quality of StyleGAN

Conference Paper