ArticlePDF Available

Abstract and Figures

3D models of trees are ubiquitous in video games, movies, and simulators. It is of paramount importance to generate high quality 3D models to enhance the visual content, and increase the diversity of the available models. In this work, we propose a methodology to create realistic 3D models of tree barks from a consumer-grade hand-held camera. Additionally, we present a pipeline that makes use of multi-view 3D Reconstruction and Generative Adversarial Networks (GANs) to generate the 3D models of the barks. We introduce a GAN referred to as the Depth-Reinforced-SPADE to generate the surfaces of the tree barks and the bark color concurrently. This GAN gives extensive control on what is being generated on the bark: moss, lichen, scars, etc. Finally, by testing our pipeline on different Northern-European trees whose barks exhibit radically different color patterns and surfaces, we show that our pipeline can be used to generate a broad panel of tree species’ bark.
Content may be subject to copyright.
A Data Driven Approach to Generate Realistic 3D
Tree Barks
Aishwarya Venkataramanan,1Antoine Richard,2Cédric Pradalier3
Abstract
3D models of trees are ubiquitous in video games, movies, and
simulators. It is of paramount importance to generate high quality 3D
models to enhance the visual content, and increase the diversity of the
available models. In this work, we propose a methodology to create
realistic 3D models of tree barks from a consumer-grade hand-held
camera. Additionally, we present a pipeline that makes use of multi-
view 3D Reconstruction and Generative Adversarial Networks (GANs)
to generate the 3D models of the barks. We introduce a GAN referred
to as the Depth-Reinforced-SPADE to generate the surfaces of the
tree barks and the bark color concurrently. This GAN gives extensive
control on what is being generated on the bark: moss, lichen, scars, etc.
Finally, by testing our pipeline on different Northern-European trees
whose barks exhibit radically different color patterns and surfaces, we
show that our pipeline can be used to generate a broad panel of tree
species’ bark.
1 Introduction
Tree modeling is an important aspect of computer graphics: its wide range of
applications in computer-generated scenes and video games makes it a key-
element of today’s digital scene. Furthermore, the forestry industry, where
this project originates from, could benefit from high quality synthetic data to
increase the performances of data-driven, non-destructive quality assessment
techniques.
The computer graphics literature consists of various methods to generate
high quality trees, but only a handful focuses on modeling the tree barks
realistically [19, 38]. The difficulties in generating realistic tree barks arise
1Georgia Institute of Technology, Atlanta, USA, Present address:Université de Lor-
raine, CNRS, LIEC, F-57000 Metz, France
2Georgia Institute of Technology, Atlanta, USA
3Georgia Tech Lorraine, CNRS IRL 2958, F-57070 Metz, France
1
from the complex appearance of the bark surfaces and color. Within a same
tree specie, the barks often present a wide variety of patterns and intricate
details. Consequently, realistic bark modeling is an active field of research.
In the literature, the majority of the works [4,9,25, 26,35,39] use traditional
3D modeling methods to generate synthetic tree barks. In contrast to the
classical modeling methods, we explore a deep-learning-based methodology
to generate high quality 3D models of tree barks. More precisely, we propose
a methodology to generate realistic tree barks using 3D Reconstruction and
deep Generative Adversarial Networks (GANs) [8]. 3D Reconstruction is
used to generate tree barks of real trees, while GANs are trained on real tree
barks and used to generate synthetic barks. The GANs used here generate
tiles that need to be merged together to form a continuous map that is then
rolled onto a tree trunk. Because of the rolling and the tiling, there tends
to be a discontinuity at the place where the tiles meet. We alleviate this
using a GAN based tiling method. Furthermore, GANs offer a lot of control
on how the bark is generated, allowing the users to add scars, cut branches,
moss and many other details easily. All in all, the GANs proposed here allow
the users to modify the texture and features of the trees trunks to suit their
needs easily and realistically. Our proposed pipeline can be divided into the
following two steps:
3D reconstruction of real barks to obtain their geometry and color.
GANs to generate intricate bark details and colors in order to create
synthetic tree barks.
Using our method, we generate barks of Oak, Beech and Robinia trees that
have strongly contrasting structures, demonstrating the broad application
scope of our method. Furthermore, we showcase the robustness of our
method by training the GANs on a dataset acquired under various light-
ing conditions. Our experimental results show that our approach generates
realistic looking features for trees with both smooth and deeply ridged sur-
faces. When provided with smooth tree bark models with no surface features,
this method could be used to generate thousands of realistic looking 3D bark
models for several types of trees. Overall, our contributions are as follows:
We propose a new pipeline for generating 3D models of realistic tree
barks using 3D Reconstruction and GANs. This method offers ex-
tensive control over the generated bark allowing to synthesize more
features other than bark patterns such as moss, scars, etc.
We demonstrate a method to automatically acquire labelled dataset to
perform deep-learning based background removal on our trees.
2
We introduce a GAN network: Depth-Reinforced-SPADE, that can
generate multi-modal outputs and also generates the surface geometry
information and the color of the barks simultaneously.
To the best of our knowledge, this is the first attempt at generating realistic
tree barks using deep learning.
Code is available in: https://github.com/vaishwarya96/Tree-barks-generation.
git and dataset in https://www.dropbox.com/scl/fo/19mktx02p49eq2vxx1bf1/
h?dl=0&rlkey=ghsaqiznjw5kgdnig4tot2jw5
2 Related Work
2.1 3D Modeling of Trees and Tree Barks
Numerous works have focused on generating complete tree models, including
the branches and the leaves. Amongst these, [39] uses tree-cuts from real
tree models to generate synthetic trees and [4,26, 35] use sketch-based model-
ing. [9,25] use a procedural approach, but they do not model the bark. Many
works [9, 20, 25, 29, 35] model trees or plants from images. [21] uses a com-
bination of GAN and procedural modeling to generate tree models. While
these methods have been successful in generating the overall structures of
the trees, they do not model the realistic bark textures which is important
for identifying the different tree species. Thus, our focus in this study is to
develop a tree-specie specific bark generator.
[6, 18, 36] model cracks and basic structures, but they are not capable
of imitating the bark patterns of a specific species. [2] proposes to use X-
ray images of a real bark to generate Maple bark structures. While this
method is interesting, it is cumbersome and requires X-ray images of trees
which are hard to come by. [38] uses texton analysis to generate barks from
a single photograph, and [19] uses procedural modeling to generate barks.
Contrary to these methods, we use deep-learning to generate realistic tree
barks. Learning based approach facilitates the users in generating several
unique barks without much of manual intervention. Additionally, the ap-
proach presented here allows the user to control what will be generated on
the trees’ barks.
2.2 Generative Adversarial Networks
The GAN introduced in [8] uses neural networks to generate the images.
However, using neural networks to generate images are limited to simple
3
datasets. To solve this, DCGAN [30] incorporates Deep Convolutional Neu-
ral Networks (CNN) in the generator and discriminator to generate complex
images. Using deep CNNs boosted the quality of the generated images com-
pared to using neural networks. Since then, novel loss functions [1, 23] and
deep CNN architectures [1416] have been evolving to stabilize the train-
ing, and to generate higher quality images. For instance, [14] could achieve
high resolution images of human faces by training a model on images of
progressively increasing resolution.
With the introduction of pix2pix [12], conditional GANs have attracted
wide-spread attention due to their ability to generate good-quality images
from an input label. Pix2pix is an image-to-image translation network, where
the input label is an image and the network generates an image corresponding
to the input. Various image-to-image translation networks have been pro-
posed in [5, 27, 37, 40, 42] since the success of pix2pix. CycleGAN [42] uses a
cycle consistency loss to perform image-to-image translation on an unpaired
set of training data. StarGAN [5] makes use of mask vector method to trans-
late images from one domain to another, when the training data is from two
different domains. Self-Attention GAN [40] adds self-attention blocks to the
generator and discriminator to generate fine details from all feature locations.
Pix2pixHD [37] uses a multi-scale generator and discriminator to generate
high-resolution images from input images, while SPADE [27] uses a spa-
tially adaptive normalization to retain the semantic information throughout
the generator network while generating photo-realistic images from semantic
maps.
Additionally, as part of our pipeline, we have to tile the images generated
by the GANs to obtain a continuous image. The state-of-the-art GAN net-
work for tiling images is TileGAN [7]. However, TileGAN requires the images
to be generated by Progressive GAN [14], and it tends to modify large parts
of the original images to be tiled, which is not desirable for our objective.
Hence, we developed a GAN based on in-painting for our objective.
3 Generating Tree Barks
In the coming sections, we explain the pipeline to generate realistic tree
barks from sequences of images. The block diagram of our pipeline is given
in Fig. 1. Our pipeline is divided into two stages:
In the first stage, we acquire the dataset, perform background suppres-
sion on the images, conduct 3D reconstruction, and extract the surface
geometry and the color of the bark.
4
In the second stage, we use GANs to generate the synthetic surface
geometry and bark colors and provide control over the generation of
the moss, scars, lichens, etc.
Figure 1: Our proposed pipeline for generating 3D models of real and syn-
thetic trees
3.1 3D Reconstruction, Surface Geometry and Color Extrac-
tion
3.1.1 Dataset Acquisition
As shown in Fig. 1, the first step in the pipeline is to capture image se-
quences of trees. To perform 3D reconstruction and triangulation, the image
sequences of trees must be captured from multiple camera positions. To make
our method affordable and as general as possible, we rely on consumer-grade
monocular cameras. Initially, while testing with smartphone cameras, mo-
tion blur was introduced due to jerking of the hand, which deteriorated the
quality of the 3D reconstruction. Hence, we use the DJI Osmo pocket, an in-
expensive gyro-stabilized camera. Gyro-stabilization counteracts the camera
shakes, and gives sharper and better quality images.
5
The videos were acquired by slow movement of the camera to avoid mo-
tion blur. First, the vertical portions of the trees were captured, followed by
a slight horizontal rotation of the camera. This sequence was repeated to
cover the entire circumference of the tree.
Since we capture the images using a single monocular camera, the global
scale information of the tree is not preserved in the final 3D point cloud. To
get the scale information, we attach square markers of 3 by 3 cm onto parts
of the trees while capturing the video. We place the markers onto the part
of the tree near the ground so that it does not obstruct the mid portion of
the trunk which is required for extracting the surface geometry and color
information. Once the 3D point cloud of the reconstructed tree is obtained,
the known marker dimension is used to scale the tree accordingly.
3.1.2 Background Suppression
To perform the reconstruction, we rely on Pix4DMapper [28]: a feature-based
3D reconstruction tool. With this type of tool, removing the background re-
duces the number of features points, that not only makes the reconstruction
faster but also more accurate. Background segmentation reduces the risk
of confusing the object of interest with the associated background. This
is especially true in images, since they are 2D projections of the 3D world
with both object and background ending up on the same plane. Hence,
we developed a semi-supervised learning method to remove the background
portions of the images and retain only the tree barks. In this method we
first create a coarse dataset using traditional computer vision methods, and
then use this dataset to train a Deep Neural Network. The advantage of us-
ing standard computer vision algorithm to make a coarse dataset is that no
manual labeling is required. Considering that our images were acquired us-
ing a gyro-stabilized camera, we can assume that the transition between the
consecutive frames is gradual and smooth. Under this assumption, we de-
veloped a mechanism to automatically label the images as bark (foreground)
or non-bark (background), using optical flow.
To remove the background portions from the images, we use PSPNet
[41], a deep learning segmentation network. It is first trained on the CMU
dataset [31], and fined-tuned on our automatically acquired dataset.
From each of the consecutive image frames, we obtain the horizontal and
vertical flow vectors and use these to calculate the magnitude and direction
of the flow. As our images are acquired using a gyro-stabilized camera,
we can hypothesize the following: when the camera translates along the
tree, the direction of the vector is vertical and the magnitude of the flow
6
vectors is greater in the foreground than in the background, seeing that the
objects closer to the camera move faster than the far-away objects. On the
contrary, when the camera rotates, the magnitude of the flow is greater in
the background since the far-away objects move faster than the objects closer
to the camera. A sample of the flow magnitude when the camera translates
and rotates is shown in Fig. 2.
(a) (b)
Figure 2: Magnitude of optical flow calculated on video sequences: when
the camera translates, the magnitude of the flow is greater in the foreground
; (b) when the camera rotates, the magnitude of the flow is larger in the
background
We use an adaptive threshold on the magnitude of the flow vectors to
mask the background portions in the images. We calculate the magnitude
of the maximum flow vector for each frame and compare it with the other
flow vectors for the same frame. When the camera translates vertically,
and the magnitude of the flow vector is greater than 50% of the maximum
magnitude, we label those regions as the foreground and the other regions to
be the background. The adaptive threshold method worked best when the
camera translates and failed for the rotation case. Hence we use the labels
when the camera translates and ignore when it rotates. For a more robust
method to perform background suppression than optical flow, we train a
deep learning segmentation network.
After training, the images extracted from all the trees’ videos are sent
through the network. The output from the segmentation network is a label
map with two classes: bark-regions and non-bark regions. Sometimes, the
segmentation network detects trees in the background and retains them.
To remove those trees, we find the largest connected region, which is the
tree bark to be reconstructed. This region is retained while the others are
masked. Finally, we dilate the final mask to include the edges of the tree
7
Figure 3: Pipeline used for suppressing the background regions in the tree
images. The images are passed through a segmentation network to remove
the background portions of the images. Then the largest contour is detected
and retained. Finally the mask obtained is used to acquire tree bark images
with the background portions erased.
bark and apply the resulting mask on to the original images to suppress the
background regions, as shown in Fig. 3.
If we had acquired the images with a 3D camera, such as Kinect or Re-
alsense, this process would not have been necessary. However, these cameras
are cumbersome and require taking a laptop or small embedded computer to
record the trees, which is impractical in forests. Furthermore, the RGB and
depth images acquired by these cameras are of low quality, particularly in
outdoor conditions. Hence, if one wanted to extract the foreground objects
using depth information, we would favor a small 3D laser-scanner, such as a
Livox Mid-40, which we could remap to the cameras pixels.
3.1.3 3D Reconstruction
The images, with the background portions suppressed are given to Pix4dMapper [28]
as it is faster and more accurate than Colmap [32, 33] to generate a dense
3D point cloud of the tree.
8
3.1.4 Surface Geometry and Color Extraction
Once the 3D dense point cloud of the tree is obtained, we extract the surface
geometry and the bark color from the point cloud. To do this, we approxi-
mate the 3D point cloud of the tree to a series of circles stacked vertically.
In the cylindrical coordinates, we discretize the point cloud along the height
and angle. The height is discretized in steps of 1 mm and angle in steps of
0.5. For each of the circles obtained from the discretized height, we obtain
the center point using a circular least-square fit.
To extract the surface geometry information, we construct a 2D map in
polar coordinates, with the radius measured from the center point of the tree
as a function of the height and the angle. Since the height and angle are
discretized, there may be many points corresponding to a particular value of
height and angle. Thus, we take the mean value of the radius of the points
to construct the radius map. The mean radius is calculated as a Gaussian
weighted sum of the distance between the point and the center of the cell in
the map divided by the sum of weights.
Similarly, to extract the color of the bark, we calculate the mean values
of R, G and B as a function of height and angle, where instead of the radius,
the R, G, B values of the point are used.
Sometimes, there may be cells in the map where there is no point. This
is much more prevalent in the case of oak and robinia trees as the ridges
on the bark surface creates occlusions which are not reconstructed in the
point cloud. As the bark of the beech tree is much smoother, the occlusions
are minimal. The presence of cells with no point creates empty regions in
the final map. To avoid this, we construct a multi-level image pyramid.
The lowest level in the pyramid consists of the original resolution map, and
the highest level consists of map with resolution 1
27of the original map. For
each level in the pyramid, if there are cells with no point, the radius and
RGB values are interpolated from the map of the next level up. Fig. 4
demonstrates the need for interpolating. Fig. 4(a) is the oak bark before
interpolation. The black regions in the image are due to empty cells. After
interpolation, the empty cells are filled with values from the higher level in
the pyramid, which can be seen in Fig. 4(b)
3.2 Synthetic Bark Surface and Color Generation
In this section, we will discuss the second stage of our pipeline: generating
synthetic surface and color of the barks using conditional GAN(cGAN). For
each tree specie, we train a separate GAN that specialises to the particular
9
(a) (b)
Figure 4: Example of an oak bark (a) before interpolation, (b) after inter-
polation
tree type and generates the fine bark details corresponding to that tree specie.
For training the GAN we smooth the ground truth real barks until no fine
bark details are present. We provide the smooth bark as input to the GAN
and the network learns to generate the fine bark details on this by using the
real barks with fine details as ground truth. Later, the user could provide
maps of smooth tree trunks to the network and use the trained GAN models
to generate thousands of synthetic barks.
To test our approach, we use empirical mathematical models to generate
thousands of smooth tree bark structures. These mathematical models are
approximations of large varieties of the trees that have cylinder-like structure
and does not accurately represent any tree in particular.
3.2.1 Generating Smooth Approximate Radius Maps
The smooth surface of the bark represents the overall structure of the tree
trunk without any finer structure details. The map consists of radius values
rof the tree trunk as a function of height zand angle. To generate these
maps, initially, we create a 2D map with an exponentially decaying radius
r0.
r0=axz(1)
where, xis a floating point number between the range 0.8 and 0.9, and a
is a random value chosen between 0.1 and 0.9. To approximate the wavy
structure of an actual bark, we add a series of sinusoids of varying amplitude
10
and frequencies to r0.
r=r0+
n
X
i=1
bisin(ωiθi)(2)
where, nis an integer chosen randomly between 1 and 3, biis the amplitude
of the ith sinusoid. ωi= 2πfiwhere fiis a value between 20 and 100. This
determines the frequency of the generated sinusoids. θiis chosen based on
ωito ensure that ris 2π-periodic. The amplitudes and frequencies of the
sinusoids, and the value of aare randomly chosen in order to generate unique
smooth tree barks every time. The parameter values are hand-tuned for each
tree species based on the visual observations of the tree bark. For instance,
the beech trees are mostly cylindrical and hence biis set to a value close to
zero (range of 0 to 0.005). For oaks, bihas a higher range of values (0.01
to 0.05). Similarly, one could adjust the range of radius values to fit to the
tree type. For example, palm trees are mostly thin and so will lie in the
lower range of the radius. The number of parameters to be tuned are limited
(radius, amplitude of sinusoid) and so we could easily find the suitable range
of values of these parameters and use the generic model for our testing.
While we developed the smooth trunk models and parameters empirically,
one can also use one’s own models and use the pipeline to generate the fine
barks details.
Our GANs were trained using the real tree bark models that were smoothed,
but still works well when tested on the empirically developed bark models
which shows that the GANs are robust to the input smooth barks and can
generalise well. While they work well for most of the trees with cylindrical-
like structure, it would fail on non-cylindrical barks such as the buttress root
trees.
3.2.2 Generating Bark Surface and Color Using GANs
In this section, we explain the architecture that generates the surface and
color maps from the smooth surface maps. Apart from the ability to generate
barks, we also want to give the user the ability to generate moss, lichen, scars,
and any other particularities that would be labelled in the dataset, which
we provide through semantic labels. Each label value in the semantic labels
indicate the location of moss, lichen and scars.
The main drawback of using most of the state-of-the-art cGANs for bark
generation is that they generate a single image corresponding to the input.
Most of these networks lack multi-modality i.e, they generate a deterministic
output for an input label. Ideally the network must generate multiple outputs
11
for identical inputs to add variability in the bark patterns generated. Thus,
to generate the surface and color maps using these networks, one must use
a two-stage pipeline: first input the smooth surface maps and obtain the
surface maps with the generated details. Second, use the generated surface
maps and semantic labels as input to another network and obtain the color
maps. It is necessary to use the generated surface maps as input in order
to ensure multi-modality in the generated colors. When the networks are
provided with identical semantic labels, it generates deterministic output.
Therefore, to obtain unique color maps, one must add variability in the
input, which we do through the surface maps. The uniqueness of surface
maps is ensured through the unique input smooth surface maps.
However, as the input surface maps pass through the deep layers of the
color map generator, the surface information gets lost which results in the
color maps lacking the necessary sharpness in texture. Additionally, the color
generator network is dependent on the semantic labels(moss, lichens, etc),
while the surface generator is independent of it. This necessitates the need for
two different architecture types for generating the surface and color maps. As
a result, we developed an architecture based on SPADE [27] called "Depth-
Reinforced-SPADE". The network takes two inputs: a smooth surface map
and a semantic map, and outputs a precise surface map and a color image
at the same time.
Figure 5: Encoder architecture of the generator in Depth-Reinforced-SPADE
The architecture of Depth-Reinforced-SPADE is made up of a main gen-
erator and two discriminator networks. The main generator consists of an
encoder and two decoders for generating the surface and color maps. The
encoder is made of ResNet blocks [10]. The input to the encoder is the
smooth surface map. The encoder compresses the input into a concise latent
space, which preserves the input smooth surface bark information, and also
helps in achieving multi-modality in the generated outputs. The decoder
uses the latent space obtained from the encoder to generate the accurate
surface maps and the corresponding color. Rather than using a single de-
12
Surface Generator
Color Generator
Figure 6: Decoder architecture of the generator in Depth-Reinforced-SPADE
coder to generate the surface and color, we separate the generation. This
is for two reasons: first, the surface and color generation are two separate
tasks. Using a single decoder to generate them resulted in unstable training.
Second, the color maps are dependent on semantic labels whereas the surface
maps are independent of them. Thus, the two decoders are architecturally
different. They take in the latent space as input. The surface generator con-
sists of ResNet blocks whereas the color generator consists of SPADE ResNet
blocks [27]. The SPADE ResNet blocks take in the semantic labels and the
output channels from the surface generator’s ResNet blocks. This is done
to reinforce the surface information onto the color generator to enhance the
texture of the generated color maps. Since both the surface and semantic
labels are passed through the SPADE ResNet layers, the information from
surface maps can overwhelm the semantic labels. To maintain a balance
between both the information, we pass the surface generator’s feature maps
through convolution layers with an output of 16 channels and append its
output to the input semantic map. Without this convolution which acts as a
surface regularizer, the color generator was not able to leverage the semantic
input fully, leading to an uncontrolled growth of moss and lichen all over
the generated bark. The network architecture of Depth-Reinforced-SPADE
is shown in Fig. 5 and Fig. 6.
13
Figure 7: Our GAN architecture for tiling the images
To obtain high quality outputs, we use separate discriminator networks
for the surface and color maps that specialise in discriminating the surface
and the colors respectively. the architecture of the discriminator networks
are identical to the one used in pix2pixHD [37].
We train the network with the same loss functions used in SPADE [27],
namely, the GAN loss, feature matching loss, perceptual loss [13] and KL-
Divergence Loss.
3.2.3 Tiling the Images and Creating 3D Meshes
The surface and color maps obtained from the GANs are parts of a full
tree bark. To build the full structure of a tree trunk, the images must be
combined together to form a full map. When the image tiles are combined,
there will be discontinuities at the places where the images are joined. The
discontinuities in the surface maps appear as bumps in the 3D models of
the tree barks, while the discontinuities in the color maps results in abrupt
change in bark colors. Thus these discontinuities must be removed to obtain
a continuous map. The existing GAN-based tiling method [7] modifies large
parts of the images and it requires the images to be generated by Progressive
GAN [14]. This cannot be used for our objective because we do not rely on
Progressive GAN to generate our images. Also, it is preferred that only
a minimal portion of our generated images are modified. Thus, for tiling
14
the images, we used an in-painting based method, where the discontinuities
in the tiled image are masked and a conditional GAN network is used to
generate a continuous image by filling only the masked regions.
To tile the images, we combine four image tiles into a single larger image.
The combined images will have discontinuities at the places where the images
are joined. Before feeding the images to our tiling network, we mask 15 pixels
from each side of the discontinuities and fill the masked areas with the mean
values of the image. Thus, contrary to the previous method, only the masked
regions are modified by the GAN. To prevent the GAN from modifying
the whole image we constrain its generation space to the gap regions only.
This was done to prevent it from learning a trivial copy task that can be
implemented separately. Doing so drastically improved the quality of the
generated images and increased its convergence speed. This process can be
seen in Fig. 7.
The network is a modified pix2pix architecture [12], where the generated
image is copied with the unmasked pixels from the input image. Thus, the
unmasked portions from the input images are by-passed from the generator
and the resulting images are given to the discriminator. This way, we can
ensure that the generated images are of the same quality as the original
images. Because we are copying the unmasked portions of the original images
on to the generated images, we can be assured that the GAN only fills
the masked regions without modifying the other image parts. This tiling
technique is applied to both the surface and color images to obtain the full
maps. Once we obtain the full surface and color maps, we construct a 3D
models of the tree bark using VTK [34].
4 Experiments
4.1 Dataset
We conducted experiments on our tree bark dataset. As of today it contains
3 tree-species: Oak, Beech, and Robinia (Locust).
To build this dataset we recorded videos of different trees. In total, we
acquired videos of: 20 oak trees, 15 beech trees, and 18 robinia trees. These
videos were captured with a DJI Osmo camera at 29.99fps in full HD res-
olution (1920x1080). All the trees were captured from local forests/streets
under natural lighting, with automatic exposure. Fig. 8 shows some of the
captures from the video frame of oak trees. Each video consists of approx-
imately 7000 frames. Since we use a gyro-stabilized camera, the transition
between the consecutive frames are smooth and there are no major changes
15
Figure 8: Samples of oak images with different illuminations.
between consecutive frames. Most of the consecutive frames contain redun-
dant information and so, the images are extracted from every tenth frame of
the whole video sequence. This way, the number of images for reconstruc-
tion are in the range of a few hundreds, which could be handled efficiently
by the photogrammetry software: Pix4D Mapper. For better reconstruction
results, we recommend using hand held gimbals and a professional grade
camera with controlled brightness, although as we demonstrate here, satis-
fying results can still be achieved without any of those.
The videos of the oak trees were acquired on a sunny day and most of the
oaks in our dataset consists of regions with shadows and parts illuminated
by sunlight and sun glares. Because of this, the 3D reconstruction and the
color maps extracted from the reconstructed point clouds consists of regions
with varying color intensities. Considering this, we acquired the videos of
the beech and robinias on an overcast day. The bark colors were free from
drastic changes due to shadows. The contrast in between these two datasets
allowed us to evaluate how our approach fared in challenging situations, and
helped us evaluate the reliability of our pipeline.
Our approach requires semantic label maps for training. The semantic
maps for the tree barks were obtained by hand labelling each of the bark
images. We used an online labelling tool, Labelbox: https://labelbox.com.
Each of the bark color images were labelled into four categories, and are
represented with the following colors in the semantic maps:
1. Bark - yellow
2. Defect/scar - orange
3. Moss - red
4. Lichens - blue
The dataset used to train the GANs consists of maps of size 256 ×256.
To generate these images, we take the large surface and color maps and take
crop out of them with a stride of 16. This allowed us to artificially augment
16
the number of samples and let us train complex GANs with a fairly limited
amount of data. Once the images are acquired, the entire set of surface/depth
maps were normalized to the range 0 to 65535. Before feeding all the images
to the network, they are instance normalized to the range [-1,1].
4.2 Baseline
4.2.1 Generation
To evaluate the generation quality of our network to other GANs we com-
pare our Depth-Reinforced SPADE with two state-of-the-art GAN architec-
tures: SPADE [27] and pix2pixHD [37]. SPADE and pix2pixHD can generate
photo-realistic images from input semantic maps. To synthesize the surface
and color maps for bark generation from pix2pixHD and SPADE, we use a
two-stage pipeline. First, we provide the smooth surface map as input to
the network and obtain the generated surface map. Next, we train another
network, where we provide the generated surface map and semantic label as
input and obtain the color map. (1) For pix2pixHD, we provide the smooth
surface map through the encoder and obtain the generated surface maps.
Next, we concatenate the generated surface maps and semantic labels and
pass it through the encoder of another pix2pixHD network and obtain the
color maps. (2) For SPADE, since the network has been designed to work
with semantic labels, we provide the input surface maps through the encoder
and semantic labels through the SPADE ResNet blocks to obtain the surface
maps. Next, we provide the generated surface map through another encoder
and decoder of SPADE to obtain the color maps.
Instead of generating the surface and color maps using two independent
networks for pix2pixHD and SPADE, one could use a single encoder and
decoder network to generate surface and color maps with the outputs con-
catenated. However, synthesizing both the surface and color maps at the
same time using a single generator (encoder and decoder) and discriminator
resulted in the training becoming very unstable. This is due to the inherent
nature of the GANs where the generator and discriminator compete against
each other and are thus unstable while training. Concatenating two different
types of images (surface maps and color maps) for generation results in the
network not learning and resulted in non-convergence.
4.2.2 Tiling
To evaluate the results of our tiling technique, we compare ourselves to Edge-
Connect [24]. It consists of two networks: the first network adds edges in
17
the areas to be in-painted so that there is a continuation of the edges from
the surrounding areas. The second network fills the color in the in-painting
areas. Finally, before saving the images, they copy the areas other than the
in-painted ones to the generated images.
5 Results
5.1 3D Reconstruction and Surface Geometry Extraction
After acquiring the images and performing background suppression on them,
they were used to obtaining 3D reconstructed trees using Pix4DMapper.
Examples of 3D point-clouds of an oak, beech, and robinia trees are shown
in Fig. 9.
Figure 9: 3D point clouds of oak, beech, and robinia trees
Once the 3D point clouds of the trees are acquired, the radius and color
information are extracted from them. The surface maps consists of the ge-
ometry of the tree bark in polar coordinates, where, the map contains the
radius values from the center points as a function of the height of the tree
and the angle. This is similar to cutting open a 3D cylinder and flattening
it as a plane. Examples of the maps obtained for an oak, beech, and robinia
trees are shown in Fig. 10.
5.2 Architecture
In this section, we compare the results obtained from pix2pixHD, SPADE
and Depth-Reinforced SPADE for the generated surface and color maps.
18
(a) (b) (c)
Figure 10: (a) Real surface and color map of an oak tree (b) Surface and
color map of a beech tree (c) Surface and color map of a robinia tree. The
colorbar shows the radius values of the barks in metres.
Datasets pix2pixHD [37] SPADE [27] Ours
Oak 65.873 97.592 50.432
Beech 55.372 89.320 52.868
Robinia 55.043 75.534 45.096
Table 1: Quantitative evaluation of the different architectures for surface
map generation. The values here are the FID scores, where a lower score
indicates better performance.
5.2.1 Surface-Map Generation
Fig. 11 shows the generation results of the surface generators for pix2pixHD,
SPADE and Depth-Reinforced SPADE. The synthetic maps generated by
pix2pixHD are blurry. When the SPADE network was provided with the se-
mantic labels, it distorted the generated surface maps. The semantic labels
19
Input pix2pixHD SPADE
Depth-
Reinforced
SPADE
Ground Truth
Figure 11: Different methods to generate surface and color tiles for oak,
beech and robinia barks. The inputs are the semantic labels and the smooth
surface maps. Under pix2pixHD and SPADE: First columns are the gen-
erated surface maps. Second columns are the color maps generated when
only the semantic labels are provided as input. Third column is when both
semantic labels and smooth surface maps are provided as input.
20
Datasets pix2pixHD [37] SPADE [27] Ours
Acc mIOU FID Acc mIOU FID Acc mIOU FID
Oak 87.88 22.49 183.09 89.06 22.87 85.17 89.17 22.57 49.21
Beech 50.77 15.68 113.02 51.73 15.57 65.273 52.23 15.70 54.45
Robinia N/A N/A 715.72 N/A N/A 164.19 N/A N/A 57.51
Table 2: Quantitative evaluation of the different architectures. Higher ac-
curacy, mIOU and lower FID indicate better performance. The datasets are
highly imbalanced which explains the low mIoU results of the segmentation
networks.
Architecture Number of parameters
pix2pixHD 366M
SPADE 249.8M
Depth-Reinforced SPADE 207.6M
Table 3: Number of parameters in the network. Note: For pix2pixHD and
SPADE, the values reported here is the total number of parameters for train-
ing both the surface and the color generator.
interfered with the input smooth surface in the deep layers of the decoder
network. Whereas, the surface maps from Depth-Reinforced SPADE are
sharper than the other methods. This is because, the KL-Divergence loss
preserves the input information in the latent space. Moreover, there is no in-
terference with the semantic labels, this allows the network to generate finer
bark structures compared to the other two methods. This is the main rea-
son for separating the surface and color generation in the Depth-Reinforced
SPADE architecture. The synthetic Robinia trees surface-maps are particu-
larly convincing, with realistic patterns that spawn the entire generated tile.
Overall the results of the surface maps look good, for each tree species our
GAN is capable of generating specie specific patterns from a smooth surface.
This can also be verified from the quantitative metrics shown in Table. 1.
We calculate the Fréchet Inception Distance (FID) [11] score between the
generated images and the ground truth images, and a lower score indicates
better performance.
5.2.2 Color-Map Generation
Fig. 11 shows some generation results for the oak, beech and robinia trees.
For pix2pixHD and SPADE we have shown two versions of color map gen-
eration. In the first version, we provide only the semantic labels as input.
21
This is to see how the network behaves when provided with identical inputs.
In the second version, we provided the surface maps along with the semantic
labels as input to verify that surface maps can help the network generate
multi-modal color maps. The fourth column and seventh column in Fig. 11
are the color maps generated for pix2pixHD and SPADE respectively, when
only the semantic labels are provided as input to the network. Based on the
results, pix2pixHD fails to generate correct outputs with a constant label
map. This is a well known issue of pix2pixHD that the SPADE architec-
ture addresses. Even though SPADE solves the problem encountered with
pix2pixHD, SPADE is deterministic: given a label, the output of the gener-
ator will always be the same, which is demonstrated in column seven for the
robinia barks. To make these networks generate varying outputs, one must
introduce significant variability in the input. Thus, when these networks
were provided with the surface maps, they started to generate multi-modal
outputs as shown in columns five and eight of Fig. 11. In the case of Depth-
Reinforced SPADE, since we generate the surface maps and let it guide the
color generator, we always get unique color maps. Furthermore, if one has a
close look at the color maps synthesized by our GAN the results look sharper,
since we use the generated surface maps to reinforce the color maps which
enhances the texture of the generated maps and make them look sharp. More
results can be seen in appendix D, E, and F.
To compare the different network quantitatively we use the same metrics
as [27]: the pixel accuracy, the mean intersection over union (mIOU), and
the Fréchet Inception Distance (FID) [11] score. The semantic segmentation
which gives the pixel accuracy and mIoU scores is applied on the generated
images and compared to the labels used to generate the image. DeepLab
V3+ [3] is used to perform the semantic segmentation. As shown in Table 2,
the results provided by our architecture are superior with a lower FID across
all datasets, a slightly better accuracy and a comparable mIoU. Please note
that on the robinia trees we could not evaluate the mIoU or the accuracy as
there is only one label.
5.2.3 Controlling the Bark Synthesis
22
Figure 12: Different generation results where we draw unusual shapes on
beech barks.
Using our GAN we can bend mother nature to our will, allowing us to
painlessly draw familiar objects on the bark as illustrated in Fig. 12. In this
figure we chose to draw the Stanford bunny and a heart on the bark of a beech
tree. This allows us to demonstrate the benefits of using GANs to generate
tree barks: using GANs let us change the trees easily and realistically. The
main drawback of using GANs is that to correctly generate some labels they
have to appear quite often in the dataset on which they were trained. In our
case we could not reproduce those results with the oak or robinia trees as
their dataset had too few occurrences of moss or lichen.
Finally, we also compare the architecture between pix2pixHD, SPADE
and Depth-Reinforced SPADE in Table. 3. For pix2pixHD and SPADE,
we calculate the total number of parameters for generating the surface and
the color. Since the surface and color generator of Depth-Reinforced SPADE
share the encoder, the number of parameters is reduced. This in turn reduces
the computational cost and reduces the time for training.
5.3 Tiling the Images
The generated images from the GAN networks were tiled using our modi-
fied pix2pix architecture. An example of a continuous image obtained by
combining four smaller color images of an oak bark is shown in Fig. 13.
Fig. 13(a) consists of an image with four images combined together. This
creates a discontinuity in the regions where they are joined. The disconti-
nuities were masked as shown in Fig. 13(b). As it can be seen, only a small
23
part surrounding the discontinuities are masked, and only these regions are
filled by the GAN. Since we use a copying task in the architecture, where
the unmasked regions are retained, the majority of the parts in the original
images are retained and the quality of the original images are also main-
tained. Finally, after tiling, the image is shown in Fig. 13(c). The GAN has
filled the masked regions with suitable values so as to generate a continuous
image.
Fig. 14 shows a situation where the state of the art in-painting method
Edge-Connect [24] fails to join the different tiles. It can be seen that, al-
though the network generates the right edge patterns, it generates radius
values with a range higher than the actual values. Lastly, as the copy task
is performed on the generated images, the difference in the radius values is
clearly visible. However, in our proposed architecture, the copy task is per-
formed before feeding the images to the discriminator. Hence, the generator
is forced to fill the missing parts with the correct range of radius, and so it
generates a continuous map.
5.4 Construction of the 3D Meshes
After the tiled radius and the corresponding color maps were obtained, they
were used to construct 3D meshes. The 3D models of oak, beech and robinia
trees without and with the color are shown in Fig. 15 and Fig. 16. From
the 3D mesh without color, the finer structural details of the tree barks can
be observed on the real and the GAN generated meshes. The GANs have
captured the nuances on the oak, beech and robinia barks, and the synthetic
3D models look as realistic as the original 3D model. Since there is more
control over the bark color generated, it results in realistic looking 3D mod-
els. More results can be seen in appendix D, E and F. To demonstrate the
robustness of the models to the input smooth bark geometry, we have gen-
erated the barks with bi= 0 and for three different input radius values. The
radius values are 0.20, 0.50 and 0.80, on which the network has successfully
generated the fine bark patterns and color as shown in Fig. 16.
To assess the quality of our generated 3D models of the tree barks, a
survey was conducted , where we had a randomly mixed up set of oak and
beech trees and we asked our participants to differentiate the real barks
(3D meshes of the barks reconstructed from the real tree images) from the
generated barks. 160 participants contributed in the survey, and were given
unlimited time to differentiate the barks. Among our participants 34 were
forestry experts. At the beginning,the participants were shown samples of
real and generated barks from both the oak and beech barks. Our survey
24
(a) (b) (c)
Figure 13: (a) The tiled image with discontinuities (b) Image after the discon-
tinuities are masked (c) Continuous image obtained from the tiling network
Input Edge-Connect Ours
Figure 14: Example of a situation where Edge-Connect [24] fails
25
Oak Beech Robinia
Generated colorlessGenerated colorReal colorlessReal color
Figure 15: Synthetic and real barks
consisted of 12 samples, 6 in oak and 6 in beech. Based on the survey
analysis the precision and recall of the participants to detect real trees was
calculated. The results are given in Table. 4. We recall that: precision =
26
Bark 1 Bark 2 Bark 3
OakBeechRobinia
Figure 16: Synthetic oak, beech and robinia barks generated with bi= 0 and
with three different radius values in the smooth bark. The value of ain Eq.
1 for bark 1 is 0.20, bark 2 is 0.50 and bark 3 is 0.80. Irrespective of the
radius values used, the network is able to generate realistic barks and colors.
T P /(T P +F P ), and recall =T P /(T P +F N ).
From Table. 4 we can see that the precision of the participants is around
55%, which shows that a large portion of them mistook our synthetic trees for
real trees. Additionally, the recall which evaluates the ability of the classifier
to find all the positive samples, is close to 0.6, which also indicates that the
participants have trouble differentiating the real barks from the synthetics
one. All in all this results prove that our GAN generated trunks are realistic
enough to fool around 40% of people, expert and layman alike.
6 Limitations and Future Work
As of today, our work faces some limitations. The first limitation comes from
the very nature of our approach. Because we are using a data-driven method-
27
Table 4: Precision and recall for the detection of real 3D models of oak and
beech trees. Higher is better.
laymans experts
Precision Recall Precision Recall
Oak 0.54 0.60 0.51 0.56
Beech 0.60 0.63 0.55 0.59
Overall 0.57 0.615 0.53 0.585
ology, we need a significant amount of data to train our neural network on.
This is particularly true with the robinia and oak trees, which had very little
moss and lichen on them, making the network incapable of synthetizing the
Stanford bunny. The second limitation in this work is related to the tiling
techniques that could be improved on: the junction between the tiles are not
perfectly smooth over the height. New methodologies, such as SRFlow [22]
or GLOW [17] could be investigated. Yet, this tiling problem could also be
partially solved by acquiring the trees under controlled illumination. This
would give uniform colors and illumination across the dataset making the
GANs generation much more homogeneous on the color side. Furthermore,
higher quality recording hardware would definitely provide higher resolution
color and surface maps and hence sharper GAN results. Our method works
well on trees with cylindrical structure, however it can’t handle the case of
non-cylindrical trees such as those with buttress roots. In an effort to in-
crease the realism of our method, future work will focus on applying different
style to generate barks of different age: allowing to generate a broader di-
versity of tree barks. Finally we will collect a larger dataset with more trees
and more species.
7 Conclusion
In this work, we propose a novel pipeline to generate realistic tree barks
from a set of images. Deviating from the conventional methods, we use a
data-driven generation method relying on deep neural networks to generate
the tree barks. Once a deep neural network is trained, there is not much
of manual intervention required to generate new barks. This is in contrast
to traditional methods which requires manual tuning of parameters to gen-
erate barks. Additionally, our method gives extensive control to the users
to generate additional features such as moss, scars, etc. We first created an
efficient method using self-supervised learning to suppress the background
28
portions of the image sequences that yields better quality tree models, and
at the same time, reconstructs the trees faster. We also proposed a GAN
architecture called Depth-Reinforced-SPADE. This GAN takes as an input
both a label map and a smooth surface. From these two inputs, it gener-
ates a detailed bark surface and the color map simultaneously. Instead of
using two separate GANs to generate the surface/depth and color, Depth-
Reinforced-SPADE can generate both simultaneously, thus saving time and
computational resources. We also presented a method to tile smaller images
obtained from our GANs to produce a continuous map of the surface and
color of the tree barks. Our method was successfully tested on trees with
smooth and ridged barks, which yielded high-quality tree barks, implying
that our method could be easily extended to other tree bark types to obtain
genuine looking barks. Finally, we demonstrated that our GANs enable users
to synthesize bark tiles with user-defined inputs like a bunny or a heart. The
only limitation being the diversity of the datatset the GAN was trained on.
8 Acknowledgement
This research was made possible with the support from the French National
Research Agency, in the framework of the project WoodSeer, ANR-19-CE10-
011.
29
References
[1] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein gan,” arXiv
preprint arXiv:1701.07875, 2017.
[2] J. Bloomenthal, “Modeling the mighty maple,” ACM SIGGRAPH Com-
puter Graphics, vol. 19, no. 3, pp. 305–311, 1985.
[3] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-
decoder with atrous separable convolution for semantic image segmen-
tation,” in Proceedings of the European conference on computer vision
(ECCV), 2018, pp. 801–818.
[4] X. Chen, B. Neubert, Y.-Q. Xu, O. Deussen, and S. B. Kang, “Sketch-
based tree modeling using markov random field,” in ACM SIGGRAPH
Asia 2008 papers, 2008, pp. 1–9.
[5] Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo, “Star-
gan: Unified generative adversarial networks for multi-domain image-to-
image translation,” in Proceedings of the IEEE conference on computer
vision and pattern recognition, 2018, pp. 8789–8797.
[6] B. Desbenoit, E. Galin, and S. Akkouche, “Modeling cracks and frac-
tures,” The Visual Computer, vol. 21, no. 8-10, pp. 717–726, 2005.
[7] A. Frühstück, I. Alhashim, and P. Wonka, “Tilegan: synthesis of
large-scale non-homogeneous textures,” ACM Transactions on Graphics
(TOG), vol. 38, no. 4, pp. 1–11, 2019.
[8] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,”
in Advances in neural information processing systems, 2014, pp. 2672–
2680.
[9] J. Guo, S. Xu, D.-M. Yan, Z. Cheng, M. Jaeger, and X. Zhang, “Re-
alistic procedural plant modeling from multiple view images,” IEEE
transactions on visualization and computer graphics, 2018.
[10] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in Proceedings of the IEEE conference on computer vision
and pattern recognition, 2016, pp. 770–778.
30
[11] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochre-
iter, “Gans trained by a two time-scale update rule converge to a local
nash equilibrium,” in Advances in neural information processing sys-
tems, 2017, pp. 6626–6637.
[12] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image transla-
tion with conditional adversarial networks,” in Proceedings of the IEEE
conference on computer vision and pattern recognition, 2017, pp. 1125–
1134.
[13] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time
style transfer and super-resolution,” in European conference on computer
vision. Springer, 2016, pp. 694–711.
[14] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing
of gans for improved quality, stability, and variation,” arXiv preprint
arXiv:1710.10196, 2017.
[15] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture
for generative adversarial networks,” in Proceedings of the IEEE Confer-
ence on Computer Vision and Pattern Recognition, 2019, pp. 4401–4410.
[16] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila,
“Analyzing and improving the image quality of stylegan,” arXiv preprint
arXiv:1912.04958, 2019.
[17] D. P. Kingma and P. Dhariwal, “Glow: Generative flow with invert-
ible 1x1 convolutions,” in Advances in neural information processing
systems, 2018, pp. 10 215–10 224.
[18] J. Kratt, M. Spicker, A. Guayaquil, M. Fiser, S. Pirk, O. Deussen, J. C.
Hart, and B. Benes, “Woodification: User-controlled cambial growth
modeling,” in Computer Graphics Forum, vol. 34, no. 2. Wiley Online
Library, 2015, pp. 361–372.
[19] P. Laitoch, “Procedural modeling of tree bark,” Bachelor thesis, Charles
University, Prague, 2018.
[20] C. Li, O. Deussen, Y.-Z. Song, P. Willis, and P. Hall, “Modeling and
generating moving trees from video,” ACM Transactions on Graphics
(TOG), vol. 30, no. 6, pp. 1–12, 2011.
31
[21] Z. Liu, K. Wu, J. Guo, Y. Wang, O. Deussen, and Z. Cheng, “Single
image tree reconstruction via adversarial network,” Graphical Models,
vol. 117, p. 101115, 2021.
[22] A. Lugmayr, M. Danelljan, L. Van Gool, and R. Timofte, “Srflow:
Learning the super-resolution space with normalizing flow,” arXiv
preprint arXiv:2006.14200, 2020.
[23] X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. Paul Smolley, “Least
squares generative adversarial networks,” in Proceedings of the IEEE
International Conference on Computer Vision, 2017, pp. 2794–2802.
[24] K. Nazeri, E. Ng, T. Joseph, F. Z. Qureshi, and M. Ebrahimi, “Edge-
connect: Generative image inpainting with adversarial edge learning,”
arXiv preprint arXiv:1901.00212, 2019.
[25] B. Neubert, T. Franken, and O. Deussen, “Approximate image-based
tree-modeling using particle flows,” in ACM SIGGRAPH 2007 papers,
2007, pp. 88–es.
[26] M. Okabe, S. Owada, and T. Igarashi, “Interactive design of botani-
cal trees using freehand sketches and example-based editing,” in ACM
SIGGRAPH 2006 Courses, 2006, pp. 18–es.
[27] T. Park, M.-Y. Liu, T.-C. Wang, and J.-Y. Zhu, “Semantic image
synthesis with spatially-adaptive normalization,” in Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, 2019,
pp. 2337–2346.
[28] S. Pix4D, “Pix4dmapper 4.1 user manual,” Pix4D SA: Lausanne,
Switzerland, 2017.
[29] L. Quan, P. Tan, G. Zeng, L. Yuan, J. Wang, and S. B. Kang, “Image-
based plant modeling,” in ACM SIGGRAPH 2006 Papers, 2006, pp.
599–604.
[30] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation
learning with deep convolutional generative adversarial networks,” arXiv
preprint arXiv:1511.06434, 2015.
[31] T. Sattler, W. Maddern, C. Toft, A. Torii, L. Hammarstrand, E. Sten-
borg, D. Safari, M. Okutomi, M. Pollefeys, J. Sivic, et al., “Bench-
marking 6dof outdoor visual localization in changing conditions,” in
32
Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, 2018, pp. 8601–8610.
[32] J. L. Schonberger and J.-M. Frahm, “Structure-from-motion revisited,”
in Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, 2016, pp. 4104–4113.
[33] J. L. Schönberger, E. Zheng, J.-M. Frahm, and M. Pollefeys, “Pixelwise
view selection for unstructured multi-view stereo,” in European Confer-
ence on Computer Vision. Springer, 2016, pp. 501–518.
[34] W. J. Schroeder, B. Lorensen, and K. Martin, The visualization toolkit:
an object-oriented approach to 3D graphics. Kitware, 2004.
[35] P. Tan, T. Fang, J. Xiao, P. Zhao, and L. Quan, “Single image tree
modeling,” ACM Transactions on Graphics (TOG), vol. 27, no. 5, pp.
1–7, 2008.
[36] B. Tao, Z. Changshui, and S. Wei, “A multi-agent based approach to
modelling and rendering of 3d tree bark textures,” in European Confer-
ence on Artificial Life. Springer, 2003, pp. 572–579.
[37] T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, A. Tao, J. Kautz, and B. Catan-
zaro, “High-resolution image synthesis and semantic manipulation with
conditional gans,” in Proceedings of the IEEE conference on computer
vision and pattern recognition, 2018, pp. 8798–8807.
[38] X. Wang, L. Wang, L. Liu, S. Hu, and B. Guo, “Interactive model-
ing of tree bark,” in 11th Pacific Conference onComputer Graphics and
Applications, 2003. Proceedings. IEEE, 2003, pp. 83–90.
[39] K. Xie, F. Yan, A. Sharf, O. Deussen, H. Huang, and B. Chen, “Tree
modeling with real tree-parts examples,” IEEE transactions on visual-
ization and computer graphics, vol. 22, no. 12, pp. 2608–2618, 2015.
[40] H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention gen-
erative adversarial networks,” arXiv preprint arXiv:1805.08318, 2018.
[41] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing
network,” in Proceedings of the IEEE conference on computer vision and
pattern recognition, 2017, pp. 2881–2890.
[42] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image
translation using cycle-consistent adversarial networkss,” in Computer
Vision (ICCV), 2017 IEEE International Conference on, 2017.
33
A Learning Objective
We use the same loss functions as pix2pixHD [37], namely the GAN loss,
feature-matching loss, perceptual loss and the KL-divergence loss..
A.1 GAN Loss
The objective of the cGAN is to optimize the minmax loss function of the
GAN. The GAN loss for the surface generator is given by
LGANs=E(s)[log(Ds(x, s))] + E(x)[log(1 Ds(x, Gs(x)))] (3)
where sis the real surface map. Dsis the prediction from the surface dis-
criminator. xis the input smooth surface map, Gs(x)is the surface map
generated from the surface generator. Similarly, the GAN loss of the color
generator is given by:
LGANc=E(l,c)[log(Dc(l, c))] + Es[log(1 Dc(l, Gc(l, x)))] (4)
where lis the input semantic label, cis the real color image, Gc(l, c)is the
color map generated from the color generator. Dc(l, c)is the prediction from
the discriminator when provided with the semantic labels and the real color
image. Dc(l, Gc(l, x)) is the prediction from the discriminator when provided
with the semantic labels and the generated color image. The total GAN loss
is the sum of the surface and color GAN loss
LGAN =LGANs+LGANc(5)
A.2 Feature Matching Loss
The feature-matching loss is calculated between the real and synthesized
images. We extract the features of the real and synthetic images from the
multi-layer discriminator and calculate the L1 distance between the extracted
features. The feature-matching loss for the surface generator is
LF Ms=E(s)
T
X
i=1
1
Ni
[||D(i)
s(x, s)D(i)
s(x, Gs(x))||1](6)
where Tis the number of layers in the discriminator and Niis the number
of elements in each layer. Similarly, the feature-matching loss for the color
generator is
LF Mc=E(l,c)
T
X
i=1
1
Ni
[||D(i)
c(l, c)D(i)
c(l, Gc(l, x))||1](7)
34
The total feature-matching loss is
LF M =LF Ms+LF Mc(8)
A.3 Perceptual Loss
We use a pre-trained VGG network to get pair-wise features of the real
and synthetic images at different layers of the network and calculate the L1
distance between the maps. This is given by Lp.
A.4 KL-Divergence Loss
The KL-Divergence Loss is calculated for the encoder of the network. This
is given by
LKD =DK L[Q(z|x)||P(z|x)] (9)
where x is the input smooth surface map, z is the latent space obtained from
the encoder. P is the original distribution and Q is the simpler estimation
of it.
The total loss is the sum of all the loss functions
Ltotal =LGAN +LF M +Lp+LKD (10)
B Training Details
B.1 pix2pix
We used the original parameters proposed in [12] for our experiments the
code can be found here https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix.
git. The learning rates of the generator and discriminator were 0.0002. We
used the Adam solver with β1= 0 and β2= 0.999, with a batch size of 8.
The experiments were conducted on an 8 GB Nvidia GeForce GTX 1080.
B.2 pix2pixHD
We used the original parameters of [37] for training the network. The
learning rate was 0.0002, and we used an Adam optimizer with β1= 0.5
and β2= 0.999. The code for network can be found here: https://github.
com/NVIDIA/pix2pixHD.git. The experiments were conducted on a 11 GB
Nvidia GeForce GTX 1080Ti, with a batch size of 4.
35
B.3 SPADE and Depth-Reinforced-SPADE
We used the original parameters proposed in [27] for our experiments, the
code used to train the SPADE GANs can be found here: https://github.
com/NVlabs/SPADE.git. The learning rates of the generator and discrimi-
nator were 0.0001 and 0.0004 respectively. We used the Adam solver with
β1= 0 and β2= 0.999. The experiments on the tree bark dataset were
conducted on a 11 GB Nvidia GeForce GTX 1080Ti, with a batch size of 4.
B.4 Edge-Connect
The code for training the network can be found in: https://github.com/
knazeri/edge-connect.git. The learning rate of the generator is 0.0001.
We used the Adam solver with β1= 0 and β2= 0.999. The experiments
on the tree bark dataset were conducted on a 11 GB Nvidia GeForce GTX
1080Ti, with a batch size of 4.
C Effect of limiting depth passed to color genera-
tor.
To study the effect of convolution layers that limit the passage of depth infor-
mation between the depth and color generator in our decoder architecture,
we conduct an experiment on the Oak bark dataset. When all the layers
from the depth generator are transferred to the color generator unrestricted,
the semantic information gets suppressed by the depth, as seen in Figure 17.
Thus, the network generates the moss even on the regions specified for the
bark. Whereas when the depth information is transferred selectively using
the convolution layers, the network is able to generate the moss and bark at
their respective locations without overlapping. Thus, the convolution layers
ensure that there is a balance between the semantic and depth information,
which ensures good quality generation.
36
Without With
Label convolution convolution
layer layer
Figure 17: Effect of limiting the number of channels passed between the
depth generator and the color generator. Left: Input label, indicating moss
and bark. Center: When no convolution layer is used, the semantic informa-
tion is suppressed by the depth. Right: When a convolution layer is used,
both the depth and the semantic information are used to generate the bark.
37
D Method results: Oak
Figure 18: Point clouds
38
Figure 19: Color Maps and Surface Maps
39
Figure 20: Generated Tiles
Figure 21: Synthetic Trunks
40
E Method results: Beech
Figure 22: Point clouds
41
Figure 23: Color Maps and Surface Maps
42
Figure 24: Generated Tiles
Figure 25: Synthetic Trunks
43
F Method results: Robinia
Figure 26: Point clouds
44
Figure 27: Color Maps and Surface Maps
45
Figure 28: Generated Tiles
Figure 29: Synthetic Trunks
46
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Over the last few years, deep learning techniques have yielded significant improvements in image inpainting. However , many of these techniques fail to reconstruct reasonable structures as they are commonly over-smoothed and/or blurry. This paper develops a new approach for image in-painting that does a better job of reproducing filled regions exhibiting fine details. We propose a two-stage adversarial model EdgeConnect that comprises of an edge generator followed by an image completion network. The edge generator hallucinates edges of the missing region (both regular and irregular) of the image, and the image completion network fills in the missing regions using hallucinated edges as a priori. We evaluate our model end-to-end over the publicly available datasets CelebA, Places2, and Paris StreetView, and show that it outperforms current state-of-the-art techniques quantitatively and qualitatively.
Preprint
Full-text available
We tackle the problem of texture synthesis in the setting where many input images are given and a large-scale output is required. We build on recent generative adversarial networks and propose two extensions in this paper. First, we propose an algorithm to combine outputs of GANs trained on a smaller resolution to produce a large-scale plausible texture map with virtually no boundary artifacts. Second, we propose a user interface to enable artistic control. Our quantitative and qualitative results showcase the generation of synthesized high-resolution maps consisting of up to hundreds of megapixels as a case in point.
Article
Realistic 3D tree reconstruction is still a tedious and time-consuming task in the graphics community. In this paper, we propose a simple and efficient method for reconstructing 3D tree models with high fidelity from a single image. The key to single image-based tree reconstruction is to recover 3D shape information of trees via a deep neural network learned from a set of synthetic tree models. We adopted a conditional generative adversarial network (cGAN) to infer the 3D silhouette and skeleton of a tree respectively from edges extracted from the image and simple 2D strokes drawn by the user. Based on the predicted 3D silhouette and skeleton, a realistic tree model that inherits the tree shape in the input image can be generated using a procedural modeling technique. Experiments on varieties of tree examples demonstrate the efficiency and effectiveness of the proposed method in reconstructing realistic 3D tree models from a single image.
Chapter
Super-resolution is an ill-posed problem, since it allows for multiple predictions for a given low-resolution image. This fundamental fact is largely ignored by state-of-the-art deep learning based approaches. These methods instead train a deterministic mapping using combinations of reconstruction and adversarial losses. In this work, we therefore propose SRFlow: a normalizing flow based super-resolution method capable of learning the conditional distribution of the output given the low-resolution input. Our model is trained in a principled manner using a single loss, namely the negative log-likelihood. SRFlow therefore directly accounts for the ill-posed nature of the problem, and learns to predict diverse photo-realistic high-resolution images. Moreover, we utilize the strong image posterior learned by SRFlow to design flexible image manipulation techniques, capable of enhancing super-resolved images by, e.g., transferring content from other images. We perform extensive experiments on faces, as well as on super-resolution in general. SRFlow outperforms state-of-the-art GAN-based approaches in terms of both PSNR and perceptual quality metrics, while allowing for diversity through the exploration of the space of super-resolved solutions. Code: git.io/Jfpyu.