Content uploaded by Zhongwei Zhao
Author content
All content in this area was uploaded by Zhongwei Zhao on Sep 12, 2021
Content may be subject to copyright.
1
Sparse-View CT Reconstruction via Generative
Adversarial Networks
Zhongwei Zhao, Student Member, IEEE, Yuewen Sun, Student Member, IEEE, and Peng Cong*
Abstract—Low dose and sparse view CT are effective ap-
proaches to reduce the radiation dose and accelerate scan speed.
Images reconstructed from insufficient data acquired from low
dose and sparse view CT are associated with severe streaking
artifacts. Therefore, reducing the radiation dose will further
degrade the imaging quality. Several attempts have been made
to remove these artifacts using deep learning methods such as
CNN. Although the deep learning methods for low dose and
sparse view CT reconstruction have gained impressive successes,
the reconstruction results are still over-smooth. In this work, we
propose an artifacts reduction method for low dose and sparse-
view CT via a single model trained by generative adversarial
networks (GAN). Several numerical simulation experiments are
implemented to test the performance of our network. The
results show that our GAN can significantly reduce the streaking
atrifacts campared with FBP mehtod, and carries more detailed
information than CNN.
Index Terms—low dose, sparse view, GAN.
I. INT ROD UC TI ON
COMUPUTED tomography is an effective non-destructive
testing method, which is widely used in clinical diag-
nosis, safety inspection, and industrial detection. Although
the x-ray enables us to inspect the internal structure of the
object, it increases the radiation dose that harmful to patients
in clinical diagnosis. Thus, decreasing the radiation dose is an
eternal topic in medical CT. Reducing the number of views
is an effective method to reduce the radiation dose while
accelerating scan speed, which is a great improvement for
medical imaging. However, the images reconstructed from
the insufficient data are often associated with severe streak
artifacts. To address the problem, optimization-based iterative
methods, such as total variation minimization, low rank, and
dictionary learning, have been comprehensively researched.
However, the iterative reconstruction algorithm requires a large
amount of time, which is unacceptable for quick checks.
Recently neural network, especially deep convolution neural
network (CNN), achieves impressive success in computer
vision tasks. It is also introduced to solve the low dose and
sparse view CT reconstruction problems. Most researchers
employ CNN to reduce streaking artifacts, which outperform
the traditional methods in the reconstruction quality. However,
Manuscript received Decemeber 15, 2018; revised January 8, 2019. This
work is under the support of China Nuclear Energy Development Project
2015460298.
Z. Zhao was with the Institute of Nuclear and New Energy Technology,
Tsinghua University.
Y. Sun was with the Institute of Nuclear and New Energy Technology,
Tsinghua University.
P. Cong was with the Institute of Nuclear and New Energy Technology,
Tsinghua University.
Sparse View
Sinogram
Estimation
Networks
(CNN)
Full View Sinogram
FBP/WLS
Estimation
Networks
(GAN)
Final Prediction
Fig. 1. Workflow of Proposed Method.
the target loss functions of those methods are MSE based,
which leads to the over-smooth and the loss of high-frequency
details of the reconstructed images. Most recently, Liang
[1] propose a generative adversarial network (GAN) based
method, which gains a better perceptual performance.
Although both CNN and GAN based methods are pro-
posed to address the artifacts caused by insufficient data,
few attempts have been made to combine each of their
advantages. The CNNs with MSE loss functions are trained
to minimize the differences between outputs and label data,
which is suitable for completing the missing projection data.
The GANs with perceptual loss function obtain the images
with high perceptual quality, which can be used to restore the
reconstructed images.
In this paper, we take three steps to address the problem: 1.
Train a convolution neural network in the projection domain
to complete the data. 2. Reconstruct the images from the
completed projection data using the FBP or WHLS methods.
3. Train a generative adversarial network with perceptual loss
function to the final outputs from the reconstructed images.
The proposed method is evaluated on clinical images, which
delivers superior performance in both perceptual quality and
compute speed comparing with the traditional methods.
II. ME TH OD
A. CT reconstruction and the artifacts analysis
The FBP algorithm is used to reconstruct slice image from
sinogram. The sparse-view sinogram is firstly interpolated to
a full-view sinogram. Then, the Ram-Lak filter is used for
frequency domain filtering along the column of the full-view
sinogram. Finally, the full-view sinogram is weighted by pixel
position and the back projection is implemented to obtain
the reconstructed image. Due to the information loss during
interpolation, the reconstructed image would become blurry.
In order to reduce the information loss and improve the image
2
sparse-view data
Conv 3×3×64
Conv 3×3×64
Conv 3×3×16
Conv 3×3×16
Relu
Block 1
Block 2
Block 3
Block 4
Conv 3×3×64
Conv 3×3×64
Completed-view data
Interpolation Network
Fig. 2. CNN Architecture.
quality, a CNN is used for interpolation and a GAN is used
to complement image details.
B. Network Architectures
The aim of sparse-view CT reconstruction problem is to
generate a high-quality tomography image IHT from sparse-
view projection data Ps. The ground truth image IGT , recon-
structed from the full-view projection data PF, is the learning
target of IHT .
Two approaches can be taken to address the problem:
1) Interpolating the sparse-view projection data Psto
completed-view projection data Pc, and minimize the
differences between the Pcand PF. Then, obtain the
reconstructed images from Pcby reconstruction algo-
rithms such as FBP and iterative algorithms.
2) Reconstruct the image IRfrom the projection data, then
suppress the artifacts of the IRand get the tomography
image.
In this paper, we combine the two approaches and the
deep learning method. Firstly, an interpolation convolutional
network is used to obtain the completed-view projection data
Pc from the sparse-view projection data Ps, which is trained
to minimize the differences from PF. Then, we get the
reconstructed image IRusing the reconstruction algorithms.
Finally, a generative adversarial network is proposed to output
the high-quality tomography image IHT , which is of the
minimum perceptual difference from the ground truth image
IGT . The whole process is shown in Fig. 1.
As Fig. 2 shows, the interpolation network is composed
of four blocks. The blocks take the skip connection structure
which is similar to the Resnet. Furthermore, the input data
is added to the last layer, which helps to ease the difficulty
of training. To avoid losing information, pooling layers are
not taken in the network while the Relu activation layers are
internal of the blocks. Bottleneck architecture is introduced
in the network to reduce the computational complexity while
maintaining a good performance, in which the number of
feature maps of the first and last two layers is 64 while the
middle is 16. Following the practical setting in CNN for low-
level computer vision problems, 3×3 kernels are used in each
convolutional layer.
Similar to the interpolation network, the generator network
is trained to minimize the perceptual function, in which the
output of the discriminator network is an essential part. The
discriminator CNN D is trained to discriminate the generated
images and ground truth images. As shown in Fig. 3, D has
4 convolutional blocks and 2 fully-connected layers. Each
convolutional block consists of a convolutional layer, a batch
normalization layer and a leaky Relu activation layer. The
kernel size K of the convolutional layers are 3×3 and the
number of filters N is increased from 64 to 256. The strides
of convolutional layers S are 2 to reduce the image resolution
when the number of features is doubled. A single output
fully-connected layer is applied to the outputs of the last
fully connected layer containing 1024 neurons and produces
a probability that the input image is a noise-free image.
C. Loss Function
The aim of the interpolation network is to generate
completed-view projection data Pcas similar as possible to
the full-view projection data PF. So we take a convolutional
neural network to minimize MSE loss between PCand PF:
W, b = arg min
W,b
1
N||FW,b(Ps), PF)||2
2(1)
In which the W, b means the weights and bias of the network,
Nmeans the number of training pairs.
Considering the performance and the speed of the proposed
method, the reconstruction algorithms can be FBP or vari-
ous iterative methods. Because the MSE loss tends to over-
smooth the details of outputs image, it isnt a suitable loss
function for the problem. Motivated by Ledig C s[2] work on
super-resolution, we propose to train a generative adversarial
network to minimize the perceptual loss function:
W, b = arg min
W,b
1
NP(FW,b(Ii
NC ), I i
NF )(2)
Where Pdenotes the perceptual loss function, which is consist
of four parts: MSE loss, adversarial loss and content loss,
which can be written as:
P=LMS E +LADV +LC ontent +LT V (3)
Among them, MSE loss function minimizes the pixel-wise
differences between input and label images, which is the value
constraint of the generated image. It can be written as:
LMS E =1
N2||FW.b(Ii
NC )−Ii
NF ||2(4)
The adversarial loss is defined based on the probabilities of a
discriminator network D:
Ladv =−
N
X
i=0
log D(FW,b(Ii
NC ), I i
NF )(5)
The discriminator network is trained to predict whether the
input images are ground truth, which forces the generator
network to generate high-frequency information to fool the
discriminator network. Thus, the generated images contain
more texture details, which makes them more perceptually
satisfying. However, some details produced by adversarial loss
dont exist in the original image.
The content loss is based on the high-level feature of the
images extracted by the pre-trained neural network, which
3
Output Images
Conv
Leaky Relu
Batch Normalizatio n
Block 2
K=3 N=64 S=2
Block 3
K=3 N=128 S=1
Block 4
K=3 N=128 S=2
FCN(1024)
Leaky Relu
Ground Truth
Block 1
K=3 N=64 S=1
Block 5
K=3 N=256 S=1
Block 6
K=3 N=256 S=2
FCN(1)
Probability
Discriminator Network
Input Images
Conv 3×3×64
Conv 3×3×64
Conv 3×3×16
Conv 3×3×16
Relu
Block 1
Block 2
Block 3
Block 4
Conv 3×3×64
Conv 3×3×64
Generator Network
Fig. 3. GAN Architecture.
helps to make images perceptually satisfying []. Thus in this
study, we define the content loss based on the activation maps
produced by the ReLU layers of the pre-trained VGG-19
network. The content loss is defined as Euclidean distance
between feature representations of the denoised images and
noise-free images:
LContent =1
CRHRWR
||R(FW,b(Ii
NC )) −R(Ii
NF )|| (6)
where Ris the feature maps obtained by VGG-19,
CR , HR, W R denotes the number, height, and width of the
feature maps.
The total variation (TV) loss is introduced to compress the
artifacts of the generated images caused by adversarial loss,
which is defined as:
LT V =1
HW ||∇x(FW,b(Ii
NC )) + ∇y(FW,b (Ii
NC ))|| (7)
H, W denotes the number, height, and width of the generated
image FW,b(Ii
NC ).
III. DATA SE T AN D EXPERIMENTS
A. Dataset
The dataset used in the experiment is downloaded from the
Data Science Bowl 2017. We use sample images in the dataset
as the original image, and simulate the projection process with
Matlab to generate the sparse-view and the original full-view
sinogram with Poisson noise. The sparse-view sinogram and
the original full-view sinogram is used as the data pair to
feed CNN. Then, all sparse-view sinogram is interpolated to
the generated full-view sinogram. The intermediate images are
reconstructed with these generated full-view sinogram, and are
used as the training dataset for GAN together with the original
images.
B. Experiment details
Because the size of the P−sand PCis inconsistent, the Ps
is resized into the same size with Pc by bicubic interpolation
algorithm. Considering the GPU memory limitation and the
depth of the network, thus patches of size 64×64 are randomly
cropped from the input data. As a result, 57216 patches are
produced in interpolation network while 19200 patches are
produced in generative adversarial networks. To address the
reconstruction of different views, the networks are trained
respectively on each dataset. The proposed model is trained
using the Tensorflow package on a workstation with an AMD
Ryzen 1700 CPU and a GTX1070 GPU. The parameters of
the networks were optimized using the Adam [20] optimizer
with a setting of β1= 0.9, β2= 0.9, lethe arning rate of
10−4and batch size of 4. The interpolation network is trained
after 5000 epochs, while the generative adversarial networks
is initialized with MSE loss after 500 epochs training and
stopped after 5000 epochs. The pre-trained Vgg-19 model is
fine-tuned on the same dataset after 3000 epochs.
IV. RES ULT A ND DISCUSSION
A. Reconstruction results and quantitative metrics
Most of traditional perceptual image quality metrics such as
PSNR and SSIM are based on per-pixel measurement, which
is too simple to account for human perception. Thus, we adopt
a perceptual similarity metric proposed by Zhang R [3], which
is based on the perceptual distance in deep feature space. The
metric is the perceptual distance from the referenced image,
which is called PDR in the following paragraph.
To validate the advantage of the proposed methods, the
different methods are tested on a test dataset based on the
metrics of PSNR, SSIM, and PDR. The computing time is also
concerned to evaluate the computing cost of different methods.
The reconstruction results of different methods are shown in
Fig.4, of which the quantitative metrics are shown in the table
I.
As the results show, the SVGAN wins the best results in
all the quality metrics, which indicates that the reconstruction
results of SVGAN are most perceptual similar with referenced
images. The same conclusions can be drawn according to
the reconstruction results. The SVGAN suppress the artifacts
while maintains the most images details, which are most
perpetually convincing. Although the CNN gets good per-
formance in SSIM and PSNR, the reconstruction results are
blur and over-smoothed compared with SVGAN and GAN.
Meanwhile, the GAN reconstructs shaper results and clear
edges, some artifacts are still retained on the images, which
decrease the PSNR and SSIM.
4
FBP ART CNN GAN SVGANOriginal
30 views
60 views
90 views
120 views
Fig. 4. Result Comparison.
TABLE I
PER FOR MA NCE COMPARISON
performance views ART FBP CNN GAN SVGAN
PSNR
30 21.98 18.11 24.96 21.55 26.14
60 24.52 23.70 28.71 26.23 29.46
90 23.39 24.42 30.15 27.85 30.54
120 27.59 26.63 30.92 29.00 32.42
SSIM
30 0.38 0.24 0.68 0.44 0.75
60 0.55 0.41 0.70 0.55 0.76
90 0.44 0.47 0.83 0.66 0.86
120 0.55 0.67 0.86 0.78 0.87
PDR
30 0.31 0.37 0.22 0.30 0.18
60 0.21 0.30 0.19 0.22 0.16
90 0.26 0.30 0.13 0.18 0.06
120 0.21 0.27 0.11 0.16 0.05
time(s)
30 308
0.7 3.3 4.1 6.9
60 608
90 903
120 1205
B. Discussion
We present a deep learning based method to suppress the
artifacts of the sparse-view CT images, which consists of two
parts: convolutional neural network and generative adversarial
networks.
The convolutional neural network is used for completing
projection data, which removes the artifacts caused by in-
sufficient data. However, the high-frequency information of
original data may be lost in the process. The reconstructed
results show that the images are over-smoothed and details
are blurred though the artifacts are removed well.
The generative adversarial networks are used for repairing
the image quality in the image domain, which maintains the
image details and keeps sharp edges. However, the GAN may
treat the artifacts as the structures of the image and retains
them in reconstructed images. The reconstructed results show
that the images have lots of artifacts while the image details
are clear.
Thus, the proposed method combines the advantages of
those methods and gets perceptual satisfying results. Mean-
while, processing the original data and images reconstructed
by FBP, the proposed method doesnt require iteration, which
is much faster than ART. Despite the proposed method out-
performs the traditional ART method in perceptual quality and
reconstruction speed, the image quality is still serious when the
number of views is lower than 60, which should be addressed
in the future work.
V. C ONCLUSIONS
In this work, we proposed a deep learning method named
SVGAN to reduce artifacts of sparse-view CT images. The
5
proposed method consists of interpolation convolutional neural
network in the projection domain and estimation generative
adversarial networks in the image domain, which both help
to improve the quality of reconstructed images. Combining
the advantages of those networks, the method suppresses the
artifacts while maintains the image details.
As result, the proposed method delivers superior perfor-
mance in both perceptual quality and compute speed compared
with the ART method. Both interpolation network in the pro-
jection domain and estimation generative adversarial network.
REFERENCES
[1] K. Liang, H. Yang, and Y. Xing, “Comparision of projection domain,
image domain, and comprehensive deep learning for sparse-view x-ray ct
image reconstruction,” arXiv preprint arXiv:1804.04289, 2018.
[2] C. Ledig, L. Theis, F. Husz´
ar, J. Caballero, A. Cunningham, A. Acosta,
A. P. Aitken, A. Tejani, J. Totz, Z. Wang et al., “Photo-realistic single
image super-resolution using a generative adversarial network.” in CVPR,
vol. 2, no. 3, 2017, p. 4.
[3] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The
unreasonable effectiveness of deep features as a perceptual metric,” arXiv
preprint, 2018.
[4] Y. Han and J. C. Ye, “Framing u-net via deep convolutional framelets:
Application to sparse-view ct,” IEEE transactions on medical imaging,
vol. 37, no. 6, pp. 1418–1429, 2018.
[5] H. Lee, J. Lee, H. Kim, B. Cho, and S. Cho, “Deep-neural-network
based sinogram synthesis for sparse-view ct image reconstruction,” arXiv
preprint arXiv:1803.00694, 2018.
[6] J. M. Wolterink, T. Leiner, M. A. Viergever, and I. Iˇ
sgum, “Generative ad-
versarial networks for noise reduction in low-dose ct,” IEEE transactions
on medical imaging, vol. 36, no. 12, pp. 2536–2545, 2017.