Conference PaperPDF Available

Sparse-View CT Reconstruction via Generative Adversarial Networks

Authors:
1
Sparse-View CT Reconstruction via Generative
Adversarial Networks
Zhongwei Zhao, Student Member, IEEE, Yuewen Sun, Student Member, IEEE, and Peng Cong*
Abstract—Low dose and sparse view CT are effective ap-
proaches to reduce the radiation dose and accelerate scan speed.
Images reconstructed from insufficient data acquired from low
dose and sparse view CT are associated with severe streaking
artifacts. Therefore, reducing the radiation dose will further
degrade the imaging quality. Several attempts have been made
to remove these artifacts using deep learning methods such as
CNN. Although the deep learning methods for low dose and
sparse view CT reconstruction have gained impressive successes,
the reconstruction results are still over-smooth. In this work, we
propose an artifacts reduction method for low dose and sparse-
view CT via a single model trained by generative adversarial
networks (GAN). Several numerical simulation experiments are
implemented to test the performance of our network. The
results show that our GAN can significantly reduce the streaking
atrifacts campared with FBP mehtod, and carries more detailed
information than CNN.
Index Terms—low dose, sparse view, GAN.
I. INT ROD UC TI ON
COMUPUTED tomography is an effective non-destructive
testing method, which is widely used in clinical diag-
nosis, safety inspection, and industrial detection. Although
the x-ray enables us to inspect the internal structure of the
object, it increases the radiation dose that harmful to patients
in clinical diagnosis. Thus, decreasing the radiation dose is an
eternal topic in medical CT. Reducing the number of views
is an effective method to reduce the radiation dose while
accelerating scan speed, which is a great improvement for
medical imaging. However, the images reconstructed from
the insufficient data are often associated with severe streak
artifacts. To address the problem, optimization-based iterative
methods, such as total variation minimization, low rank, and
dictionary learning, have been comprehensively researched.
However, the iterative reconstruction algorithm requires a large
amount of time, which is unacceptable for quick checks.
Recently neural network, especially deep convolution neural
network (CNN), achieves impressive success in computer
vision tasks. It is also introduced to solve the low dose and
sparse view CT reconstruction problems. Most researchers
employ CNN to reduce streaking artifacts, which outperform
the traditional methods in the reconstruction quality. However,
Manuscript received Decemeber 15, 2018; revised January 8, 2019. This
work is under the support of China Nuclear Energy Development Project
2015460298.
Z. Zhao was with the Institute of Nuclear and New Energy Technology,
Tsinghua University.
Y. Sun was with the Institute of Nuclear and New Energy Technology,
Tsinghua University.
P. Cong was with the Institute of Nuclear and New Energy Technology,
Tsinghua University.
Sparse View
Sinogram
Estimation
Networks
(CNN)
Full View Sinogram
FBP/WLS
Estimation
Networks
(GAN)
Final Prediction
Fig. 1. Workflow of Proposed Method.
the target loss functions of those methods are MSE based,
which leads to the over-smooth and the loss of high-frequency
details of the reconstructed images. Most recently, Liang
[1] propose a generative adversarial network (GAN) based
method, which gains a better perceptual performance.
Although both CNN and GAN based methods are pro-
posed to address the artifacts caused by insufficient data,
few attempts have been made to combine each of their
advantages. The CNNs with MSE loss functions are trained
to minimize the differences between outputs and label data,
which is suitable for completing the missing projection data.
The GANs with perceptual loss function obtain the images
with high perceptual quality, which can be used to restore the
reconstructed images.
In this paper, we take three steps to address the problem: 1.
Train a convolution neural network in the projection domain
to complete the data. 2. Reconstruct the images from the
completed projection data using the FBP or WHLS methods.
3. Train a generative adversarial network with perceptual loss
function to the final outputs from the reconstructed images.
The proposed method is evaluated on clinical images, which
delivers superior performance in both perceptual quality and
compute speed comparing with the traditional methods.
II. ME TH OD
A. CT reconstruction and the artifacts analysis
The FBP algorithm is used to reconstruct slice image from
sinogram. The sparse-view sinogram is firstly interpolated to
a full-view sinogram. Then, the Ram-Lak filter is used for
frequency domain filtering along the column of the full-view
sinogram. Finally, the full-view sinogram is weighted by pixel
position and the back projection is implemented to obtain
the reconstructed image. Due to the information loss during
interpolation, the reconstructed image would become blurry.
In order to reduce the information loss and improve the image

2
sparse-view data
Conv 3×3×64
Conv 3×3×64
Conv 3×3×16
Conv 3×3×16
Relu
Block 1
Block 2
Block 3
Block 4
Conv 3×3×64
Conv 3×3×64
Completed-view data
Interpolation Network
Fig. 2. CNN Architecture.
quality, a CNN is used for interpolation and a GAN is used
to complement image details.
B. Network Architectures
The aim of sparse-view CT reconstruction problem is to
generate a high-quality tomography image IHT from sparse-
view projection data Ps. The ground truth image IGT , recon-
structed from the full-view projection data PF, is the learning
target of IHT .
Two approaches can be taken to address the problem:
1) Interpolating the sparse-view projection data Psto
completed-view projection data Pc, and minimize the
differences between the Pcand PF. Then, obtain the
reconstructed images from Pcby reconstruction algo-
rithms such as FBP and iterative algorithms.
2) Reconstruct the image IRfrom the projection data, then
suppress the artifacts of the IRand get the tomography
image.
In this paper, we combine the two approaches and the
deep learning method. Firstly, an interpolation convolutional
network is used to obtain the completed-view projection data
Pc from the sparse-view projection data Ps, which is trained
to minimize the differences from PF. Then, we get the
reconstructed image IRusing the reconstruction algorithms.
Finally, a generative adversarial network is proposed to output
the high-quality tomography image IHT , which is of the
minimum perceptual difference from the ground truth image
IGT . The whole process is shown in Fig. 1.
As Fig. 2 shows, the interpolation network is composed
of four blocks. The blocks take the skip connection structure
which is similar to the Resnet. Furthermore, the input data
is added to the last layer, which helps to ease the difficulty
of training. To avoid losing information, pooling layers are
not taken in the network while the Relu activation layers are
internal of the blocks. Bottleneck architecture is introduced
in the network to reduce the computational complexity while
maintaining a good performance, in which the number of
feature maps of the first and last two layers is 64 while the
middle is 16. Following the practical setting in CNN for low-
level computer vision problems, 3×3 kernels are used in each
convolutional layer.
Similar to the interpolation network, the generator network
is trained to minimize the perceptual function, in which the
output of the discriminator network is an essential part. The
discriminator CNN D is trained to discriminate the generated
images and ground truth images. As shown in Fig. 3, D has
4 convolutional blocks and 2 fully-connected layers. Each
convolutional block consists of a convolutional layer, a batch
normalization layer and a leaky Relu activation layer. The
kernel size K of the convolutional layers are 3×3 and the
number of filters N is increased from 64 to 256. The strides
of convolutional layers S are 2 to reduce the image resolution
when the number of features is doubled. A single output
fully-connected layer is applied to the outputs of the last
fully connected layer containing 1024 neurons and produces
a probability that the input image is a noise-free image.
C. Loss Function
The aim of the interpolation network is to generate
completed-view projection data Pcas similar as possible to
the full-view projection data PF. So we take a convolutional
neural network to minimize MSE loss between PCand PF:
W, b = arg min
W,b
1
N||FW,b(Ps), PF)||2
2(1)
In which the W, b means the weights and bias of the network,
Nmeans the number of training pairs.
Considering the performance and the speed of the proposed
method, the reconstruction algorithms can be FBP or vari-
ous iterative methods. Because the MSE loss tends to over-
smooth the details of outputs image, it isnt a suitable loss
function for the problem. Motivated by Ledig C s[2] work on
super-resolution, we propose to train a generative adversarial
network to minimize the perceptual loss function:
W, b = arg min
W,b
1
NP(FW,b(Ii
NC ), I i
NF )(2)
Where Pdenotes the perceptual loss function, which is consist
of four parts: MSE loss, adversarial loss and content loss,
which can be written as:
P=LMS E +LADV +LC ontent +LT V (3)
Among them, MSE loss function minimizes the pixel-wise
differences between input and label images, which is the value
constraint of the generated image. It can be written as:
LMS E =1
N2||FW.b(Ii
NC )Ii
NF ||2(4)
The adversarial loss is defined based on the probabilities of a
discriminator network D:
Ladv =
N
X
i=0
log D(FW,b(Ii
NC ), I i
NF )(5)
The discriminator network is trained to predict whether the
input images are ground truth, which forces the generator
network to generate high-frequency information to fool the
discriminator network. Thus, the generated images contain
more texture details, which makes them more perceptually
satisfying. However, some details produced by adversarial loss
dont exist in the original image.
The content loss is based on the high-level feature of the
images extracted by the pre-trained neural network, which
3
Output Images
Conv
Leaky Relu
Batch Normalizatio n
Block 2
K=3 N=64 S=2
Block 3
K=3 N=128 S=1
Block 4
K=3 N=128 S=2
FCN(1024)
Leaky Relu
Ground Truth
Block 1
K=3 N=64 S=1
Block 5
K=3 N=256 S=1
Block 6
K=3 N=256 S=2
FCN(1)
Probability
Discriminator Network
Input Images
Conv 3×3×64
Conv 3×3×64
Conv 3×3×16
Conv 3×3×16
Relu
Block 1
Block 2
Block 3
Block 4
Conv 3×3×64
Conv 3×3×64
Generator Network
Fig. 3. GAN Architecture.
helps to make images perceptually satisfying []. Thus in this
study, we define the content loss based on the activation maps
produced by the ReLU layers of the pre-trained VGG-19
network. The content loss is defined as Euclidean distance
between feature representations of the denoised images and
noise-free images:
LContent =1
CRHRWR
||R(FW,b(Ii
NC )) R(Ii
NF )|| (6)
where Ris the feature maps obtained by VGG-19,
CR , HR, W R denotes the number, height, and width of the
feature maps.
The total variation (TV) loss is introduced to compress the
artifacts of the generated images caused by adversarial loss,
which is defined as:
LT V =1
HW ||∇x(FW,b(Ii
NC )) + y(FW,b (Ii
NC ))|| (7)
H, W denotes the number, height, and width of the generated
image FW,b(Ii
NC ).
III. DATA SE T AN D EXPERIMENTS
A. Dataset
The dataset used in the experiment is downloaded from the
Data Science Bowl 2017. We use sample images in the dataset
as the original image, and simulate the projection process with
Matlab to generate the sparse-view and the original full-view
sinogram with Poisson noise. The sparse-view sinogram and
the original full-view sinogram is used as the data pair to
feed CNN. Then, all sparse-view sinogram is interpolated to
the generated full-view sinogram. The intermediate images are
reconstructed with these generated full-view sinogram, and are
used as the training dataset for GAN together with the original
images.
B. Experiment details
Because the size of the Psand PCis inconsistent, the Ps
is resized into the same size with Pc by bicubic interpolation
algorithm. Considering the GPU memory limitation and the
depth of the network, thus patches of size 64×64 are randomly
cropped from the input data. As a result, 57216 patches are
produced in interpolation network while 19200 patches are
produced in generative adversarial networks. To address the
reconstruction of different views, the networks are trained
respectively on each dataset. The proposed model is trained
using the Tensorflow package on a workstation with an AMD
Ryzen 1700 CPU and a GTX1070 GPU. The parameters of
the networks were optimized using the Adam [20] optimizer
with a setting of β1= 0.9, β2= 0.9, lethe arning rate of
104and batch size of 4. The interpolation network is trained
after 5000 epochs, while the generative adversarial networks
is initialized with MSE loss after 500 epochs training and
stopped after 5000 epochs. The pre-trained Vgg-19 model is
fine-tuned on the same dataset after 3000 epochs.
IV. RES ULT A ND DISCUSSION
A. Reconstruction results and quantitative metrics
Most of traditional perceptual image quality metrics such as
PSNR and SSIM are based on per-pixel measurement, which
is too simple to account for human perception. Thus, we adopt
a perceptual similarity metric proposed by Zhang R [3], which
is based on the perceptual distance in deep feature space. The
metric is the perceptual distance from the referenced image,
which is called PDR in the following paragraph.
To validate the advantage of the proposed methods, the
different methods are tested on a test dataset based on the
metrics of PSNR, SSIM, and PDR. The computing time is also
concerned to evaluate the computing cost of different methods.
The reconstruction results of different methods are shown in
Fig.4, of which the quantitative metrics are shown in the table
I.
As the results show, the SVGAN wins the best results in
all the quality metrics, which indicates that the reconstruction
results of SVGAN are most perceptual similar with referenced
images. The same conclusions can be drawn according to
the reconstruction results. The SVGAN suppress the artifacts
while maintains the most images details, which are most
perpetually convincing. Although the CNN gets good per-
formance in SSIM and PSNR, the reconstruction results are
blur and over-smoothed compared with SVGAN and GAN.
Meanwhile, the GAN reconstructs shaper results and clear
edges, some artifacts are still retained on the images, which
decrease the PSNR and SSIM.
4
FBP ART CNN GAN SVGANOriginal
30 views
60 views
90 views
120 views
Fig. 4. Result Comparison.
TABLE I
PER FOR MA NCE COMPARISON
performance views ART FBP CNN GAN SVGAN
PSNR
30 21.98 18.11 24.96 21.55 26.14
60 24.52 23.70 28.71 26.23 29.46
90 23.39 24.42 30.15 27.85 30.54
120 27.59 26.63 30.92 29.00 32.42
SSIM
30 0.38 0.24 0.68 0.44 0.75
60 0.55 0.41 0.70 0.55 0.76
90 0.44 0.47 0.83 0.66 0.86
120 0.55 0.67 0.86 0.78 0.87
PDR
30 0.31 0.37 0.22 0.30 0.18
60 0.21 0.30 0.19 0.22 0.16
90 0.26 0.30 0.13 0.18 0.06
120 0.21 0.27 0.11 0.16 0.05
time(s)
30 308
0.7 3.3 4.1 6.9
60 608
90 903
120 1205
B. Discussion
We present a deep learning based method to suppress the
artifacts of the sparse-view CT images, which consists of two
parts: convolutional neural network and generative adversarial
networks.
The convolutional neural network is used for completing
projection data, which removes the artifacts caused by in-
sufficient data. However, the high-frequency information of
original data may be lost in the process. The reconstructed
results show that the images are over-smoothed and details
are blurred though the artifacts are removed well.
The generative adversarial networks are used for repairing
the image quality in the image domain, which maintains the
image details and keeps sharp edges. However, the GAN may
treat the artifacts as the structures of the image and retains
them in reconstructed images. The reconstructed results show
that the images have lots of artifacts while the image details
are clear.
Thus, the proposed method combines the advantages of
those methods and gets perceptual satisfying results. Mean-
while, processing the original data and images reconstructed
by FBP, the proposed method doesnt require iteration, which
is much faster than ART. Despite the proposed method out-
performs the traditional ART method in perceptual quality and
reconstruction speed, the image quality is still serious when the
number of views is lower than 60, which should be addressed
in the future work.
V. C ONCLUSIONS
In this work, we proposed a deep learning method named
SVGAN to reduce artifacts of sparse-view CT images. The
5
proposed method consists of interpolation convolutional neural
network in the projection domain and estimation generative
adversarial networks in the image domain, which both help
to improve the quality of reconstructed images. Combining
the advantages of those networks, the method suppresses the
artifacts while maintains the image details.
As result, the proposed method delivers superior perfor-
mance in both perceptual quality and compute speed compared
with the ART method. Both interpolation network in the pro-
jection domain and estimation generative adversarial network.
REFERENCES
[1] K. Liang, H. Yang, and Y. Xing, “Comparision of projection domain,
image domain, and comprehensive deep learning for sparse-view x-ray ct
image reconstruction,” arXiv preprint arXiv:1804.04289, 2018.
[2] C. Ledig, L. Theis, F. Husz´
ar, J. Caballero, A. Cunningham, A. Acosta,
A. P. Aitken, A. Tejani, J. Totz, Z. Wang et al., “Photo-realistic single
image super-resolution using a generative adversarial network.” in CVPR,
vol. 2, no. 3, 2017, p. 4.
[3] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The
unreasonable effectiveness of deep features as a perceptual metric,arXiv
preprint, 2018.
[4] Y. Han and J. C. Ye, “Framing u-net via deep convolutional framelets:
Application to sparse-view ct,IEEE transactions on medical imaging,
vol. 37, no. 6, pp. 1418–1429, 2018.
[5] H. Lee, J. Lee, H. Kim, B. Cho, and S. Cho, “Deep-neural-network
based sinogram synthesis for sparse-view ct image reconstruction,” arXiv
preprint arXiv:1803.00694, 2018.
[6] J. M. Wolterink, T. Leiner, M. A. Viergever, and I. Iˇ
sgum, “Generative ad-
versarial networks for noise reduction in low-dose ct,IEEE transactions
on medical imaging, vol. 36, no. 12, pp. 2536–2545, 2017.
... In recent years, the use of deep learning has emerged to address inverse problems. Zhao 18 and Liu 19 used generative adversarial networks (GANs) to complete the sparse-sampled sinogram for computed tomography (CT) reconstruction, and this approach showed great potential for radiation reduction. Li et al 20 proposed an end-to-end network that directly derives the tomography from the sinogram, which can reduce the error in the intermediate reconstruction process. ...
... 23 Zhang et al introduced deep learning to the sparse-view reconstruction of CT and proposed a dense convolutional and deconvolution-based network, which satisfactorily removed streaking artifacts and preserved the structure. 24 The deep learning methods developed to address the CT sparse-view problem can also be divided into two main categories: networks developed for projection/sinogram domain processing 18,19,25,26 and networks developed for image domain processing. 24,27 Common practice in projection domain methods is to use GANs to complete the sparse sinogram, and image domain methods are used to postprocess the tomographic image reconstructed by sparse-view projections. ...
... The processing methods of deep learning to address the sparse-view problem of CT are mainly divided into two categories: completion of the sparse sinogram (processing in the projection domain), 18,19,25,26 and processing the reconstructed image from the sparse-view projections (processing in the image domain). 24,27 Because the abundant CT data set, methods in both of the two categories perform exceptionally well. ...
Article
Full-text available
Background Magnetic particle imaging (MPI) is a novel tomographic imaging modality that scans the distribution of superparamagnetic iron oxide nanoparticles. However, it is time‐consuming to scan multiview two‐dimensional (2D) projections for three‐dimensional (3D) reconstruction in projection MPI, such as computed tomography (CT). An intuitive idea is to use the sparse‐view projections for reconstruction to improve the temporal resolution. Tremendous progress has been made toward addressing the sparse‐view problem in CT, because of the availability of large data sets. For the novel tomography of MPI, to the best of our knowledge, studies on the sparse‐view problem have not yet been reported. Purpose The acquisition of multiview projections for 3D MPI imaging is time‐consuming. Our goal is to only acquire sparse‐view projections for reconstruction to improve the 3D imaging temporal resolution of projection MPI. Methods We propose to address the sparse‐view problem in projection MPI by generating novel projections. The data set we constructed consists of three parts: simulation data set (including 3000 3D data), four phantoms data, and an in vivo mouse data. The simulation data set is used to train and validate the network, and the phantoms and in vivo mouse data are used to test the network. When the number of novel generated projections meets the requirements of filtered back projection, the streaking artifacts will be absent from MPI tomographic imaging. Specifically, we propose a projection generative network (PGNet), that combines an attention mechanism, adversarial training strategy, and a fusion loss function and can generate novel projections based on sparse‐view real projections. To the best of our knowledge, we are the first to propose a deep learning method to attempt to overcome the sparse‐view problem in projection MPI. Results We compare our method with several sparse‐view methods on phantoms and in vivo mouse data and validate the advantages and effectiveness of our proposed PGNet. Our proposed PGNet enables the 3D imaging temporal resolution of projection MPI to be improved by 6.6 times, while significantly suppressing the streaking artifacts. Conclusion We proposed a deep learning method operated in projection domain to address the sparse‐view reconstruction of MPI, and the data scarcity problem in projection MPI reconstruction is alleviated by constructing a sparse‐dense simulated projection data set. By our proposed method, the number of acquisitions of real projections can be reduced. The advantage of our method is that it prevents the generation of streaking artifacts at the source. Our proposed sparse‐view reconstruction method has great potential for application to time‐sensitive in vivo 3D MPI imaging.
... In recent years, generative adversarial networks (GAN) have been extensively developed in the field of low-dose CT reconstruction [71,73,109,118,123,127,132,134,[146][147][148][149][150][151][152][153][154][155]. In contrast to convolutional neural networks (CNNs) in patches, [147] proposed denoising networks which are FCN-based using images in full size for training, and because they reused the underlying feature maps, the computational efficiency was very high. ...
Article
Full-text available
Conventional reconstruction techniques, such as filtered back projection (FBP) and iterative reconstruction (IR), which have been utilised widely in the image reconstruction process of computed tomography (CT) are not suitable in the case of low-dose CT applications, because of the unsatisfying quality of the reconstructed image and inefficient reconstruction time. Therefore, as the demand for CT radiation dose reduction continues to increase, the use of artificial intelligence (AI) in image reconstruction has become a trend that attracts more and more attention. This systematic review examined various deep learning methods to determine their characteristics, availability, intended use and expected outputs concerning low-dose CT image reconstruction. Utilising the methodology of Kitchenham and Charter, we performed a systematic search of the literature from 2016 to 2021 in Springer, Science Direct, arXiv, PubMed, ACM, IEEE, and Scopus. This review showed that algorithms using deep learning technology are superior to traditional IR methods in noise suppression, artifact reduction and structure preservation, in terms of improving the image quality of low-dose reconstructed images. In conclusion, we provided an overview of the use of deep learning approaches in low-dose CT image reconstruction together with their benefits, limitations, and opportunities for improvement.
... Zhao et al. [52] proposed a GAN-based sinogram inpainting network, which achieved unsupervised training in a sinogram-image-sinogram closed loop. Zhao et al. [53] also proposed a two-stage method, firstly they use an interpolating convolutional network to obtain the full-view projection data, then use GAN to output high-quality CT images. In 2019, Lee et al. [54] proposed a deep learning model based on fully convolutional network and wavelet transform. ...
Article
Full-text available
Limited-view Computed Tomography (CT) can be used to efficaciously reduce radiation dose in clinical diagnosis, it is also adopted when encountering inevitable mechanical and physical limitation in industrial inspection. Nevertheless, limited-view CT leads to severe artifacts in its imaging, which turns out to be a major issue in the low dose protocol. Thus, how to exploit the limited prior information to obtain high-quality CT images becomes a crucial issue. We notice that almost all existing methods solely focus on a single CT image while neglecting the solid fact that, the scanned objects are always highly spatially correlated. Consequently, there lies bountiful spatial information between these acquired consecutive CT images, which is still largely left to be exploited. In this paper, we propose a novel hybrid-domain structure composed of fully convolutional networks that groundbreakingly explores the three-dimensional neighborhood and works in a “coarse-to-fine” manner. We first conduct data completion in the Radon domain, and transform the obtained full-view Radon data into images through FBP. Subsequently, we employ the spatial correlation between continuous CT images to productively restore them and then refine the image texture to finally receive the ideal high-quality CT images, achieving PSNR of 40.209 and SSIM of 0.943. Besides, unlike other current limited-view CT reconstruction methods, we adopt FBP (and implement it on GPUs) instead of SART-TV to significantly accelerate the overall procedure and realize it in an end-to-end manner.
Article
Background and objective: Neural network based image reconstruction methods are becoming increasingly popular. However, limited training data and the lack of theoretical guarantees for generalizability raised concerns, especially in biomedical imaging applications. These challenges are known to lead to an unstable reconstruction process that poses significant problems in biomedical image reconstruction. In this paper, we present a new framework that uses untrained generator networks to tackle this challenge, leveraging the structure of deep networks for regularizing solutions based on a technique known as Deep Image Prior (DIP). Methods: To achieve a high reconstruction accuracy, we propose a framework optimizing both the latent vector and the weights of a generator network during the reconstruction process. We also propose the corresponding reconstruction strategies to improve the stability and convergent performance of the proposed framework. Furthermore, instead of calculating forward projection in each iteration, we propose implementing its normal operator as a convolutional kernel under parallel beam geometry, thus greatly accelerating the calculation. Results: Our experiments show that the proposed framework has significant improvements over other state-of-the-art conventional, pre-trained, and untrained methods under sparse-view, limited-angle, and low-dose conditions. Conclusions: Applying to parallel beam X-ray imaging, our framework shows advantages in speed, accuracy, and stability of the reconstruction process. We also show that the proposed framework is compatible with all differentiable regularizations that are commonly used in biomedical image reconstruction literature. Our framework can also be used as a post-processing technique to further improve the reconstruction generated by any other reconstruction methods. Furthermore, the proposed framework requires no training data and can be adjusted on-demand to adapt to different conditions (e.g. noise level, geometry, and imaged object).
Chapter
Image artifact removal in computed tomography (CT) allows clinicians to make more accurate diagnoses. One method of artifact removal is iterative reconstruction. However, reconstructing large amounts of CT data using this method is tedious, which is why researchers have proposed using filtered back-projection paired with neural networks. The purpose of this paper is to compare the performances of various forms of training data for convolutional neural networks in three low level CT image processing tasks: sinogram completion, Poisson noise removal, and focal spot deblurring. Specifically, modified U-nets are trained with either CT sinogram data or reconstruction data for each of the tasks. Then, the predicted results of each model are compared in terms of image quality and viability in a clinical setting. The predictions show strong evidence of increased image quality when training models with reconstruction data, thus the reconstruction strategy possesses a clear edge in practicality over the sinogram strategy.
Article
4D-CBCT is a powerful tool to provide respiration-resolved images for the moving target localization. However, projections in each respiratory phase are intrinsically under-sampled under the clinical scanning time and imaging dose constraints. Images reconstructed by compressed sensing (CS)-based methods suffer from blurred edges. Introducing the average-4D-image constraint to the CS-based reconstruction, such as prior-image-constrained CS (PICCS), can improve the edge sharpness of the stable structures. However, PICCS can lead to motion artifacts in the moving regions. In this study, we proposed a dual-encoder convolutional neural network (DeCNN) to realize the average-imageconstrained 4D-CBCT reconstruction. The proposed DeCNN has two parallel encoders to extract features from both the under-sampled target phase images and the average images. The features are then concatenated and fed into the decoder for the highquality target phase image reconstruction. The reconstructed 4D-CBCT using of the proposed DeCNN from the real lung cancer patient data showed (1) qualitatively, clear and accurate edges for both stable and moving structures; (2) quantitatively, lowintensity errors, high peak signal-to-noise ratio, and high structural similarity compared to the ground truth images; and (3) superior quality to those reconstructed by several other state-of-the-art methods including the back-projection, CS totalvariation, PICCS, and the single-encoder CNN. Overall, the proposed DeCNN is effective in exploiting the average-image constraint to improve the 4DCBCT image quality.
Article
Full-text available
X-ray Computed Tomography (CT) imaging has been widely used in clinical diagnosis, non-destructive examination, and public safety inspection. Sparse-view (sparse view) CT has great potential in radiation dose reduction and scan acceleration. However, sparse view CT data is insufficient and traditional reconstruction results in severe streaking artifacts. In this work, based on deep learning, we compared image reconstruction performance for sparse view CT reconstruction with projection domain network, image domain network, and comprehensive network combining projection and image domains. Our study is executed with numerical simulated projection of CT images from real scans. Results demonstrated deep learning networks can effectively reconstruct rich high frequency structural information without streaking artefact commonly seen in sparse view CT. A comprehensive network combining deep learning in both projection domain and image domain can get best results.
Article
Recently, a number of approaches to low-dose computed tomography (CT) have been developed and deployed in commercialized CT scanners. Tube current reduction is perhaps the most actively explored technology with advanced image reconstruction algorithms. Sparse data sampling is another viable option to the low-dose CT, and sparse-view CT has been particularly of interest among the researchers in CT community. Since analytic image reconstruction algorithms would lead to severe image artifacts, various iterative algorithms have been developed for reconstructing images from sparsely view-sampled projection data. However, iterative algorithms take much longer computation time than the analytic algorithms, and images are usually prone to different types of image artifacts that heavily depend on the reconstruction parameters. Interpolation methods have also been utilized to fill the missing data in the sinogram of sparse-view CT thus providing synthetically full data for analytic image reconstruction. In this work, we introduce a deep-neural-network-enabled sinogram synthesis method for sparse-view CT, and show its outperformance to the existing interpolation methods and also to the iterative image reconstruction approach.
Article
While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on the ImageNet classification task has been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called "perceptual losses"? What elements are critical for their success? To answer these questions, we introduce a new Full Reference Image Quality Assessment (FR-IQA) dataset of perceptual human judgments, orders of magnitude larger than previous datasets. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by huge margins. More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations.
Article
X-ray computed tomography (CT) using sparse projection views is often used to reduce the radiation dose. However, due to the insufficient projection views, a reconstruction approach using the filtered back projection (FBP) produces severe streaking artifacts. Recently, deep learning approaches using large receptive field neural networks such as U-net have demonstrated impressive performance for sparse view CT reconstruction. However, theoretical justification is still lacking. The main goal of this paper is, therefore, to develop a mathematical theory and to discuss how to improve these algorithms. In particular, inspired by the recent theory of deep convolutional framelets, we show that the U-net relies on a sub-optimal non-local bases that overly emphasizes low frequency components. The discovery leads to a dual frame and a tight frame U-net architectures for effective recovery of directional image components.
Article
Noise is inherent to low-dose CT acquisition. We propose to train a convolutional neural network (CNN) jointly with an adversarial CNN to estimate routine-dose CT images from low-dose CT images and hence reduce noise. A generator CNN was trained to transform low-dose CT images into routine-dose CT images using voxel-wise loss minimization. An adversarial discriminator CNN was simultaneously trained to distinguish the output of the generator from routinedose CT images. The performance of this discriminator was used as an adversarial loss for the generator. Experiments were performed using CT images of an anthropomorphic phantom containing calcium inserts, as well as patient non-contrast-enhanced cardiac CT images. The phantom and patients were scanned at 20% and 100% routine clinical dose. Three training strategies were compared: the first used only voxel-wise loss, the second combined voxel-wise loss and adversarial loss, and the third used only adversarial loss. The results showed that training with only voxel-wise loss resulted in the highest peak signal-to-noise ratio with respect to reference routine-dose images. However, the CNNs trained with adversarial loss captured image statistics of routine-dose images better. Noise reduction improved quantification of low-density calcified inserts in phantom CT images and allowed coronary calcium scoring in low-dose patient CT images with high noise levels. Testing took less than 10 seconds per CT volume. CNN-based low-dose CT noise reduction in the image domain is feasible. Training with an adversarial network improves the CNN’s ability to generate images with an appearance similar to that of reference routine-dose CT images.