Conference PaperPDF Available

Sparse-View CT Reconstruction via Generative Adversarial Networks

November 2018

November 2018

DOI:10.1109/NSSMIC.2018.8824362

Conference: 2018 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC)

Authors:

Zhongwei Zhao

Tsinghua University

Content uploaded by Zhongwei Zhao

Content may be subject to copyright.

Sparse-View CT Reconstruction via Generative

Adversarial Networks

Zhongwei Zhao, Student Member, IEEE, Yuewen Sun, Student Member, IEEE, and Peng Cong*

Abstract—Low dose and sparse view CT are effective ap-

proaches to reduce the radiation dose and accelerate scan speed.

Images reconstructed from insufﬁcient data acquired from low

dose and sparse view CT are associated with severe streaking

artifacts. Therefore, reducing the radiation dose will further

degrade the imaging quality. Several attempts have been made

to remove these artifacts using deep learning methods such as

CNN. Although the deep learning methods for low dose and

sparse view CT reconstruction have gained impressive successes,

the reconstruction results are still over-smooth. In this work, we

propose an artifacts reduction method for low dose and sparse-

view CT via a single model trained by generative adversarial

networks (GAN). Several numerical simulation experiments are

implemented to test the performance of our network. The

results show that our GAN can signiﬁcantly reduce the streaking

atrifacts campared with FBP mehtod, and carries more detailed

information than CNN.

Index Terms—low dose, sparse view, GAN.

I. INT ROD UC TI ON

COMUPUTED tomography is an effective non-destructive

testing method, which is widely used in clinical diag-

nosis, safety inspection, and industrial detection. Although

the x-ray enables us to inspect the internal structure of the

object, it increases the radiation dose that harmful to patients

in clinical diagnosis. Thus, decreasing the radiation dose is an

eternal topic in medical CT. Reducing the number of views

is an effective method to reduce the radiation dose while

accelerating scan speed, which is a great improvement for

medical imaging. However, the images reconstructed from

the insufﬁcient data are often associated with severe streak

artifacts. To address the problem, optimization-based iterative

methods, such as total variation minimization, low rank, and

dictionary learning, have been comprehensively researched.

However, the iterative reconstruction algorithm requires a large

amount of time, which is unacceptable for quick checks.

Recently neural network, especially deep convolution neural

network (CNN), achieves impressive success in computer

vision tasks. It is also introduced to solve the low dose and

sparse view CT reconstruction problems. Most researchers

employ CNN to reduce streaking artifacts, which outperform

the traditional methods in the reconstruction quality. However,

Manuscript received Decemeber 15, 2018; revised January 8, 2019. This

work is under the support of China Nuclear Energy Development Project

2015460298.

Z. Zhao was with the Institute of Nuclear and New Energy Technology,

Tsinghua University.

Y. Sun was with the Institute of Nuclear and New Energy Technology,

Tsinghua University.

P. Cong was with the Institute of Nuclear and New Energy Technology,

Tsinghua University.

Sparse View

Sinogram

Estimation

Networks

(CNN)

Full View Sinogram

FBP/WLS

Estimation

Networks

(GAN)

Final Prediction

Fig. 1. Workﬂow of Proposed Method.

the target loss functions of those methods are MSE based,

which leads to the over-smooth and the loss of high-frequency

details of the reconstructed images. Most recently, Liang

[1] propose a generative adversarial network (GAN) based

method, which gains a better perceptual performance.

Although both CNN and GAN based methods are pro-

posed to address the artifacts caused by insufﬁcient data,

few attempts have been made to combine each of their

advantages. The CNNs with MSE loss functions are trained

to minimize the differences between outputs and label data,

which is suitable for completing the missing projection data.

The GANs with perceptual loss function obtain the images

with high perceptual quality, which can be used to restore the

reconstructed images.

In this paper, we take three steps to address the problem: 1.

Train a convolution neural network in the projection domain

to complete the data. 2. Reconstruct the images from the

completed projection data using the FBP or WHLS methods.

3. Train a generative adversarial network with perceptual loss

function to the ﬁnal outputs from the reconstructed images.

The proposed method is evaluated on clinical images, which

delivers superior performance in both perceptual quality and

compute speed comparing with the traditional methods.

II. ME TH OD

A. CT reconstruction and the artifacts analysis

The FBP algorithm is used to reconstruct slice image from

sinogram. The sparse-view sinogram is ﬁrstly interpolated to

a full-view sinogram. Then, the Ram-Lak ﬁlter is used for

frequency domain ﬁltering along the column of the full-view

sinogram. Finally, the full-view sinogram is weighted by pixel

position and the back projection is implemented to obtain

the reconstructed image. Due to the information loss during

interpolation, the reconstructed image would become blurry.

In order to reduce the information loss and improve the image



sparse-view data

Conv 3×3×64

Conv 3×3×16

Relu

Block 1

Block 2

Block 3

Block 4

Conv 3×3×64

Completed-view data

Interpolation Network

Fig. 2. CNN Architecture.

quality, a CNN is used for interpolation and a GAN is used

to complement image details.

B. Network Architectures

The aim of sparse-view CT reconstruction problem is to

generate a high-quality tomography image IHT from sparse-

view projection data Ps. The ground truth image IGT , recon-

structed from the full-view projection data PF, is the learning

target of IHT .

Two approaches can be taken to address the problem:

1) Interpolating the sparse-view projection data Psto

completed-view projection data Pc, and minimize the

differences between the Pcand PF. Then, obtain the

reconstructed images from Pcby reconstruction algo-

rithms such as FBP and iterative algorithms.

2) Reconstruct the image IRfrom the projection data, then

suppress the artifacts of the IRand get the tomography

image.

In this paper, we combine the two approaches and the

deep learning method. Firstly, an interpolation convolutional

network is used to obtain the completed-view projection data

Pc from the sparse-view projection data Ps, which is trained

to minimize the differences from PF. Then, we get the

reconstructed image IRusing the reconstruction algorithms.

Finally, a generative adversarial network is proposed to output

the high-quality tomography image IHT , which is of the

minimum perceptual difference from the ground truth image

IGT . The whole process is shown in Fig. 1.

As Fig. 2 shows, the interpolation network is composed

of four blocks. The blocks take the skip connection structure

which is similar to the Resnet. Furthermore, the input data

is added to the last layer, which helps to ease the difﬁculty

of training. To avoid losing information, pooling layers are

not taken in the network while the Relu activation layers are

internal of the blocks. Bottleneck architecture is introduced

in the network to reduce the computational complexity while

maintaining a good performance, in which the number of

feature maps of the ﬁrst and last two layers is 64 while the

middle is 16. Following the practical setting in CNN for low-

level computer vision problems, 3×3 kernels are used in each

convolutional layer.

Similar to the interpolation network, the generator network

is trained to minimize the perceptual function, in which the

output of the discriminator network is an essential part. The

discriminator CNN D is trained to discriminate the generated

images and ground truth images. As shown in Fig. 3, D has

4 convolutional blocks and 2 fully-connected layers. Each

convolutional block consists of a convolutional layer, a batch

normalization layer and a leaky Relu activation layer. The

kernel size K of the convolutional layers are 3×3 and the

number of ﬁlters N is increased from 64 to 256. The strides

of convolutional layers S are 2 to reduce the image resolution

when the number of features is doubled. A single output

fully-connected layer is applied to the outputs of the last

fully connected layer containing 1024 neurons and produces

a probability that the input image is a noise-free image.

C. Loss Function

The aim of the interpolation network is to generate

completed-view projection data Pcas similar as possible to

the full-view projection data PF. So we take a convolutional

neural network to minimize MSE loss between PCand PF:

W, b = arg min

W,b

N||FW,b(Ps), PF)||2

2(1)

In which the W, b means the weights and bias of the network,

Nmeans the number of training pairs.

Considering the performance and the speed of the proposed

method, the reconstruction algorithms can be FBP or vari-

ous iterative methods. Because the MSE loss tends to over-

smooth the details of outputs image, it isnt a suitable loss

function for the problem. Motivated by Ledig C s[2] work on

super-resolution, we propose to train a generative adversarial

network to minimize the perceptual loss function:

W, b = arg min

W,b

NP(FW,b(Ii

NC ), I i

NF )(2)

Where Pdenotes the perceptual loss function, which is consist

of four parts: MSE loss, adversarial loss and content loss,

which can be written as:

P=LMS E +LADV +LC ontent +LT V (3)

Among them, MSE loss function minimizes the pixel-wise

differences between input and label images, which is the value

constraint of the generated image. It can be written as:

LMS E =1

N2||FW.b(Ii

NC )−Ii

NF ||2(4)

The adversarial loss is deﬁned based on the probabilities of a

discriminator network D:

Ladv =−

i=0

log D(FW,b(Ii

NC ), I i

NF )(5)

The discriminator network is trained to predict whether the

input images are ground truth, which forces the generator

network to generate high-frequency information to fool the

discriminator network. Thus, the generated images contain

more texture details, which makes them more perceptually

satisfying. However, some details produced by adversarial loss

dont exist in the original image.

The content loss is based on the high-level feature of the

images extracted by the pre-trained neural network, which

Output Images

Conv

Leaky Relu

Batch Normalizatio n

Block 2

K=3 N=64 S=2

Block 3

K=3 N=128 S=1

Block 4

K=3 N=128 S=2

FCN(1024)

Leaky Relu

Ground Truth

Block 1

K=3 N=64 S=1

Block 5

K=3 N=256 S=1

Block 6

K=3 N=256 S=2

FCN(1)

Probability

Discriminator Network

Input Images

Conv 3×3×64

Conv 3×3×16

Relu

Block 1

Block 2

Block 3

Block 4

Conv 3×3×64

Generator Network

Fig. 3. GAN Architecture.

helps to make images perceptually satisfying []. Thus in this

study, we deﬁne the content loss based on the activation maps

produced by the ReLU layers of the pre-trained VGG-19

network. The content loss is deﬁned as Euclidean distance

between feature representations of the denoised images and

noise-free images:

LContent =1

CRHRWR

||R(FW,b(Ii

NC )) −R(Ii

NF )|| (6)

where Ris the feature maps obtained by VGG-19,

CR , HR, W R denotes the number, height, and width of the

feature maps.

The total variation (TV) loss is introduced to compress the

artifacts of the generated images caused by adversarial loss,

which is deﬁned as:

LT V =1

HW ||∇x(FW,b(Ii

NC )) + ∇y(FW,b (Ii

NC ))|| (7)

H, W denotes the number, height, and width of the generated

image FW,b(Ii

NC ).

III. DATA SE T AN D EXPERIMENTS

A. Dataset

The dataset used in the experiment is downloaded from the

Data Science Bowl 2017. We use sample images in the dataset

as the original image, and simulate the projection process with

Matlab to generate the sparse-view and the original full-view

sinogram with Poisson noise. The sparse-view sinogram and

the original full-view sinogram is used as the data pair to

feed CNN. Then, all sparse-view sinogram is interpolated to

the generated full-view sinogram. The intermediate images are

reconstructed with these generated full-view sinogram, and are

used as the training dataset for GAN together with the original

images.

B. Experiment details

Because the size of the P−sand PCis inconsistent, the Ps

is resized into the same size with Pc by bicubic interpolation

algorithm. Considering the GPU memory limitation and the

depth of the network, thus patches of size 64×64 are randomly

cropped from the input data. As a result, 57216 patches are

produced in interpolation network while 19200 patches are

produced in generative adversarial networks. To address the

reconstruction of different views, the networks are trained

respectively on each dataset. The proposed model is trained

using the Tensorﬂow package on a workstation with an AMD

Ryzen 1700 CPU and a GTX1070 GPU. The parameters of

the networks were optimized using the Adam [20] optimizer

with a setting of β1= 0.9, β2= 0.9, lethe arning rate of

10−4and batch size of 4. The interpolation network is trained

after 5000 epochs, while the generative adversarial networks

is initialized with MSE loss after 500 epochs training and

stopped after 5000 epochs. The pre-trained Vgg-19 model is

ﬁne-tuned on the same dataset after 3000 epochs.

IV. RES ULT A ND DISCUSSION

A. Reconstruction results and quantitative metrics

Most of traditional perceptual image quality metrics such as

PSNR and SSIM are based on per-pixel measurement, which

is too simple to account for human perception. Thus, we adopt

a perceptual similarity metric proposed by Zhang R [3], which

is based on the perceptual distance in deep feature space. The

metric is the perceptual distance from the referenced image,

which is called PDR in the following paragraph.

To validate the advantage of the proposed methods, the

different methods are tested on a test dataset based on the

metrics of PSNR, SSIM, and PDR. The computing time is also

concerned to evaluate the computing cost of different methods.

The reconstruction results of different methods are shown in

Fig.4, of which the quantitative metrics are shown in the table

As the results show, the SVGAN wins the best results in

all the quality metrics, which indicates that the reconstruction

results of SVGAN are most perceptual similar with referenced

images. The same conclusions can be drawn according to

the reconstruction results. The SVGAN suppress the artifacts

while maintains the most images details, which are most

perpetually convincing. Although the CNN gets good per-

formance in SSIM and PSNR, the reconstruction results are

blur and over-smoothed compared with SVGAN and GAN.

Meanwhile, the GAN reconstructs shaper results and clear

edges, some artifacts are still retained on the images, which

decrease the PSNR and SSIM.

FBP ART CNN GAN SVGANOriginal

30 views

60 views

90 views

120 views

Fig. 4. Result Comparison.

TABLE I

PER FOR MA NCE COMPARISON

performance views ART FBP CNN GAN SVGAN

PSNR

30 21.98 18.11 24.96 21.55 26.14

60 24.52 23.70 28.71 26.23 29.46

90 23.39 24.42 30.15 27.85 30.54

120 27.59 26.63 30.92 29.00 32.42

SSIM

30 0.38 0.24 0.68 0.44 0.75

60 0.55 0.41 0.70 0.55 0.76

90 0.44 0.47 0.83 0.66 0.86

120 0.55 0.67 0.86 0.78 0.87

PDR

30 0.31 0.37 0.22 0.30 0.18

60 0.21 0.30 0.19 0.22 0.16

90 0.26 0.30 0.13 0.18 0.06

120 0.21 0.27 0.11 0.16 0.05

time(s)

30 308

0.7 3.3 4.1 6.9

60 608

90 903

120 1205

B. Discussion

We present a deep learning based method to suppress the

artifacts of the sparse-view CT images, which consists of two

parts: convolutional neural network and generative adversarial

networks.

The convolutional neural network is used for completing

projection data, which removes the artifacts caused by in-

sufﬁcient data. However, the high-frequency information of

original data may be lost in the process. The reconstructed

results show that the images are over-smoothed and details

are blurred though the artifacts are removed well.

The generative adversarial networks are used for repairing

the image quality in the image domain, which maintains the

image details and keeps sharp edges. However, the GAN may

treat the artifacts as the structures of the image and retains

them in reconstructed images. The reconstructed results show

that the images have lots of artifacts while the image details

are clear.

Thus, the proposed method combines the advantages of

those methods and gets perceptual satisfying results. Mean-

while, processing the original data and images reconstructed

by FBP, the proposed method doesnt require iteration, which

is much faster than ART. Despite the proposed method out-

performs the traditional ART method in perceptual quality and

reconstruction speed, the image quality is still serious when the

number of views is lower than 60, which should be addressed

in the future work.

V. C ONCLUSIONS

In this work, we proposed a deep learning method named

SVGAN to reduce artifacts of sparse-view CT images. The

proposed method consists of interpolation convolutional neural

network in the projection domain and estimation generative

adversarial networks in the image domain, which both help

to improve the quality of reconstructed images. Combining

the advantages of those networks, the method suppresses the

artifacts while maintains the image details.

As result, the proposed method delivers superior perfor-

mance in both perceptual quality and compute speed compared

with the ART method. Both interpolation network in the pro-

jection domain and estimation generative adversarial network.

REFERENCES

[1] K. Liang, H. Yang, and Y. Xing, “Comparision of projection domain,

image domain, and comprehensive deep learning for sparse-view x-ray ct

image reconstruction,” arXiv preprint arXiv:1804.04289, 2018.

[2] C. Ledig, L. Theis, F. Husz´

ar, J. Caballero, A. Cunningham, A. Acosta,

A. P. Aitken, A. Tejani, J. Totz, Z. Wang et al., “Photo-realistic single

image super-resolution using a generative adversarial network.” in CVPR,

vol. 2, no. 3, 2017, p. 4.

[3] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The

unreasonable effectiveness of deep features as a perceptual metric,” arXiv

preprint, 2018.

[4] Y. Han and J. C. Ye, “Framing u-net via deep convolutional framelets:

Application to sparse-view ct,” IEEE transactions on medical imaging,

vol. 37, no. 6, pp. 1418–1429, 2018.

[5] H. Lee, J. Lee, H. Kim, B. Cho, and S. Cho, “Deep-neural-network

based sinogram synthesis for sparse-view ct image reconstruction,” arXiv

preprint arXiv:1803.00694, 2018.

[6] J. M. Wolterink, T. Leiner, M. A. Viergever, and I. Iˇ

sgum, “Generative ad-

versarial networks for noise reduction in low-dose ct,” IEEE transactions

on medical imaging, vol. 36, no. 12, pp. 2536–2545, 2017.

PGNet: Projection generative network for sparse‐view reconstruction of projection‐based magnetic particle imaging

Article

Full-text available

Oct 2022
MED PHYS

Background Magnetic particle imaging (MPI) is a novel tomographic imaging modality that scans the distribution of superparamagnetic iron oxide nanoparticles. However, it is time‐consuming to scan multiview two‐dimensional (2D) projections for three‐dimensional (3D) reconstruction in projection MPI, such as computed tomography (CT). An intuitive idea is to use the sparse‐view projections for reconstruction to improve the temporal resolution. Tremendous progress has been made toward addressing the sparse‐view problem in CT, because of the availability of large data sets. For the novel tomography of MPI, to the best of our knowledge, studies on the sparse‐view problem have not yet been reported. Purpose The acquisition of multiview projections for 3D MPI imaging is time‐consuming. Our goal is to only acquire sparse‐view projections for reconstruction to improve the 3D imaging temporal resolution of projection MPI. Methods We propose to address the sparse‐view problem in projection MPI by generating novel projections. The data set we constructed consists of three parts: simulation data set (including 3000 3D data), four phantoms data, and an in vivo mouse data. The simulation data set is used to train and validate the network, and the phantoms and in vivo mouse data are used to test the network. When the number of novel generated projections meets the requirements of filtered back projection, the streaking artifacts will be absent from MPI tomographic imaging. Specifically, we propose a projection generative network (PGNet), that combines an attention mechanism, adversarial training strategy, and a fusion loss function and can generate novel projections based on sparse‐view real projections. To the best of our knowledge, we are the first to propose a deep learning method to attempt to overcome the sparse‐view problem in projection MPI. Results We compare our method with several sparse‐view methods on phantoms and in vivo mouse data and validate the advantages and effectiveness of our proposed PGNet. Our proposed PGNet enables the 3D imaging temporal resolution of projection MPI to be improved by 6.6 times, while significantly suppressing the streaking artifacts. Conclusion We proposed a deep learning method operated in projection domain to address the sparse‐view reconstruction of MPI, and the data scarcity problem in projection MPI reconstruction is alleviated by constructing a sparse‐dense simulated projection data set. By our proposed method, the number of acquisitions of real projections can be reduced. The advantage of our method is that it prevents the generation of streaking artifacts at the source. Our proposed sparse‐view reconstruction method has great potential for application to time‐sensitive in vivo 3D MPI imaging.

The use of deep learning methods in low-dose computed tomography image reconstruction: a systematic review

Article

Full-text available

May 2022

Conventional reconstruction techniques, such as filtered back projection (FBP) and iterative reconstruction (IR), which have been utilised widely in the image reconstruction process of computed tomography (CT) are not suitable in the case of low-dose CT applications, because of the unsatisfying quality of the reconstructed image and inefficient reconstruction time. Therefore, as the demand for CT radiation dose reduction continues to increase, the use of artificial intelligence (AI) in image reconstruction has become a trend that attracts more and more attention. This systematic review examined various deep learning methods to determine their characteristics, availability, intended use and expected outputs concerning low-dose CT image reconstruction. Utilising the methodology of Kitchenham and Charter, we performed a systematic search of the literature from 2016 to 2021 in Springer, Science Direct, arXiv, PubMed, ACM, IEEE, and Scopus. This review showed that algorithms using deep learning technology are superior to traditional IR methods in noise suppression, artifact reduction and structure preservation, in terms of improving the image quality of low-dose reconstructed images. In conclusion, we provided an overview of the use of deep learning approaches in low-dose CT image reconstruction together with their benefits, limitations, and opportunities for improvement.

A Limited-View CT Reconstruction Framework Based on Hybrid Domains and Spatial Correlation

Article

Full-text available

Feb 2022
SENSORS-BASEL

Limited-view Computed Tomography (CT) can be used to efficaciously reduce radiation dose in clinical diagnosis, it is also adopted when encountering inevitable mechanical and physical limitation in industrial inspection. Nevertheless, limited-view CT leads to severe artifacts in its imaging, which turns out to be a major issue in the low dose protocol. Thus, how to exploit the limited prior information to obtain high-quality CT images becomes a crucial issue. We notice that almost all existing methods solely focus on a single CT image while neglecting the solid fact that, the scanned objects are always highly spatially correlated. Consequently, there lies bountiful spatial information between these acquired consecutive CT images, which is still largely left to be exploited. In this paper, we propose a novel hybrid-domain structure composed of fully convolutional networks that groundbreakingly explores the three-dimensional neighborhood and works in a “coarse-to-fine” manner. We first conduct data completion in the Radon domain, and transform the obtained full-view Radon data into images through FBP. Subsequently, we employ the spatial correlation between continuous CT images to productively restore them and then refine the image texture to finally receive the ideal high-quality CT images, achieving PSNR of 40.209 and SSIM of 0.943. Besides, unlike other current limited-view CT reconstruction methods, we adopt FBP (and implement it on GPUs) instead of SART-TV to significantly accelerate the overall procedure and realize it in an end-to-end manner.

Weighted sparse constraint-based reconstruction networks for sparse view CT imaging

Conference Paper

Mar 2024

End-to-End Deep Learning for Reconstructing Segmented 3D CT Image from Multi-Energy X-ray Projections

Conference Paper

Oct 2023

基于深度学习的稀疏或有限角度CT重建方法研究综述

Article

Jan 2023

Learning Projection Views for Sparse-View CT Reconstruction

Conference Paper

Oct 2022

Sparse-view and limited-angle CT reconstruction with untrained networks and deep image prior

Article

Oct 2022
COMPUT METH PROG BIO

Background and objective: Neural network based image reconstruction methods are becoming increasingly popular. However, limited training data and the lack of theoretical guarantees for generalizability raised concerns, especially in biomedical imaging applications. These challenges are known to lead to an unstable reconstruction process that poses significant problems in biomedical image reconstruction. In this paper, we present a new framework that uses untrained generator networks to tackle this challenge, leveraging the structure of deep networks for regularizing solutions based on a technique known as Deep Image Prior (DIP). Methods: To achieve a high reconstruction accuracy, we propose a framework optimizing both the latent vector and the weights of a generator network during the reconstruction process. We also propose the corresponding reconstruction strategies to improve the stability and convergent performance of the proposed framework. Furthermore, instead of calculating forward projection in each iteration, we propose implementing its normal operator as a convolutional kernel under parallel beam geometry, thus greatly accelerating the calculation. Results: Our experiments show that the proposed framework has significant improvements over other state-of-the-art conventional, pre-trained, and untrained methods under sparse-view, limited-angle, and low-dose conditions. Conclusions: Applying to parallel beam X-ray imaging, our framework shows advantages in speed, accuracy, and stability of the reconstruction process. We also show that the proposed framework is compatible with all differentiable regularizations that are commonly used in biomedical image reconstruction literature. Our framework can also be used as a post-processing technique to further improve the reconstruction generated by any other reconstruction methods. Furthermore, the proposed framework requires no training data and can be adjusted on-demand to adapt to different conditions (e.g. noise level, geometry, and imaged object).

Comparing Different CNN Training Strategies in Low-Level CT Image-Processing Tasks

Chapter

Jan 2021

Alexander Huang

Image artifact removal in computed tomography (CT) allows clinicians to make more accurate diagnoses. One method of artifact removal is iterative reconstruction. However, reconstructing large amounts of CT data using this method is tedious, which is why researchers have proposed using filtered back-projection paired with neural networks. The purpose of this paper is to compare the performances of various forms of training data for convolutional neural networks in three low level CT image processing tasks: sinogram completion, Poisson noise removal, and focal spot deblurring. Specifically, modified U-nets are trained with either CT sinogram data or reconstruction data for each of the tasks. Then, the predicted results of each model are compared in terms of image quality and viability in a clinical setting. The predictions show strong evidence of increased image quality when training models with reconstruction data, thus the reconstruction strategy possesses a clear edge in practicality over the sinogram strategy.

Enhancement of Four-dimensional Cone-beam Computed Tomography (4D-CBCT) using a Dual-encoder Convolutional Neural Network (DeCNN)

Article

Dec 2021

4D-CBCT is a powerful tool to provide respiration-resolved images for the moving target localization. However, projections in each respiratory phase are intrinsically under-sampled under the clinical scanning time and imaging dose constraints. Images reconstructed by compressed sensing (CS)-based methods suffer from blurred edges. Introducing the average-4D-image constraint to the CS-based reconstruction, such as prior-image-constrained CS (PICCS), can improve the edge sharpness of the stable structures. However, PICCS can lead to motion artifacts in the moving regions. In this study, we proposed a dual-encoder convolutional neural network (DeCNN) to realize the average-imageconstrained 4D-CBCT reconstruction. The proposed DeCNN has two parallel encoders to extract features from both the under-sampled target phase images and the average images. The features are then concatenated and fed into the decoder for the highquality target phase image reconstruction. The reconstructed 4D-CBCT using of the proposed DeCNN from the real lung cancer patient data showed (1) qualitatively, clear and accurate edges for both stable and moving structures; (2) quantitatively, lowintensity errors, high peak signal-to-noise ratio, and high structural similarity compared to the ground truth images; and (3) superior quality to those reconstructed by several other state-of-the-art methods including the back-projection, CS totalvariation, PICCS, and the single-encoder CNN. Overall, the proposed DeCNN is effective in exploiting the average-image constraint to improve the 4DCBCT image quality.

Comparision of projection domain, image domain, and comprehensive deep learning for sparse-view X-ray CT image reconstruction

Article

Full-text available

Apr 2018

X-ray Computed Tomography (CT) imaging has been widely used in clinical diagnosis, non-destructive examination, and public safety inspection. Sparse-view (sparse view) CT has great potential in radiation dose reduction and scan acceleration. However, sparse view CT data is insufficient and traditional reconstruction results in severe streaking artifacts. In this work, based on deep learning, we compared image reconstruction performance for sparse view CT reconstruction with projection domain network, image domain network, and comprehensive network combining projection and image domains. Our study is executed with numerical simulated projection of CT images from real scans. Results demonstrated deep learning networks can effectively reconstruct rich high frequency structural information without streaking artefact commonly seen in sparse view CT. A comprehensive network combining deep learning in both projection domain and image domain can get best results.

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

Conference Paper

Jun 2018

Deep-Neural-Network-Based Sinogram Synthesis for Sparse-View CT Image Reconstruction

Article

Mar 2018

Recently, a number of approaches to low-dose computed tomography (CT) have been developed and deployed in commercialized CT scanners. Tube current reduction is perhaps the most actively explored technology with advanced image reconstruction algorithms. Sparse data sampling is another viable option to the low-dose CT, and sparse-view CT has been particularly of interest among the researchers in CT community. Since analytic image reconstruction algorithms would lead to severe image artifacts, various iterative algorithms have been developed for reconstructing images from sparsely view-sampled projection data. However, iterative algorithms take much longer computation time than the analytic algorithms, and images are usually prone to different types of image artifacts that heavily depend on the reconstruction parameters. Interpolation methods have also been utilized to fill the missing data in the sinogram of sparse-view CT thus providing synthetically full data for analytic image reconstruction. In this work, we introduce a deep-neural-network-enabled sinogram synthesis method for sparse-view CT, and show its outperformance to the existing interpolation methods and also to the iterative image reconstruction approach.

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

Article

Jan 2018

While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on the ImageNet classification task has been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called "perceptual losses"? What elements are critical for their success? To answer these questions, we introduce a new Full Reference Image Quality Assessment (FR-IQA) dataset of perceptual human judgments, orders of magnitude larger than previous datasets. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by huge margins. More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations.

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

Conference Paper

Jul 2017

Framing U-Net via Deep Convolutional Framelets: Application to Sparse-View CT

Article

Aug 2017

X-ray computed tomography (CT) using sparse projection views is often used to reduce the radiation dose. However, due to the insufficient projection views, a reconstruction approach using the filtered back projection (FBP) produces severe streaking artifacts. Recently, deep learning approaches using large receptive field neural networks such as U-net have demonstrated impressive performance for sparse view CT reconstruction. However, theoretical justification is still lacking. The main goal of this paper is, therefore, to develop a mathematical theory and to discuss how to improve these algorithms. In particular, inspired by the recent theory of deep convolutional framelets, we show that the U-net relies on a sub-optimal non-local bases that overly emphasizes low frequency components. The discovery leads to a dual frame and a tight frame U-net architectures for effective recovery of directional image components.

Generative Adversarial Networks for Noise Reduction in Low-Dose CT

Article

May 2017

Noise is inherent to low-dose CT acquisition. We propose to train a convolutional neural network (CNN) jointly with an adversarial CNN to estimate routine-dose CT images from low-dose CT images and hence reduce noise. A generator CNN was trained to transform low-dose CT images into routine-dose CT images using voxel-wise loss minimization. An adversarial discriminator CNN was simultaneously trained to distinguish the output of the generator from routinedose CT images. The performance of this discriminator was used as an adversarial loss for the generator. Experiments were performed using CT images of an anthropomorphic phantom containing calcium inserts, as well as patient non-contrast-enhanced cardiac CT images. The phantom and patients were scanned at 20% and 100% routine clinical dose. Three training strategies were compared: the first used only voxel-wise loss, the second combined voxel-wise loss and adversarial loss, and the third used only adversarial loss. The results showed that training with only voxel-wise loss resulted in the highest peak signal-to-noise ratio with respect to reference routine-dose images. However, the CNNs trained with adversarial loss captured image statistics of routine-dose images better. Noise reduction improved quantification of low-density calcified inserts in phantom CT images and allowed coronary calcium scoring in low-dose patient CT images with high noise levels. Testing took less than 10 seconds per CT volume. CNN-based low-dose CT noise reduction in the image domain is feasible. Training with an adversarial network improves the CNN’s ability to generate images with an appearance similar to that of reference routine-dose CT images.

Sparse-View CT Reconstruction via Generative Adversarial Networks

Recommended publications

An improved adaptive TV algorithm for image reconstruction

CT image super-resolution reconstruction via pixel-attention feedback network

Characterization of Synthetic Emeralds Based on the Morphology of their Inclusions

Low dose CT technique using prior image knowledge