Content uploaded by Anders Tingberg
Author content
All content in this area was uploaded by Anders Tingberg on Mar 10, 2021
Content may be subject to copyright.
A Study of the Feasibility of using slabbing to reduce Tomosynthesis
Review Time
Magnus Dustler1*, Martin Andersson1, Daniel Förnvik1, Pontus Timberg2, Anders Tingberg1
1Medical Radiation Physics, Department of Clinical Sciences Malmö, Faculty of Medicine, Lund
University, Skåne University Hospital, SE-205 02, Sweden
2 Diagnostic Radiology, Department of Clinical Sciences Malmö, Faculty of Medicine, Lund
University, Skåne University Hospital Malmö, SE-205 02, Sweden
ABSTRACT
This study aimed to investigate whether decreasing the amount of slices in breast tomosynthesis (BT) image volumes
reduce reading time. BT slices were combined into so-called slabs, by reconstructing thin slices and merging them into
thicker slabs. Sets of slabs where created from 35 clinical BT volumes with malignant or benignant findings and from 50
BT volumes drawn from screening sets (without any prior review). The image sets were reviewed in two separate
sessions while the review time was recorded. A total of five experienced radiologists were employed for the image
review.
Additionally a VGA study was performed to compare slabbed images with the originals in order to ensure that the image
quality was not significantly degraded. One set of 27 pathological cases (13 masses and 14 microcalcification clusters)
and one of 22 subtle lesions that had been missed on digital mammography but detected on BT were presented to an
experienced radiologist and 2 medical physicists who rated the quality of the slabbed versions relative to the originals.
The study could find no significant degradation in image quality when using 2 mm slabs instead of 1 mm slices. There
was no significant decrease in reading time on clinical cases (P = .133), but on screening images there was a significant
decrease of 7.7 ± 9.6 s from an average level of 32.2 ± 14.5 s (P < .0001). This suggests that increasing slab thickness
can reduce the time radiologists spend studying normal images by 20%.
Keywords: Mammography, breast tomosynthesis, slab, thick slice, reading time
1. INTRODUCTION
Breast Tomosynthesis (BT) is on the rise as perhaps a viable alternative to mammography for breast cancer screening.
This study aims to reduce one of the major potential obstacles: the increased review time per case when compared to
mammography1-3. Reducing the amount of slices in the BT image volume by merging them into thicker slabs might be
effective in this regard, as especially in a screening modality, the ability to promptly review a case is of critical
importance.
This study is based in part on data collected during the spring of 2011 with the intention of possibly expanding them at a
later date. The data was later reanalyzed and added to during 2012, when more advanced methods of producing image
slabs became available.
2. METHOD(S)
2.1 Slice thickness
Increasing the thickness of BT slices can be done in two ways4. The most straightforward of these is to vary the thickness
* Corresponding author: magnus.dustler@med.lu.se
+46 40 33 86 56
during reconstruction, as the filtered back projection (FBP) algorithm used can handle arbitrary reconstruction
thicknesses, merely affecting the size of the reconstructed image voxels along the z-direction. To improve the quality of
Medical Imaging 2013: Image Perception, Observer Performance, and Technology Assessment,
edited by Craig K. Abbey, Claudia R. Mello-Thoms, Proc. of SPIE Vol. 8673, 86731L
© 2013 SPIE · CCC code: 1605-7422/13/$18 · doi: 10.1117/12.2006987
Proc. of SPIE Vol. 8673 86731L-1
Downloaded From: http://spiedigitallibrary.org/ on 09/27/2013 Terms of Use: http://spiedl.org/terms
the reconstructed volume and add to the 3D-appearance by reducing the influence of out-of-plane artifacts a slice
thickness filter tailored to a specific z-direction sampling interval (or slice thickness) is used. The application of the slice
thickness filter is intended to keep slice thickness constant, but it also reduces the sharpness of high-frequency image
components. As the Siemens MAMMOMAT Inspiration Tomo (Siemens AG, Erlangen, Germany) and its slice thickness
filter is optimized for a sampling interval of 1 mm in the z-direction doing this would introduce an additional factor of
uncertainty if slices are made thicker 5.
2.2 Slabbing
The alternative to generating thicker slices is to combine several reconstructed slices into new thicker ones, which are
called slabs (Figure 1). There are a number of possible approaches to do this, most prominently by setting each pixel in
the slab either as the average or the maximum value of the corresponding pixels in the constituent slices (known as AIP
(average-intensity projection) and MIP (maximum-intensity projection slabs, respectively). These approaches do not
differ in the number or thickness of slabs they produce, but they will affect the image information contained in the slices
in different ways. Averaging suppresses image noise, but will also reduce the contrast of small features such as
microcalcifications, especially if such structures are included in only one slice. Selecting the MIP will instead increase
visibility of micro-calcifications while also increasing the noise level. For this study, it was decided that the MIP and AIP
approaches would both be appopriate, created either by importing and slabbing BT-volumes using Matlab (Mathworks,
Natick, MA) or by employing Siemens prototype system with a built-in option of reconstruction based on statistical
artifact reduction and superresolution, which dispenses with the slice-thickness filter and instead reconstructs slices with
a thickness of 1/6th mm and combines them into slabs of the desired thickness 6. This method was not available during
the initial phase of the study.
Figure 1: This figure illustrates the slabbing procedure. Each pixel of a slab consists of the maximum values of corresponding pixels in
a number of thinner slices.
In order not to complicate the study it was decided to only investigate the reading time reduction obtained by reducing
the number of included slices in the image volumes by one half. The required reduction of the number of slices required
a comparison with 2 mm slabs. Two different methods were used to create the slabs. Firstly, the straightforward
Proc. of SPIE Vol. 8673 86731L-2
Downloaded From: http://spiedigitallibrary.org/ on 09/27/2013 Terms of Use: http://spiedl.org/terms
approach of building each 2 mm AIP slab of two consecutive 1 mm slices reconstructed with standard settings using the
slice thickness filter was used. Secondly, the previously mentioned built-in option of reconstruction based on statistical
artifact reduction was used when it became available during 2012, combining 12 slices into 2 mm slabs. These
approaches are broadly similar in that the AIP slabs they produce are somewhat noisier than 1 mm reconstructed slices
but preserve (or improve) the visibility of high frequency components (such as microcalcifications).
2.3 Image Quality
To ensure that this procedure would not dramatically affect lesion detection slabs of a set of 27 pathological structures
from clinical images (13 lesions and 14 microcalcification clusters) were created. These were presented side-by-side with
the original in a relative Visual Grading Analysis (VGA) 7-8. One experienced radiologist and two medical physicists
were asked to rate the quality of the slabbed image relative to the original, using ImageJ software 9.
Later 22 sets of images with subtle findings (defined as lesions that were detected at tomosynthesis but not on
mammography) were used to compare slabs of 2, 3 and 5 mm thickness (using the now available thin slice superreslution
reconstruction with statistical artifact reduction which had the added benefit of being able to easily import the images to
standard mammography workstations, allowing their use for review) with 1 mm originals. The same radiologist and
medical physicists ranked these images from best to worst. The criteria which reviewers were told use in their rating and
ranking of images were (if applicable); mass visibility, microcalcification conspicuity, mass visibility and
microcalcification continuity. Continuity was defined in this context as the degree to which structures could be followed
from one slice to the next.
2.4 Reading time
The reading time study was split into two parts, the first performed during 2011 using standard 2 mm slabs. The first part
involved two experienced radiologists reviewing 35 clinical images (30 with minor proven benign findings and 5 with
proven malignant findings) in a simulated screening setting using ViewDEX10-11 where the radiologists where told to
mark findings as normal while being timed with a chronometer. The enriched material was intended to investigate the
effect of slabbing on tomosynthesis cases with suspicious findings. The radiologists were presented with the cases in two
sessions separated by three weeks. 19 image stacks were randomly selected to be slabbed during the first session, while
the remaining 16 stacks were slabbed during the second session.
The second part of the study involved slabbing 50 sets of image stacks taken directly from a BT screening study. These
image stacks were purposely randomly selected without any prior review, in order to represent the type of cases
commonly encountered in a screening setting, i.e. likely normal cases with no pathology. Four experienced radiologists
(one of which was part of the first clinical part as well) reviewed these cases both with the standard 1 mm slices and 2
mm superresolution MIP slabs under normal screening conditions using a standard mammography workstation. The
reviewers reviewed each case until they felt confident in assigning it a BIRADS score 12.This was done in between two
and four sessions separated by between two days and three weeks, with the same image reviewed no more than once
during any session. The reading time of each individual case was recorded using a chronometer, with the radiologist
allowed to use all of the workstations functions and told to review the mammograms as would normally be done for a
new screening patient with no prior history or images available.
3. RESULTS
3.1 Image quality
The results of both the relative VGA study and the image ranking study were investigated using a non-parametric
Friedman test for multiple readers. The VGA study showed no significant difference in image quality for the original
images and the slabbed images (Figure 2), while the ranking study (on the statistical artifact reduction AIP slabs) did not
show a significant difference between the original images and 2 mm images (Figure 3), but did show significant
differences for 3 and 5 mm slabs.
Proc. of SPIE Vol. 8673 86731L-3
Downloaded From: http://spiedigitallibrary.org/ on 09/27/2013 Terms of Use: http://spiedl.org/terms
1
4_' F
Friedman score 4
3.2 Reading time
The first part of the reading time study, on clinical images, was inconclusive; with one radiologist showing a significant
decrease in reading time on the clinical images from a mean of 69.6 s to 55.4 s (P =.039) and one showing a very minor,
non-statistically significant decrease of 41.4 s to 40.4 s (P = 0.906). An ANOVA for multiple readers showed no
significant difference in review time (P = .133). The second part indicated a difference in reading time on the unsorted
screening material of 7.7 ± 9.6 s from an average level of 32.2 ± 14.5 s. A 2-way ANOVA performed on the whole
dataset showed that this decrease was significant (P<.0001), and indicated that there were significant differences in
reading time between the four reviewing radiologists. Further all four reviewers individually showed statistically
significant decreases in reading time (P< .0001) using the paired Student’s t-test. Table 1 summarizes data for each
reviewer.
Figure 2: Results of the relative VGA, using the Friedman test for multiple readers. There is no significant difference between group
1(original images) and group 2 (2 mm MIP slabs), as the confidence intervals overlap.
Reading time / s
1 mm 2 mm Difference
Reviewer 1 23.4 ± 6.6 14.1 ± 3.7 -9.4 ± 8.3
Reviewer 2 52.4 ± 12.6 41.2 ± 10.8 -11.2 ± 13.7
Reviewer 3 27.7 ± 8.1 22.4 ± 5.6 -5.2 ± 8.2
Reviewer 4 25.0 ± 4.7 20.2 ± 4.2 -4.8 ± 4.2
Total 32.2 ± 14.5 24.5 ± 12.1 -7.7 ± 9.6
Table 1: Summary of statistics from the reading time experiment on unsorted screening images.
Proc. of SPIE Vol. 8673 86731L-4
Downloaded From: http://spiedigitallibrary.org/ on 09/27/2013 Terms of Use: http://spiedl.org/terms
1
E
EF
5
0 2 4 6
Friedman score
810 12
Figure 3: Comparison of the ranking of original (1 mm) slices and 2, 3 and 5 mm slabs using the Friedman test for multiple readers.
There is a significant difference between the 3 and 5 mm slabs and the 1 mm slices as there is no overlap between their confidence
intervals.
4. DISCUSSION
The VGA and ranking studies did not indicate a significant decrease in image quality, which opens the possibility of
using thick slices to shorten reading time. Though the study would have to be considerably larger to conclusively show
that there is no difference, one can assume that such a difference would be minor and could likely be compensated for by
optimizing reconstruction parameters for the new slice thickness. Still, there is of course the possibility that slabbing
does impair image quality to an unacceptable degree. This could be construed as a limitation of the study, and of course
it is a serious problem if the image quality should be compromised. However, the scope of this study is about quantifying
the effect of slabbing on reading time, with the implication that although a larger image quality investigation is required
to implement thicker slabs in clinical practice; it is not pertinent to perform such a study if the gain in reading time is too
small to be relevant, though of course at what level a decrease in workload is relevant is an issue that has to be
considered by health economists.
Because of similar reasons to those mentioned above, no direct comparison between the quality of the two methods of
slabbing was performed.
Regarding the clinical images there was a large variation between the two radiologists both in absolute reading time and
changes in reading time, with one showing a significant improvement of reading time by 20% and the other showing no
applicable difference. However, by comparison the reading time differential of the unsorted images was very consistent,
with all reviewers showing a significant and substantial decrease. The average difference was 7.7 s or nearly 24%,
though it should be noted that there are substantial differences in reading time between different radiologists.
To put this in perspective, review of digital mammography is roughly 25-50 % faster than that of BT1-3. This implies that
there is a potential to speed up reading times by a meaningful amount. As noted before, there are individual variations in
reading time between readers and we theorize that such variations become more pronounced in a “busy” breast with
suspicious looking regions. Thus, even though the results for the clinical cases with suspicious findings were
Proc. of SPIE Vol. 8673 86731L-5
Downloaded From: http://spiedigitallibrary.org/ on 09/27/2013 Terms of Use: http://spiedl.org/terms
inconclusive, the fact that the vast majority of images do not have any suspicious findings and that there was, critically, a
reduction in reading time on the unsorted screening cases supports the idea that it could be beneficial to implement
slabbing in a screening program as it would speed up review of the majority of cases, those that show no suspicious
structures.
5. CONCLUSIONS
We conclude that it is feasible to use slabbing to reduce the review time of BT image sets to a level closer to that of
standard digital mammography. We cannot conclude whether slabbed images provide equivalent image quality (with
regards to lesion detection); but if not, the difference is likely minor. Thus, we consider slabbing to be a viable method to
reduce the workload of radiologists, though it requires thought and further investigation before it can be used as a
standard.
ACKNOWLEDGEMENTS
We would like to acknowledge Siemens AG Healthcare for the opportunity to use the statistical artifact reduction and
superresolution reconstructions and for financial support. We would also like to acknowledge all participating
radiologists at Unilabs AB and SUS Malmö.
REFERENCES
[1] Zuley ML, Bandos AI, Abrams GS, Cohen C, Hakim CM, Sumkin JH, Drescher J, Rockette HE, Gur D,
“Time to diagnosis and performance levels during repeat interpretations of digital breast tomosynthesis:
Preliminary observations,” Acad Radiol 17, 450-455 (2010)
[2] Good WF, Abrams GS, Catullo VJ, Chough DM, Ganott MA, Hakim CM, Gur D, “Digital breast
tomosynthesis: A pilot observer study,” Am J Roentgenol 190, 865-869 (2008)
[3] Gur D, Abrams GS, Chough DM, Ganott MA, Hakim CM, Perrin RL, Rathfon GY, Sumkin JH, Zuley
ML, Bandos AI, “Digital breast tomosynthesis: Observer performance study,” Am J Roentgenol 193, 586-591
(2009)
[4] Diekmann F, et al., “Thick Slices from Tomosynthesis Data Sets: Phantom Study for the Evaluation of
Different Algorithms,” Journal of Digital Imaging 22(5), 519-526 (2009)
[5] Mertelmeier T, et al., “Optimizing filtered backprojection reconstruction for a breast tomosynthesis
prototype device,” Proc. SPIE 6142, 131-142 (2006)
[6] Abdurahman S, Jerebko A, Mertelmeier T, Lasser T, Navab N, “Out-of-plane artifact reduction in
tomosynthesis based on regression modeling and outlier detection,” IWDM’12 Proc. of the 11th international
conference on Breast Imaging, 729-736 (2012)
[7] Båth M and Månsson L.G, “Visual grading characteristics (VGC) analysis: a non-parametric rank-
invariant statistical method for image quality evaluation,” Br J Radiol 80(951), 169-76 (2007)
[8] Månsson L.G, “Methods for the evaluation of image quality: A review,” Radiat Prot Dosimetry 90, 89-99
(2000)
[9] Rasband W.S, ImageJ, U. S. National Institutes of Health, Bethesda, Maryland, USA,
http://imagej.nih.gov/ij/, 1997-2012
[10] Håkansson M, Svensson S, Båth M, Månsson L.G, “ViewDEX – A Java-based software for presentation
and evaluation of medical images in observer performance studies,” Proc. SPIE 6509, (2007)
[11] Börjesson S, Håkansson M, Båth M, Kheddache S, Svensson S, Tingberg A, Grahn A, Ruschin M,
Hemdal B, Mattsson S, Månsson L.G, “A software tool for increased efficiency in observer performance
studies in radiology,” Radiat. Prot. Dosimetry 114, 45-52 (2005)
[12] D’Orsi CJ, Mendelson EB, Ikeda DM, et al. [Breast Imaging Reporting and Data System: ACR BI-
RADS – Breast Imaging Atlas], American College of Radiology, Reston, (2003)
Proc. of SPIE Vol. 8673 86731L-6
Downloaded From: http://spiedigitallibrary.org/ on 09/27/2013 Terms of Use: http://spiedl.org/terms