ArticlePDF Available

A 100 fps, Time-Correlated Single-Photon-Counting-Based Fluorescence-Lifetime Imager in 130 nm CMOS

Authors:
  • Quanergy Systems

Abstract and Figures

A fully-integrated single-photon avalanche diode (SPAD) and time-to-digital converter (TDC) array for high-speed fluorescence lifetime imaging microscopy (FLIM) in standard 130 nm CMOS is presented. This imager is comprised of an array of 64-by-64 SPADs each with an independent TDC for performing time-correlated single-photon counting (TCSPC) at each pixel. The TDCs use a delay-locked-loop-based architecture and achieve a 62.5 ps resolution with up to a 64 ns range. A data-compression datapath is designed to transfer TDC data to off-chip buffers, which can support a data rate of up to 42 Gbps. These features, combined with a system implementation that leverages a x4 PCIe-cabled interface, allow for demonstrated FLIM imaging rates at up to 100 frames per second.
Content may be subject to copyright.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 49, NO. 4, APRIL 2014 867
A 100 fps, Time-Correlated Single-Photon-
Counting-Based Fluorescence-Lifetime Imager
in 130 nm CMOS
Ryan M. Field, Member, IEEE, Simeon Realov, Member, IEEE, and Kenneth L. Shepard, Fellow, IEEE
Abstract—A fully-integrated single-photon avalanche diode
(SPAD) and time-to-digital converter (TDC) array for high-speed
uorescence lifetime imaging microscopy (FLIM) in standard
130 nm CMOS is presented. This imager is comprised of an array
of 64-by-64 SPADs each with an independent TDC for performing
time-correlated single-photon counting (TCSPC) at each pixel.
The TDCs use a delay-locked-loop-based architecture and achieve
a 62.5 ps resolution with up to a 64 ns range. A data-compression
datapath is designed to transfer TDC data to off-chip buffers,
which can support a data rate of up to 42 Gbps. These features,
combined with a system implementation that leverages a x4
PCIe-cabled interface, allow for demonstrated FLIM imaging
rates at up to 100 frames per second.
Index Terms—Fluorescence lifetime imaging microscopy
(FLIM), imaging, single-photon avalanche diodes (SPADs),
time-correlated single-photon counting (TCSPC), time-to-digital
converter (TDC).
I. INTRODUCTION
FLUORESCENCE microscopy is a powerful imaging tech-
nique used in the biological sciences to identify labeled
components of a sample with specicity. This is usually accom-
plished by labeling with uorescent dyes and imaging these la-
bels, isolating individual dyes by their spectral signatures with
optical lters and determining signal from the intensity of the
uorescent response. Additional techniques, such as uores-
cence energy resonance transfer (FRET), allow interactions be-
tween dyes to be monitored through measuring intensity ratios
of the dyes’ spectra [1]. Although these techniques are widely
used, uorescence intensity images can be negatively affected
by intrinsic uorescence of unlabelled molecules (autouores-
cence), residual leakage of excitation illumination through the
lters (bleedthrough), loss of uorescence with continued illu-
mination (photobleaching), and variations in uorophore con-
centration [2].
Manuscript received August 21, 2013; revised November 13, 2013; accepted
November 16, 2013. Date of publication January 02, 2014; date of current ver-
sion March 24, 2014. This paper was approved by Guest Editor Jeffrey Gealow.
This work was supported in part by the U.S. Army ResearchLaboratory and the
U.S. Army Research Ofce under contract number W911NF-12-1-0594 and by
the National Science Foundation under grant 1063315.
R. M. Field is with Intel Corporation, Santa Clara, CA 95054 USA.
S. Realov is with Intel Corporation, Hillsboro, OR 97124 USA.
K. L. Shepard is with Columbia University, 1300 New York, NY 10027 USA.
Color versions of one or more of the gures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identier 10.1109/JSSC.2013.2293777
Fluorophores have associated with them a characteristic life-
time, which denes the exponential uorescent decay transient
after the removal of the excitation source. These lifetimes, on
the order of nanoseconds for organic dyes, are characteristic of
the dye and its environment such as pH, local charge density,
viscosity, and FRET interactions [3]–[5]. Consequently, the u-
orophore lifetime can not only provide contrast in forming an
image but can also serve as a sensing mechanism for the mi-
croenvironment of the uorophore. FLIM has the property of
being insensitive to uorophore concentration and other factors
that affect uorescence intensity and has beenappliedtoappli-
cations as diverse as bacteria detection, in vivo metabolic state
identication, and FRET studies [6]–[9].
The two most common techniques for measuringtheuores-
cence lifetime are the modulated frequency-domain technique
and time-correlated single-photon counting (TCSPC) [10].
Typically, wide-eld frequency-domain techniques can record
a few frames per second but are limited by the inability to
detect small changes in lifetime or to resolve multi-exponential
decays when more than one uorophoreispresentatthesame
location. TCSPC allows for high accuracy in measuring life-
time and for the extraction of complex lifetime waveforms, as
is necessary in chemical characterization studies [4]. However,
TCSPC traditionally can require tens of seconds to acquire a
single FLIM image in typical laser scanning systems.
In commercial TCSPC systems [11], one detector, typically
an avalanche photodiode (APD) or photomultiplier tube (PMT),
with one time-to-digital converter (TDC) measurement channel
is raster scanned across a sample. At each point in the image,
a laser is pulsed and the arrival time of the rst uorescent
photon relative to the laser pulse is measured. With repeated
laser pulses, a histogram of these individual photon arrival times
is collected and the lifetime is extracted from the exponential t
to the resulting distribution. In order for the histogram distri-
butiontom
atch the true uorescence lifetime, the uorescence
intensity should be sufciently low such that a photon is only de-
tected from around 1% of laser pulses [12]. For a typical 20 MHz
laser repetition rate, a uorescence intensity tuned for a 1% de-
tection rate, an ideal scanning and detection system, and a min-
imum of 500 photon detections for lifetime extraction, it will
usually take 250 s to measure the lifetime at each pixel. Ac-
quiring a 64-by-64 pixel image, therefore, requires at least 1 s.
Recent work has leveraged integrated arrays of CMOS
single-photon avalanche diodes (SPADs) and TDCs to create
parallelized TCSPC imaging systems [13]–[18]. Although
0018-9200 © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
868 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 49, NO. 4, APRIL 2014
TAB L E I
AT
ABLE SHOWING THE MAXIMUM THEORETICAL FRAME RAT E FO R PREVIOUS SPAD ARRAYS WITH INTEGRATED TDCS.
THIS ASSUMES AN EVENT-DRIVEN READOUT SCHEME,WHICH REQUIRES THAT EACH TDC DATA MUST ALSO BE TAGGED WITH
THE PIXEL LOCATION FROM WHICH IT WAS GENERATED.THIS THEORETICAL MAXIMUM ASSUME S THAT 1000 PHOTON EVENTS ARE
NEEDED TO EXTRACT THE LIFETIME AND THAT PHOTON EVENT DATA IS OUTPUT ON EVERY POSSIBLE I/O CLOCK CYCLE
improved imaging speeds have been demonstrated in some
of these designs, the parallel acquisition channels generate
off-chip data rates that limit the achievable frame rates. For a
typical laser repetition rate of20MHzanda64-by-64array
of pixels with 10-bit timing resolution and 12-bit position
information, the required data rate reaches 1.8 Tbps. While
event-driven readout approaches have reduced these data rates,
previous SPAD array systems have still been limited in the
number of parallel channels, frame rates, or number of acquired
frames [19]. Table I lists published SPAD arrays with integrated
TDCs and the theoretical data-bandwidth-limited maximum
frame rate.
In this work, we present an FLIM imager containing a
64-by-64 array of SPADs in CMOS with per-pixel TDCs.
An event-driven high-speed datapath supports a maximum
imaging frame rate of 466 fps. The imager is designed using a
standard 130 nm CMOS process, with an associated board-level
data-handling system optimized for high-throughput operation.
Section II describes an overview of the imager chip architec-
ture, the detailed design of each on-chip component, and an
overview of the system-level considerations for high-speed
image acquisition. Section III presents measurement results
that highlight the SPAD capabilities, characterize the TDCs,
and demonstrate the high-speed FLIM performance of our
system. Section IV concludes.
II. TCSPC FLIM IMAGING SYSTEM
A block diagram showing the entire imaging system is shown
in Fig. 1. At the core of the system is the FLIM imager chip. The
data output from the imager chip consists of raw arrival time
data, which is arranged into histograms for each pixel by the
four eld-programmable gate arrays (FPGAs). Each FPGA bins
the arrival time data from 1024 pixels, which is then transmitted
to a computer where it is saved to disk before subsequent data
processing to extract the lifetime. We now consider the design
aspects of each of the major system components.
A. Integrated Circuit Architecture
In Fig. 2, a block diagram of the imager architecture is
shown. The entire imager chip is synchronized to a 20 MHz
laser signal using a phase-locked loop (PLL), which generates a
1 GHz clock signal that is distributed to the delay-locked loops
(DLLs), as , and the datapath, as . A trigger
input signal allows the imager to synchronize the datapath
controller and TDC start signals with any frequency that is an
Fig. 1. System level block diagram showing a high-level overview of the con-
nections between the IC, FPGAs, and PC.
integer fraction of the 20 MHz laser repetition, allowing for
laser pulse picking.
Each pixel contains quench, reset, control, calibration, and
output circuits. The output buffer for each pixel drives the stop
signal for one of 4096 independent TDCs. The timing data
recorded by the TDCs is shifted into a datapath for compression
before passing to the chip periphery. Four banks of 22 LVDS
buffers output a clock, a one-bit valid ag, and 20 bits, which
consists of 10 bits of arrival time and 10 bits of position data,
at up to 500 MHz.
B. SPAD Array
The SPADs used in this design are the same as those reported
by the authors in [20]. Each pixel contains one octagonal SPAD
with a 5- m-diagonal active area and the pixel-level circuitry
for control of the SPAD (see Fig. 3). With this circuitry, the
SPADs are spaced on a 48 m pitch, resulting in a ll-factor
of 0.77%. The layout for a single pixel is presented in Fig. 4(a).
When the SPAD is triggered by a photon, a voltage equal to
the overvoltage potential, , is applied across the gate of the
PFET M4, causing it to turn on, triggering the output buffers.
The threshold voltage of M4 is approximately 320 mV and the
maximum gate voltage is 3.6 V, yielding a range of acceptable
values from 0.32 V to 3.6 V as set by . The inverter
U2 is connected to the core chip power supply of 1.5 V and
level-shifts the output from M4, with a supply of ,to
this core logic voltage. Following a second inverter, U3, are
FIELD et al.: A 100 fps, TIME-CORRELATED SINGLE-PHOTON-COUNTING-BASED FLUORESCENCE-LIFETIME IMAGER IN 130 nm CMOS 869
Fig. 2. Block diagram showing the major components of the imager chip.
Fig. 3. Pixel circuit schematic that performs the quench, reset, TDC calibration, event output, and pixel control functions. Transistors M1–M9 and inverter U2
are designed using thick oxide devices.
Fig. 4. (a) The layout of a single pixel in the array, including SPAD and the pixel control circuit of Fig. 3. The pixel and circuitry occupy an area that is
48 m48 m, of which a considerable amount is white space due to the conservative SPAD structure and guard rings used. (b) The impulse response of the
SPAD as recorded by the on-chip TDCs is 125 ps. Each bar in the histogram represents a 62.5 ps wide timing bin.
two multiplexers for selecting among the SPAD output, an elec-
trical calibration input (used for characterizing the TDCs), and
a ground signal for turning off the output. The output from the
multiplexers is then buffered to drive the stop signal of the TDC
for this pixel.
When a photon event triggers a SPAD avalanche, the
avalanche current must be stopped, or quenched, so that the
SPAD can be reset and used in subsequent detection windows.
In order to quench the device, the voltage across the SPAD
must be reduced to below its breakdown voltage, .Inthis
870 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 49, NO. 4, APRIL 2014
Fig. 5. Pixel circuit timing diagram showing a typical measurement and reset cycle. A pixel event occurs at the beginning of the cycle and triggers the output
buffer. Immediately after the next laser pulse, is asserted and the SPAD recharges.
Fig. 6. A block level overview of the time-to-digital converter used in this work. A delay-locked loop subdivides the reference clock ( ) into 16 evenly
spaced phases. also increments a coarse counter. The phase and counter outputs are buffered to ip-ops to be used in TDCs for groups of 128 pixels. The
thermometer encoder converts the 16-bit thermometer code into a 4 bit value, which, along with 6-bits from the counter and a valid data ag, is clocked into a
chain of shift ip-ops at the end of each measurement window. is a gated version of the datapath clock.
design, a PFET device, M1 in Fig. 3, is used as the quenching
resistor. A tunable voltage, , is applied to the gate of M1,
which allows the drain-to-source resistance, , of the device
to be adjusted from 10 k to several M as approaches
the threshold voltage.
After the SPAD has been quenched, it must be reset before it
can be used to detect another event. In an active quenching ap-
proach [21], [22], reset is performed by the wide-channel PFET
device, M2, with a resistance between 1 k and 400 ,de-
pending on . In addition, the NFET, M3, is used to hold
the bias across the SPAD below breakdown and can be used to
prevent the SPAD from resetting or to disable the SPAD com-
pletely. M2 and M3 are independently controlled to minimize
the probability for after-pulsing.
The signal, triggered by the laser pulse, passes through
an AND gate, which allows for the option of disabling the pixel,
and through a level-shifter that brings the signal to the
supply level. Device M2 is enabled and charges the
cathode of the SPAD. When M5 turns on, it pulls the input to
U2 low, and causes U1 to pull down, turning off M3. The timing
diagram for event detection and reset is shown in Fig. 5.
Both pixel_off_sc and pixel_off_ctrl can be used to disable a
pixel. The pixel_off_sc signal is a conguration bit that can be
used to completely disable the pixel during all measurements.
By using this control signal to disable abnormally noisy pixels,
data bandwidth that would otherwise be used by these noise
events is eliminated. The pixel_off_ctrl signal comes from a
datapath controller and is used to disable the pixel at the end of
a measurement window if no events have occurred. This feature
reduces the impact of the dark count rate (DCR) on the lifetime
measurement. A similar technique for dening the measurement
window has recently been used to achieve extremely low noise
levels in CMOS SPADs for measurements in which the photon
arrival time is tightly constrained, like 3-D imaging [23].
FIELD et al.: A 100 fps, TIME-CORRELATED SINGLE-PHOTON-COUNTING-BASED FLUORESCENCE-LIFETIME IMAGER IN 130 nm CMOS 871
Fig. 7. Overview of delay-locked loop.
C. Time-to-Digital Converters
The TDC used in this work is based on a differential DLL
architecture with a synchronous counter. This architecture pro-
vides a well-dened precision and dynamic range and fast con-
version speed, and the DLL can be easily shared among groups
of pixels in the array. In this design, the DLL and counter out-
puts are distributed to groups of 128 pixels, as shown in Fig. 6.
The DLL [14] uses buffers as the delay element and includes a
differential clock generator and charge pump (see Fig. 7).
A key concern when designing a differential voltage-con-
trolled delay line (VCDL) with cross-coupled inverters between
each stage is the matching of the two complementary clock
edges. A slight difference in the crossing points of these
complementary edges results in systematic timing errors. A
complementary clock generator is designed using a pass-tran-
sistor circuit to align the crossing points of the complementary
clocks (see Fig. 8(a)). The input buffer driving is designed
such that its rise and fall times are greater than than that of
the inverter receiving . Consequently, clk will be in the
middle of its low-to-high transition when reaches the
pass-transistor gate. As a result, is able to pass through
its transmission gate with a constant low resistance because
clk has already switched these transmission gates before
arrives, producing well-aligned transitions for both clk and
. Simulation results including typical and skewed process
corners are shown in Fig. 8(b)–(d).
The phase detector of the DLL generates equally sized UP
and DN pulses when the input phases are aligned, such that
any mismatch in the up and down currents of the charge pump
will result in a static phase offset. Process-voltage-temperature
(PVT) variations, in particular those that cause differences in the
relative strength of NFETs and PFETs, can produce such offsets,
necessitating calibration. The charge pump calibration control is
designed such that the total combined differential width of the
current mirror NFETs for the UP and DN currents can be ad-
justed in increments of 10 nm, corresponding to current steps
of approximately 3 nA. A subset of the associated calibration
coding scheme is shown in Table II.
A schematic of the charge pump is shown in Fig. 9, a
switched, low-headroom, self-biased, cascoded current mirror.
This architecture provides a high output resistance and closely
matches an ideal current source. Both the UP and DN switches
are implemented using only NFETs to minimize variability
due to NFET-PFET process skew. Calibration control is im-
plemented with six additional NFET devices in parallel with
each of the switches (M24–M35), allowing for ne current
adjustments on the scale of 3 nA with minimum length devices.
Replica biasing devices (M39–M50) are also used to ensure
that the UP and DN adjustment currents are well matched. 2.5 V
devices are used, allowing the charge pump voltage to span
the entire operating supply range of the VCDL from 0.75 V to
1.6 V. An off-chip reference current is used to bias the current
mirrors.
D. Data-Compression Datapath
If a 10-bit time value is output for every pixel after each laser
repetition with a laser pulse rate of 20 MHz, an off-chip data
rate of 1.8 Tbps would be required for the array. This data rate,
however, does not reect the sparseness of the data. In partic-
ular, TCSPC experiments typically record a photon hit for only
1–2% of laser repetitions. Through the use of an event-driven
872 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 49, NO. 4, APRIL 2014
Fig. 8. (a) Schematic of the complementary clock generator. (b)–(d) Simulation results showing complementary clock edge alignment for nominal devices (b),
skewed fast NFET, slow PFET devices (c), and skewed slow NFET, fast PFET devices (d).
TAB L E I I
SUBSET OF CALIBRATION CODES AND COMBINATIONS DEMONSTRATING THE WIDTH TUNING CAPABILITIES OF THE CALIBRATED CHARGE PUMP
USED IN THIS DESIGN.THE DIFFERENTIAL VALUE OF THE CODES CONTINUOUSLY INCREASES IN INCREMENTS OF 10 NM FROM 0TO 630 NM.
ADDITIONAL DIFFERENTIAL WIDTHS BEYOND 630 NM CAN BE GENERATED FROM THESE CODES WITH THE MAXIMUM CALIBRATION DIFFEREN CE
AT 2470 NM.THE BITS IN THE CODE CORRESPOND TO THE FOLLOWING DEVICE WIDTHS IN ORDER FROM MSB TO LSB: 520 NM,
440 NM,400NM,380NM, 370 NM, 360 NM.THE MINIMUM DEVICE WIDTH FOR THESE TRANSISTORS IS 360 NM
readout approach, sparseness is exploited in our design to re-
duce the average data rate to approximately 18 Gbps. To achieve
this, the time data for each pixel are appended with a valid bit
that indicates whether a pixel event has occurred. This valid bit
is used to control the ow of data out of the array such that only
data associated with pixel events are allowed to pass.
The pipelined datapath shown in Fig. 2 is used to perform
this data compression. At the end of a measurement window and
before the next laser pulse occurs, all time data and associated
valid bits are loaded into a set of registers, connected as shift
registers on a half-row basis. The 10-bit time data are shifted out
of each half-row into separate datapaths. A counter tracks the
pixel position from which the data originated, resulting in a 5-bit
position word appended to the time data, giving a combined
16-bit word (see Fig. 10). Within each datapath, the data for up
to eight events per row are shifted into a bank of eight 16-bit
registers as shown in Fig. 10. This happens within the 20 MHz
laser pulse period that follows the one in which the data were
captured.
On the rising edge of the next laser pulse trigger, the pixel
event data is shifted in parallel into another set of registers and
then shifted out as shown in Fig. 11. During this parallel shift
operation an additional address bit is added to the data word
in order to keep track of the row from which the data origi-
nated, increasing its lengthto17bits.Asimilarshiftingprocess
is repeated for each of the next four laser pulses (see Fig. 12).
During each stage transition, an additional bit is added to the
data word in order to indicate from which row it originated.
Following Stage 5, the data are directly written into a rst-in
rst-out (FIFO) buffer. At this point, the data words are 19-bits
long with eight address bits. The diagram in Fig. 12 depicts the
data for 8 half-rows of pixels. This identical datapath block is
repeated 16 times on the chip.
The number of shift registers in each stage of the datapath is
chosen such that an average pixel event rate of 1% of all laser
pulses will result in an datapath overow with a probability of
less than10 . The datapath is designed to operate at a clock
frequency of 1 GHz and a laser pulse rate of 20 MHz, which
provides 50 clock cycles within the datapath to complete all of
the required shifting operations between stages.
E. Output Stage
After the valid pixel data has reached the FIFO at the end
of each datapath, groups of four FIFOs are combined and their
FIELD et al.: A 100 fps, TIME-CORRELATED SINGLE-PHOTON-COUNTING-BASED FLUORESCENCE-LIFETIME IMAGER IN 130 nm CMOS 873
Fig. 9. Calibrated charge pump schematic.
Fig. 10. The TDC data is shifted into a set of ip-ops at the end of each measurement window. This data is then shifted into the rst stage of the datapath, which
issizedtoholdupto8pixeleventsperrow.
Fig. 11. The rst stage of the datapath captures data shifted out of the TDCs and collects all valid pixel events. The datapath controller checks the valid bit of the
incoming data and organizes the data into the rightmost ip-ops. After the data input shift is complete, the next laser pulse triggers a parallel shift operation into
the central ip-ops that are connected in a U-shape. During the next measurement window, the data in these ip-ops will be scanned into the next stage of the
datapath. The number of data bits increases by one to 17 bits, with the 17th bit representing the input row of the data.
data are transmitted over a bank of LVDS drivers, each designed
to meet the TIA/EIA 644-A LVDS standard [24]. An output
controller cycles between each FIFO in the group of four in
a round-robin manner, adding two additional address bits that
indicate the FIFO from which the data are retrieved. There are
four banks of 22 LVDS buffers, a clock and 21 bits of data.
These output drivers are capable of running at up to 500 MHz,
providing a total output bandwidth of 42 Gbps.
F. System-Level Considerations
As shown in Fig. 1, each of the four LVDS banks commu-
nicates with a dedicated FPGA. In this design, four Virtex-6
XC6VLX130T-3 devices are used to capture the raw arrival
time data and generate histograms of the arrival times for each
pixel. The FPGA RAM is partitioned such that each 18-kb-block
RAM is congured as a true dual-port memory and stores 128-
874 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 49, NO. 4, APRIL 2014
Fig. 12. Diagram showing the movement of data through the datapath. Each of these blocks is repeated sixteen times on the imager chip. An example showing
how data is compressed in each stage is shown for the top two rows. Incoming valid data (green) initially has non-event data (red) between it. Each incoming data
consists of 10 bits of timing information, 5 bits position information, and 1 bit valid ag. As data enters the datapath, the non-event data is discarded, resulting in
the two valid data packets in the top row nishing in the rightmost ip-ops. In the second stage, the two rows of data are shifted in parallel into the U-shaped shift
chain and then shifted clockwise with the non-event data being discarded once again. In this gure, all horizontal arrows represent serial data shifts while vertical
arrows indicate a parallel data shift.
Fig. 13. Die photograph showing the major functional blocks of the FLIM imaging IC.
bin, 16-bit histogram information for two adjacent pixels in the
array. Each pixel is allocated enough memory to record two his-
tograms in the RAM such that one histogram can be read while
the other is being captured, allowing for continuous recording
of FLIM data. Each FPGA can process an incoming data stream
at up to 10.5 Gbps.
Once the histograms have been formed, the data rate require-
ments for subsequent processing drop signicantly. Each frame
of the histogram dataset is 8 Mb, and the data rate requirement
for transfer from the FPGAs to a computer scales with the
desired frame rate. At 100 fps, the data rate to the computer
is 800 Mbps. In order to reliably transfer data at this rate, we
use PCIe interfaces on the Virtex-6 devices to perform direct
memory access (DMA) writes from each FPGA directly to
system memory on the computer. Each of the four FPGAs is
congured with a x1 PCIe Gen 2 interface, which connects to a
PCIe switch that combines four x1 links to a single x4 link. The
switch used in this design is the PLX Technology PEX8608.
We connect this x4 link using a cabled PCIe interface and x4
cabled PCIe adapter card. Using this PCIe interface, the system
can support frame transfer rates of up to 754 fps.
III. EXPERIMENTAL RESULTS
This design is fabricated in a standard 130 nm CMOS process,
and a die photograph is presented in Fig. 13. Additionally, a
printed circuit board (PCB) is designed with the appropriate
FIELD et al.: A 100 fps, TIME-CORRELATED SINGLE-PHOTON-COUNTING-BASED FLUORESCENCE-LIFETIME IMAGER IN 130 nm CMOS 875
Fig. 14. Plots showing afterpulsing probabilities at (a) 20 MHz reset rate and (b) 100 MHz reset rate. Both measurements show periodic spikes in the afterpulsing
probability plots at multiples of the reset frequency. For both measurements, was set at 3.5 V.
high-speed PCIe interfaces and kernel driver modules to allow
for DMA transfers between the FPGAs and system memory.
Because of the complexity of the design, characterization is per-
formed individually for each of the major system components.
A. Pixel Circuit Characterization
Located above the main SPAD array is an isolated pixel with
the same circuitry as Fig. 3 but with its output connected di-
rectly to a pad for characterization. This pixel is used to eval-
uate the maximum count rate of our SPAD and the afterpulsing
probability.
The maximum count rate for this device is evaluated using a
bright, uncorrelated white light source with the standard reset
rate of 20 MHz and a fast reset rate of 100 MHz. The test pixel
is biased with an overvoltage, 3.5V.Ataresetrateof
100 MHz, the pixel dead time, quench time, and reset time sum
to 10 ns. The maximum count rate observed is 89.2 MHz.
The afterpulsing for the pixel using the active quench and
reset circuitry is also evaluated at both 20 MHz and 100 MHz
reset rates using uncorrelated white light. Afterpulsing proba-
bility can be measured by recording signal traces of pixel output
pulses and computing the autocorrelation of the traces [25],
[26]. The autocorrelation, ,atthelagof is given by
(1)
where is the total number of samples used in the calculation
and is a discrete signal of pulse arrival times.
Fig. 14 shows the afterpulsing probabilities calculated from
3710 signal traces of 4000 ns with 800 ps precision for both
20 MHz and 100 MHz reset frequencies [26]. With either the
20 MHz or 100 MHz reset rate, afterpulsing probabilities are
below 0.002 even with this SPAD biased with a relatively high
of 3.5 V. The correlograms at both 20 MHz and 100 MHz
reset frequencies do show periodic spikes, which are due to
the synchronous SPAD reset at 50 ns and 10 ns intervals,
respectively.
In addition to autocorrelation analysis, a histogram of the
inter-spike interval (ISI) times can also be used to characterize
afterpulsing. In a detector with afterpulsing, the histogram of
the ISI times will show a bi-exponential decay, with a short
decay time constant that is a consequence of afterpulsing and a
Fig. 15. Semi-log plot showing a histogram of the inter-spike intervals mea-
sured with a SPAD of3.5Vandaresetrateof20MHz.Themono-expo-
nential decay indicates that no afterpulsing is present. Spikes in the histogram
are observed at multiples of the reset frequency.
Fig. 16. Diagrams showing the different charge pump mismatch states. (a) The
DN current is stronger than the UP current causing the VCDL to run slow and
the TDC output to lag behind the input delay. This results in a vertical jump in
the ne TDC transfer curve. (b) When the UP and DN currents are equal, the
TDC output linearly tracks the delay input. (c) In the case when the UP current
is stronger than the DN current, the VCDL runs fast and the TDC output leads
the delay input. This causes a horizontal plateau in the ne TDC transfer curve.
The charge pumps are calibrated by measuring this transfer curve and adjusting
the UP and DN currents accordingly.
long decay time constant that is related to the uncorrelated light
source [27]. As seen in Fig. 15, the ISI histogram provides fur-
ther evidence that afterpulsing is not signicant for this detector.
B. Time-to-Digital Converter Characterization
The TDCs are characterized and their charge pumps cali-
brated using the cal_stop signal in Fig. 3. To characterize the
TDC, a 400 kHz reference signal is input into the trigger port
of a Stanford Research Systems DG535 digital delay generator.
Two outputs of the DG535 are used to generate the trigger signal
and a tunable cal_stop signal with the delay between the trigger
and cal_stop signal swept to characterize TDC performance.
876 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 49, NO. 4, APRIL 2014
Fig. 17. (a) TDC output when the UP and DN charge pump calibration codes are 010000 and 010001, respectively. The dashed oval highlights the region of the
transfer characteristic that indicates that the VCDL is running slowly (the DN is stronger than the UP current). (b) The TDC output after charge pump adjustments
are made. The UP calibration code is 000101 and the DN calibration code is 101000. In this hardware, the UP devices are stronger than the DN devices.
Thecal_stopsignalisbufferedonthePCB,whichresultsin
200 ps of additional jitter in the measurement.
The charge pump in this design is manually calibrated by
setting 12 control bits (6 bits each for UP and DN currents).
Measurements of the ne TDC value while the start-stop delay
of the cal_stop signal from the pixel into the TDC is adjusted
are used to perform this calibration (see Fig. 16). Representa-
tive measurement results showing the adjustment of the ne
TDC transfer characteristics through charge-pump calibration
are shown in Fig. 17. In Fig. 17(a) the DN calibration bits have
been set to 010001 and the UP calibration bits to 010000, cre-
ating a mismatch with the total DN current exceeding the UP.
As a result, a vertical jump in the ne TDC transfer curve is
observed (as described in Fig. 16(a)). In Fig. 17(b) the DN cali-
bration bits are set to 101000 and the UP bits are set to 000101,
which leads to the DN current matching the UP current. The
TDC is also found to have a static delay offset of ve ne-delay
increments (312.5 ps) between the input to the DLL and the start
of the coarse TDC counter increments, which is calibrated out
digitally.
In order to measure the TDC linearity, the cal_stop delay is
swept over the entire 64 ns range of the TDC. At each step of
10 ps, 10 samples are collected and averaged. The results of
these measured delay sweeps are shown in Fig. 18. Fig. 18(a)
shows the overall transfer curve for the TDC. The 200 ps mea-
surement jitter is more than three times the LSB of the TDC,
making accurate determination of the TDC linearity difcult.
Subtracting the jitter, the measured DNL is better than 4LSB
(Fig. 18(d)). By using the code-density approach to determine
the DNL with stop times derived from uncorrelated dark counts
in the SPADs (eliminating the jitter from external measurement
electronics), the maximum DNL is less than 2.27 LSB.
In the transfer curve in Fig. 18(a), periodic large spikes can
also be observed. By separating the ne and coarse components
of the TDC value in Fig. 18(d) and (e), it is clear that these spikes
in the transfer curve are contributed by the coarse counter. This
artifact is due to metastability brought on by the use of an asyn-
chronous stop signal to latch counter values into ip-ops, as
shown in Fig. 6. Because of these spikes, the INL is slightly
more than 8 LSB. This error could be corrected by simply syn-
chronizing the stop signal for the counter with the counter clock
as in Fig. 19. Lifetime measurements presented in Section III.C
TAB L E I II
SUMMARY OF IC CHARACTERISTICS.THE DETAIL FOR THEAVERAGE POWER
CONSUMPTION IS PROVIDED FOR EACH OF THE ON-CHIP SUPPLIES.IS
THE MAIN 1.5 V CORE SUPPLY AND POWERS ALL OF THE CONTROL LOGIC AND
THE DATA PAT H ,IS THE 2.5 V I/O POWER SUPPLYFORTHEFOUR LVDS
BANKS,IS THE 2.5 V POWER SUPPLY FOR THE VOLTAGE REGULATORS
IN THE DLL, IS THE NEGATIVE BIAS FOR THE SPAD S AND VARIES
DEPENDING ON BIAS BUT IS TYPICALLY AROUND ,IS A 1.6 V
SUPPLY FOR THE PLL. THE POWER FOR BOTH AND VARY
WITH THE INCIDENT PHOTON FLUX AND THE VALUES OF THE TABLE ARE FOR
A‘TYPICAL’FLUX THAT RESULTS IN A HIT RAT E O F APPROXIMATELY 1%
show that this non-ideality does not signicantly impact the
imaging performance.
The impulse response function (IRF) of the pixel and TDC
combined was measured by repeatedly triggering the SPADs
usinga500nmbandfromaFianium supercontinuum laser with
pulse width of 10 ps. The outputs of the TDC were collected and
formed into histograms, which resulted in a distribution with a
peak of only two LSB, or 125 ps (Fig. 4(b)).
C. Imaging Array Performance
The imager draws a total of 8.79 W when running at full-
speed and is water-cooled to avoid degraded performance of the
SPADs due to heating [28]. To achieve this cooling, a custom
BGA package with a copper core is directly soldered to the PCB,
which also has a copper core. The dark count rate (DCR) for the
FIELD et al.: A 100 fps, TIME-CORRELATED SINGLE-PHOTON-COUNTING-BASED FLUORESCENCE-LIFETIME IMAGER IN 130 nm CMOS 877
Fig. 18. Plots showing the (a) transfer curve of the TDC, (b) DNL of the TDC and (c) INL of the TDC. The transfer curves of ne and coarse components of the
TDC value are presented in (d) ne and (e) coarse. In the DNL and INL plots, we have ltered the large spikes in the coarse TDC measurement by removing TDC
value changes greater than 1 ns between any two of the 10 ps input delay steps.
Fig. 19. TDC counter stop signal synchronizer.
system with cooling is 544 Hz, which is an improvement from
1036 Hz without cooling. Fig. 20 shows the distribution of DCR
throughout the array. The overvoltage for this measurement
is 2.5 V and is consistent with the measurements taken in [20].
From this DCR data, a clear pattern in the number of hits
recorded can be seen within groups of 8 rows of pixels, in which
the seventh and eighth rows record lower counts. We attribute
this to a voltage droop in the power distribution biasing the
SPADs. A similar pattern is observed in the lifetime images
and results in missing rows due to the pixels receiving an in-
sufcient number of hits for lifetime extraction. This voltage
droop problem does not affect the lifetime extraction for the
other pixels within the array.
In preliminary testing of the imaging performance of the
array, we use a ceramic cover to mask a portion of the SPAD
array and then place a dish of uorescein dye over the array
and image using a 488 nm excitation wavelength ltered from
a Fianium supercontinuum pulsed light source with a pulse
repetition rate of 4 MHz. A 550 nm emission lter is also
employed.Figs.22and23showthesetupandtheimaging
results, respectively. The resulting image matches the expected
uorescence lifetime for uorescein of 4–5 ns [29].
In order to test the fast acquisition capabilities of our system,
we capture a total of 16 consecutive frames with each being ac-
quired in 10 ms for a frame rate of 100 fps. The sixteen frames
from this experiment are shown sequentially in Fig. 24. At this
high frame rate, there is more error in the lifetime estimation due
878 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 49, NO. 4, APRIL 2014
Fig. 20. (a) A plot showing the dark count rate distribution throughout the imaging array. (b) A cross-sectional view showing the periodic reduction in count rate
across the array.
Fig. 21. A photograph of the FLIM PCB with FPGAs and the cabled PCIe
interface. The liquid cooling system is in the center of the PCB with the imager
mounted on the bottom side of the board.
Fig. 22. A photograph showing the arrangement of the ceramic mask, band-
pass lter, and uorescein dye used in the imaging experiments. The excitation
source is not shown but would be directed into the page over the sample.
to the limited amount of data that is collected for each frame.
By increasing the image acquisition time, more accurate and
uniform lifetime estimation is possible, as in Fig. 23 where an
imaging rate of 50 fps is used. Currently the number of con-
secutive frames that can be captured is limited by thekernel
driver we have written to handle the DMA transfer from the
FPGAs to the computer. The technique employed requires a co-
herent block of memory allocated on the computer for each of
the FPGAs that can hold 16 frames of FLIM data. The CPU
waits a predetermined amount of time that is based on the frame
rate before reading this block of memory and performing addi-
tional processing. In order to leverage the full potential of the
system, our driver must be modied to handle interrupt signals
over the PCIe interface from the FPGAs such that the CPU can
be instructed to process the data and free the memory so that
additional frames can be written.
IV. CONCLUSIONS
In this work, we have built the fastest published TCSPC-
based uorescence lifetime imaging system, which is capable
of acquiring FLIM images at 100 fps. The system consists of a
FLIM-specic integrated circuit and a custom system architec-
ture that are optimized for high-speed imaging. The integrated
circuit contains 4096 SPADs with independent TDC channels
allowing for fully parallel time-of-arrival recording. The timing
resolution of the TDCs is 62.5 ps and they can record arrival
times for up to 64 ns after a triggering laser pulse, supporting a
range of possible uorescence decay rates. A data-compression
datapath provides a mechanism for efciently transmitting data
off-chip in an event-driven manner. The imager chip can sup-
port an output data bandwidth of up to 42 Gbps allowing for a
maximum FLIM imaging rate of over 400 fps. The total power
consumption of the IC is 8.79 W, or 2.15 mW/pixel, including
the TDCs and all data handling circuitry.
We designed an FPGA-based system to capture the raw ar-
rival time data from the imaging IC and generate a histogram for
each pixel in the image. This histogram data can be transferred
to a PC at over 750 fps. The entire system (including imager)
consumes 26.4 W. A summary of our FLIM imaging system is
presented in Table III. The unmatched performance of our de-
sign is a direct result of the circuit optimizations made to ef-
ciently handle the FLIM data. Further improvements on this
system could improve the lifetime variability between pixels
and allow for a greater number of consecutive frames to be ac-
quired at full speed. The image acquisition time achieved in this
work has the potential to enable a wide range of FLIM applica-
tions involving dynamic samples.
FIELD et al.: A 100 fps, TIME-CORRELATED SINGLE-PHOTON-COUNTING-BASED FLUORESCENCE-LIFETIME IMAGER IN 130 nm CMOS 879
Fig. 23. (a) Intensity image from uorescein dye measurement. (b) Lifetime image showing the well resolved mask. (c) A representative lifetime decay from
pixel (49,10) in the image.
Fig. 24. Sixteen consecutive frames capture using our lifetime imaging system
with an acquisition time of only 10 ms per frame. Color bar indicates the ex-
tracted lifetime value.
REFERENCES
[1] A. Pietraszewska-Bogiel and T. W. J. Gadella, “FRET microscopy:
From principle to routine technology in cell biology,” J. Microscopy,
vol. 241, pp. 111–8, Feb. 2011.
[2] J. C. Waters, “Accuracy and precision in quantitative uorescence mi-
croscopy,” J. Cell Biol., vol. 185, pp. 1135–48, Jun. 2009.
[3] R. Sanders, A. Draaijer, H. Gerritsen, P. Houpt, and Y. Levine,
“Quantitative pH imaging in cells using confocal uorescence lifetime
imaging microscopy,” Analyt. Biochem., vol. 227, no. 2, pp. 302–308,
1995.
[4] K. Suhling, P. M. W. French, and D. Phillips, “Time-resolved uores-
cence microscopy,” Photochem. Photobiolog. Sci., vol. 4, pp. 13–22,
Jan. 2005.
[5] D. Elson, J. Requejo-Isidro, I. Munro, F. Reavell, J. Siegel, K. Suhling,
P. Tadrous, R. Benninger, P. Lanigan, J. McGinty, C. Talbot, B. Tre-
anor, S. Webb, A. Sandison, A. Wallace, D. Davis, J. Lever, M. Neil,
D. Phillips, G. Stamp, and P. French, “Time-domain uorescence life-
time imaging applied to biological tissue,” Photochem. Photobiolog.
Sci., vol. 3, pp. 795–801, Aug. 2004.
[6] M. S. Kim, B.-K. C ho, A. M. Lefcourt, Y.-R. Chen, and S. Kang, “Mul-
tispectral uorescence lifetime imaging of feces-contaminated apples
by time-resolved laser-induced uorescence imaging system with tun-
able excitation wavelengths,” Appl. Opt., vol. 47, pp. 1608–1616, Mar.
2008.
[7] J. McGinty, N. P. Galletly, C. Dunsby, I. Munro, D. S. Elson, J. Re-
quejo-Isidro, P. Cohen, R. Ahmad, A. Forsyth, A. V. Thillainayagam,
M. a. a. Neil, P. M. W. French, and G. W. Stamp, “Wide-eld uo-
rescence lifetime imaging of cancer,” Biomed. Opt. Expr., vol. 1, pp.
627–640, Jan. 2010.
[8] V. V. Ghukasyan and F.-J. Kao, “Monitoring cellular metabolism with
uorescence lifetime of reduced nicotinamide adenine dinucleotide,”
J. Phys. Chemist. C, vol. 113, pp. 11532–11540, Jul. 2009.
[9] J. W. Borst and A. J. W. G. Visser, “Fluorescence lifetime imaging mi-
croscopy in life sciences,” Measure. Sci. Technol., vol. 21, p. 102002,
Oct. 2010.
[10] X. F. Wang, A. Periasamy, B. Herman, and D. Coleman, “Fluores-
cence lifetime imaging microscopy (FLIM): Instrumentation and ap-
plications,” Critical Rev. Analyt. Chemist., vol. 23, no. 5, pp. 369–395,
1992.
[11] C. Chang, D. Sud, and M. Mycek, “Fluorescence lifetime imaging mi-
croscopy,” Methods Cell Biol., vol. 81, no. 06, pp. 495–524, 2007.
[12] C. Harris and B. Selinger, “Single-photon decay spectroscopy. II, The
Pile-up problem,” Australian J. Chemist., vol. 32, pp. 2111–2129,
1979.
[13] D. D.-U. Li, J. Arlt, D. Tyndall, R. Walker, J. Richardson, D. Stoppa,
E. Charbon, and R. K. Henderson, “Video-rate uorescence lifetime
imaging camera with CMOS single-photon avalanche diode arrays
and high-speed imaging algorithm,” J. Biomed. Opt., vol. 16, no. 9, p.
096012, 2011.
[14] D. E. Schwartz, E. Charbon, and K. L. Shepard, “A single-photon
avalanche diode array for uorescence lifetime imaging microscopy,”
IEEE J. Solid-State Circuits, vol. 43, pp. 2546–2557, Nov. 2008.
[15] L. Pancheri and D. Stoppa, “A SPAD-based pixel linear array for high-
speed time-gated uorescence lifetime imaging,” in Proc. ESSCIRC,
Sep. 2009, pp. 428–431.
[16] E. Charbon, “Highly sensitive arrays of nano-sized single-photon
avalanche diodes for industrial and bio imaging,” in Proc. Nano-Net,
2009, pp. 161–168.
880 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 49, NO. 4, APRIL 2014
[17] M. Gersbach, Y. Maruyama, E. Labonne, J. Richardson, R. Walker,
L. Grant, R. Henderson, F. Borghetti, D. Stoppa, and E. Charbon,
“A parallel 32x32 time-to-digital converter array fabricated in a 130
nm imaging CMOS Technology,” Proc. ESSCIRC, pp. 196–199, Sep.
2009.
[18] C. Veerappan, J. Richardson, R. Walker, D.-U. Li, M. W. Fishburn, Y.
Maruyama, D. Stoppa, F. Borghetti, M. Gersbach, R. K. Henderson,
andE.Charbon,“A160 128 single-photon image sensor with
on-pixel 55 ps 10b time-to-digital converter,” in IEEE Int. Solid-State
Circuits Conf. Dig., ISSCC, 2011, pp. 312–314.
[19] C. Niclass, M. Sergio, and E. Charbon, W. Becker, Ed., “A single
photon avalanche diode array fabricated in 0.35- mCMOSand
based on an event-driven readout for TCSPC experiments,” in Proc.
SPIE: Advanced Photon Counting Techniq., Oct. 2006, vol. 6372, pp.
63720S–63720S-12.
[20] R. M. Field, J. Lary, J. Cohn, L. Paninski, and K. L. Shepard, “A
low-noise, single-photon avalanche diode in standard 0.13 m comple-
mentary metal-oxide-semiconductor process,” Appl. Phys. Lett., vol.
97, no. 21, p. 211111, 2010.
[21] N. S. Nightingale, “A new silicon avalanche photodiode photon
counting detector module for astronomy,” Experimental Astronomy,
vol. 1, no. 6, pp. 407–422, 1990.
[22] F. Zappa, A. Lotito, A. Giudice, S. Cova, and M. Ghioni, “Monolithic
active-quenching and active-reset circuit for single-photon avalanche
detectors,” IEEE J. Solid-State Circuits, vol. 38, pp. 1298–1301, Jul.
2003.
[23] E. Vilella and A. Diéguez, “A gated single-photon avalanche diode
array fabricated in a conventional CMOS process for triggered sys-
tems,” Sensors Actuators A: Phys., vol. 186, pp. 163–168, Oct. 2012.
[24] “Electrical Characteristics of Low Voltage Differential Signaling
(LVDS) Interface Circuits,” TIA/EIA-644-A, 2001.
[25] J. C. Jackson, D. Phelan, A. P. Morrison, R. M. Redfern, and A. Math-
ewson, “Characterization of geiger mode avalanche photodiodes for
uorescence decay measurements,” in Photodetector Materials and
Devices VII, May 2002, vol. 4650, pp. 55–66.
[26] R. G. Brown, R. Jones, J. G. Rarity, and K. D. Ridley, “Characterization
of silicon avalanche photodiodes for photon correlation measurements.
2: Active quenching,” Appl. Opt., vol. 26, pp. 2383–9, Jun. 1987.
[27] M. W. Fishburn, Fundamentals of CMOS Single-Photon Avalanche
Diodes. Delft, The Netherlands: TU Delft, 2012.
[28] S. Cova, M. Ghioni, A. Lacaita, C. Samori, and F. Zappa, “Avalanche
photodiodes and quenching circuits for single-photon detection,” Appl.
Opt., 1996.
[29] D.Magde,G.E.Rojas,andP.G.Seybold, “Solvent dependence of
the uorescence lifetimes of xanthene dyes,” Photochemist. Photobiol.,
vol. 70, no. 5, p. 737, 1999.
[30] M. Gersbach, Y. Maruyama, R. Trimananda, M. W. Fishburn, D.
Stoppa, J. A. Richardson, R. Walker, R. Henderson, and E. Charbon,
“A time-resolved, low-noise single-photon image sensor fabricated
in deep-submicron CMOS technology,” IEEE J. Solid-State Circuits,
vol. 47, pp. 1394–1407, Jun. 2012.
[31] J. Richardson, R. Walker, L. Grant, D. Stoppa, F. Borghetti, E.
Charbon, M. Gersbach, and R. K. Henderson, “A 32x32 50 ps resolu-
tion 10 bit time to digital converter array in 130 nm CMOS for time
correlated imaging,” in Proc. IEEE Custom Integrated Circuits Conf.,
CICC, 2009, pp. 77–80, 029217.
[32] C. Niclass, C. Favi, T. Kluter, M. Gersbach, and E. Charbon, “A
128x128 single-photon image sensor with column-level 10-bit
time-to-digital converter array,” IEEE J. Solid-State Circuits, vol. 43,
pp. 2977–2989, Dec. 2008.
Ryan M. Field (M’13) received the B.S. degree in
electrical engineering and theB.S.degreeinphysics
from North Carolina State University, Raleigh,
NC,in2007andtheM.S.andPh.D.degreesfrom
Columbia University, New York, NY in 2008 and
2013, respectively. His Ph.D. work focused on inte-
grated circuit and system-level design for improved
biomedical imaging techniques.
After graduating in 2013, he joined Intel Corpora-
tion, Santa Clara, CA, USA, as a research scientist
in the Integrated Biosystems Lab and is interested in
electronic detection of biological signals. He was the recipient of the Astronaut
Scholarship in 2006 and the National Science Foundation Graduate Research
Fellowship, the National Defense Science and Engineering Graduate Fellow-
ship, and the Columbia University Fu Foundation School of Engineering Pres-
idential Distinguished Fellowship in 2007. He was also an associate in the Co-
lumbia University BioIGERT Program.
Simeon Realov (M’12) received the B.S. degree in
engineering from Swarthmore College, Swarthmore,
PA,in2006andtheM.S.andPh.D.degreesinelec-
trical engineering from Columbia University, New
York, NY, in 2007 and 2012, respectively. During his
time at Columbia University, he worked on various
methods for detailed on-chip device characterization
and statistical modeling of device variability in ad-
vanced CMOS processes.
He held internship positions at IBM T. J. Watson
Research Center, Yorktown Heights, NY, where he
developed techniques for on-chip capacitance measurement, combined C-V/I-V
transistor characterization, as well as a fully-integrated SRAM uctuations
monitoring circuit. He also held a summer internship position with Rambus,
Inc., Sunnyvale, CA, where he worked on the design of a high-bandwidth
on-chip supply noise monitoring system. After his graduation from Columbia
University in 2012, Dr. Realov joined Intel Corporation, Hillsboro, OR, where
he is currently a member of the Advanced Design Library group working on
digital standard cell design.
Kenneth L. Shepard (F’08) received the B.S.E. de-
gree from Princeton University, Princeton, NJ, USA,
in 1987 and the M.S. and Ph.D. degrees in electrical
engineering from Stanford University, Stanford, CA,
USA, in 1988 and 1992, respectively.
From 1992 to 1997, he was a Research Staff
Member and Manager with the VLSI Design Depart-
ment, IBM T. J. Watson Research Center, Yorktown
Heights, NY, USA, where he was responsible for
the design methodology for IBM’s G4 S/390 micro-
processors. Since 1997, he has been with Columbia
University, New York, NY, USA, where he is now Professor of Electrical En-
gineering and Biomedical Engineering. He also was Chief Technology Ofcer
of CadMOS Design Technology, San Jose, CA, USA, until its acquisition by
Cadence Design Systems in 2001. His current research interests include power
electronics, carbon-based devices and circuits, and CMOS bioelectronics.
Dr. Shepard was Technical Program Chair and General Chair for the 2002
and 2003 International Conference on Computer Design, respectively. He
has served on the Program Committees for International Electron Devices
Meeting (IEDM), International Solid-State Circuits Conference (ISSCC), VLSI
Symposium, International Conference on Computer-Aided Design (ICCAD),
Design Automation Conference (DAC), International Symposium on Circuits
and Systems (ISCAS), International Symposium on Quality Electronic Design
(ISQED), Great Lakes Symposium on VLSI (GLS-VLSI), and International
Conference on Computer Design (ICCD). He received the Fannie and John
Hertz Foundation Doctoral Thesis Prize in 1992, a National Science Founda-
tion CAREER Award in 1998, and the 1999 Distinguished Faculty Teaching
Award from the Columbia Engineering School Alumni Association. He has
been an Associate Editor of the IEEE TRANSACTIONS ON VERY LARGE-SCALE
INTEGRATION (VLSI) SYSTEMS and is currently an Associate Editor for the
IEEE JOURNAL OF SOLID-STATE CIRCUITS and the IEEE TRANSACTIONS ON
BIOMEDICAL CIRCUITS AND SYSTEMS.
... L'échantillon, présentant alors un surplus d'énergie, émet des photons afin de retourner dans un état d'équilibre stable. La mesure du délai séparant le pulse laser des photons émis permet alors de remonter à la durée de la fluorescence [100] [101]. Un exemple de configuration expérimentale pour les mesures TCSPC est présenté sur la Figure 1.17a. ...
... Cette mesure est utilisée dans différents domaines comme par exemple la microscopie par imagerie de fluorescence résolue dans le temps (Fluorescence Lifetime Imaging Microscopy, FLIM) [100]- [109]. Il s'agit d'une méthode d'imagerie utilisée fréquemment dans le monde des sciences biologiques afin d'identifier certaines structures marquées par des fluorophores discriminables par leur temps de déclin [100] [105] [109]. Par le principe de la mesure TCSPC, des images basées sur la cartographie de ces temps de déclin comme illustré sur Figure 1.17b peuvent être obtenues. ...
Thesis
L'objectif de cette thèse concerne la simulation, la conception et la caractérisation de nouvelles structures de diodes à avalanche à photon unique (Single Photon Avalanche Diode - SPAD) implémentées dans la technologie CMOS FD-SOI (Fully Depleted Silicon On Insulator) 28nm de STMicroelectronics. Les photodétecteurs SPAD présentent une grande sensibilité de détection (associée à un temps de réponse très court) qui fait d’eux d’excellents candidats pour la mesure du temps de vol (Time Of Flight – ToF) dans des applications de télémétrie, de reconnaissance faciale et de LIDAR (Light Detection And Ranging) pour les voitures autonomes. L’intégration de la SPAD en CMOS FD-SOI permet de créer un pixel intrinsèquement 3D, i) en incorporant la SPAD au niveau de la jonction PW (P-Well) / DNW (Deep N-Well) dans le silicium bulk sous l’oxyde enterré (BOX) et ii) en utilisant le film silicium situé au-dessus du BOX pour intégrer l'électronique associée au détecteur (circuits d'étouffement et d'adressage), tout en optimisant le facteur de remplissage avec une approche BSI (back side illumination). Les SPAD réalisées dans la technologie native (avec respect des règles de dessin) ont mis en évidence plusieurs points faibles : un DCR (Dark Count Rate) élevé pour des tensions d'excès faibles (500Hz/µm2 à Vex = 0.5V pour une tension de claquage de 9.5V) ainsi qu'un claquage prédominant sur la périphérie de la zone active. Dans ce contexte, les travaux présentés dans cette thèse ont porté sur l'optimisation des performances électriques de la SPAD FD-SOI par des modifications de la structure respectant ou non le procédé de fabrication : adaptation des conditions d’implantation du caisson profond DNW, remaniement des tranchées STI (Shallow Trench Isolation) etc. Les structures SPAD-FD-SOI ainsi optimisées ont démontré expérimentalement un bien meilleur niveau de DCR (17Hz/µm2 à Vex = 1V pour une tension de claquage de 15.8V). Des caractérisations électro-optiques préliminaires ont été réalisées avec une probabilité de détection des photons de l’ordre de 7% à Vex = 1V et une longueur d’onde de 650nm. Même si ces travaux n’ont pas permis d’atteindre les performances des SPAD les plus performantes de l’état de l’art, ils ont exploré de nombreuses voies d’optimisation, certaines conduisant à une amélioration significative des performances des SPAD réalisées dans cette technologie. La poursuite de ces travaux (association de ces structures SPAD FD-SOI optimisées avec une électronique intégrée performante, amincissement des dispositifs pour opérer avec un éclairage par la face arrière etc.) devrait permettre de réaliser des pixels SPAD intrinsèquement 3D (sans recours à du collage de wafers) très performants dans le proche infrarouge pour les applications d’imagerie 3D embarquées.
... Thus, a tradeoff occurs between the array size and the tolerance of imaging distortion caused by multiple hits. For semiconductor single-photon detector arrays, the pixel-level readout was used to overcome this problem [18][19][20][21]. However, the power dissipation of the pixel-level readout is proportional to the size of the array, which would be a critical challenge for superconducting imagers in a cryogenic environment. ...
Article
Full-text available
Scaling up superconducting nanowire single-photon detectors (SNSPDs) into a large array for imaging applications is the current pursuit. Although various readout architectures have been proposed, they cannot resolve multiple-photon detections (MPDs) currently, which limits the operation of the SNSPD arrays at high photon flux. In this study, we focused on the readout ambiguity of a superconducting nanowire single-photon imager applying time-of-flight multiplexing readout. The results showed that image distortion depended on both the incident photon flux and the imaging object. By extracting multiple-photon detections on idle pixels, which were virtual because of the incorrect mapping from the ambiguous readout, a correction method was proposed. An improvement factor of 1.3~9.3 at a photon flux of µ = 5 photon/pulse was obtained, which indicated that joint development of the pixel design and restoration algorithm could compensate for the readout ambiguity and increase the dynamic range.
... Time-correlated single-photon counting (TCSPC) [1]- [5] has become the key functionality in a variety of emerging quantum technology, including quantum imaging/sensing [6]- [8], quantum-state preparations [9]- [12], quantum cryptography [13]- [15], positron emission tomography (PET) [16], [17], time-resolved spectroscopy [18], fluorescence-lifetime imaging (FLIM) [19], [20], molecular imaging, live-cell/tissue microscopy [2], free-space time-of-flight (TOF) measurements [21], and light detection-and-ranging (LiDAR) [6], [22]. One example of quantum-state detections shown in Fig. 1 exploits TCSPC to measure the quantum state of a light beam, which is a "vector" but very different from the vector defined in classical physics. ...
Article
Full-text available
An almost all-digital time-to-digital converter (TDC) possessing sub-picosecond resolutions, scalable dynamic ranges, high linearity, high noise-immunity, and moderate conversion-rates can be achieved by a random sampling-and-averaging (RSA) approach with the self-antithetic variance reduction (SAVR) technique for time-correlated single-photon counting (TCSPC) quantum measurements. This paper presents detailed theoretical analysis and behavior-model verifications of the SAVR technique to effectively enhance the conversion-rate of an asynchronous RSA-based TDC by more than 62× with 7% power overhead. In addition, the proposed performance estimation methodology for SAVR can greatly improve the computation efficiency during the system-level design and reduce the read-out circuit complexity in the silicon-photonics RSA-based TCSPC realization.
... 51 While it is possible to integrate a TDC with each pixel, this reduces the fill factor of the detector significantly, and it is important to optimize this trade-off. 105 Implementing the entire timing circuit on an FPGA enables greater flexibility, such as in case of the LinoSPAD detector for spectroscopy, where 64 TDCs are shared by all the pixels in a linear array. 106 Such an architecture is ideal for applications where only a few detectors register photons at any instant, such as spectroscopy of few-photon emitters. ...
... For proteomic assays, for example, the detection range can be over several orders of magnitude [11]. Additionally, SPAD detectors often need special cooling techniques to reduce noise in terms of dark count rate to guarantee highly sensitive single-photon detection [12] as well as expensive manufacturing processes that increase the overall costs of the system. For the detection of low fluorophore concentrations, state-of-the-art low-cost TRF readers still require optical filters and have a limited detection range [13]. ...
Article
Full-text available
We present a CMOS image sensor (CIS) based time-resolved fluorescence (TRF) measurement system for filter-less, highly sensitive readout of lateral-flow assay (LFA) test strips. The CIS contains a 256 × 128 lock-in pixel (LIP) sensor array. Each pixel has a size of $10 \ {\mu }\mathrm{m}$ × $10 \ {\mu }\mathrm{m}$ and includes a photodiode acting as signal transducer. The LIP CIS was designed in a standard 0.18 ${\mu }$ $\mathrm{m}$ CMOS technology specifically for TRF applications. The LIP architecture blocks interfering light when fluorophores are excited and accumulates the emitted fluorescence light to be measured over multiple cycles after excitation. This allows to detect even small amounts of fluorescence light over a wide analyte concentration range. The LIP CIS based TRF reader was characterized in terms of reproducible and uniform signal intensities with use of appropriate Europium(III) [Eu 3+] chelate particles as fluorescence standards. We measured different concentrations of Eu-based nanoparticles (NP) on test strips with the TRF reader. The sensor system shows 5.1 orders of magnitude of detection dynamic range (DDR) with a limit of detection (LoD) of $0.1 \ \text{ng/cm}$ . In addition, using human C-reactive protein (hCRP) as a model analyte, we compared the developed TRF reader with a commercial colorimetric LFA reader. For the quantification of CRP, the LIP CIS based TRF reader demonstrates a DDR of 3.6 orders of magnitude with an excellent LoD of $0.05 \ \text{ng/mL}$ , which is 14 times better than the LoD of the commercial LFA reader.
Article
Single-photon avalanche diodes (SPADs)-based depth imagers are vital components of direct time-of-flight (d-ToF) systems, known for their precision and high throughput. With growing demands for improved sensor temporal resolution and extended range, the chip’s output bandwidth becomes a bottleneck for the effective event rate of incoming photons due to the substantial increase in data volume. While conventional data compression techniques can enhance the event rate, they often introduce increased latency from photon-in to data-out. This paper introduces an on-chip data processor designed to prioritize low latency for continuous photon detection. It is characterized by a small photon cluster size and efficient computational and memory utilization. The processor utilizes a two-stage approach that combines delta encoding and entropy compression techniques. Our simulation results demonstrate a data compression ratio of up to 2.33, enabling efficient handling of up to 125 million 12-bit photon events per second within a one-gigabit output bandwidth. We validate the hardware resources requirement using a test chip featuring a 1 × 64 sensor array fabricated in 180 nm technology. This solution is well-prepared to meet the demands of high-speed Light Detection and Ranging (LiDAR) systems.
Article
This article presents a resolution-tunable time-to-digital converter (TDC) with a three-level structure, in which the low-level TDC employs an improved all digital delay-locked loop (ADDLL) based on a cyclic pulse ring oscillator (CPRO) and a digital controller for detection lock state (DLS). Specifically, a bidirectional bypass transmission delay unit (BBTDU) in CPRO provides adjustable resolution with 15 ps coarse delay step and 3.5 ps fine delay step. Secondly, a data processing approach is presented to open the window to extract a 4-bit binary data of the bidirectional serial shift register line (BSSRL) in ADDLL for DLS and update the values of BSSRL at the same position of windowing, which can reduce the number of manipulated registers in BSSRL by approximately 93.7%. Then, a methodology of detecting the locking pattern between lock and unlock and selecting the fixed optimal windowing sequence to update the controlling values of BSSRL is proposed to eliminate dithering in locked state, which can reduce the clock phase jitter when locked. Finally, the proposed TDC has been integrated into a system on chip (SoC) and fabricated in 65-nm CMOS technology. The measurement results implemented with a 30 MHz reference clock demonstrate that a resolution-tunable TDC with low power has been obtained. There are 6 configurable resolutions in all. When configured to a minimum resolution of 7.78 ps, the power consumption and the precision are 4.812 mW and 3.02 ps, respectively. While configured for a maximum resolution of 29.7 ps, the power consumption and the precision are 2.874 mW and 13.2 ps, respectively. The proposed TDC is extremely flexible and well suited for integration into other large-scale ASIC chips to extend the application range, especially for low frequency applications.
Article
An almost all-digital time-to-digital converter possessing sub-picosecond resolution, scalable dynamic range, calibratable linearity, high noise-immunity, and fast conversion-rates can be achieved by a stochastic random sampling-and-averaging approach with the proposed collaborative variance reduction (VR) technique for a wide range of time-correlated single-photon counting applications. This paper presents detailed theoretical analysis and behavior-model verifications of both self-antithetic and control-variate VR techniques to enhance the conversion-rate of an asynchronous RSA-based TDC up to 1.5 MHz with 12-ENOB accuracy, 0.36-pJ/step energy efficiency, and 23% power overhead. Also, the conversions of the mathematical closed-form expressions into digital signal-processing implementations are derived and demonstrated for the forthcoming silicon-photonics integrated-circuit realization.
Conference Paper
Wide-field fluorescence lifetime imaging (FLIM) is a promising technique for biomedical and clinic applications. Integrating with CMOS single-photon avalanche diode (SPAD) sensor arrays can lead to cheaper and portable real-time FLIM systems. However, the FLIM data obtained by such sensor systems often have sophisticated noise features. There is still a lack of fast tools to recover lifetime parameters from highly noise-corrupted fluorescence signals efficiently. This paper proposes a smart wide-field FLIM system containing a 192×128 COMS SPAD sensor and a field-programmable gate array (FPGA) embedded deep learning (DL) FLIM processor. The processor adopts a hardware-friendly and light-weighted neural network for fluorescence lifetime analysis, showing the advantages of high accuracy against noise, fast speed, and low power consumption. Experimental results demonstrate the proposed system's superior and robust performances, promising for many FLIM applications such as FLIM-guided clinical surgeries, cancer diagnosis, and biomedical imaging.
Article
A random sampling-and-averaging (RSA) technique based on stochastic Monte Carlo methods is described in this paper for enhancing the accuracy of single-photon arrival-time measurements down to sub-picosecond ranges in emerging quantum applications. The theoretical variances of both synchronous and asynchronous RSA techniques are presented in the mathematical formats and experimentally verified by the Monte Carlo simulations. Meanwhile, the methodology of converting the mathematical models into an almost all-digital low-power integrated-circuit is elaborated by a circuit-level example with the instruction of setting circuit parameters. Along with the superior measurement resolution, scalable dynamic ranges, high linearity, high noise immunity, and low power/area consumption, the primary limitation of the RSA techniques has also been addressed for the forthcoming conversion-rate enhancement techniques.
Article
Full-text available
Fluorescence lifetime imaging microscopy (FLIM) and fluorescence anisotropy imaging microscopy (FAIM) are versatile tools for the investigation of the molecular environment of fluorophores in living cells. Owing to nanometre-scale interactions via Förster resonance energy transfer (FRET), FLIM and FAIM are powerful microscopy methods for the detection of conformational changes and protein-protein interactions reflecting the biochemical status of live cells. This review provides an overview of recent advances in photonics techniques, quantitative data analysis methods and applications in the life sciences.
Article
Full-text available
We present the design and characterization of a single-photon avalanche diode (SPAD) fabricated with a standard 0.13 mum complementary metal-oxide-semiconductor process. We have developed a figure of merit for SPADs when these detectors are employed in high frame-rate fluorescent lifetime imaging microscopy, which allows us to specify an optimal bias point for the diode and compare our diode with other published devices. At its optimum bias point at room temperature, our SPAD achieves a photon detection probability of 29% while exhibiting a dark count rate of only 231 Hz and an impulse response of 198 ps.
Article
Full-text available
Fluorescence lifetimes of Ave representative xanthene dye species-the rhodamine B zwitterion (RB=), the rhoda-mine B cation (RB+), the rhodamine 6G cation (R6G+), the rhodamine 101 zwitterion (R101) and the fluorescein dianion (F2-)-were measured in H2O, D2O and in a series of alcohol solvents ranging from methanol to octanol. The lifetimes of both RB= and RB+ increased markedly as the solvent was varied from water to octanol. In contrast, the lifetimes of R6G+ and R101± decreased slightly over the alcohol series and that of F2- increased only slightly in the same series. For all the dyes studied the fluorescence lifetimes observed in D2O were slightly longer than those in H2O. Possible causes for the variations observed are discussed.
Article
When Rutherford and Geiger tested the independence of simultaneous α particle emissions their results showed only general agreement with expectation. The main failure in retrospect appears to be the neglect of dead time, the processing time of the ocular system for counting flashes, which we find from their results to be about 0.5 s. This paper deals with pile-up in photon counting and related fields. We deal with multichannel scaling and the measurement of time-dependent fluorescence processes for sources of various characteristics. Both mathematical and electronic methods of dealing with the problem are discussed.
Article
A bidimensional array based on single-photon avalanche diodes for triggered imaging systems is presented. The diodes are operated in the gated mode of acquisition to reduce the probability to detect noise counts interfering with photon arrival events. In addition, low reverse bias overvoltages are used to lessen the dark count rate. Experimental results demonstrate that the prototype fabricated with a standard HV-CMOS process gets rid of afterpulses and offers a reduced dark count probability by applying the proposed modes of operation. The detector exhibits a dynamic range of 15 bits with short gated ‘on’ periods of 10 ns and a reverse bias overvoltage of 1.0 V.
Article
We report on the design and characterization of a novel time-resolved image sensor fabricated in a 130 nm CMOS process. Each pixel within the 32$\times$32 pixel array contains a low-noise single-photon detector and a high-precision time-to-digital converter (TDC). The 10-bit TDC exhibits a timing resolution of 119 ps with a timing uniformity across the entire array of less than 2 LSBs. The differential non-linearity (DNL) and integral non-linearity (INL) were measured at ±0.4 and ±1.2 LSBs, respectively. The pixel array was fabricated with a pitch of 50 µm in both directions and with a total TDC area of less than 2000 µm². The target application for this sensor is time-resolved imaging, in particular fluorescence lifetime imaging microscopy and 3D imaging. The characterization shows the suitability of the proposed sensor technology for these applications.
Article
Geiger mode avalanche photodiodes (APD) can be biased above the breakdown voltage to allow detection of single photons. Because of the increase in quantum efficiency, magnetic field immunity, robustness, longer operating lifetime and reduction in costs, solid-state detectors capable of operating at non-cryogenic temperatures and providing single photon detection capabilities provide attractive alternatives to the photomultiplier tube (PMT). Shallow junction Geiger mode APD detectors provide the ability to manufacture photon detectors and detector arrays with CMOS compatible processing steps and allows the use of novel Silicon-on-Insulator(SoI) technology to provide future integrated sensing solutions. Previous work on Geiger mode APD detectors has focused on increasing the active area of the detector to make it more PMT like, easing the integration of discrete reaction, detection and signal processing into laboratory experimental systems. This discrete model for single photon detection works well for laboratory sized test and measurement equipment, however the move towards microfluidics and systems on a chip requires integrated sensing solutions. As we move towards providing integrated functionality of increasingly nanoscopic sized emissions, small area detectors and detector arrays that can be easily integrated into marketable systems, with sensitive small area single photon counting detectors will be needed. This paper will demonstrate the 2-dimensional and 3-dimensional simulation of optical coupling that occurs in Geiger mode APDs. Fabricated Geiger mode APD detectors optimized for fluorescence decay measurements were characterized and preliminary results show excellent results for their integration into fluorescence decay measurement systems.
Article
The new and novel techniques of fluorescence lifetime imaging (FLI)** and fluorescence lifetime imaging microscopy (FLIM) provide the investigator with the capacity to quantitate two-dimensional fluorescence intensity distributions and lifetimes. The concept, theory, and instrumentation of FLI and FLIM are reviewed in this paper. The implementation of FLIM instrumentation with conventional and confocal microscopic systems is discussed. These instruments permit the quantitative measurement of molecular interactions and chemical environment from samples in biological, physical, and environmental sciences. Numerous applications in the biomedical sciences for FLIM instrumentation are also discussed.** We refer to the measurement of fluorescence lifetime images for macrosamples (e.g., cuvette) without use of a microscope as fluorescence lifetime imaging (FLI). whereas measurements obtained with a microscope are termed fluorescence lifetime imaging microscopy (FLIM).
Article
Formulation of oxidative phosphorylation and its first observation by means of fluorescence spectroscopy in the 1960s led to the acceptance of bioenergetics as a new field of studies. The new discipline grew fast with the increasing number of papers, related to the energy generation in mitochondria, advancement of the instrumentation, and improvement of observation techniques. As such, fluorescence lifetime imaging microscopy (FLIM) has gained popularity as a sensitive technique to monitor the functional/conformational states of nicotinamide adenine dinucleotide reduced (NADH)—one of the main compounds of oxidative phosphorylation. We hereby review the development and current application of cellular metabolism observation via NADH FLIM, illustrating it with the examples of both physiological (cell density, apoptosis, necrosis) and pathological states (inhibition of the electron transfer chain).