# A Multichannel and Compact Time to Digital Converter for Time of Flight Positron Emission Tomography

Nahema Marino, Federico Baronti, *Member, IEEE*, Luca Fanucci, *Senior Member, IEEE*, Sergio Saponara, *Senior Member, IEEE*, Roberto Roncella, *Member, IEEE*, Maria G. Bisogni, and Alberto Del Guerra, *Fellow, IEEE* 

*Abstract*—This paper presents a novel multichannel time to digital converter (TDC) specifically designed for the digitization of photon time of flight (TOF) and energy in positron emission tomography (PET) scanners. A coarse-fine architecture based on a counter combined with a delay locked loop (DLL) is implemented using a fully synchronous approach exploiting the pipeline principle and dynamic logic. This makes the design particularly compact and suitable for multichannel applications. The converter is also able to reject the events generated by the dark noise of the photodetectors used in the PET modules. This significantly reduces the communication bandwidth required for reading the TDC outputs. The TDC has been designed in a 65 nm CMOS process and features 8 channels that provide the arrival time information of an event with an LSB of 102 ps. The core occupies an active area of 0.3 mm<sup>2</sup> and consumes 230 mW.

*Index Terms*—Time to digital converter, time of flight positron emission tomography, delay locked loop, pipeline processing.

#### I. INTRODUCTION

In recent years, technology progress, increased level of integration, and higher working speed of integrated circuits (ICs) have led to considerable improvements in time measurement. Time to digital converters (TDCs) are used in several application fields, such as laser/radio ranging [1], time domain reflectometry [2], frequency synthesis [3], [4], on chip jitter measurement [5], automatic test equipment [6], and high energy physics [7], [8]. Biomedical imaging systems, especially time of flight (TOF) positron emission tomography (PET) scanners, are another important application of TDCs [9]–[12].

PET is a noninvasive imaging technique used to inspect the physiological processes that take place inside the body [13]. It is based on the detection of pairs of  $\gamma$  rays that result from the annihilation of an atomic electron with a positron. The latter is released after a  $\beta^+$  decay of a radiotracer injected into the tissues under investigation. The PET scanner uses a ring of detectors around the inspected part of the body. An image is obtained from the projection data associated with pairs of  $\gamma$  rays that

satisfy the coincidence criterion. The additional arrival time information of the two photons is measured by TDCs in TOF PET scanners. The latter can provide a significantly better image quality in terms of noise level, contrast, and clarity of detail than conventional PET scanners [14]. To achieve this valuable goal, the time resolution of the TDC must not impair the intrinsic resolution of the detectors. They determine the best accuracy achievable in TOF measurements, which is typically in the order of a few hundreds of picoseconds [15]. As such, the time accuracy required to the electronics is less stringent compared to that of the state-of-the-art TDCs [16], [17]. On the other hand, multichannel capability and compactness are critical figures of merit in PET applications, where pixelated photodetectors are often employed in order to improve the spatial resolution [18]. Given that a typical PET scanner may comprise thousands of pixelated detectors (where each pixel is read out independently), a very large number of channels must be implemented in the readout electronics.

PET is often combined with another imaging modality providing anatomical details, either computer tomography (CT) [19] or magnetic resonance imaging (MRI) [20]. Indeed, a hybrid modality offers a better diagnosis and shortens the overall scanning time. Combined PET/MRI is quite a new technology that shows better performance in soft tissues over PET/CT, but requires novel scintillation crystals, photodetectors, and readout electronics [21]–[24].

This paper presents an innovative TDC architecture for TOF measurement to be used in the hybrid TOF PET/MRI prototype module developed within the 4DMPET project [25], [26]. A classical coarse-fine approach based on a counter and a delay locked loop (DLL) is exploited [27]–[30] and designed adopting a novel fully synchronous pipeline architecture. Dynamic logic is employed to take full advantage of a deep submicrometer technology in terms of working frequency. A preliminary description of the TDC has been given in [31]. Here, the implementation of the converter is discussed in detail and the experimental results measured on TDC prototypes are presented.

This work has been partially supported by the EU within the framework of the funded project HadronPhysics3-HP3 283286.

Nahema Marino, Federico Baronti, Luca Fanucci, Sergio Saponara, and Roberto Roncella are with Dipartimento di Ingegneria dell'Informazione, Università di Pisa, Italy (e-mail: federico.baronti@unipi.it).

Maria G. Bisogni and Alberto Del Guerra are with Dipartimento di Fisica, Università di Pisa and INFN Pisa, Italy.

This paper is organized as follows: Section II summarizes the TDC specifications derived from the TOF PET/MRI requirements, Section III describes the design concept and the circuit implementation, and Section IV discusses the most significant experimental results. Finally, Section V draws some conclusions.

#### II. TDC SPECIFICATIONS

The TOF PET/MRI module includes a continuous LYSO scintillator crystal, which transforms the  $\gamma$  rays into light, and a silicon photomultiplier (SiPM) matrix that converts the light into an electrical signal. The latter is processed by a dedicated front-end ASIC that generates a digital pulse whose rising edge provides the arrival time information of the  $\gamma$  ray [32]. Furthermore, the pulse width is proportional to the energy of the  $\gamma$  ray, according to the time over threshold (TOT) technique [33]–[35]. Thus, both the rising and falling edge of the pulse carry information and must be digitized by the TDC. Fig. 1 shows the signal processing flow from the SiPM matrix to the TDC.

The converter must have a time resolution  $\sigma_{LSB}$  small enough, so as not to impair the detector performance in TOF measurement. In this paper, we use the term time resolution to indicate the TDC measurement precision (i.e., the rms of the quantization error), as traditionally done in TDC performance characterization [36].

Recent LSO and LYSO based detectors have exhibited resolutions larger than 200 ps [37]–[39]. However, simulations have shown that a resolution of around 100 ps is achievable if both sides of a LYSO crystal are coupled to a SiPM matrix [40]. This demands for a  $\sigma_{LSB}$  of the TDC smaller than 100 ps. On the other hand, the resolution required for the measurement of the pulse width is less demanding. In fact, the proportionality constant between the energy of the  $\gamma$  ray and the pulse width can be chosen in the front-end ASIC [32]. However, the larger is the resolution, the longer is the dead time of the detector. If the bin size LSB<sub>TOT</sub> of the TOT measurement is below 1 ns, the proportionality constant can be set so that the maximum energy corresponds to a width of a few hundred of nanoseconds. In this case, the probability that a new event occurs during the processing time of the previous one is negligible, as the event rate per channel is around 9 kHz (see Table I).

Other requirements of the converter include multichannel capability, good linearity, and the ability to discard input pulses shorter than a programmable threshold. The last feature is crucial to reject the pulses that are triggered by the dark noise of the SiPM. Such pulses are forwarded by the front-end ASIC to the TDC, but have a width smaller than that associated with the arrival of  $\gamma$  rays. Since the SiPM dark noise rate (2 MHz/mm<sup>2</sup> for the SiPMs considered) is significantly larger than that of valid events (1 kHz/mm<sup>2</sup> in preclinical PET) [31], real time input validation significantly reduces the output data rate of the converter. As the area of a SiPM pixel is 9 mm<sup>2</sup>, the rate of valid events to be converted by the TDC is roughly 9 kHz. The TDC specifications discussed so far are summarized in Table I.



Fig. 1. Signal processing sequence in the proposed TOF PET/MRI module. A valid SiPM signal associated with the impact of a  $\gamma$  ray on the scintillator crystal is translated into a digital pulse, whose rising edge contains the arrival time information and the width is proportional to its energy. Events due to dark noise are converted by the front-end in narrow pulses, which are discarded by the TDC.

| TABLE I TDC SPECIFICATIONS |                 |  |  |  |
|----------------------------|-----------------|--|--|--|
| $\sigma_{LSB}$             | < 100 ps        |  |  |  |
| LSB <sub>TOT</sub>         | < 1 ns          |  |  |  |
| TOT dynamic range          | $\sim 400 \ ns$ |  |  |  |
| Rejection threshold range  | $6-50 \ ns$     |  |  |  |
| Noise rate/channel         | 18 MHz          |  |  |  |
| Event rate/channel         | 9 kHz           |  |  |  |



Fig. 2. Block diagram of the devised TDC architecture. Input *In* arrives from the front-end ASIC. TC is the timing core, NOCG is the non overlapping clock generator providing the two non overlapped phases  $CK_{m,s}$ , and SVC is the sampling and validation channel.

## III. FULLY SYNCHRONOUS COARSE-FINE PIPELINE TDC

The block diagram of the proposed multichannel TDC is illustrated in Fig. 2. The fundamental block is the timing core (TC), which generates the high-resolution time references that feed the sample and validation channels (SVCs). Each channel contains two groups of hit registers that are used to sample the arrival time of both the rising and falling edge of the channel



Fig. 3. Structure of the counter and hit registers. Signals are pipelined vertically and horizontally along the cells.

input signal *In*. The time window counter (TWC) performs the validation of the input in parallel with the acquisition of its edges arrival times. This allows us to free the channel in case of an invalid event within a few nanoseconds, thus significantly reducing the channel dead time due to the dark noise events.

# A. Implementation Concept

The TC structure based on an *n*-bit counter coupled with a *k*-stage DLL is well suited to multichannel applications. Indeed, the output of both the counter and the DLL can be easily shared by two or more SVCs [28]. Moreover, this architecture offers large dynamic ranges and good resolution with low static power consumption. Each channel digitizes the arrival time of an event by sampling the counter output (coarse time) and the DLL phases (fine time). The time resolution of the delay line is a function of the clock period  $T_{CK}$  and the number of delay stages *k*, according to:

$$\sigma_{\rm DLL} = LSB_{\rm DLL} / \sqrt{6} = T_{\rm CK} / (k\sqrt{6}), \qquad (1)$$

where *LSB*<sub>DLL</sub> is the delay between two consecutive phases of the DLL and  $\sigma_{DLL}$  is the rms of the double-shot quantization error [41]. Although increasing the length of the delay line enhances the resolution, the number of DLL stages should be kept small to achieve a good linearity of the TDC [42]. Hence, it is advantageous to push the clock frequency  $f_{CK} = 1/T_{CK}$  to the limits of the available technology. However, the carry propagation delay in counters limits their maximum operating speed, especially when the number of bits increases to achieve a larger dynamic range [43]. Nevertheless, high frequencies up to a few gigahertz can be reached in deep submicrometer technologies, if a synchronous pipeline counter is realized with dynamic flip flops [44], [45]. As a consequence, just a few delay stages in the DLL are sufficient to reduce  $\sigma_{DLL}$  down to some tens of picoseconds while preserving the linearity.

Once the coarse-fine scheme has been selected, the next step is the definition of the SVC structure. Fully synchronous designs are widely acknowledged as the best choice in digital ICs because the behavior of the circuit is predictable and robust against glitches [46]. However, the adoption of a fully synchronous approach implies clocking also the hit registers at the very high frequency  $f_{CK}$  of the pipeline counter. This problem is overcome by realizing also the SVCs with dynamic logics and by using only local communication between adjacent cells (i.e., pipelining [47]). The use of dynamic logic also reduces the drawbacks of pipeline architectures such as higher design complexity and larger area. This is because the memory stages are implemented with simple pass gates and combined with logic gates, as shown afterwards.

The task of the hit registers is to store the counter/DLL bits when an event occurs on the relevant channel input. The DLL output phases  $\varphi_i$  feed the clock of the static flip flops that continuously sample the input signal. The sampled values are synchronized with the system clock and sent to the hit registers, which store the counter/DLL bits in correspondence of the 0-1 transition of the input signal. As shown in Fig. 2, the detection of the 0-1 transition generates the rise store trigger *StR*, whereas the 1-0 transition is detected to provide the fall store trigger *StF*. These trigger pulses are vertically pipelined along the SVCs.

Fig. 3 shows the circuit implementation of the counter and the hit registers. Two hit registers (rise time hit register, RTHR, and fall time hit register, FTHR) are cascaded to the counter, whose bits are pipelined horizontally through the array of cells building up the SVC. Every cell makes use of master-slave dynamic flip flops. The latter are clocked by the two non overlapped phases  $CK_{m,s}$  that are derived from the system clock CK by means of the non overlapping clock generator (NOCG) shown in Fig. 2. The counter consists of a chain of *n* half adders, where the output carry *CO* is evaluated as the AND between the input carry *CI* and the previous value of the half adder output *b*. The latter is calculated as the EXOR between its previous value and *CI*.

The RTHR and FTHR are made of identical memory cells that perform both the storage and the vertical shift to allow the



Fig. 4. DLL design. (a) Block diagram. (b) Architecture of the *i*-th stage of the delay line. It includes two inverters each of them loaded with the same number m of MOS varactors to avoid the pulse-shrinking phenomenon. (c) Circuit realization of the *m*-bit thermometric pipeline up/down counter.

stored values to be extracted from the TDC. It is worth noting that the FTHR samples the counter bits only. This implies to push the clock frequency above one gigahertz to achieve the required resolution in TOT measurement. In more detail, each memory cell implements the following function:

$$Q_{i} = \left(Q_{i-1} \cdot ShI_{i} + Q_{i} \cdot \overline{ShI_{i}}\right) \cdot \overline{StI_{i}} + b_{i} \cdot StI_{i}, \qquad (2)$$

where  $b_i$  is the output of the *i*-th half adder,  $StI_i$  is the pipelined rise/fall store trigger,  $ShI_i$  is the pipelined shift command coming from outside,  $Q_i$  and  $Q_{i-1}$  are the outputs of the *i*-th and (i-1)-th cell of the hit register, respectively. We note that in (2) (and in the following equations), the expression on the right-hand side of the equal sign is evaluated before the clock edge and represents the new value of the left-hand side expression after the clock edge.

# B. Delay Locked Loop Design

The delay element of the delay line consists of two cascaded inverters loaded with m identical MOS varactors in order to

attain a digital control [48], as depicted in Fig. 4b. Aiming at a fully synchronous design, a digital control loop based on the pipeline approach is also exploited for achieving the locking of the delay line. Indeed, an m-bit thermometric pipeline up/down counter produces a thermometric code to switch the varactors on/off. The up/down signal is generated by the balanced two state bang-bang phase detector (PD) based on cross-coupled SR latches reported in [49] and is synchronized with the system clock at the output of the PD, as shown in Fig. 4a. The thermometric counter is enabled every  $2^{r} T_{CK}$  by the overflow signal EnI of an r-bit pipeline counter, to reduce the bandwidth of the DLL control loop. In Fig. 4a, the UdI signal is the pipelined replica of the synchronized PD output and feeds the up/down input of the thermometric counter. The latter is implemented as a bi-directional shift register with enable, in which the up/down input controls the shifting direction. One end of the shift register shifts in an '1', whereas the other end shifts in a '0'.

Fig. 4c shows the pipeline implementation of the *i*-th stage of the thermometric counter. The output *Var<sub>i</sub>* controls the status of the *i*-th MOS varactor and is generated accordingly to the following relationship:

$$Var_{i} = \overline{EnI_{i}} \cdot Var_{i} + EnI_{i} \cdot \left(Inc_{i} \cdot UdI_{i} + Dec_{i} \cdot \overline{UdI_{i}}\right) =$$

$$= \overline{EnI_{i}} \cdot Var_{i} + EnI_{i} \cdot \left(Var_{i+1} \cdot UdI_{i} + Q_{i-1} \cdot \overline{UdI_{i}}\right),$$
(3)

where  $UdI_i$  and  $EnI_i$  are the up/down (which determines the shifting direction) and enable inputs, respectively ( $UdI_0$  and  $EnI_0$  coincide with the signals UdI and EnI generated by the *r*-bit counter). To perform the bi-directional shifting function, each stage has two additional inputs *Inc* and *Dec*, which are connected to the *Var* output of the next stage and the *Q* output of the preceding one, respectively ( $Inc_{m-1} = 1$  and  $Dec_0 = 0$ ). The *Q* output is the *Var* signal delayed by one clock cycle. The use of the delayed output *Q* is necessary to account for the pipeline propagation of the enable and up/down signals along the thermometric counter stages.

#### C. System Clock Generation

The TDC system clock CK is generated by a free running ring oscillator (OSC in Fig. 2). Thus, the CK period is not a priori fixed, but it varies from chip to chip and within the same chip depending on the voltage and temperature conditions. On the one hand, this requires an online calibration of the TDC measurements, as it will be pointed out afterwards. On the other hand, the CK dependence on the process, voltage and temperature (PVT) conditions simplifies the design of the pipeline cells used in the TDC. In fact, the ring oscillator can be designed with a number k of inverters, so that their cumulative delay is larger than the maximum propagation delay of the combinational logic included between two consecutive pass gates. Besides, it is sufficient to verify this timing requirement in a single PVT condition, as the delay of the ring oscillator inverters and the propagation delay of the logic vary reasonably in the same way with PVT conditions. In other words, the synchronous logic of the TDC will work independent of the actual PVT condition and the consequent oscillation frequency



Fig. 5. (a) Complete architecture of the sampling and validation channel; (b). Programmable cell of the s-bit TWC pipeline counter.

of the ring oscillator. To account for uncertainties in the simulation models and mismatches between the behavior of the combinational logic in the pipeline cells and the ring oscillator inverters, the delay of the latter is made controllable by loading its output with q varactors.

Similarly, the locking condition of the DLL can directly be achieved for any PVT corner by implementing the DLL with kstages, each of them made of two inverters as the ones used in the ring oscillator, as shown in Fig.4b. This choice leads to a highly reliable locking capability compared to conventional DLLs [50]. In order to account for possible mismatches between the ring oscillator and DLL inverters, the latter are provided with m varactors, with m>q. The extra varactors are also used to compensate for the delay introduced by a further NAND gate inserted in the ring oscillator to disable the internal clock generation for testing purposes. In this case, k must be even. The distribution of the non overlapping clock phases is not critical, as there is only local communication between adjacent pipeline cells. However, the channel input is sampled by k flip flops clocked by the DLL phases (see Fig. 2), whose outputs  $O_0$ - $O_{k-1}$  are inputs of the synchronous logic of the TDC. To guarantee the correct sampling of these signals, the phase of the system clock with respect to the DLL clock can be properly adjusted. Indeed, an erroneous sampling can be detected, as it causes  $O_0$ - $O_{k-1}$  not to be a thermometric code.

In order to have an absolute reference for the measurement of the TOF, a periodic signal with a precise and stable frequency can be sampled by the TDC and its data are used to calibrate the chips. Calibration is also exploited to take into account the clock long-term jitter, which affects the free running oscillator. This calibration signal can be either sent to a dedicated channel of the converter or superimposed to the SiPM pulses at the TDC inputs. We note that the calibration signal is similar to the reference clock needed to generate a precise clock within the chip by means of a phase locked loop (PLL), as adopted in conventional TDC designs. The proposed solution has the benefit of a simpler design, higher yield and higher working frequency, as the chip must not be designed to work at the frequency imposed by the PLL in any PVT corner. On the other hand, the TDC readouts must be processed to perform the *a*  *posteriori* calibration, using the acquired calibration events. However, this operation can be easily executed in the digital section of the chip.

For measurement purposes, the system clock is divided by a programmable pipeline counter and routed to the chip boundary. If  $T_{\text{DIV}} = pT_{\text{CK}}$  is the clock period after the division operation, the half adder cells of the counter can be initialized to a given configuration that is restored every *p* clock cycles [51].

### D. Input Validation

In order to filter out the SiPM dark noise, the TDC stores the arrival times of the input edges only if the pulse width is larger than a programmable threshold. This feature is realized using an *s*-bit pipeline counter (TWC) whose end-of-count value (i.e., the rejection threshold) is configurable. When a new event is detected, the TWC starts counting until either it reaches the end-of-count value or the input returns to zero. In the first case, the input pulse is larger than the threshold and a validation flag is sent to the hit registers, so that they store the information related to the current event until the channel is read out. In the second scenario, the input is recognized as SiPM noise and is discarded, as its information is overwritten by the next event.

Given that it is not possible to know if an event is valid until its width is checked by the TWC, the RTHR always samples the arrival time of the input rising edge until it receives the validation flag. When this occurs, the RTHR status is frozen and the bits can be shifted along the register by the channel readout electronics. Otherwise, the sampled data are overwritten by the bits of a new event. The FTHR records the arrival time of the falling edge only if the event has been validated.

The blocking process of the hit registers is supervised by the TWC through the management and interface blocks shown in Fig. 5a. The pipeline cell of the TWC is shown in Fig. 5b. It includes a resettable half adder cell in its upper part (dashed box). In addition, it performs the comparison between the *j*-th bit of the threshold  $Th_j$  and the sum of the half adder  $S_j$ . The result  $CmpO_j$  is forwarded to the next stage, according to the following relationship:

$$CmpO_{j} = \overline{S_{j} \oplus Th_{j}} \cdot CmpO_{j-1}, \qquad (4)$$

where  $CmpO_{j-1}$  is the comparison result of the previous stage. When a valid input occurs, the rejection threshold is reached and an end-of-count (*Eocl/EocO*) flag is pipelined back along the chain in order to stop the TWC. At the same time, the last comparison result  $CmpO_{s-1}$  is forwarded to the *CmpI* input of the management blocks through the interface blocks and the hit registers.

The management block for the RTHR masks the rise store trigger StR when CmpI = 1, until it is released by RlsI signal. As depicted in Fig 5b, the actual RTHR trigger StO is generated according to:

$$StO = StR \cdot \overline{YR},$$
 (6)

where YR is evaluated as:



Fig. 6. Layout of the 8 channel TDC. The timing core and one sample and validation channel are highlighted.

$$YR = RlsI \cdot (CmpI + YR). \tag{7}$$

On the other hand, the management block for the FTHR forwards the fall store trigger StF only when CmpI = 1 and is deactivated on the rising edge of the next event according to:

$$StO = StF \cdot \overline{YF},\tag{8}$$

where YF is:

$$YF = \overline{CmpI} \cdot (StR + YF). \tag{9}$$

The management block for the TWC resets the relevant counter on the rising edge of the event and then reactivates the count function *CO* as long as either the channel input *In* is high or the set threshold is reached. The interface blocks synchronize the rising edge of the readout signals *Sh*, *Rls* and *Ack*, which control the serial data shifting along the hit registers. In particular, *Sh* is the shifting command, *Rls* and *Ack* reactivate the RTHR and the TWC respectively.

Finally, each TWC stage also pipelines the TC counter bits from the FTHR to the next channel (cf. Fig. 2 and Fig. 5) to comply with the local communication constraint. It is worth noting that the validation process described so far is performed in real time, so that there is no impairment in the acquisition capability of the converter due to spurious events.

## IV. EXPERIMENTAL RESULTS

An 8 channel TDC based on the proposed architecture has been manufactured in a 65 nm CMOS process. The layout of the TDC is shown in Fig. 6, together with a microphotograph of the realized chip. The 8 channel TDC occupies an area of  $0.3 \text{ mm}^2$ . This confirms that the proposed architecture allows a very compact design. The readout signals of one SVC are routed to the chip boundary and have been used to characterize the TDC performance. The TC section of the prototype is based on a 10-bit pipeline counter and a 4-stage DLL. The inverters used in the DLL stages are loaded with 12 *n*-MOS varactors. The ring oscillator generating *CK* has one NAND gate (which provides an external signal to enable the OSC) cascaded with 4 inverters loaded with 8 *n*-MOS varactors that can be controlled by external signals. The time window counter in each channel is implemented as a 7-bit counter.



Fig. 9. Histogram of the double-shot conversion error measured from 800 events spaced by 600 ns.



Fig. 10. rms error as a function of the calibration frequency.



Fig. 8. Clock frequency variation (a) over temperature; (b) over supply voltage.

First, the oscillation frequency of the ring oscillator has been measured varying the varactors configuration on roughly 20 prototypes. For instance, Fig. 7 shows the dependency of the clock frequency on the activated varactors, evaluated with respect to the value measured with 4 activated varactors (i.e., at half load). Each varactor causes a frequency change of approximately 0.7% with a linear behavior. The half load oscillation frequency is 2.45 GHz for an external supply voltage of 2.4 V. We note that this value is higher than the 1.2 V nominal supply voltage of the technology, at which the full functionality of the TDC could not be assessed. This incoherence can reasonably be ascribed to a bad distribution of the supply voltage throughout the blocks of the TDC.

With a supply voltage of 2.4 V, the TDC operates correctly with a power consumption of 230 mW and provides a fine bin

size LSB<sub>DLL</sub> of 102 ps, a TOT bin size LSB<sub>TOT</sub> of 408 ps, a TOT dynamic range of 418 ns and an input rejection threshold programmable between 408 ps and 52.2 ns. As expected, the oscillation frequency is strongly affected by both the temperature and the voltage, as depicted in Fig. 8. In particular, the clock frequency variation over voltage is linear with a slope of 1.2 MHz/mV. This feature could be exploited to control the system clock frequency, and thus the TDC LSB, by adjusting the voltage supply. The DLL locking state has been achieved in all the tested prototypes with an appropriate configuration of the varactors in the ring oscillator.

Second, the validation mechanism of an event has been verified by monitoring the readout flag *valid* and varying the event width with a fixed TWC threshold. This procedure has been performed for various values of the threshold.

Third, preliminary tests have been carried out to assess the double-shot resolution achieved by the TDC in the measurement of a time interval. More specifically, 800 pulses with 150 ns duration and separated in time by 600 ns have been sent to one SVC. The pulses have been generated by a Tektronix AFG3252 waveform generator. The arrival time of each pulse (rising edge) is acquired by the channel RTHR and the sampled counter and DLL bits are read out from the TDC chip. The conversion of the TDC digital readout to an absolute time value requires the information of the system clock period  $T_{CK}$ , which in our approach is not a priori known and it must be derived using a calibration signal. The latter is obtained by superimposing a train of pulses spaced by a reference time (calibration period) on the input signal. From the arrival times of two consecutive calibration pulses, it is possible to compute  $T_{\rm CK}$  to be used for the conversion of the input events occurred between the two calibration pulses. Increasing the calibration frequency improves the resolution, but also decreases the amount of time in which the channel is available for acquiring the input events.

TABLE II COMPARISON OF THE PERFORMANCE OF RECENT MULTICHANNEL TDCs

|                                | This work                 | [10]              | [52]                                  | [12]                | [1]                                    |
|--------------------------------|---------------------------|-------------------|---------------------------------------|---------------------|----------------------------------------|
| Architecture                   | Pipeline counter<br>+ DLL | ADLL              | Counter + DLL + passive interpolation | DLL + VDL           | Counter + MDLL<br>+ fine interpolation |
| Technology                     | 65 nm                     | 0.35 µm           | 130 nm                                | 0.25 μm             | 0.35 µm                                |
| Channels                       | 8                         | 3                 | 8                                     | 4 (+2) <sup>a</sup> | 7                                      |
| Active area [mm <sup>2</sup> ] | 0.3                       | 4.17 <sup>b</sup> | 1.29                                  | 4.82 <sup>b</sup>   | 4 <sup>b</sup>                         |
| Clock [MHz]                    | 2450°                     | $100^{d}$         | 1562.5 <sup>d</sup>                   | 100 <sup>d</sup>    | 220 <sup>e</sup>                       |
| LSB [ps]                       | 102                       | 71                | 5                                     | 40                  | 8.88                                   |
| Resolution (rms) [ps]          | 95 <sup>f</sup>           | 21                | 3                                     | NA                  | 8.6                                    |
| Dynamic range [ns]             | 418                       | 10                | 0.64                                  | 10                  | 74000                                  |
| DNL [LSB]                      | 0.04                      | 0.58              | 0.9                                   | NA                  | 1                                      |
| Power/channel [mW]             | 29                        | 50                | 42                                    | 18                  | 12                                     |

<sup>a</sup>2 dummy channels; <sup>b</sup>Estimated from the micrograph; <sup>c</sup>Internal clock; <sup>d</sup>External clock; <sup>e</sup>External crystal at 20 MHz; <sup>f</sup>Estimated as the rms error at a calibration frequency of 167 kHz.

In our experiment, the calibration signal consists of a subset of the input events, as their arrival times are known. This means that one every a given number of consecutive TDC digital readouts is used for the calibration procedure. Once the calibration events have been selected,  $T_{CK}$  is computed for each calibration period and used to convert the remainder events occurred in the same interval. Fig. 9 shows the histogram of the TDC conversion error when only the first and last event are used for the calibration (i.e., the calibration period is equal to the whole duration of the experiment) and the remainder 798 events are converted to a time value. It shows that the error lies in a range of  $\pm 5$  LSB and the rms error is 165 ps. This value dramatically improves if the calibration frequency is increased as shown in Fig. 10 and it goes below the target value of 100 ps for a calibration frequency of 167 kHz. Of course, the doubleshot conversion error is likely to worsen if it is measured between two channels belonging to different chips (as it is case in TOF PET), because of the additional uncorrelated noise coming from the two independent oscillators. We note that the channel dead time related to the acquisition of a calibration event is in the order of a few hundreds of nanoseconds. Thus, the unavailability of the channel due to a calibration signal at 167 kHz is below a few percent.

As the events sent to a channel are uncorrelated with the system clock, they can be used to assess the differential non linearity (DNL) of the TDC according to the code density test (CDT) technique. The CDT has been performed by evaluating the histogram of the occurrences of the least two significant bits of the TDC readout, which correspond to the 4 stage DLL. As the number of events is 3267, the DNL is estimated with a 5 % accuracy and a 90 % confidence level [41]. In the tested chip prototype, we obtained a maximum DNL of 0.04 LSB, which confirms the very good linearity that can be attained with a DLL consisting of a limited number of cells.

Table II shows some figures of merit of the proposed TDC in comparison with recent multichannel TDCs published in literature. The comparison highlights that the proposed architecture offers the possibility to accommodate a large number of channels within a small area while offering a good dynamic range and linearity. The resolution is larger compared to the other TDCs but fulfills the PET requirements. Finally, the power/channel consumption during acquisition is relatively moderate given the high clock rate of 2.45 GHz. No abnormal heating has been observed by monitoring the chip package with an IR thermal camera.

# V. CONCLUSIONS

A novel coarse-fine TDC architecture has been specifically devised for evaluating the time of flight and photon energy in an innovative TOF PET/MRI detector. A key point of the proposed architecture is the use of the pipeline approach and local communication between adjacent pipeline cells, so as to push the working frequency of a fully synchronous design up to the limits of a deep submicrometer CMOS process. In this way, the required TDC resolution can be achieved by means of a synchronous counter combined with a DLL with just a few stages, thus providing a good linearity of the TDC. The counter and DLL outputs can be shared by a theoretical arbitrary number of sampling and validation channels, avoiding appreciable variability among them. This leads to a very compact structure, also thanks to the use of dynamic memory, which alleviates the burden of the pipeline stages. In addition, each channel is capable of real-time validation of the input event by checking its width against a programmable threshold. This allows the spurious events due to the dark noise of the SiPM matrix to be discarded, thus dramatically reducing the requested readout data rate from the channel.

The devised architecture has been used to implement an 8 channel TDC prototype in 65 nm CMOS technology. The chip occupies an area of  $0.3 \text{ mm}^2$  and consumes 230 mW at an oscillation frequency of 2.45 GHz. The power consumption is likely to improve by revising the power supply section of the chip layout, so that it can be supplied at the technology nominal voltage of 1.2 V.

As the DLL consists of 4 stages, the TDC LSB is 102 ps. Measurements performed on the TDC prototypes show that a resolution well below 100 ps can be achieved providing an external calibration signal with a proper frequency. The calibration signal is needed because the internal ring oscillator is free running, so its frequency varies with PVT conditions. This significantly simplifies the design of the DLL and the synchronous part of the TDC, because they are not requested to work at a fixed reference frequency in any PVT corner.

#### References

- J. P. Jansson, V. Koskinen, A. Mantyniemi, and J. Kostamovaara, "A multichannel high-precision CMOS time-to-digital converter for laserscanner-based perception systems," *IEEE Trans. Instrum. Meas.*, vol. 61, no. 9, pp. 2581-2590, Sep. 2012.
- [2] D. Lee, J. Sung, and J. Park, "A 16ps-resolution random equivalent sampling circuit for TDR utilizing a Vernier time delay generation," in *Proc. IEEE Nucl. Sci. Symp. Conf. Rec.*, vol. 2, 19–25 Oct. 2003, pp. 1219-1223.
- [3] J. Yu, F. F. Dai, and R. C. Jaeger, "A 12-bit Vernier ring time-to-digital converter in 0.13 µm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 45, no. 4, pp. 830-842, Apr. 2010.
- [4] M. Zanuso, P. Madoglio, S. Levantino, C. Samori, and A. Lacaita, "Timeto-digital converter for frequency synthesis based on a digital bang-bang DLL," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 57, no. 3, pp. 548-555, Mar. 2010.
- [5] C.-C. Chung and W.-J. Chu, "An all-digital on-chip jitter measurement circuit in 65 nm CMOS technology," in *Int. Symp. VLSI Design, Automation and Test*, Hsinchu, 25-28 Apr. 2011, pp. 1-4.
- [6] J. Rivoir, "Fully-digital time-to-digital converter for ATE with autonomous calibration," in *IEEE Int. Test Conf.*, Santa Clara, CA, Oct. 2006, pp. 1-10.
- [7] K. Karadamoglou, N. P. Paschalidis, E. Sarris, N. Stamatopoulos, G. Kottaras, and V. Paschalidis, "An 11-bit high-resolution and adjustable-range CMOS time-to-digital converter for space science instruments," *IEEE J. Solid-State Circuits*, vol. 39, no. 1, pp. 214-222, Jan. 2004.
- [8] L. Perktold and J. Christiansen, "A flexible 5 ps bin-width timing core for next generation high-energy-physics time-to-digital converter applications," in *Conf. Ph.D. Research in Microelectr. and Electr.*, Aachen, Germany, 12-15 Jun. 2012, pp. 1-4.
- [9] J. Torres, A. Aguilar, R. Garcia-Olcina, P. A. Martinez, J. Martos, J. Soret, J. M. Benlloch, P. Conde, A. J. Gonzalez, and F. Sanchez, "Time-to-Digital Converter Based on FPGA With Multiple Channel Capability," *IEEE Trans. Nucl. Sci.*, vol. 61, no. 1, pp. 107-114, Feb. 2014.
- [10] W. Gao, D. Gao, D. Brasse, C. Hu-Guo, and Y. Hu, "Precise multiphase clock generation using low-jitter delay-locked loop techniques for positron emission tomography imaging," *IEEE Trans. Nucl. Sci.*, vol. 57, no. 3, pp. 1063-1070, Jun. 2009.
- [11] S. Mandai and E. Charbon, "A 128-channel, 8.9-ps LSB, column-parallel two-stage TDC based on time difference amplification for time-resolved imaging," *IEEE Trans. Nucl. Sci.*, vol. 59, no. 5, pp. 2463-2470, Oct. 2012.
- [12] Y. Li, H. Yu, J. Lai, J. Zhen, Z. Jun, N. Ming, and X. Peng, "A CMOS time-to-digital converter for multi-voltage threshold method in positron emission tomography," in *IEEE Int. Conf. of Electron Devices and Solid-State Circuits*, Hong Kong, 3-5 Jun. 2013, pp. 1-2.
- [13] D. L. Bailey, D. W. Townsend, P. E. Valk, and M. N. Maisey, *Positron Emission Tomography: Basic Sciences*. Springer, 2005, pp. 1-2.
- [14] M. Conti, "Why is TOF PET reconstruction a more robust method in the presence of inconsistent data?," *Phys. Med. Biol.*, vol. 56, pp. 55-168, Jan. 2011.
- [15] W. Moses, "Recent advances and future advances in time-of-flight PET," Nucl. Instrum. Methods Phys. Res. A, vol. 580, no. 2, pp. 919-924, Oct. 2007.
- [16] Z. Xu, M. Miyahara, and A. Matsuzawa, "Picosecond resolution time-todigital converter using Gm-C integrator and SAR-ADC," *IEEE Trans. Nucl. Sci.*, vol. 61, no. 2, Apr. 2013, pp. 852-859.
- [17] Y.-H. Seo, J.-S. Kim, H.-J. Park, and J.-Y. Sim, "A 0.63ps resolution, 11b pipeline TDC in 0.13µm CMOS," in *Symp. On VLSI Circuits*, Honolulu, HI, 15-17 Jun. 2011, pp. 152-153.
- [18] H. Zaidi, Quantitative Analysis in Nuclear Medicine Imaging. Springer, 2006, p. 5.

- [19] T. Beyer, D. W. Townsend, J. Czernin, and L. S. Freudenberg, "The future of hybrid imaging-part 2: PET/CT," *Insights Imaging*, vol. 2, no. 3, pp. 225-234, Jun. 2011.
- [20] H. Zaidi and A. Del Guerra, "An outlook on future design of hybrid PET/MRI systems," *Med. Phys.*, vol. 38, no. 10, pp. 5667-5689, Oct. 2011.
- [21] C. Piemonte, R. Battiston, M. Boscardin, G. F. Dalla Betta, A. Del Guerra, N. Dinu, A. Pozza, and N. Zorzi, "Characterization of the first prototypes of silicon photomultipliers fabricated at ITC-irst," *IEEE Trans. Nucl. Sci.*, vol. 54, no. 1, pp. 236-244, Feb. 2007.
- [22] A. Del Guerra, N. Belcari, M. G. Bisogni, G. Llosa, S. Marcatili, G. Ambrosi, F. Corsi, C. Marzocca, G. Dalla Betta, and C. Piemonte, "Advantages and pitfalls of the silicon photomultiplier (SiPM) as photodetector for the next generation of PET scanners," *Nucl. Instrum. Methods Phys. Res. A*, vol. 617, pp. 223-226, May 2010.
- [23] G. Llosa, J. Barrio, J. Cabello, C. Lacasta, J. F. Oliver, M. Rafecas, C. Solaz, P. Barillon, C. de La Taille, M.G. Bisogni, A. Del Guerra, and C. Piemonte, "Development of a PET prototype with continuous LYSO crystals and monolithic SiPM matrices," in *IEEE Nucl. Sci. Symp. And Med. Imag. Conf.*, Valencia, Spain, 23-29 Oct. 2011, pp. 3631-3634.
- [24] F. Corsi, M. Foresta, C. Marzocca, G. Matarrese, and A. Del Guerra, "ASIC development for SiPM readout," *JINST*, vol. 4, pp. 1-10, Mar. 2009.
- [25] M. Morrocchi, S. Marcatili, N. Belcari, M. G. Bisogni, G. Collazzuol, G. Ambrosi, F. Corsi, M. Foresta, C. Marzocca, G. Matarrese, G. Sportelli, P. Guerra, A. Santos, and A. Del Guerra, "Characterization and test of a data acquisition system for PET," in *IEEE Nucl. Sci. Symp. And Med. Imag. Conf.*, Valencia, Spain, 23-29 Oct. 2011, pp. 621-625.
- [26] F. Pennazio, G. Ambrosi, M. G. Bisogni, P. Cerello, F. Corsi, A. Del Guerra, M. Ionica, N. Marino, C. Marzocca, M. Morrocchi, C. Peroni, G. Pirrone, C. Santoni, and R. Wheadon, "SiPM-based PET module with depth of interaction,"," in *IEEE Nucl. Sci. Symp. And Med. Imag. Conf.*, Anaheim, CA, 27 Oct.-3 Nov. 2012, pp. 3786-3789.
- [27] J.-P. Jansson, A. Mantyniemi, and J. Kostamovaara, "A CMOS time-todigital converter with better than 10 ps single-shot precision," *IEEE J. Solid-State Circuits*, vol. 41, no. 6, pp. 1286-1296, Jun. 2006.
- [28] P. Andreani, F. Bigongiari, R. Roncella, R. Saletti, P. Terreni, A. Bigongiari, and M. Lippi, "Multihit multichannel time-to-digital converter with ±1% differential nonlinearity and near optimal time resolution," *IEEE J. Solid-State Circuits*, vol. 33, no.4, pp. 650-656, Apr. 1998.
- [29] D. E. Schwartz, E. Charbon, and K. L. Shepard, "A single-photon avalanche diode imager for fluorescence lifetime applications," in *IEEE Symp. VLSI Circ.*, Kyoto, Japan, 14-16 Jun. 2007, pp. 144-145.
- [30] J. Guo and S. Sonkusale, "A 22-bit 110ps time-interpolated time-todigital converter," in *Int. Symp. Circ. and Syst.*, Seoul, 20-23 May 2012, pp. 3166-3169.
- [31] N. Marino, F. Baronti, L. Fanucci, R. Roncella, S. Saponara, M. G. Bisogni, and A. Del Guerra, "A novel time to digital converter architecture for time of flight positron emission tomography," in *IEEE Nordic-Medit. Workshop on Time-to-Digital Converters*, Perugia, Italy, 3-3 Oct. 2013, pp. 1-4.
- [32] F. Licciulli, F. Ciciriello, F. Corsi, C. Marzocca, and M. G. Bisogni, "TOT\_AL: an ASIC for TOF and DOI measurement," in *IEEE Nucl. Sci.* Symp. Conf. Rec., Seul, Korea, 27 Oct.-2 Nov. 2013, pp. 1-6.
- [33] N. Marino, G. Ambrosi, F. Baronti, M. G. Bisogni, P. Cerello, F. Corsi, L. Fanucci, M. Ionica, C. Marzocca, F. Pennazio, R. Roncella, C. Santoni, S. Saponara, S. Tarantino, R. Wheadon, and A. Del Guerra, "An innovative detection module concept for PET," *JINST*, vol. 7, pp. C08003- C08012, Aug. 2012.
- [34] Y. Wang, X. Cheng, D. Li, W. Zhu, and C. Liu, "A Linear Time-Over-Threshold Digitizing Scheme and Its 64-channel DAQ Prototype Design on FPGA for a Continuous Crystal PET Detector," *IEEE Trans. Nucl. Sci.*, vol. 61, no. 1, pp. 99-106, Feb. 2014.
- [35] M. W. Ben Attouch, K. M. Koua, S. Panier, L. Arpin, L. Njejimana, H. Bouziri, M. Abidi, C. Paulin, R. Lecomte, J. Pratte, and R. Fontaine, "A fully integrated pulse charge generator embedded in a 64-channel readout ASIC dedicated to a PET/CT detector module," *IEEE Int. Conf. Elec. Des., Syst. and App.*, Kuala Lumpur, 5-6 Nov. 2012, pp. 130-134.
- [36] F. Baronti, L. Fanucci, D. Lunardini, R. Roncella, and R. Saletti, "On the differential nonlinearity of time-to-digital converters based on delaylocked-loop delay lines," IEEE Trans. Nucl. Sci., vol. 48, no. 6, pp. 2424– 2431, 2001.
- [37] L. Cosentino, P. Finocchiaro, A. Pappalardo, and F. Garibali, "Assessment of a high-resolution candidate detector for prostate time-of-

flight positron emission tomography", *Rev. Sci. Instrum.*, vol. 83, no. 11, pp. 114301-1 – 114301-9, Nov. 2012.

- [38] M. Ito, M. S. Lee, and J. S. Lee, "Continuous depth-of-interaction measurement in a single-layer pixelated crystal array using a single-ended readout," *Phys. Med. Biol.*, vol. 58, no. 5, pp. 1269-1282, Feb. 2013.
- [39] S. Gundacker, E. Auffray, B. Frisch, H. Hillemanns, P. Jarron, T. Meyer, K. Pauwels, and P. Lecoq, "A systematic study to optimize SiPM photodetectors for highest time resolution in PET," *IEEE Trans. Nucl. Sci.*, vol. 59, no. 5, pp. 1798-1804, Oct. 2012.
- [40] F. Pennazio, J. Barrio, M. G. Bisogni, P. Cerello, G. De Luca, A. Del Guerra, C. Lacasta, G. Llosá, G. Magazzu, S. Moehrs, C. Peroni, and R. Wheadon, "Simulations of the 4DMPET SiPM based PET module,"," in *IEEE Nucl. Sci. Symp. And Med. Imag. Conf.*, Valencia, Spain, 23-29 Oct. 2011, pp. 2316-2320.
- [41] L. Zaworski, D. Chaberski, M. Kowalski, and M. Zieliński, "Quantization error in time-to-digital converters," *Metr. And Meas. Syst.*, vol. 19, no. 1, pp. 115-122, Mar. 2012.
- [42] F. Baronti, D. Lunardini, R. Roncella, and R. Saletti, "A Self-Calibrating Delay-Locked Delay Line With Shunt-Capacitor Circuit Scheme," *IEEE J. Solid-State Circuits*, vol. 39, no. 2, pp. 384–387, Feb. 2004.
- [43] A. K. Maini, Digital Electronics: Principles, Devices and Applications. John Wiley & Sons, 2007, pp. 411-413.
- [44] R. Roncella and R. Saletti, "A VLSI systolic adder for digital filtering of delta-modulated signals," *IEEE Trans. Acoustics, Speech and Signal Process.*, vol. 37, no. 5, pp. 749-754, May 1989.

- [45] M. R. Stan, A. F. Tenca, and M. D. Ercegovac, "Long and fast up/down counters," *IEEE Trans. Computers*, vol. 47, no. 7, pp. 722-735, Jul. 1998.
- [46] P. Forshaw and R. Hahn, "Synchronous design: the right technique for digital ASICs," in *IEEE ASIC Seminar and Exhibit*, Rochester, NY, 17-21 Sep. 1990, pp. P6/1.1-P6/1.5.
- [47] K. T. Johnson, A.R. Hurson, and B. Shirazi, "General-purpose systolic arrays," *Computer*, vol. 26, no. 11, pp. 20-31, Nov. 1993.
- [48] J. Bremer, T. Peikert, and W. Mathis, "Analytical inversion-mode varactor modelling based on the EKV model and its application to RF VCO design," in 17th Int. Conf. on Mixed Design of Integ. Circ. and Syst., Warsaw, Poland, 24-26 Jun. 2010, pp.64-69.
- [49] M. Mota and J. Christiansen, "A four channel, self-calibrating, high resolution, time to digital converter," in *IEEE Int. Conf. on Electr., Circ.* and Syst., vol. 1, Lisboa, Portugal, 7-10 Sep. 1998, pp. 409-412.
- [50] Y.-H. Moon, I.-S. Kong, Y.-S. Ryu, and J.-K. Kang, "A 2.2mW, 20-135MHz, false-lock free DLL for display interface in 0.15µm CMOS," IEEE Trans. on Circ. and Syst. II: Express Briefs, vol. PP, no. 99, pp. 1-5, May 2014, 10.1109/TCSII.2014.2327338.
- [51] K. Z. Pekmestzi and N. Thanasouras, "Systolic frequency dividers/counters," *IEEE Trans. Circ. Syst. II: Analog Digit. Signal Process.*, vol. 41, no. 11, pp. 775-776, Nov. 1994.
- [52] L. Perktold and J. Christiansen, "A fine time-resolution (<< 3 ps-rms) time-to-digital converter for highly integrated designs," in *IEEE Int. Instr. Meas. Tech. Conf.*, Minneapolis, MN, 6-9 May 2013, pp. 1092-1097