# A 510-pW 32-kHz Crystal Oscillator With High Energy-to-Noise-Ratio Pulse Injection

Li Xu<sup>®</sup>, *Graduate Student Member, IEEE*, Taekwang Jang<sup>®</sup>, *Senior Member, IEEE*, Jongyup Lim<sup>®</sup>, *Graduate Student Member, IEEE*, Kyojin David Choo<sup>®</sup>, *Member, IEEE*, David Blaauw<sup>®</sup>, *Fellow, IEEE*, and Dennis Sylvester<sup>®</sup>, *Fellow, IEEE* 

Abstract-This article introduces a 32-kHz crystal oscillator (XO) with high energy-to-noise-ratio pulse injection at subharmonic frequency. A T/4-delay clock slicer is proposed to convert the sinusoidal crystal waveform into an output clock of 32 kHz and to introduce a delay of T/4, providing proper timing for energy injections. The output clock feeds frequency dividers and generates pulses to activate the proposed all-NMOS differential driver at 4 kHz. It enables two injections in eight periods at the peak and valley of the crystal oscillation, with the crystal running freely between injections. This configuration achieves a 2-ppb Allan deviation floor. The less frequent injections reduce the injection overhead, enabling the lowest reported power consumption of published nW XOs (0.51 nW). At 0.45 V, the proposed XO operates across a temperature range of -25 °C to 125 °C, the widest reported range for nW XOs. This design is fabricated in the 40-nm CMOS and occupies 0.02 mm<sup>2</sup>.

Index Terms—Crystal oscillator (XO), long-term stability, pulse injection, subharmonic injection, ultralow power.

## I. INTRODUCTION

▼RYSTAL oscillators (XOs) play an important role in ultralow power systems. In these systems, functional blocks such as sensors or communication circuits can consume several  $\mu$ A to mA. A 32-kHz XO is used to duty cycle these circuits to reduce power, but the XO itself cannot be dutycycled. This represents the first of three design challenges in achieving ultralow power consumption. Second, although the crystal is stable compared with on-chip passive components in CMOS processes, the drive circuit of an XO must work reliably under process, voltage, and temperature (PVT) variations. For radio applications that use 32-kHz XOs as wakeup timers, the RF blocks are turned on for short periods of time with guard bands to realize synchronization as shown in Fig. 1. Larger variations in the duty-cycled period require a longer guard band, which results in more power overhead [1]. Hence, as the third design challenge,

Manuscript received February 5, 2021; revised April 10, 2021 and May 21, 2021; accepted June 15, 2021. Date of publication July 8, 2021; date of current version January 28, 2022. This article was approved by Associate Editor Danielle Griffith. This work was supported in part by Semiconductor Research Corporation (SRC). (*Corresponding author: Li Xu.*)

Li Xu, Jongyup Lim, Kyojin Choo, David Blaauw, and Dennis Sylvester are with the Department of Electrical and Computer Engineering, University of Michigan, Ann Arbor, MI 48109 USA (e-mail: lxummad@umich.edu).

Taekwang Jang is with the Department of Information Technology and Electrical Engineering, ETH Zürich, 8092 Zürich, Switzerland.

Color versions of one or more figures in this article are available at https://doi.org/10.1109/JSSC.2021.3092424.

Digital Object Identifier 10.1109/JSSC.2021.3092424



Fig. 1. Simplified power profile of an ultralow power SoC with 32-kHz XO as wake-up timer.

good long-term frequency stability is required for XOs, which means a low Allan deviation at the duty cycle period is essential.

In ultralow-power XOs, the ideal circuit consumes minimum power while disturbing the oscillation as little as possible. To achieve sub-nW power consumption, there are three fundamental considerations: the loss in the crystal, the efficiency of energy injection, and the power required to extract oscillation frequency and phase as well as drive the injection. Crystal loss is a quadratic function of the oscillation amplitude  $(V_{OSC})$  in series resonance and a quadratic function of both  $V_{OSC}$  and the load capacitance  $(C_{\rm L})$  in parallel resonance as shown by (3) and (4) in Section II-B. Recent nW XOs use an amplitude control circuit [2] or lower voltage supply [1], [3]-[8] to reduce  $V_{OSC}$ , while [3] and [7] have no explicit capacitance on the crystal nodes to reduce  $C_{\rm L}$ . These techniques greatly reduce the crystal loss such that it no longer dominates the total power [2], [3]. A 0.55-nW 32-kHz XO operating in series resonance was proposed in [2] and [9]. It uses I/O downconversion and upconversion to preserve the oscillation phase across the crystal to force the crystal into series resonance. A delay-locking loop (DLL) is used to generate I/Q signals from crystal oscillation. Because the delay between the I/Q signals and crystal oscillation does not affect the phase synchronization across the crystal, the phase-detecting circuit in the DLL does not require a fast response, which reduces the power consumption. In this work, the measured peak-to-peak oscillation amplitude at one side of the crystal is about 0.1 V. Because the phase shift across the crystal is close to zero in series resonance, the differential oscillation amplitude across the crystal is reported as 2 mV [9]. The choice of this low oscillation amplitude is important to reduce the crystal loss in series-mode resonance. However, a low oscillation amplitude makes the oscillation more prone to noise from the driving circuit, which is a design challenge for all types of nW XOs.

0018-9200 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information. As a result, nW XOs typically exhibit a higher Allan deviation floor, which indicates degraded long-term frequency stability.

Shrivastava et al. [6] introduce a design with an inverter-based Pierce structure with a duty-cycling scheme for the driver. Because the inverting amplifier operates in the subthreshold region with a  $V_{DD}$  of 0.3 V, the bias current is sensitive to PVT variations, and thus this design requires calibration. An ultralow-voltage 32-kHz XO design operating with only 60 mV was presented in [4]. It uses a Schmitt trigger as the inverting amplifier to compensate for the crystal loss. Although it shows that a Schmitt trigger has much less variability in process corners than an inverter, this design has a limited  $V_{DD}$  range of 0.06–0.1 V and was only tested from 5 °C to 62 °C due to the measurement setup. Siniscalchi et al. [4] and Shrivastava et al. [6] using a conventional Piece structure and inverting amplifiers demonstrate the tradeoff between power and robustness to PVT variations. To avoid this tradeoff and sustain the oscillation despite PVT variations, an XO with a pulsed driver was proposed in [1] and [8]. The driver injects energy into the crystals with pulses only at the peak and valley of the crystal oscillation. During other parts of the oscillation period, the driver is turned off. With this configuration, the driver can be sized properly with margin for PVT variations while avoiding substantial static power consumption. By injecting energy at the peak and valley of the crystal oscillation, the pulse driver also achieves high efficiency of energy injection, thanks to a small voltage drop from  $V_{DD}$  or ground to crystal during the injections.

One disadvantage of this pulsed driver is that it requires timing control for the pulsed injections to ensure that they happen close to the peak and valley of the crystal oscillation in the presence of PVT variation. Timing control using a DLL [1] or a phase-locking loop (PLL) [5] has been proposed. However, within a nW power budget, even detecting the phase of a sinusoidal crystal oscillation is not trivial. These timing control loops introduce large power and area overhead. In [7], an XO design with a pulsed driver and open-loop timing control was introduced. The injection timing relies on the intrinsic delay of the proposed low-power clock slicer that converts the sinusoidal crystal oscillation into a rail-to-rail output clock. This configuration simplifies the architecture and reduces power at the cost of uncontrolled injection timing with PVT variation. Because the pulsed driver only injects energy at the peak and valley of the crystal oscillation, the pulsed control signals typically require bootstrapping to turn on the driver strongly. These bootstrapping circuits also contribute to switching power overhead.

Fig. 2 presents the power consumption and Allan deviation floor of the nW XOs published from 2012 to 2020. Yoon *et al.* [8] from ISSCC 2012 presented the first 32-kHz XO that consumed less than 10 nW; now the lowest power consumption for an XO is close to 0.5 nW. Considering the three challenges discussed at the beginning of this article, we propose an XO design that consumes only 0.51 nW [10]. It can operate from -25 °C to 125 °C, and the measured Allan deviation floor is 2 ppb at 0.45 V power supply and 25 °C.

The proposed XO design uses frequency-divided (4 kHz), high energy-to-noise-ratio (ENR) injection



Fig. 2. Power and Allan deviation floor of state-of-the-art nW XOs.

oscillation (HERO). It injects high energy in short pulses at 4 kHz around the peak and valley of the crystal oscillation. By allowing the crystal to run freely for a longer period of time between injections, HERO achieves a 2-ppb Allan deviation floor, which is the lowest reported among the state-of-the-art nW XOs. The design with the second-best reported Allan deviation floor [4] consumes 2.26 nW. Furthermore, the less frequent injections in HERO significantly reduce the injection overhead, enabling the lowest reported power consumption (0.51 nW) to the best of the authors' knowledge. An integrated phase extraction and delay circuit achieves accurate injection alignment, resulting in stable operation from -25 °C to 125 °C, the widest reported temperature range among nW XOs.

This article is organized as follows. Section II discusses the design considerations required to achieve ultralow power including the choice of crystal and resonance mode. Section III introduces the proposed architecture and circuit implementations. The measurement results including power consumption, frequency variations in PVT, Allan deviation, and reliability tests are presented in Section IV. Finally, Section V summarizes the conclusions and discusses future work.

## **II. PROPOSED LOW-POWER TECHNIQUES**

In this section, we analyze which elements determine the lowest possible achievable power consumption of a 32-kHz XO. In other words, if we have an ideal circuit driving a real crystal, what are the fundamental requirements or functionality of this circuit to provide a 32-kHz output? First, it must extract



Fig. 3. Resonance modes of crystals: "parallel resonance" and "series resonance"; waveforms at  $V_1$  and  $V_2$  in these two modes and oscillation amplitudes across the crystal,  $V_{OSC}$ .

frequency and phase from the crystal waveform; second, it must inject energy into the crystal to compensate for the loss in the crystal; and third, it requires timing control so energy is injected at the right time. These requirements lead to the corresponding fundamental power consumptions: power for extraction, power for crystal loss, and power for timing. For example, in a conventional Pierce XO with an inverter, the inverter and the load capacitances must be carefully sized to meet the three requirements above. The inverter can convert the sinusoidal waveform from the crystal into the square wave clock output at the cost of short-circuit current. With resistor and load capacitors, the inverter works as a continuous amplifier that must satisfy the gain and phase to sustain the oscillation of the crystal [11]. However, even with an ideal circuit that can perfectly accomplish phase extraction, energy injection, and timing control, energy is still required to compensate for the loss in the crystal. Hence, the loss in the crystal determines the power limit.

## A. Choice of Crystal

The parameters of a crystal can determine the crystal loss. Typically, we use an RLC circuit to model the crystal as shown in Fig. 3. For simplicity, dc-biasing resistors are not shown. Because it has non-zero motional resistance  $R_S$ , the crystal has a quality factor Q, the ratio between stored and dissipated energy in one cycle [12], on the order of tens of thousands

$$Q = 2\pi \frac{E_{\text{Stored}}}{E_{\text{Loss},\text{T}}} = 2\pi \frac{0.5L_{\text{S}}I_{R_{\text{S}}}^2}{0.5I_{R_{\text{S}}}^2 R_{\text{S}}T_{\text{XO}}} = \frac{\omega_{\text{OSC}}L_{\text{S}}}{R_{\text{S}}}.$$
 (1)

 $R_{\rm S}$  and Q can determine both the power and noise performance of an XO. A higher Q means a smaller phase error of oscillation in the presence of a disturbance. When we choose a 32-kHz crystal for a nW XO, a crystal with a high Q is desired for low-loss and good noise performance. As we will discuss in Section II-B, for an XO operating in "parallel resonance," a crystal with a low  $R_{\rm S}$  is preferred for less crystal loss. For an XO operating in "series resonance," a crystal with a higher  $R_{\rm S}$  is preferred for less crystal loss. For an XO operating in "series resonance," a crystal with a higher  $R_{\rm S}$  is preferred for less crystal with a higher  $R_{\rm S}$  is preferred for less crystal loss. Hence, the power and noise performance also depend on how the crystal is resonated.



Fig. 4. Simplified crystal model in parallel resonance for calculation of crystal loss.

### B. Resonance Mode

In "parallel resonance," inductor  $L_S$  resonates with  $C_S$  in series with load capacitance  $C_L$ , which is a combination of  $C_{\rm O}$  and  $C_{\rm P}$ . There is a phase shift of 180° across the crystal. One important feature of the parallel resonance mode is that if the driver is turned off, the crystal will stay in parallel resonance and continue to oscillate as the amplitude decays. In "series resonance," the inductor resonates with  $C_{\rm S}$  only. It requires a driver to maintain zero phase shift across the crystal. In contrast to parallel resonance, the driver must always be on to maintain series resonance. If the driver is turned off, the inductor current would instantaneously cause phase difference between  $V_1$  and  $V_2$  by pulling up one of  $V_1$ and  $V_2$  while pulling down the other one. So, once the driver for series resonance is turned off, the crystal would switch to parallel resonance. Fig. 3 shows the voltage waveform at  $V_1$ and  $V_2$  in these two modes. The oscillation amplitude across the crystal is  $V_{OSC}$ . Assuming a certain  $V_{OSC}$ , we can calculate the crystal loss in these two modes.

In parallel resonance, because  $V_1$  and  $V_2$  are 180° out of phase to each other, the crystal model can be simplified as shown in Fig. 4.  $C_L$  is the total load capacitance, and  $C_L = C_O + 0.5C_P$ . Because  $C_L$  and  $C_S$  form a capacitive voltage divider, and the oscillation amplitude across  $C_L$  is defined as  $V_{OSC}$  as shown in Fig. 3, the oscillation amplitude across  $R_S$ and  $L_S$  can be obtained as

$$V_{\text{OSC,INT}} = V_{\text{OSC}} \cdot \frac{C_{\text{S}} + C_{\text{L}}}{C_{\text{S}}}.$$
 (2)

Then, the crystal loss in parallel resonance can be obtained by calculating the power dissipated on  $R_S$  [1]

$$P_{\text{Loss},\text{PR}} \approx 0.5 R_{\text{S}} \cdot (V_{\text{OSC}} \cdot C_{\text{L}} \cdot \omega_{\text{S}})^2$$
(3)

where  $\omega_{\rm S}$  is the resonance frequency of  $L_{\rm S}$  and  $C_{\rm S}$ ,  $1/\sqrt{(L_{\rm S}C_{\rm S})}$ . Hence, the crystal loss in parallel resonance is proportional to  $R_{\rm S}$  and a quadratic function of both  $V_{\rm OSC}$  and  $C_{\rm L}$ .

In series resonance, because  $L_S$  resonates with  $C_S$ ,  $V_{OSC}$  is across  $R_S$ , which makes the crystal loss a quadratic function of  $V_{OSC}$  and inversely proportional to  $R_S$ :

$$P_{\rm Loss,SR} = P_{R_{\rm S}} = \frac{V_{\rm OSC,R_{\rm S}}^2}{2R_{\rm S}} = \frac{V_{\rm OSC}^2}{2R_{\rm S}}.$$
 (4)

From (3), to reduce the crystal loss in parallel resonance, we should choose a crystal with a small  $R_{\rm S}$  and reduce the oscillation amplitude,  $V_{\rm OSC}$ , and load capacitance,  $C_{\rm L}$ . For example, for a 32-kHz crystal model with  $R_{\rm S} = 50$  k $\Omega$ ,



Fig. 5. Circuit injects both energy and noise into the crystal.



Fig. 6. Proposed high ENR injection compared with continuous injection and pulsed injection.

 $L_{\rm S} = 17$  kH,  $C_{\rm S} = 1.39$  fF,  $C_{\rm O} = 1.35$  pF, and Q = 70000, assuming 100-mV  $V_{\rm OSC}$  and 1-pF  $C_{\rm P}$  (all the calculations in Section II use this crystal model), the crystal loss in parallel resonance is 36 pW. XOs in series resonance can generate about 100-mV<sub>peak-peak</sub> single-ended oscillation amplitude while keeping  $V_{\rm OSC} = 2$  mV across the crystal [9]. With the crystal model above and  $V_{\rm OSC} = 2$  mV, the crystal loss in series resonance is calculated as 40 pW using (4). The crystal loss of 36 pW in parallel resonance and 40 pW in series resonance sets the fundamental limit on the lowest possible power consumption under the assumptions of crystal parameters,  $V_{\rm OSC}$ , and  $C_{\rm P}$ .

# C. How and When to Inject Energy Into Crystal

This fundamental limit on power consumption can only be achieved with an ideal XO circuit. For a realistic XO circuit, there are numerous fundamental challenges to achieving nW power levels: first, power is required to convert the sine wave from the crystal into square wave clock; second, the efficiency of energy injection; third, the power of the timing control for energy injection; and finally, the circuit itself has noise so when the circuit injects energy, it also injects noise into the crystal, which disturbs the inherent oscillation. The first challenge relates to the power required for observation, while the other three challenges relate to injection. The injection involves "Energy" (*E*), "Noise of circuit" ( $N_C$ ), and "Noise injected into crystal" ( $N_{INJ}$ ) as shown in Fig. 5. Thus, the question of how and when to inject energy is central to the design of ultralow-power XO circuits.

As shown in Fig. 6, a conventional Pierce XO and series mode XOs continuously inject energy into the crystal. Because



Fig. 7. (a) Model of differential injection after the oscillation is started, (b) equivalent capacitance seen by one side of the differential driver during a differential injection, and (c) attenuation of the oscillation amplitude w/ and w/o pulsed injections.

they use Class A operation, the injection efficiency is low, and the noise from the circuit is continuously injected into the crystal. In addition, referring to the theory of phase noise [13], at the peak and the valley of the waveform, the oscillation phase has the minimum sensitivity to amplitude change or noise. These observations led to the pulsed driver design of the XO proposed in [1]. The injection efficiency can be close to 100% in theory. Because the injections happen at the peak and valley, the phase error of the crystal oscillation due to injections is minimized. Interestingly, we found that because the driver for injection is duty-cycled, the circuit noise is also sampled, reducing the noise injected into the crystal. Assuming the low-frequency noise from the driver and power supply of the driver is  $N_C$  as shown in Fig. 6, the averaged noise injected into the crystal related to  $N_C$  is

$$\bar{N}_{\rm INJ} = D \cdot \bar{N}_{\rm C} \tag{5}$$

$$D = \frac{2}{n} \times \frac{T_{\text{pulse}}}{T_{\text{XO}}} \tag{6}$$

where  $T_{\text{pulse}}$  is the pulsewidth of the injections,  $T_{\text{XO}}$  is the period of crystal oscillation, and 2/n means there are two injections in *n* periods (n = 1 for the pulse injection in Fig. 6). Since pulsed injections happen every peak and valley and the control signals for drivers must be bootstrapped, the switching power related to the control signals to activate the injections is high. The motivation of the proposed HERO in Fig. 6 is based on these two observations. We think that if we do energy injection at a lower frequency than the oscillation frequency, we could achieve lower power and better noise performance. However, there are two questions that we must figure out to consider the proposed concept as a feasible technique: 1) is it possible to do energy injection at a lower frequency than the oscillation frequency? and 2) how strong are these subharmonic pulsed injections required to be to compensate the crystal loss?

To investigate these two questions, we use the model of differential pulsed driver shown in Fig. 7(a). Fig. 7(b) presents the equivalent capacitance seen by one side of the differential driver during a differential injection. All the following analyses in Section II assume ideal driver. Once crystal startup is achieved, and assuming  $V_{OSC} = 100$  mV, if we turn off the drivers and let the crystal run freely, we can calculate the amplitude attenuation. First, with the simplified model in Fig. 4, the energy stored in the crystal can be calculated as

$$E_{\text{Stored}} = 0.5 \times (C_{\text{S}} || C_{\text{L}}) \cdot V_{\text{OSC,INT}}^2$$
  
 
$$\approx 0.5 C_{\text{S}} \cdot V_{\text{OSC,INT}}^2$$
(7)

Then, the amplitude attenuation of  $V_{\text{OSC,INT}}$  in Fig. 4 can be obtained

 $\Delta V_{\rm OSC, INT}$ 

$$\approx V_{\text{OSC,INT}} - \sqrt{\frac{2 \times \left(E_{\text{Stored}} - E_{\text{Loss,T}}\right)}{C_{\text{S}}}}$$
$$= V_{\text{OSC,INT}} - \sqrt{\frac{2E_{\text{Stored}} \cdot \left(1 - \frac{2\pi}{Q}\right)}{C_{\text{S}}}}$$
$$= V_{\text{OSC,INT}} \cdot \left(1 - \sqrt{1 - \frac{2\pi}{Q}}\right) \approx V_{\text{OSC,INT}} \cdot \frac{\pi}{Q}.$$
(8)

Finally, because  $V_{\text{OSC}}$  has the same attenuation ratio as  $V_{\text{OSC,INT}}$ , the attenuation of  $V_{\text{OSC}}$  after one cycle due to the crystal loss can be estimated as

$$\Delta V_{\rm OSC} \approx V_{\rm OSC} \cdot \frac{\pi}{Q}.$$
(9)

With  $V_{OSC} = 100$  mV, after one cycle, the oscillation amplitude attenuation,  $\Delta V_{OSC}$ , is only 4.5  $\mu V$  because of the high Q of the crystal as shown in Fig. 7(c). Hence, the XO circuit does not need to inject energy every cycle; it can continue to extract the clock even if the drivers are turned off for a period of time. Now we would like to derive the relationship between the injection step,  $V_{INJ}$ , and the injected energy. In XOs with pulsed driver, the energy is injected into the capacitive network formed by  $C_{\rm O}$  and  $C_{\rm P}$  through short pulses, and then part of this energy would move to the crystal through the inductor current. This perspective provides a way to calculate the energy injected into the crystal by checking the stored energy in the capacitive network before and after the injections [1]. Yoon et al. [1] show the derivation of injected energy and injection step in the case of single-side injection. The energy removed from the capacitive network due to the pull-down pulse can be calculated as

$$E_{\text{Down,S}} = 0.5C_{\text{Network,S}} \cdot V_{\text{INJ,S}}^2$$
(10)

where  $C_{\text{Network,S}}$  is the equivalent capacitance seen by the single-side driver during a single-sided injection and  $C_{\text{Network,S}} = C_{\text{P}} + C_{\text{O}} ||C_{\text{P}}||$ . The energy added to  $C_{\text{Network,S}}$ because of the pull-up pulse can be calculated as

$$E_{\rm Up,S} = 0.5C_{\rm Network,S} \cdot V_{\rm DD}^2 - 0.5C_{\rm Network,S} \cdot (V_{\rm DD} - V_{\rm INJ,S})^2 = 0.5C_{\rm Network,S} \cdot V_{\rm INJ,S} \cdot (2V_{\rm DD} - V_{\rm INJ,S}).$$
(11)

Then, the energy added to the capacitive network after one pull-down pulse and one pull-up pulse can be calculated as

$$E_{\rm INJ,S} = E_{\rm Up,S} - E_{\rm Down,S}$$
  
=  $C_{\rm Network,S} \cdot V_{\rm INJ,S} \cdot (V_{\rm DD} - V_{\rm INJ,S})$   
 $\approx C_{\rm Network,S} \cdot V_{\rm INJ,S} \cdot V_{\rm OSC}.$  (12)

In differential driver, because when  $V_1$  or  $V_2$  is being pulled up, the other side of the crystal is being pulled down,  $C_{\text{Network}} = C_0 + C_P$  can be obtained by considering the other side of crystal as ac ground, as shown in Fig. 7(b). The energy injected after two injections (one pull-up and one pull-down) can be estimated as

$$E_{\rm INJ,Diff} = 2C_{\rm Network} \cdot (V_{\rm DDL} - V_{\rm INJ}) \cdot V_{\rm INJ}$$
  
$$\approx 2 \times (C_{\rm O} + C_{\rm P}) \cdot V_{\rm OSC} \cdot V_{\rm INJ}.$$
(13)

The energy dissipated in the crystal during one period is

$$E_{\text{Loss},\text{T}} = P_{\text{Loss},\text{PR}} \cdot T_{\text{XO}} \approx 0.5 R_{\text{S}} \cdot (V_{\text{OSC}} \cdot C_{\text{L}} \cdot \omega_{\text{S}})^2 \cdot T_{\text{XO}}.$$
 (14)

The required injection step to compensate the crystal loss in n period can be obtained using (13) and (14)

$$E_{\text{INJ,Diff}} = n \cdot E_{\text{Loss,T}}$$
(15)  

$$V_{\text{INJ}} \approx \frac{0.5n \cdot R_{\text{S}} \cdot (V_{\text{OSC}} \cdot C_{\text{L}} \cdot \omega_{\text{S}})^2 \cdot T_{\text{XO}}}{2 \times (C_{\text{O}} + C_{\text{P}}) \cdot V_{\text{OSC}}}$$
$$\approx \frac{0.5n \cdot R_{\text{S}} \cdot V_{\text{OSC}} \cdot C_{\text{L}}^2 \cdot \omega_{\text{S}} \cdot (2\pi f_{\text{XO}})}{2 \times (C_{\text{O}} + C_{\text{P}}) \cdot f_{\text{XO}}}$$
$$\approx \frac{0.5\pi \cdot n \cdot R_{\text{S}} \cdot V_{\text{OSC}} \cdot C_{\text{L}}^2 \cdot \omega_{\text{S}}}{(C_{\text{O}} + C_{\text{P}})}$$
$$= \frac{0.5\pi \cdot n \cdot V_{\text{OSC}} \cdot C_{\text{L}}^2}{(C_{\text{O}} + C_{\text{P}}) \cdot Q \cdot C_{\text{S}}}.$$
(16)

With the crystal model (Section II-B) and 100-mV oscillation amplitude, an injection step  $V_{\text{INJ}}$  of 2 mV is required to compensate the crystal loss in one period (n = 1). Since the energy injected into the crystal is proportional to  $V_{\text{INJ}}$  (13), if we want to compensate for the loss in eight cycles, we will need a  $V_{\text{INJ}}$  of about 16 mV (n = 8). This represents a reasonably small magnitude compared with the 100-mV oscillation amplitude. Hence, it is feasible to perform injections at a much lower frequency than 32 kHz. This analysis leads to our proposed concept.

Instead of continuous injection or pulsed injection, we propose HERO with two injections in eight periods. With this configuration, the switching loss on average is reduced and less noise is injected into the crystal on average (n = 8 in (6)). In addition, because the injections happen around the peak and valley of the oscillation, the effect of injection and noise on the phase of crystal oscillation is minimized. Here, we are discussing about how the pulsed injection with energy and noise disturbs the crystal oscillation, and we have not considered the implementation of the circuit to convert this crystal oscillation waveform into a rail-to-rail clock. We will continue this discussion in Section IV-C.

A tradeoff here is that it requires a bigger injection step to inject more energy, which reduces the injection efficiency.



Fig. 8. Injection efficiency for two injections across a varying number of periods and the corresponding power from  $V_{DDL}$  ( $V_{OSC} = 100 \text{ mV}$ ).

The injection efficiency can be estimated as [1]

$$\zeta = \frac{E_{\rm INJ}}{E_{V_{\rm DDL}}} \approx \frac{V_{\rm DDL} - V_{\rm INJ}}{V_{\rm DDL}} = \frac{V_{\rm OSC}}{V_{\rm DDL}}$$
(17)

where  $V_{\text{DDL}} = V_{\text{OSC}} + V_{\text{INJ}}$ . Referring to (16), targeting at the same  $V_{\text{OSC}}$ ,  $V_{\text{INJ}}$  increases linearly as *n* increases. With (3) and (17), the power from  $V_{\text{DDL}}$  can be calculated as

$$P_{V_{\text{DDL}}} = \frac{P_{\text{INJ}}}{\zeta} = \frac{P_{\text{Loss,PR}}}{\zeta} = \frac{0.5R_{\text{S}} \cdot (V_{\text{OSC}} \cdot C_{\text{L}} \cdot \omega_{\text{S}})^2}{\zeta}.$$
 (18)

As shown in Fig. 8 assuming  $V_{OSC} = 100$  mV, the injection efficiency in theory drops from 97.7% to 84.2% when we do two injections in eight periods instead of two injections in each oscillation period. Meantime, the estimated power from  $V_{DDL}$ to compensate for the crystal loss increases from 37 to 43 pW. Hence, because the crystal loss is only tens of pW, although the injection efficiency drops after we do two injections in eight periods, the power overhead due to a lower injection efficiency is small.

To analyze the noise performance of the proposed design, we introduce ENR and it can be calculated using (5) and (6) as

$$\text{ENR} = \frac{\bar{E}_{\text{INJ}}}{\bar{N}_{\text{INJ}}} = \frac{\bar{E}_{\text{INJ}}}{D \cdot \bar{N}_{\text{C}}} = \frac{n}{2} \times \frac{T_{\text{XO}}}{T_{\text{pulse}}} \cdot \frac{\bar{E}_{\text{INJ}}}{\bar{N}_{\text{C}}}.$$
 (19)

Both ENR and phase noise present the ratio between injected noise and power of the oscillation signal. Phase noise is a function of injected noise and power of the carrier [13]. As to the crystal oscillation, the injected energy is proportional to the power of the carrier. When ENR increases, the ratio between the injected noise and power of the carrier decreases, which leads to better frequency stability or lower phase noise. To achieve the same oscillation amplitude  $V_{OSC}$ , the average injected energy in the conventional pulsed XO is the same as it is in the proposed HERO. With the same pulsewidth, the ENR in the proposed HERO (n = 8) will be eight times of the ENR in the conventional pulsed XO, as shown in Fig. 6. We assume a constant  $T_{\text{pulse}}$  for the varying *n* because there would be a fundamental limit on minimum achievable pulsewidth for a given CMOS process.  $N_{\rm C}$  reflects the low-frequency noise from the circuit, including noise in circuit due to power supplies and environment which should be independent of n,

so we assume a constant  $N_{\rm C}$  for the varying *n* in the analyses of ENR.

In the above analyses, we assume an oscillation amplitude to decide the crystal loss or injected energy on average. In real implementation, in the case that we fix power supply voltage for the driver [ $V_{DDL}$  in Fig. 7(a)],  $V_{OSC}$  would decrease while we reduce the injection rate because  $V_{OSC} = V_{DDL} - V_{INJ}$ and larger  $V_{INJ}$  is required to compensate the crystal loss. From (16), we can obtain

$$V_{\rm DDL} = V_{\rm OSC} + V_{\rm INJ} = V_{\rm OSC} + \frac{0.5\pi \cdot n \cdot V_{\rm OSC} \cdot C_{\rm L}^2}{(C_{\rm O} + C_{\rm P}) \cdot Q \cdot C_{\rm S}}.$$
 (20)

Then,  $V_{\text{OSC}}$  can be calculated as a function of n and  $V_{\text{DDL}}$ 

$$V_{\text{OSC}} = \frac{1}{1 + \frac{0.5\pi \cdot C_{\text{L}}^2}{(C_{\text{O}} + C_{\text{P}}) \cdot Q \cdot C_{\text{S}}} \cdot n} \cdot V_{\text{DDL}}.$$
 (21)

With the crystal model in Section II-B, we can estimate  $V_{\text{OSC}}$  as

$$V_{\rm OSC} \approx \frac{1}{1 + 0.0235 \times n} \cdot V_{\rm DDL}.$$
 (22)

Using (14), (19), and (21), we can obtain ENR as a function of n

$$\operatorname{ENR} = \frac{n}{2} \times \frac{T_{\mathrm{XO}}}{T_{\mathrm{pulse}}} \cdot \frac{E_{\mathrm{INJ}}}{\bar{N}_{\mathrm{C}}} = \frac{n}{2} \times \frac{T_{\mathrm{XO}}}{T_{\mathrm{pulse}}} \cdot \frac{E_{\mathrm{Loss},\mathrm{T}}}{\bar{N}_{\mathrm{C}}}$$
$$\approx \frac{n}{2} \times \frac{T_{\mathrm{XO}}}{T_{\mathrm{pulse}}} \cdot \frac{0.5R_{\mathrm{S}} \cdot (V_{\mathrm{OSC}} \cdot C_{\mathrm{L}} \cdot \omega_{\mathrm{S}})^{2} \cdot T_{\mathrm{XO}}}{\bar{N}_{\mathrm{C}}}$$
$$\approx n \cdot \frac{T_{\mathrm{XO}}}{T_{\mathrm{pulse}}} \cdot \frac{\pi^{2}R_{\mathrm{S}} \cdot C_{\mathrm{L}}^{2}}{T_{\mathrm{XO}} \cdot \bar{N}_{\mathrm{C}}} \cdot V_{\mathrm{OSC}}^{2}$$
$$= \frac{\pi^{2}R_{\mathrm{S}} \cdot C_{\mathrm{L}}^{2} \cdot V_{\mathrm{DDL}}^{2}}{T_{\mathrm{pulse}} \cdot \bar{N}_{\mathrm{C}}} \cdot \frac{n}{\left(1 + \frac{0.5\pi \cdot C_{\mathrm{L}}^{2}}{(C_{\mathrm{O}} + C_{\mathrm{P}}) \cdot Q \cdot C_{\mathrm{S}}} \cdot n\right)^{2}}.$$
 (23)

Referring to (21)–(23), with a fixed  $V_{\text{DDL}}$ , the oscillation amplitude decreases as n increases because it requires larger injection step,  $V_{INJ} = V_{DDL} - V_{OSC}$  as *n* increases. The injected energy at steady-state is equal to the crystal loss which is a quadratic function of  $V_{OSC}$ . This means the injected energy gets lower as n increases when  $V_{DDL}$  is fixed, while the injected energy does not change with n if we fix  $V_{OSC}$  by changing  $V_{\text{DDL}} = V_{\text{OSC}} + V_{\text{INJ}}$  as *n* increases. With the same  $V_{\text{OSC}}$  at n = 1 and the same amount of noise at certain n, the ENR with fixed  $V_{DDL}$  would be lower than the ENR with fixed  $V_{\text{OSC}}$  at n > 1. With (23), we can plot ENR (solid line) as a function of n in Fig. 9, normalized with ENR at n = 1in (23). The normalized ENR based on (19) with a fixed  $V_{OSC}$ is also plotted as dashed line in Fig. 9. We can tell how fixed  $V_{\text{DDL}}$  limited the ENR improvement as *n* increases. Referring to (22), the ratio between  $V_{OSC}$  and  $V_{DDL}$  is also shown in Fig. 9. Compared with Fig. 8 that assumes  $V_{\text{OSC}} = 100 \text{ mV}$ , Fig. 9 presents how  $V_{OSC}$  shrinks as the injection rate decreases when  $V_{\text{DDL}}$  is fixed.

Figs. 8 and 9 present the tradeoff between the injection efficiency  $(V_{OSC}/V_{DDL})$  and ENR as we change the injection rate (or *n*). With a fixed  $V_{DDL}$  and the assumptions of crystal model, n = 8, 16, or 32 provides a balance between the injection efficiency  $(V_{OSC}/V_{DDL})$  and ENR. Because we have



Fig. 9. With fixed  $V_{\text{DDL}}$ , normalized ENR and  $V_{\text{OSC}}/V_{\text{DDL}}$  versus injection rate. Dashed line is the normalized ENR with fixed  $V_{\text{OSC}}$ .

not discussed about the switching power to activate the driver which changes with n and could be a big portion of the total power in real implementation, the choice of n = 8 or injection at the eighth subharmonic frequency in the proposed design as shown in Fig. 6 will be discussed in Section III-E after we introduce the implementation of the proposed concept.

In summary, pulsed injections at peak and valley of the oscillation waveform can provide high injection efficiency compared with the continuous or Class A operation in conventional XOs. The proposed HERO with pulsed injections at the subharmonic frequency of the oscillation can reduce the switching loss to activate the driver and achieve better noise performances, thanks to high ENR.

We discuss the implementation of the proposed concept in Section III.

#### **III. ARCHITECTURE OF THE DESIGN**

Fig. 10 presents the signal waveforms and architecture of the proposed HERO. To extract the frequency and phase from the crystal waveform, we propose a T/4-delay slicer. It not only converts the sinusoidal crystal waveform,  $V_1$ , into an output clock of 32 kHz but also introduces a delay of T/4 to provide proper timing for energy injection. The clock edges align with the peak and valley of the crystal waveform to generate timing for the injections. The output clock goes to a frequency divider to generate two 4-kHz clocks. Clk\_4k with 50% duty goes to a current reference to control the delay in the slicer. Clk\_duty provides timing control of the injections. As shown in Fig. 10(a), in eight periods, there are two edges of Clk\_duty, which align with the peak and valley of the crystal waveform. Clk\_duty supports the pulse generation (PG) with bootstrapping, outputting pulses to activate the differential driver. We do two injections, one at the valley, followed by one at the peak, to avoid a dc shift of the waveform. There are two power supplies in this design,  $V_{DDL}$  for the driver and startup circuit and  $V_{DD}$  for the other blocks.  $V_{DDL}$  also acts as a knob to control the oscillation amplitude. In this design, there are only parasitic capacitances at the two nodes of the crystal to reduce the load capacitance.

#### A. T/4-Delay Slicer

Fig. 11 shows the proposed T/4-delay clock slicer, which is the first block in the signal path. In [1], the slicer is followed



Fig. 10. (a) Signal waveforms and (b) architecture of the proposed high HERO.

by a DLL to generate T/4-delay, which requires a delay in the slicer far less than T/4. This requires high power across PVT variations. In the low-power slicer proposed in [7], the input PMOS and NMOS transistors are biased on the edge of conduction. When the crystal waveform goes up, it turns on the NMOS while turning off the PMOS. This Class B operation reduces short current. Although this structure can achieve low power consumption, the delay in the slicer varies considerably across PVT variations.

The proposed T/4-delay slicer not only generates a clock from the sinusoidal crystal waveform but also generates T/4-delay in an open-loop way for injection timing. This choice simplifies the architecture compared with timing generations in previous works with a DLL [1] or PLL [5]. Fig. 11 also shows the simulated waveforms at  $V_1$ ,  $V_{GP}$ ,  $V_{GN}$ ,  $V_{ramp}$ , and clk.  $M_P$  and  $M_N$  are biased with mirrored current from  $I_{REF}$  using a diode-connected PMOS stack or NMOS stack [14], and their gate voltages  $V_{GP}$  and  $V_{GN}$  are ac-coupled to the crystal waveform,  $V_1$ . As  $V_1$  rises from its valley to its peak,  $V_{GP}$  and  $V_{GN}$  rise from their valleys to their peaks. When  $V_1$  reaches a value close to its dc value,  $V_{ramp}$  starts decreasing.  $I_{REF}$  and the mirror ratio are designed so that  $V_{ramp}$  would be discharged to about  $V_{DD}/2$  when  $V_1$  is close to its peak. This would trigger the following inverter and generate output



Fig. 11. Simplified schematic of the proposed T/4-delay slicer and simulated waveforms.

clock, clk. The operation when  $V_1$  decreases from its peak to its valley is similar. The delay is decided by the reference current  $I_{\text{REF}}$ , capacitance at  $V_{\text{ramp}}$ , and  $V_{\text{DD}}$ , and it can be estimated in first order as

$$T_{\text{Delay,Slicer}} \approx \frac{0.5 V_{\text{DD}} \cdot C_{\text{PAR.}}}{\alpha I_{\text{REF}}}$$
 (24)

where  $\alpha$  is the ratio between the equivalent charging/discharging current at  $V_{\text{ramp}}$  during rising/falling edges of  $V_1$  and  $I_{\text{REF}}$ . This equivalent charging/discharging current at  $V_{\text{ramp}}$  can be estimated with the dc current from  $M_{\text{P}}/M_{\text{N}}$ when  $V_{\text{GP}}/V_{\text{GN}}$  is equal to its dc value and  $V_{\text{ramp}}$  is set to  $0.5V_{\text{DD}}$ , which is set by the value of  $I_{\text{REF}}$  and the current mirror ratio. In the proposed design,  $\alpha I_{\text{REF}}$  is designed as about  $15I_{\text{REF}} \approx 165$  pA, and  $C_{\text{PAR}}$  is estimated as about 5 fF after post-pex simulation, which leads to a delay in slicer of about 7  $\mu$ s at  $V_{\text{DD}} = 0.45$  V.  $I_{\text{REF}}$  is from the current reference block, IREF, and it is designed to be proportional to  $V_{\text{DD}}$ .

Fig. 12 shows the simulation results of the crystal waveform,  $V_1$ , and output clock in five corners and at temperatures ranging from -40 °C to 100 °C with ideal bias current. The edges of the clock align with the peak or valley of the crystal waveform. Because the transistors  $M_P$  and  $M_N$  act as current sources mirrored from  $I_{REF}$ , the impedance looking from  $V_{ramp}$ 



Fig. 12. Simulated waveforms of the slicer outputs at 15 conditions (TT/FF/SS/FS/SF at -40/25/100 °C) and 100 Monte Carlo runs with mismatch.



Fig. 13. Design challenges in the proposed IREF structure with switched-capacitor resistance using single NMOS as switch.

into the pulling-up branch and the pulling-down branch should be high to reduce the variation in the current to voltage changes at  $V_{\rm ramp}$ . Also, the capacitive coupling between  $V_{\rm ramp}$ and  $V_{\rm GP}/V_{\rm GN}$  would contribute to the delay variation due to capacitance and voltage variations at  $V_{\rm ramp}$ . With the cascode transistors  $M_{\rm CP}$  and  $M_{\rm CN}$ , the variation in the delay is reduced due to higher impedance at  $V_{\rm ramp}$  and less Miller effect or coupling between  $V_{\rm GP}/V_{\rm GN}$  and  $V_{\rm ramp}$ .

To evaluate how local variation affects the delay in slicer and the duty cycle of the output clock, Monte Carlo simulation of the slicer with mismatch is done and the output clock waveforms of 100 run are shown in Fig. 12. The distribution of duty cycle in 100 run shows mean = 50.2% and sigma = 3%.

#### **B.** Reference Current Generation

The slicer delay is controlled by the reference current,  $I_{\text{REF}}$ . Fig. 13 presents the proposed structure for generating  $I_{\text{REF}}$ .



Fig. 14. Simplified schematic of the proposed IREF with switched-capacitor resistance using ultralow leakage composite switches.

We use a diode stack to generate a reference voltage of  $V_{DD}/5$ , and a negative feedback loop regulates the voltage  $V_{Reg}$  to be equal to  $V_{DD}/5$ . A switched-capacitor resistance is realized with a 60-fF MOM cap and 4-kHz nonoverlapping clocks. With a regulated voltage  $V_{Reg}$  and this resistance, a regulated current  $I_{Reg}$  can be obtained

$$H_{\text{Reg}} = 2I_{\text{REF}} = \frac{V_{\text{Reg}}}{R_{\text{SWCAP}}}$$
$$\approx \frac{0.2V_{\text{DD}}}{\frac{1}{C_{\text{SW}}f_{\text{SW}}}} = 0.2V_{\text{DD}}C_{\text{SW}}f_{\text{SW}}$$
$$\approx 0.2 \times 0.45 \text{ V} \times 60 \text{ fF} \times 4 \text{ kHz} \approx 22 \text{ pA} \quad (25)$$

which is proportional to  $V_{DD}$ .  $I_{REF}$  is mirrored from  $I_{Reg}$ .

Clk\_4k is divided down from the 32-kHz output clock, so the concept here is that the current reference is derived from the XO frequency itself, which is very stable to control the delay in the slicer.

At 0.45 V  $V_{DD}$ , ideally,  $I_{Reg}$  is about 22 pA with a 4-kHz switching frequency and 60-fF capacitance. However, there are two main challenges to achieving this  $I_{Reg}$  value. As shown in Fig. 13, with PVT variation, especially FF corner and high temperature, the leakage of the "off" switches can be more than 5 pA even with long-channel transistors. Furthermore, because the voltage at  $V_{Cap}$  is only 90 mV at 0.45 V  $V_{DD}$ , it is sensitive to clock feedthrough. Together, these factors result in more than 30% variation in the reference current across PVT variation.

To deal with these challenges, we use composite switches to reduce the leakage of the "off" transistors and dummy switches and transmission gates to compensate for the clock feedthrough. Fig. 14 presents a simplified schematic of the proposed IREF. The ultralow leakage composite switch was proposed in [15], and it uses two transistors instead of one as a switch. The internal node of these two transistors is regulated



Fig. 15. Simplified schematic of the self-biased amplifier A1 in the IREF.

by an extra buffer to reduce  $V_{\rm DS}$  of the "off" transistor to reduce leakage. The simulated leakage of this composite switch is less than 50 fA at 100 °C and FF corner. During  $\Phi_1$ , the switching capacitor  $C_{SW}$  is shorted to ground, and we would like to reduce leakage  $(I_{\text{Leak},\Phi 1})$  from node  $V_{\text{Reg}}$ . So, the internal node,  $V_{SW2}$ , is driven to be close to  $V_{Reg}$ by the buffer. Because  $V_{\rm DS}$  of the top transistor is reduced to close to zero, the leakage is greatly reduced compared with the leakage observed with  $V_{\rm DS} = V_{\rm Reg}$ . Similarly, during  $\Phi_2$ , the leakage ( $I_{\text{Leak},\Phi_2}$ ) from  $V_{\text{Cap}}$  is reduced by driving  $V_{\rm SW1}$  to be close to  $V_{\rm Reg}$ . The simulated  $I_{\rm REF}$  variation from its nominal values at different  $V_{\rm DD}$  is -2% to 10% across 0.45-0.9 V V<sub>DD</sub> and 15 conditions (TT/FF/SS/FS/SF at -40/25/100 °C). This reference current generation makes the slicer delay less sensitivity to temperature and  $V_{DD}$ , but it would not compensate the variation in the slicer delay due to process variation in slicer, and this -2% to 10% variation in IREF would directly contribute to the variation in the slicer delay.

A self-biasing scheme [16] was used to generate bias current for the two amplifiers, A1 and A2, in the IREF. Fig. 15 shows a simplified schematic of the self-biased amplifier A1. The tail current of the differential input stage is mirrored from  $I_{\text{Reg}}$ . The PMOS providing  $I_{\text{Reg}}$  would have different  $|V_{\text{GS}}|$  across PVT, and it can vary hundreds of millivolts. The A1 amplifier must provide a dc gain of >40 dB to suppress the difference between  $V_{\text{REF}}$  and  $V_{\text{Reg}}$  to several millivolts. In the proposed IREF design, a capacitor of 2 pF is used to reduce ripple due to the switched-capacitor resistance, so the bandwidth requirement of the A1 amplifier is relaxed. The pole at the A1 amplifier's output is the dominant pole with the help of  $C_{\text{C}}$ . Amplifier A2 in the extra buffer uses the same structure but consumes about one-fourth of the power consumption of A1.

#### C. PG With Bootstrapping

As shown in Fig. 16(a), the 32-kHz output clock from the slicer goes to the frequency dividers to generate clk\_duty. The PG circuit with bootstrapping creates pulses to activate the differential driver. The pulsewidth is designed as 1  $\mu$ s at 25 °C, TT. The current starving structure with bias current mirrored from  $I_{\text{REF}}$  is used to mitigate part of the delay variation due to PVT. The simulated pulsewidth varies from



Fig. 16. (a) Frequency divider and PG with bootstrapping in the proposed XO and (b) schematic of the bootstrapping circuit in PG.



Fig. 17. Measured power consumption with 32- and 4-kHz injections.

0.5 to 2.5  $\mu$ s across PVT. We chose this pulsewidth to make it less than 1/10 of the oscillation period (about 30.5  $\mu$ s), while we make sure that the driver has sufficient time to pull  $V_1$  or  $V_2$  in Fig. 16(a) to  $V_{DDL}$  or  $V_{SS}$ . Fig. 16(b) presents the bootstrapping circuits, which use a conventional structure. Even with careful layout and dummy fill control, the parasitic capacitance remains between 20 and 30 fF in the pulse generator and the bootstrapped 0.9-V domain, which would result in ~600-pW power consumption at 32 kHz. With our proposed high ENR injection at 4 kHz, the switching loss is greatly reduced at the cost of frequency dividers as shown in Fig. 17.

#### D. All-NMOS Differential Driver

A differential driver was introduced in [7], and it uses PMOS as a pull-up device and NMOS as a pull-down device. The differential charging scheme achieves dc balancing between two nodes of the crystal, which is typically created by a large feedback resistor [7]. Hence, the differential driver eliminates the requirement of a feedback resistor and the loss it induces. For example, in a conventional Pierce XO and pulsed XO with a single-side driver, assuming an oscillation amplitude of 100 mV across the crystal and a feedback resistor of 100 M $\Omega$ , the loss on the feedback resistor is about 70 pW. The diode-based pseudo resistor can achieve much higher resistance, but it is challenging to keep its resistance across PVT. In the proposed design, differential driver is chosen for reliability and simplicity. Because the differential driver activates injections at both sides of the crystal simultaneously, the switching loss at the control signals is twice that of a single-ended driver. As for injection step,  $V_{INJ}$ , the differential injection requires smaller  $V_{INI}$  compared with single-ended injection to compensate a fixed amount of crystal loss, referring to (12) and (13). For future work, a single-ended driver will be investigated to further optimize power consumption.

For this XO, we propose an all-NMOS differential driver as shown in Fig. 16(a). First, compared with four bootstrapping circuits for a differential driver with both PMOS and NMOS, the all-NMOS differential driver requires only two bootstrapping circuits, saving power. Second, using NMOS as the pull-up device, when it is turned off with  $V_G = 0$  V, it is in super-cutoff, reducing leakage from  $V_{DDL}$  during the freerunning phase.

#### E. Design Considerations of Injection Frequency

There are several design considerations in choosing the injection frequency. To reduce the switching loss and make full use of the high ENR injection, operating at a lower frequency is desired in theory. However, a lower injection frequency means that more energy and a bigger injection step [ $V_{INJ}$  in Figs. 7(c) and 10(a)] are required for each injection. As discussed in Section II-C, in theory, a bigger injection step reduces the injection efficiency, which means more energy is required to compensate for the crystal loss. In a real implementation, the driver has a finite conductance and there are parasitic capacitances at the nodes of the crystal. With the same pulsewidth or duration of each pulsed injection, the size of the driver must increase when we need higher energy for each injection. This results in more parasitic capacitances at the bootstrapped power domain and higher switching power. Furthermore, more 2-to-1 frequency divide stages are needed if we want to generate clocks at a lower frequency. The simulated power overhead of one extra 2-to-1 frequency divider is  $\sim 20$  pW.

As shown in Fig. 17, with the proposed injections at 4 kHz, the power consumption of the PG and the bootstrapping is greatly reduced compared with its value at 32-kHz injections. Operating at a lower frequency is feasible, but the overhead due to the extra frequency dividers and bigger drivers would be larger than the reduction in the switching loss. In this

design, we consider power as the first priority and choose two injections in eight periods to optimize power consumption.

#### F. On-Chip Pierce Oscillator for Startup

An on-chip inverter [in gray in Fig. 10(b)] is designed as an inverting amplifier to start the crystal oscillation. A diode-based pseudo resistor is used as dc-biasing resistor across the inverter, and once the startup completes, this pseudo resistor would also be disconnected from the crystal. We use conventional transmission gates as switches in Fig. 10(b) to connect or disconnect the startup circuit including the dcbiasing resistor. Each "off" switch can provide move than 389 M $\Omega$  at FF 125 °C condition and 57 G $\Omega$  at TT 27 °C condition, which we consider sufficient to suppress the effect of the dc-biasing resistor. During the startup mode, PG is turned off. As the amplitude of crystal oscillation becomes sufficient for the clock slicer to generate a clock, IREF begins working. Once the reference current settles to its steady-state value, PG can be turned on, while all startup circuits including the on-chip inverter are turned off and disconnected from the crystal. The proposed oscillator then enters the injection mode, and energy is injected into the crystal through the differential driver. The maximum possible time between the end of the startup mode and the first injection is about seven periods of oscillation. Since 32-kHz XOs are always on after startup, the total startup time and energy are negligible.

## IV. MEASUREMENT RESULTS AND COMPARISON

The HERO design was fabricated in 40-nm CMOS. Fig. 18 presents the measured waveforms including the transition from the startup mode to the injection mode and the zoom-in of the injection mode. The measurements were performed at 25 °C, 0.45 V  $V_{DD}$ , and 0.15 V  $V_{DDL}$ . The crystal waveform at node  $V_2$  is monitored through an on-chip source follower. Fig. 18 shows the two injections in eight periods of the oscillation. During the seven periods between injections, the crystal freely runs with unobservable attenuation of the oscillation amplitude, and the common-mode voltage of the oscillation is stable. The oscillation amplitude is about 135 mV and the injection step is about 15 mV to compensate for the loss in eight cycles.

A chip-on-board (COB) package is used to reduce the parasitic capacitance at the crystal nodes as shown in Fig. 19(a), and there is no external capacitor for the crystal. Fig. 19(b) shows the die micrograph. The measured output frequency is 32.788 kHz (ECS-2X6-FLX crystal), and the total  $C_{\rm L}$  is estimated as <1.9 pF. With an oscillation amplitude of 135 mV and  $C_{\rm L}$  < 1.9 pF, the power from  $V_{\rm DDL}$  is only 55 pW (average of ten chips) to compensate for the crystal loss, which is 11% of the total power consumption as shown in Fig. 20. To generate the output clock and a delay of T/4, it consumes 233 pW. The PG with bootstrapping consumes 75 pW. The average power consumed by the frequency divider in the ten chips is 141 pW at 0.45 V  $V_{DD}$ , which is 28% of the total 0.51 nW. In most highly duty-cycled systems, the frequency dividers are already included with the XO (e.g., to enable calendar functions), which alleviates this power overhead. Ten chips were tested with ten ECS-2X6-FLX crystals from



Fig. 18. Measured crystal waveforms at 0.45 V  $V_{DD}$  and 0.15 V  $V_{DDL}$  showing (top) transition from startup mode to injection mode and (bottom) two injections in eight periods.



Fig. 19. (a) COB package of the proposed XO with transparent epoxy and (b) die micrograph.

-25 °C to 85 °C. Fig. 21 shows the total power and frequency deviation across temperatures from -25 °C to 85 °C at  $V_{DD} = 0.45$  V and  $V_{DDL} = 0.15$  V. The frequency deviation versus temperature is mainly decided by the crystal itself, which is a



Fig. 20. Measured power consumption of each block averaged across ten chips at 0.45 V  $V_{DD}$  and 0.15 V  $V_{DDL}$ .



Fig. 21. Measured frequency deviation and power consumption versus temperature (-25 °C to 85 °C) of ten samples with ECS-2X6-FLX crystals at  $V_{DD} = 0.45$  V and  $V_{DDL} = 0.15$  V.

second-order function of temperature and centered at around 25 °C. Because the analog blocks in the proposed XO are biased with the reference current, which is not sensitive to temperature, the power consumption of this design does not increase rapidly as the temperature goes up until the power of the digital circuits dominates. Fig. 22 shows the total power and frequency deviation with  $V_{\text{DD}}$  swept from 0.4 to 0.9 V ( $V_{\text{DDL}} = 0.15$  V). The line sensitivity, averaged over ten samples, is 18 ppm/V. Because the reference current in this design is proportional to  $V_{\text{DD}}$ , the power increases linearly as  $V_{\text{DD}}$  rises until the power of the digital circuits begins to dominate. Frequency sensitivity to  $V_{\text{DDL}}$  is measured and shown in Fig. 23.

# A. Measurements With High-Temperature Crystals

As the ECS-2X6-FLX crystal has an operational temperature range of -40 °C to 85 °C, ten chips with an



Fig. 22. Measured frequency deviation and power consumption versus  $V_{\text{DD}}$  (0.4–0.9 V) of ten samples with ECS-2X6-FLX crystals at 25 °C and  $V_{\text{DDL}} = 0.15$  V.



Fig. 23. Measured frequency deviation versus  $V_{\text{DDL}}$  (0.1–0.4 V) of ten samples with ECS-2X6-FLX crystals at 25 °C and  $V_{\text{DD}}$  = 0.45 V.

ECX-34Q-S crystal (-40 °C to 125 °C capable) are tested to show stable operation up to 125 °C. Figs. 24 and 25 present the measured power and frequency deviation across a temperature range with  $V_{DD}$  swept from 0.4 to 0.9 V ( $V_{DDL} = 0.15$  V). Since the XO is measured at temperatures up to 125 °C, the inherent crystal frequency deviation with temperature is much higher compared with measurements up to 85 °C. Because an ECX-34Q-S crystal has a lower *Q* than that of an ECS crystal, the line sensitivity is 29 ppm/V, which is higher than 18 ppm/V obtained with the ECS crystal. Operation at temperatures lower than -25 °C can be achieved with a  $V_{DD}$ higher than 0.45 V. Frequency sensitivity to  $V_{DDL}$  is measured and shown in Fig. 26.



Fig. 24. Measured frequency deviation and power consumption versus temperature (-25 °C to 125 °C) of ten samples with ECX-34Q-S crystals at  $V_{DD} = 0.45$  V and  $V_{DDL} = 0.15$  V.



Fig. 25. Measured frequency deviation and power consumption versus  $V_{\text{DD}}$  (0.4–0.9 V) of ten samples with ECX-34Q-S crystals at 25 °C and  $V_{\text{DDL}} = 0.15$  V.

#### **B.** Allan Deviation Measurements

To evaluate long-term frequency stability, the Allan deviation is measured with three baselines in a temperature chamber



Fig. 26. Measured frequency deviation versus  $V_{\text{DDL}}$  (0.1–0.4 V) of ten samples with ECX-34Q-S crystals at 25 °C and  $V_{\text{DD}}$  = 0.45 V.



Fig. 27. Measured Allan deviation of HERO and three baseline approaches.

at 25 °C, 0.45 V  $V_{DD}$ , and 0.15 V  $V_{DDL}$ . The first baseline is a conventional, high-power Pierce XO on a PCB with a discrete inverter as the inverting amplifier. The power supply is 1.1 V, and it consumes 1.9  $\mu$ W. The Allan deviation is shown by the green line in Fig. 27. Compared with the result of the nW XO in [2] (ISSCC2019), it shows a difference in frequency stability because of better noise performance at both low and high frequencies. The second baseline is the on-chip Pierce oscillator for startup. It consumes about 2 nW and uses the same T/4-delay slicer to generate clock output from the sinusoidal crystal waveform. Its Allan deviation is shown by the yellow line in Fig. 27. The third baseline is the proposed architecture but with 32-kHz injections; the red line shows the measured Allan deviation.

The Allan deviation of the proposed HERO is shown by the blue line in Fig. 27. The floor of the Allan deviation is 2 ppb. Compared with the baselines in Fig. 27, the HERO demonstrates improved frequency stability due to the pulse injection and the high ENR injection. With a short averaging window (left part of Fig. 27), the Allan deviation evaluates the performance of high-frequency noise. The two on-chip baselines and the proposed XO use the same T/4-delay slicer to convert a sinusoidal crystal waveform into output clock. Because this low-power slicer determines the high-frequency noise performance, the yellow, red, and blue lines are close together on the far left side of Fig. 27. With a relatively longer averaging window (middle part of Fig. 27), the advantage of the pulsed injection and HERO with respect to noise becomes evident. When the averaging window is more than hundreds of seconds, the noise in the testing environment, including the nonideal temperature stability of the temperature chamber, dominates and determines the value of the Allan deviation.

As shown in Fig. 1, the 32-kHz XO is used as wake-up timer in a low-power edge device to activate the power-hungry communication block (RX/TX). Depending on the long-term stability of the XO, the duty-cycled periods,  $T_{\text{Duty}}$ , would show variations, and the RF blocks in the system-on-chip (SoC) must wake up early (guard band) to accommodate this inaccuracy [1]. The duration of this guard band is inversely proportional to the Allan deviation at an averaging window of  $T_{\text{Duty}}$ . For example, referring to Fig. 27, at  $\tau = 100$  s, the Allan deviation of the proposed XO is 1/5 of the Allan deviation of the baseline with 32-kHz pulsed injections. For a duty-cycled edge device with  $T_{\text{Duty}} = 100$  s, the required guard band with the proposed XO as wake-up timer can be 1/5 of the guard band with the baseline with 32-kHz pulsed injections. This means the power overhead due to the guard band is reduced to 1/5, thanks to the improvement on the Allan deviation with the proposed design.

Referring to the calculations and operation scenarios of wireless sensor nodes in [1], the Allan deviation of approximately 10 ppb over a 1000-s time window is already enough to make the power overhead due to the guard band negligible. In these scenarios, the proposed structure to reduce the Allan deviation floor to 2 ppb would not make a difference from the perspective of power reduction, but it provides a technique to improve the long-term frequency stability for the cases where the Allan deviation of the baseline design is not good enough due to various issues including lower oscillation amplitude, worse resonator, or noisier environment.

#### C. Spur and Noise Performance

Injecting energy at the eighth-order subharmonic frequency of the oscillation would introduce spurs at frequency

$$f_{\rm Spur} = m \cdot \frac{f_{\rm XO}}{8} \tag{26}$$

where m = 1, 2, ..., 7, 9, ... Fig. 28 shows the measured power spectrum density (PSD) with HP 35670A signal analyzer, which presents the spurs.

After each pull-down injection at  $V_1$  [Fig. 10(a)] or each pull-up injection at  $V_2$  (Fig. 18), the dc value or the zero-crossing of the oscillation waveform would be different from its value in the following seven periods by  $V_{INJ}$ . Though referring to the theory of phase noise in [13], this voltage step would cause minimum disturbance on the phase of the crystal oscillation, it would cause phase error and short-term frequency change at the output clock due to nonideal clock slicer. On average, this dc shift would not cause big frequency deviation, and we confirmed this by checking the output frequency of the proposed design (32.788556 kHz), the output frequency of the on-chip Pierce XO (32.788213 kHz), and the frequency of oscillation waveform  $V_2$  (32.788546 kHz).



Fig. 28. Measured PSD of the output clock after the clock buffer on PCB to show the spurs.



Fig. 29. Measured PSD versus offset frequency from the output clock frequency and comparison with baselines.

However, it would result in worse jitter performance. The measured rms period jitter of the proposed design is 230.8  $n_{RMS}$  (10000 samples with Keysight 53230A frequency counter that has single-shot time resolution of 20 ps and input frequency range of 1 mHz to 350 MHz), while the measured rms period jitter of the three baseline (on-chip Pierce, XO with 32-kHz injection, and PCB Pierce) is 53.9  $n_{RMS}$ , 52.8  $n_{RMS}$ , and 0.5  $n_{RMS}$  respectively.

Considering the performances of spurs and jitter, for applications requiring low spur and low jitter, like communication circuits, the proposed design is not suitable as frequency reference.

Fig. 29 presents the measured PSD versus offset frequency from the oscillation frequency. We can tell that at offset frequency <1 Hz, the proposed design has lower PSD than the on-chip Pierce and the design with 32-kHz pulsed injection, and it is close to the performance of the PCB Pierce at offset frequency = 0.1 Hz. These matches the Allan deviation plots in Fig. 27.

We would like to point out that though flicker noise from the transistors can be reduced using bigger sizes of devices, the lower noise in the proposed design from 0.1 to 1 Hz in Fig. 29 compared with the on-chip Pierce XO does not come from device sizing. The size of the on-chip inverter for startup is  $W/L = 56 \ \mu m/40$  nm for PMOS and  $W/L = 24 \ \mu m/40$  nm

| TABLE I                                                         |
|-----------------------------------------------------------------|
| PERFORMANCE SUMMARY AND COMPARISON WITH STATE-OF-THE-ART NW XOS |

|                                                                         | This v                           | vork                                | JSSC'20 [3]         | TCASI'20 [4]         | JSSC'19 [9]         | VLSI'17 [5]     | JSSC'16 [6]   | JSSC'16 [1]             | ISSCC'14 [7]     |
|-------------------------------------------------------------------------|----------------------------------|-------------------------------------|---------------------|----------------------|---------------------|-----------------|---------------|-------------------------|------------------|
| Technology                                                              | 40nm                             |                                     | 55nm                | 130nm                | 65nm                | 55nm            | 130nm         | 180nm                   | 28nm             |
| Area [mm <sup>2</sup> ]                                                 | 0.02                             |                                     | 0.019               | 0.0033               | 0.027               | 0.16            | 0.062         | 0.3                     | 0.03             |
| VDDs [V]                                                                | 0.45 & 0.15                      |                                     | 0.3                 | 0.06                 | 0.5                 | 0.4 & 0.1       | 0.3           | 0.94                    | 0.15             |
| Resonance mode                                                          | Parallel                         |                                     | Parallel            | Parallel             | Series              | Parallel        | Parallel      | Parallel                | Parallel         |
| Crystal<br>Op. temperature                                              | ECS-2X6-FLX<br>-40~85°C          | ECX-34Q-S<br>-40~125°C              | ABS07W<br>-40~125°C | AB\$07W<br>-40~125°C | ECX-34Q<br>-40~85°C | -               | -             | ECS-2X6-FLX<br>-40~85°C | -                |
| Power@25°C [nW]                                                         | 0.51                             | 0.55                                | 0.74                | 2.26                 | 0.55                | 1.7             | 1.5           | 5.58                    | 1.89             |
| ADEV floor [ppb]                                                        | 2                                | 4                                   | 19                  | <3                   | 14                  | 25              | 70            | 10                      | 10               |
| # samples reported                                                      | 10                               | 10                                  | 1                   | 8                    | 20                  | 1               | 25            | 1                       | 1                |
| Output Freq.                                                            | 32.788kHz                        | 32.796kHz                           | 32.790kHz           | 32.763kHz            | -                   | -               | 32.768kHz     | 32.76783kHz             | -                |
| Vosc [mV]                                                               | 135                              | 130                                 | 144                 | <30                  | 2                   | 100             | 230           | 160                     | -                |
| C <sub>L</sub> [pF]                                                     | <1.9*                            | 1.55*                               | 0.93                | 3                    | -                   | -               | 6             | 10~20                   | Parasitic        |
| Temp. stability [ppm]<br>(Vari. due to crystal**)<br>Tested temp. range | 154<br>(<169)<br><b>-25~85°C</b> | 425<br>(225~441)<br><b>-25~125℃</b> | 152<br>-20~80°C     | 62<br>25~62°C        | 80<br>-20~80°C      | 109<br>-20~80°C | 150<br>0~80°C | 133<br>-20~80°C         | 48.8<br>-20~80°C |
| Line sensi. [ppm/V]                                                     | 18 (VDD)<br>5.7 (VDDL)           | 29 (VDD)<br>8.6 (VDDL)              | 101                 | 8.4                  | 13                  | 6.7             | 7             | 30.3                    | 85               |
| <b>Calibration Required?</b>                                            | NO                               |                                     | NO                  | NO                   | NO                  | NO              | YES           | YES                     | NO               |

\*Estimated with output frequency.

\*\*Intrinsic temperature stability of the crystal itself.

for NMOS, which is much larger than the proposed differential driver (W/L = 800 nm/300 nm for pull-up NMOS and W/L = 400 nm/300 nm for pull-down NMOS) and the proposed slicer ( $W/L = 1.6 \ \mu \text{m}/500 \text{ nm}$  for  $M_{\text{P}}$  and  $W/L = 800 \text{ nm}/2 \ \mu \text{m}$  for  $M_{\text{N}}$ ).

#### D. Comparison With State-of-the-Art

Table I summarizes the HERO performance and compares it with prior state-of-the-art nW XOs. We use 0.45 V  $V_{DD}$ and an extra 0.15-V power supply to control the oscillation amplitude. The proposed design achieves the lowest numbers for both power consumption and the Allan deviation floor. Because we only use parasitic capacitance at the two nodes of the crystal that is far less than the required load capacitances (12.5 pF for ECS-2X6-FLX and 6 pF for ECX-34Q-S) to set the oscillation frequency as 32.768 kHz, the output frequency is higher than the standard 32.768 kHz, which can be estimated with (27) and Fig. 30

$$f_{\rm OSC} = \frac{1}{2\pi\sqrt{L_{\rm S} \cdot (C_{\rm S}||C_{\rm L})}} = \frac{1}{2\pi\sqrt{L_{\rm S} \cdot C_{\rm S}}} \cdot \sqrt{1 + \frac{C_{\rm S}}{C_{\rm L}}}.$$
 (27)

It is not an issue for wake-up timers that do not require an exact frequency of 32.768 kHz. For real-time clock applications that generate 1 s from 32-kHz XOs, there are two solutions for the proposed design to deal with this frequency deviation from 32.768 kHz. The first solution is to use the required load capacitance specified by the crystal datasheet to set the output frequency to 32.768 kHz. For example, we can use the crystal optimized for low-power Internetof-things (IoT) [17] that requires  $C_L = 3$  pF to achieve 32.768 kHz. Because  $C_L = 3$  pF is larger than the load capacitance in the proposed design, this would increase the power to compensate the crystal loss, which increases the total power consumption to about 593 pW. The second solution is to adjust the output frequency in the digital domain using



Fig. 30. Calculated output frequency versus  $C_L$  with the model of ECX-34Q-S crystal.

fractional division, which would introduce power and area overhead compared with the conventional chain of 15 divideby-2 frequency dividers to obtain 1 s from 32.768 kHz.

This design was tested at temperatures ranging from -25 °C to 85 °C using an ECS crystal and up to 125 °C using a high-temp ECX crystal. It is the widest tested temperature range reported for nW XOs. Because the frequency deviation is dominated by the crystal itself and our tested temperature ranges are the widest, the frequency deviations to temperature are higher than those of the other works in Table I. Calibration is not required for the proposed design.

## E. Reliability Tests

In the proposed design, to reduce the crystal loss, there are only parasitic capacitances at the two nodes of the crystal, and the load capacitance  $C_L$  is estimated to be less than 1.9 pF. Manufacturers have recently started to optimize crystals for low-power IoT applications, reducing  $C_L$  to 3 pF [17], and several nW XO designs [3], [7] have also relied only on

TABLE II Nominal Frequency Across Different Parts With Different Crystals

| Chip# | Crystal#* | fout [kHz] | ∆f** [ppm] |
|-------|-----------|------------|------------|
| 1     | 1         | 32.788978  | 4.3        |
| 2     | 2         | 32.788957  | 3.7        |
| 3     | 3         | 32.788563  | -8.3       |
| 4     | 4         | 32.788927  | 2.8        |
| 5     | 5         | 32.789083  | 7.5        |
| 6     | 6         | 32.788788  | -1.5       |
| 7     | 7         | 32.788424  | -12.6      |
| 8     | 8         | 32.788860  | 0.7        |
| 9     | 9         | 32.788693  | -4.4       |
| 10    | 10        | 32 789089  | 77         |

TABLE III Nominal Frequency Across Different Parts With One Crystal

| Chip# | Crystal#* | f <sub>out</sub> [kHz] | Δf** [ppm] |
|-------|-----------|------------------------|------------|
| 1     | 7         | 32.788402              | -0.13      |
| 2     | 7         | 32.788433              | 0.81       |
| 3     | 7         | 32.788422              | 0.48       |
| 4     | 7         | 32.788416              | 0.30       |
| 5     | 7         | 32.788492              | 2.61       |
| 6     | 7         | 32.788373              | -1.02      |
| 7     | 7         | 32.788424              | 0.54       |
| 8     | 7         | 32.788407              | 0.02       |
| 9     | 7         | 32.788333              | -2.24      |
| 10    | 7         | 32.788361              | -1.38      |

\*ECS-2X6-FLX crystals.

\*\*Frequency variation from the average of 10 output frequencies.

parasitic capacitance at two nodes of the crystals. However, there are still potential concerns about reliability with such low load capacitance.

Table II shows the measured nominal frequency across different parts in COB packages with different crystals. The frequency variation due to the COB package is evaluated to be within  $\pm 3$  ppm by re-soldering one crystal to all ten COBs, and Table III presents the measured nominal frequency across different parts in COB packages with the same crystal. This experiment shows that for the proposed design, the frequency variation in the crystals instead of the parasitic from COB packages dominates the variations in the nominal output frequency. Compared with the designs with the standard load capacitances, the oscillation frequency in the proposed design is more susceptible to PCB parasitics, and the absence of load capacitors also causes frequency deviation from the conventional 32.768 kHz. For the conventional packages and PCBs, we suggest that the proposed design uses a one-point calibration for every board to account for the board-dependent frequency drift.

To evaluate the reliability of the proposed XO, five types of interferences were applied to the COB (transparent epoxy). Fig. 31 shows the measured frequency variation across time in the presence of these interferences. The frequency data were measured with a frequency counter with a gate time of 100 ms, and each interference was added at the 20-s time point and removed at 50 s. As a reference, if due to noise or interference, one pulse is swallowed or added to the output clock during a gate time of 100 ms, the measured output frequency during this



Fig. 31. Measured frequency variations to interferences.

gate time would show a deviation of about  $1/3279 \approx 305$  ppm. Fig. 31 shows that in the presence of interference, the proposed XO works reliably, and the frequency deviation is less than 2 ppm at a gate time of 100 ms.

Clicking the COB introduces mechanical vibration, and the corresponding frequency variation is less than 0.5 ppm. A mobile phone with WiFi on and streaming video was moved to within 2 cm of the chip to test its response to RF signals, and there were frequency variations of around 0.5 ppm with a time constant of seconds. Sixty-Hertz noise from power supplies is another important interference source. In our measurement, when a power brick for a laptop was placed 2 cm above the chip, the proposed XO output frequency showed visible glitches (<1 ppm). By directly touching the crystal with a finger, both mechanical and electrical interferences were introduced to the XO. Each large glitch (in red) corresponds to one touch. As an ultralow-power design, there are many blocks with bias currents less than 10 pA in the proposed XO. For some SoCs targeting edge devices with small form factors, a conventional package is not applicable, and it is hard to achieve light shielding. Hence, light sensitivity would be a design concern for such ultralow-power designs. In our measurement setup, the chip was exposed to an ambient (indoor) light intensity of about 500 lux, and an LED lamp was turned on to apply 6k lux light at 20 s and turned off at 50 s. The frequency variation due to this change in light intensity is  $\sim 1.5$  ppm.

#### V. CONCLUSION

We presented a 0.51-nW 32-kHz XO. By limiting the oscillation amplitude to 135 mV and reducing the load capacitance to less than 2 pF, we reduce the crystal loss to about 55 pW. By performing injections at 4 kHz, the switching loss is greatly reduced, and the power of this design without frequency divider is 0.37 nW. The circuit works across temperatures ranging from -25 °C to 125 °C.

With the proposed HERO, less noise is injected into the crystal on average, and the crystal runs freely for seven periods between injections. It achieves a 2-ppb Allan deviation floor.

This prototyping design uses two power supplies,  $V_{DD}$  and  $V_{DDL}$ . For future work, an ultralow-power low-dropout regulator or 3-to-1 switched-capacitor dc–dc converter can be implemented to allow this XO to work with a single power supply. Furthermore, it can be seen from Fig. 20 that the slicer and IREF consume 46% of the total power, which is more than four times the power needed to compensate for the crystal loss. We expect new architectures in the future to further reduce the power for phase extraction and timing generation, pushing the total power consumption closer to the fundamental limit.

## ACKNOWLEDGMENT

The authors would like to thank the TSMC University Shuttle Program for chip fabrication and Stefano Pietri and John Pigott from NXP for valuable technical discussions.

#### REFERENCES

- D. Yoon, T. Jang, D. Sylvester, and D. Blaauw, "A 5.58 nW crystal oscillator using pulsed driver for real-time clocks," *IEEE J. Solid-State Circuits*, vol. 51, no. 2, pp. 509–522, Feb. 2016.
- [2] H. Esmaeelzadeh and S. Pamarti, "18.4 A 0.55nW/0.5 V 32 kHz crystal oscillator based on a DC-only sustaining amplifier for IoT," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2019, pp. 300–301.
- [3] K.-M. Kim, S. Kim, K.-S. Choi, H. Jung, J. Ko, and S.-G. Lee, "A sub-nW single-supply 32-kHz sub-harmonic pulse injection crystal oscillator," *IEEE J. Solid-State Circuits*, vol. 56, no. 6, pp. 1849–1858, Jun. 2021.
- [4] M. Siniscalchi, F. Silveira, and C. Galup-Montoro, "Ultra-low-voltage CMOS crystal oscillators," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 67, no. 6, pp. 1846–1856, Jun. 2020.
- [5] Y. Zeng, T. Jang, Q. Dong, M. Saligane, D. Sylvester, and D. Blaauw, "A 1.7nW PLL-assisted current injected 32KHz crystal oscillator for IoT," in *Proc. Symp. VLSI Circuits*, Jun. 2017, pp. C68–C69.
- [6] A. Shrivastava, D. A. Kamakshi, and B. H. Calhoun, "A 1.5 nW, 32.768 kHz XTAL oscillator operational from a 0.3 V supply," *IEEE J. Solid-State Circuits*, vol. 51, no. 3, pp. 686–696, Mar. 2016.
- [7] K.-J. Hsiao, "17.7 A 1.89nW/0.15 V self-charged XO for real-time clock generation," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2014, pp. 298–299.
- [8] D. Yoon, D. Sylvester, and D. Blaauw, "A 5.58nW 32.768 kHz DLLassisted XO for real-time clocks in wireless sensing applications," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2012, pp. 366–368.
- [9] H. Esmaeelzadeh and S. Pamarti, "A sub-nW 32-kHz crystal oscillator architecture based on a DC-only sustaining amplifier," *IEEE J. Solid-State Circuits*, vol. 54, no. 12, pp. 3247–3256, Dec. 2019.
- [10] L. Xu, T. Jang, J. Lim, K. Choo, D. Blaauw, and D. Sylvester, "3.3 A 0.51nW 32 kHz crystal oscillator achieving 2ppb allan deviation floor using high-energy-to-noise-ratio pulse injection," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2020, pp. 62–64.
- [11] E. A. Vittoz, M. G. R. Degrauwe, and S. Bitz, "High-performance crystal oscillator circuits: Theory and application," *IEEE J. Solid-State Circuits*, vol. 23, no. 3, pp. 774–783, Jun. 1988.
- [12] B. Razavi, "A study of phase noise in CMOS oscillators," *IEEE J. Solid-State Circuits*, vol. 31, no. 3, pp. 331–343, Mar. 1996.
- [13] A. Hajimiri and T. H. Lee, "A general theory of phase noise in electrical oscillators," *IEEE J. Solid-State Circuits*, vol. 33, no. 2, pp. 179–194, Feb. 1998.
- [14] A. Arnaud, R. Fiorelli, and C. Galup-Montoro, "Nanowatt, sub-nS OTAs, with sub-10-mV input offset, using series-parallel current mirrors," *IEEE J. Solid-State Circuits*, vol. 41, no. 9, pp. 2009–2018, Sep. 2006.
- [15] M. O'Halloran and R. Sarpeshkar, "An analog storage cell with 5e<sup>-</sup>/sec leakage," in *Proc. IEEE Int. Symp. Circuits Syst.*, 2006, p. 4 and 557–560.

- [16] T. Jang, M. Choi, S. Jeong, S. Bang, D. Sylvester, and D. Blaauw, "5.8 A 4.7nW 13.8ppm/°C self-biased wakeup timer using a switched-resistor scheme," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Jan. 2016, pp. 102–103.
- [17] ABRACON. 32.768 kHz IoT Optimized SMD Crystal. Accessed: Dec. 16, 2020. [Online]. Available: https://abracon.com/Resonators/ ABS07W.pdf



Li Xu (Graduate Student Member, IEEE) received the B.Eng. degree in automation from Tongji University, Shanghai, China, in 2009, and the M.S. degree in electrical and computer engineering from Northeastern University, Boston, MA, USA, in 2016. He is currently pursuing the Ph.D. degree with the University of Michigan, Ann Arbor, MI, USA.

From 2009 to 2011, he was an IC Design Engineer with Ricoh Electronic Devices Shanghai Co., Ltd., Shanghai, where he worked on LDO and dc/dc converter projects. During the summer of 2015, he was

a Design Intern with Linear Technology Corporation, Colorado Springs, CO, USA. During the summer of 2020, he was a Research Intern with NVIDIA Corporation, Santa Clara, CA, USA. His current research interest is energy-efficient mixed-signal circuit design.



Taekwang Jang (Senior Member, IEEE) received the B.S. and M.S. degrees in electrical engineering from KAIST, Daejeon, South Korea, in 2006 and 2008, respectively, and the Ph.D. degree from the University of Michigan, Ann Arbor, MI, USA, in 2017; his dissertation was titled "Circuit and System Designs for Millimeter-Scale IoT and Wireless Neural Recording."

From 2008 to 2013, he worked at Samsung Electronics Company Ltd., Yongin, South Korea, focusing on mixed-signal circuit design, including

analog and all-digital phase-locked loops for communication systems and mobile processors. After working as a Post-Doctoral Research Fellow at the University of Michigan, he joined the ETH Zürich, Zürich, Switzerland, in 2018 as an Assistant Professor and is leading the Energy-Efficient Circuits and IoT Systems group. His research interests include ultralow power systems, biomedical circuits, frequency synthesizers, and data converters.

Dr. Jang was a co-recipient of IEEE Transactions on Circuits and Systems 2009 Guillemin-Cauer Best Paper Awards. At the same time, he is a member of the Competence Center for Rehabilitation Engineering and Science and the Chair of IEEE Solid-State Circuits Society, Switzerland chapter.



**Jongyup Lim** (Graduate Student Member, IEEE) received the B.S. degree (*summa cum laude*) in electrical and computer engineering from Seoul National University, Seoul, South Korea, in 2016, and the M.S. degree in electrical and computer engineering from the University of Michigan, Ann Arbor, MI, USA, in 2018, where he is currently pursuing the Ph.D. degree.

His research interests include wireless neural recording system, energy-efficient deep learning hardware, clock generation, and ultralow-power

sensor node design.

Dr. Lim was a recipient of the Doctoral Fellowship from Kwanjeong Educational Foundation in South Korea.



**Kyojin David Choo** (Member, IEEE) received the B.S. and M.S. degrees in electrical engineering from Seoul National University, Seoul, South Korea, in 2007 and 2009, respectively, and the Ph.D. degree from the University of Michigan, Ann Arbor, MI, USA, in 2018.

From 2009 to 2013, he was with Image Sensor Development Team of Samsung Electronics, South Korea, where he designed signal readout chains for mobile/DSLR image sensors. He is currently a Post-Doctoral Research Fellow with the Univer-

sity of Michigan. During his Ph.D., he interned with Apple, Cupertino, CA, USA, and was a consultant to several companies including Sony Electronics, San Jose, CA. He holds 17 U.S. patents and his research interests include charge-domain circuits, sensor interfaces, energy converters, high-speed links/timing generators, and millimeter-scale integrated systems.



**David Blaauw** (Fellow, IEEE) received the B.S. degree in physics and computer science from Duke University, Durham, NC, USA, in 1986, and the Ph.D. degree in computer science from the University of Illinois at Urbana-Champaign, Champaign, IL, USA, in 1991.

Until August 2001, he worked with Motorola, Inc., Austin, TX, USA, where he was the manager of the High Performance Design Technology group and won the Motorola Innovation award. Since August 2001, he has been the Faculty Member of

the University of Michigan, where he is the Kensall D. Wise Collegiate Professor of EECS. He has published over 600 papers and holds 65 patents. He has researched on ultralow-power wireless sensors using subthreshold operation and low-power analog circuit techniques for millimeter systems. This research was awarded the MIT Technology Review's "one of the year's most significant innovations." His research group introduced the socalled near-threshold computing, which has become a common concept in semiconductor design. Most recently, he has pursued research in cognitive computing using analog, in-memory neural networks for edge devices and genomics for precision health.

Dr. Blaauw has received numerous best paper awards. He was General Chair of the IEEE International Symposium on Low Power and a member of the IEEE International Solid-State Circuits Conference (ISSCC) analog program subcommittee. He received the 2016 SIA-SRC faculty award for lifetime research contributions to the U.S. semiconductor industry. He is the Director of the Michigan Integrated Circuits Lab.



**Dennis Sylvester** (Fellow, IEEE) received the Ph.D. degree in electrical engineering from the University of California, Berkeley, CA, USA, in 1999.

He is the Edward S. Davidson Collegiate Professor of Electrical and Computer Engineering at the University of Michigan, Ann Arbor, MI, USA. He held research staff positions at Synopsys, Mountain View, CA, USA, and Hewlett-Packard Laboratories, Palo Alto, CA, USA, as well as visiting professorships at the National University of Singapore, Singapore, and Nanyang Technological University, Singapore.

His main research interests are in the design of miniaturized ultralow power microsystems, touching on analog, mixed-signal, and digital circuits. He has published over 500 articles and holds more than 50 U.S. patents in these areas. His research has been commercialized via three major venture capital-funded startup companies: Ambiq Micro, Cubeworks, and Mythic.

Dr. Sylvester is currently a member of the Administrative Committee for IEEE Solid-State Circuits Society, an Associate Editor for IEEE JOURNAL OF SOLID-STATE CIRCUITS, and was previously an IEEE Solid-State Circuits Society Distinguished Lecturer. He has received 14 best paper awards and nominations and was named a Top Contributing Author at ISSCC and most prolific author at IEEE Symposium on Very Large Scale Integration (VLSI) Circuits.