# Alignment-Independent Chip-to-Chip Communication for Sensor Applications Using Passive Capacitive Signaling Yu-Shiang Lin, Member, IEEE, Dennis Sylvester, Senior Member, IEEE, and David Blaauw, Senior Member, IEEE Abstract—We propose a capacitive coupling based method for sensor data retrieval that can be easily integrated with miniature sensor nodes of sub-mm scale. To enable passive operation of the sensor chip, the data retrieval chip sends power to/receives signal from the sensor chip simultaneously. An alignment detection and pad reconfiguration mechanism is implemented to allow convenient read-out without precise positioning. A test chip was fabricated in 0.13 $\mu m$ CMOS technology. The silicon measurement results demonstrate < 15% difference in achievable data rate can be obtained when the sensor chip is randomly dropped on the data retrieval chip regardless of alignment. Index Terms—Capacitive coupling, proximity communication. ## I. INTRODUCTION INIATURE self-sustaining sensor nodes have become a viable option with silicon technology scaling. Such a system can be easily attached to, or implanted into, various objects for applications such as periodic sensing and recording of temperature or bio-chemical data. With energy minimization techniques [1]–[3] and aggressive power gating, these systems can potentially operate using a micro-fabricated battery with comparable form factor over an extended period of time [4]. To maintain the form factor for such systems, data read-out requires low hardware overhead. Additionally, power consumption is criitical factor during read-out since it determines the size of power sources. Passive radio-frequency identification (RFID) transponder techniques can be used to eliminate read-out energy dissipation for the sensor, but this generally requires an external coil on a centimeter scale, significantly limiting the application space [5]. Near-field pulse signaling through inductive coupling has been reported to achieve high bandwidth using integrated inductors while also being energy efficient [6], [7]. However, the power required for sending data back from the sensor chip still needs to be supplied externally. Capacitive coupling is another favored candidate for near field communication due to its high bandwidth and low energy consumption capable of achieving less than 0.1 pJ/b [8], Manuscript received August 25, 2008; revised November 11, 2008. Current version published March 25, 2009. This work was supported by the National Science Foundation (NSF) Engineering Research Center for Wireless Integrated Microsystems (WIMS), and by a Mediatek International Student Fellowship. The authors are with the University of Michigan, Ann Arbor, MI 48105 USA (e-mail: yushiang@umich.edu; dmcs@umich.edu; blaauw@umich.edu). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JSSC.2009.2014024 [9]. Simultaneous data and power transmission has also been successfully tested for silicon on a stack [10]. When chip stacking is applicable, this is an advantageous solution since all hardware can be integrated into a silicon chip. On the other hand, the signal strength of capacitive coupling is inversely proportional to the distance between the pads, which makes the robustness of such scheme very susceptible to misalignment. Pad alignment of about 3 $\mu$ m, achieved by markings on the edge of a scriber line, was reported [11]. Vernier bar patterns were also proposed to electrically detect the alignment between chips so that alignment error down to 1.4 $\mu$ m can be detected [12]. The accuracy of alignment can be further improved by dividing each transmit plate into smaller microplates so that the mechanical misalignment can be compensated up to $\pm 25~\mu m$ [13]. More quantized alignment information can be provided by a capacitive sensor and alignment circuits. As reported in [14], the analog output by the alignment circuits is able to differentiate alignment error down to 0.1 $\mu$ m. In this work, we propose a capacitive coupling based method where the communication module is fully integrated with the sensor node on a sub-mm scale [15]. The goal is to provide a convenient read-out mechanism without the aid of a optical microscope and positioning by micromanipulator. We use the terminology sensor chip (SC) and data retrieval (DR) chip to indicate corresponding concepts referred to as the transponder and interrogator in RFID systems. By dividing the data retrieval pads into microplates, individual microplates can be grouped together to establish power and signal channels after alignment is known. In Section II, the geometric design issues associated with capacitive coupling will be discussed. The proposed system architecture will be shown in Section III along with circuits blocks. Several design aspects will be highlighted in Section IV with silicon measurement results. Section V concludes the work. # II. GEOMETRY OPTIMIZATION Since our goal is to achieve chip to chip communication without fine tuning the alignment, the pad pattern is designed considering the electrical field in the *worst case* due to misalignment. Thus, we first seek the *worst case* scenario when stacking two chips face-to-face. There are two assumptions in the following analysis: 1) The data retrieval chip is composed of a large array of square pads so that the sensor chip is completely covered by the pad array. 2) The distance between the coupling pads is not a function of location, i.e., the thickness of passivation layer is fixed. The first assumption requires that the data retrieval array is large enough such that the sensor chip can be easily dropped on top of it while the entire sensor chip is still within the boundary of the receiver array. By designing the receiver array two times larger than the sensor chip, this assumption can be satisfied without fine positioning by micromanipulator. The second assumption relies on the uniformity of the final passivation, which can be affected by many issues such as dust on the surface of the chip. In general, this is not a deterministic process from the circuit designer's perspective of view. Therefore, it is reasonable to make this assumption at design time. Both the power and signal channels are required to be established during communication. For the sensor chip, the power pads will be allocated with as much area as possible to maximize the charge that can be harvested. On the other hand, signal pad sizing presents a tradeoff between capacitive loading and coupling factor. A larger signal pad means greater energy consumption at each transition, while reducing the size decreases the sensible voltage seen by the data retrieval pad given fixed parasitic capacitances. For the data retrieval array, the pads are placed as close as possible so that the uncovered area can be minimized. The spacing between pads is typically constrained by two DRC (Design Rule Check) rules in advanced VLSI technologies: the metal density rule that is allowed in the process and the minimum spacing between top metal layers. In the following analysis, the separation between data retrieval pads is fixed at 5 $\mu$ m according to the CMOS process we use. Ideally, dividing the pads into a smaller dimension is helpful for a finer configuration. In reality, however, the minimum size of the pads will be decided by the area of the functional blocks associated with each pad. ## A. Sizing of the Sensor Pad Fig. 1 illustrates the worst case condition given that all the signal pads are square in this work. $\theta$ is defined as the offset angle between the two chips. Here $W_{\rm RX}, W_{\rm TX}, W_{\rm sep}$ are the width of data retrieval pads, sensor signal pad and the separation between the data retrieval pads, respectively. Since the pads are all squares, from symmetry $\theta$ is only considered from $[0,\pi/4]$ . In this case $W_{\rm RX} <= \sqrt{2}W_{\rm TX}$ , and polygon ABCD represents the area of interest, which is used to calculate the coupling capacitance of the pads. Line segments r1 through r4 are used to represent the length of the sides. Coupling capacitance is the sum of the parallel plate capacitance and the fringing capacitance: $$C_c = C_{pp} \cdot (\text{Area of polygon ABCD}) + C_{fr,TX} \cdot (\overline{BCD}) + C_{fr,RX} \cdot (\overline{DAB})$$ (1) where $C_{pp}$ is the parallel plate capacitance per unit area, $C_{fr,RX}$ is the fringing capacitance per unit length of the data retrieval pads and $C_{fr,TX}$ is the fringing capacitance per unit length of the sensor pads. Using trigonometric function, r1 through r4 can be written as $$r1 = \frac{W_{\text{TX}}}{2} \cdot \sec \theta - \frac{W_{\text{sep}}}{2} \cdot (1 - \tan \theta)$$ (2) Fig. 1. The relative position of the receiver array and sensor signal pad when $W_{\rm RX} <= \sqrt{2}W_{\rm TX}$ . $$r2 = \frac{W_{\text{TX}}}{2} \cdot (1 - \tan \theta) - \frac{W_{\text{sep}}}{2} \cdot \sec \theta \tag{3}$$ $$r3 = \frac{W_{\text{TX}}}{2} \cdot \sec \theta - \frac{W_{\text{sep}}}{2} \cdot (1 + \tan \theta) \tag{4}$$ $$r4 = \frac{W_{\text{TX}}}{2} \cdot (1 + \tan \theta) - \frac{W_{\text{sep}}}{2} \cdot \sec \theta \tag{5}$$ It is noted that the above expressions are physically meaningful only when vertex C is still inside the data retrieval pad. In other words, they are valid when $\theta \le \theta_t$ where $\theta_t$ is the angle when vertex B and vertex C overlap. Combining (2) through (5), the area of polygon ABCD can be obtained and simplified as Area of polygon ABCD $$= \frac{1}{2}(r1 \cdot r3 + r2 \cdot r4)$$ $$= \frac{1}{4}W_{\text{TX}}^2 + \frac{1}{4}W_{\text{sep}}^2 - \frac{1}{2}W_{\text{TX}} \cdot W_{\text{sep}} \cdot \sec \theta \qquad (6)$$ while the line segment $\overline{BCD}$ and $\overline{DAB}$ are $W_{\mathrm{TX}} - W_{\mathrm{sep}} \cdot \sec \theta$ and $W_{\mathrm{TX}} \cdot \sec \theta - W_{\mathrm{sep}}$ , respectively. As mentioned before, $W_{\rm sep}$ is designed to be small enough to maximize the coupled area. Thus, it is reasonable to assume that $C_{fr,RX}$ is negligible compared to $C_{fr,TX}$ because the electric field lines from sidewall $\overline{DAB}$ are mostly terminated at the neighboring receiver pads instead of the sensor pad. $C_c$ can then be rewritten as $$C_c = C_{pp} \cdot \left[ \frac{1}{4} W_{\text{TX}}^2 + \frac{1}{4} W_{\text{sep}}^2 - \frac{1}{2} W_{\text{TX}} \cdot W_{\text{sep}} \cdot \sec \theta \right] + C_{fr,TX} \cdot (W_{\text{TX}} - W_{\text{sep}} \cdot \sec \theta). \quad (7)$$ Fig. 2. Differential signaling scheme. Pad A (square with slant lines) together with all the other pads in light gray are used to recover the signal from the sensor. Similar derivations can also be applied to other cases such as $\theta > \theta_t$ or when $W_{\rm RX} > \sqrt{2}W_{\rm TX}$ . Based on the analysis the worst case occurs when $\theta = \pi/4$ . With the aid of 3-D field solver tools [16], the relationship between $W_{\rm TX}$ and coupling capacitance can be determined. In the technology used in this work, the minimum size of the data retrieval pad is 50 $\mu$ m due to the active circuits area. With $W_{\rm RX}$ and $W_{sep}$ being 50 $\mu$ m and 5 $\mu$ m, the simulation results with respect to different outer dimensions of $W_{\rm TX}$ are plotted. The coupling capacitance gradually increases until about 150 $\mu$ m. At this point the sensor pad is large enough to cover at least one data retrieval pad no matter where it is located. Further simulation result shows that the difference between coupling capacitance at different orientations is within 1%, suggesting that a consistently good coupling ratio can be achieved at $W_{\rm TX}=150~\mu{\rm m}$ . To sum up, sensor pads are chosen to be about three times larger than the receiver pad to maximize coupling in the worst case condition. ## B. Single-Ended Versus Differential Signaling In the previous section, only a single pad was considered to transmit a signal from the sensor. On the other hand, the signal strength can be doubled by implementing differential signaling. Consider the diagram shown in Fig. 2, assuming that the dimension of the pads are the same as given in Section II-A. The figure illustrates one of the situation where differential signaling is applied. In this scheme, both Pads A and B are required to amplify the differential signal from the sensor pads. Since the sensor chip can land in any orientation, 15 DR pads along with Pad B have to be routed into Pad A to make sure that signals from both sensor pads are able to be picked up by the DR pad. In a simplified analysis, the coupled voltage from the sensor pad to the receiver pad is proportional to the ratio of coupling capacitance $(C_{\rm couple})$ and ground capacitance $(C_{\rm gnd})$ where $C_{\rm gnd}$ already includes the input capacitance of the amplifier. The ground potential of the data retrieval chip and the sensor chip is assumed to be identical since they are strongly coupled. For a single-ended signaling scheme, the coupling coefficient can be written as $$C_{c,\text{single}} = \frac{V_{\text{couple}}}{V_{\text{tran}}} = \frac{C_{\text{couple}}}{C_{\text{end}} + C_{\text{couple}}}$$ (8) TABLE I SUMMARY OF PAD DIMENSIONS | | Pad size<br>(μm) | Pad Spacing (μm) | Number of pads | |---------------------|--------------------|------------------|----------------| | Sensor chip | Power: 225 by 225 | ~20 | Power: 2 | | | Signal: 150 by 150 | | Signal: 1 | | Data retrieval chip | 48 by 48 | 5 | 400 | $V_{\rm couple}$ and $V_{\rm tran}$ are the coupled voltage and transmitted amplitude, respectively. For a differential signaling scheme, the coupling coefficient is given by $$C_{c,\text{diff}} = \sum_{i=1}^{2} \frac{C_{\text{couple},i}}{C_{\text{gnd},i} + C_{\text{couple},i} + N \cdot C_{\text{sw}} + C_{\text{wire}}}$$ (9) where $C_{\rm sw}$ is the device loading of the switches that control the destination of coupled signal, N is the number of other pads the pad has to connect to, and $C_{\rm wire}$ denote the extra wire loading due to the differential signaling scheme. Assuming $C_{\rm couple,1} \approx C_{\rm couple,2} = C_{\rm couple}$ and $C_{\rm gnd,1} \approx C_{\rm gnd,2} = C_{\rm gnd}$ , the difference between $C_{c,\rm single}$ and $C_{c,\rm diff}$ is $$C_{c,\text{single}} - C_{c,\text{diff}}$$ $$= \frac{C_{\text{couple}}}{C_{\text{gnd}} + C_{\text{couple}}}$$ $$- \frac{2C_{\text{couple}}}{C_{\text{gnd}} + c_{\text{couple}} + N \cdot C_{ckt} + C_{\text{wire}}}$$ $$= \left(\frac{C_{\text{couple}}}{C_{\text{gnd}} + C_{\text{couple}}}\right)$$ $$\cdot \left(\frac{N \cdot C_{\text{sw}} + C_{\text{wire}} - C_{\text{gnd}} - C_{\text{couple}}}{C_{\text{gnd}} + C_{\text{couple}} + N \cdot C_{\text{sw}} + C_{\text{wire}}}\right). \quad (10)$$ In other words, the differential scheme is better than the singleended scheme only when the sum of $N \cdot C_{\text{sw}}$ and $C_{\text{wire}}$ is smaller than the sum of $C_{\text{gnd}}$ and $C_{\text{couple}}$ . $C_{\text{gnd}}$ and $C_{\text{couple}}$ can be estimated from the process and geometry, or more precisely, through RC extraction tools. For a DR pad that is 50 $\mu$ m by each side, $C_{\rm gnd}$ is 40–50 fF if the signal and power routing underneath it are restricted to metal 3 or below. $C_{\rm sw}$ can generally been ignored if, for example, a transmission gate that is four times as large as the minimum sized transistor is used. $C_{\rm wire}$ can be estimated by the wire length. Considering 15 extra connections require 150 $\mu$ m long metal wiring each with minimum width, the total wire loading is 150 fF assuming isolated wires. Unless $C_{\text{couple}}$ is more than two times larger than $C_{\text{gnd}}$ , differential signaling scheme will not offer any advantage over the single-ended counterpart. In addition to that, complex wiring in the differential signaling scheme will force wires to be routed at higher levels of metal and will increase $C_{\mathrm{gnd}}$ as a result. Therefore, single-ended signaling is implemented in this work. The dimensions of the pads used in data retrieval chip and sensor chip are summarized in Table I. Due to fabrication constraints, the actual footprint of the pads are slightly different from the designed values. For example, the DR pad size is reduced from 50 $\mu$ m to 48 $\mu$ m on a side to comply with metal density rules. Fig. 3. System architecture for the proposed data retrieval mechanism. Fig. 4. Data retrieval array showing 20 by 20 cells and controller. #### III. SYSTEM ARCHITECTURE Fig. 3 shows the proposed system diagram for sensor data retrieval. The data retrieval chip is responsible for sending power and recovering data from the sensor chip at the same time. Since there is no common reference for both chips, two power channels are required to send AC power differentially. An AC to DC converter at the sensor chip side is used to harvest the supply voltage for the sensor. The clock signal is modulated with the power signals and can be demodulated by the sensor chip, so no additional channel is needed for synchronization. This also helps to precisely control the sensing window of the receiver circuit for better noise rejection. A single signal channel is used to transmit data back to the data retrieval chip as suggested in the previous section. #### A. Data Retrieval Circuits Design While the sensor chip has three pads dedicated to individual channels, the data retrieval chip contains an array of 20 by 20 cells that each can be assigned as the signal channel or can be clustered as a power channel as needed (Fig. 4). Each cell is tied to a corresponding DR pad, which serves as communication channels that are reconfigurable based on alignment information. One of the following three functions can be performed by the DR cell at the same time. - Alignment detection: Alignment information is transformed to digital output and can be scanned out for post-processing. - Power transmission: The pad is driven by level converters with elevated amplitude to strengthen the signal that is able to reach the sensor pads. Fig. 5. Alignment detector. (a) Block diagram. (b) Operation waveform. Fig. 6. Data retrieval with capacitive coupled input and periodic precharge to sensitize the amplifier. 3) Signal recovery: The capacitively coupled signal is first amplified and then decoded by the DR controller. After sensor chip is dropped on top of the data retrieval chip, the alignment detector shown in Fig. 5(a) is used to determine the best configuration. The alignment detector is essentially an ring oscillator based capacitance-to-digital converter that translates capacitive loading for each DR pad. The ring oscillator converts the capacitance into frequency information represented by RING\_CLK. Then RING\_CLK is used to increment the synchronous counter during a given period of time when *ENABLE* is high (defined by *SYS\_CLK*). The operation waveform is shown in Fig. 5(b). To adapt for different speed of ring oscillators across the DR array, a one-time zero-calibration method needs to be implemented (Section IV-B). Although the output has to be limited to 9 bits to place in the limited area under each DR cell, the circuits can be operated in cyclic mode. This means that the alignment information is maintained even though the counter overflows and the carry-out information is discarded. We will revisit the alignment detection issue in Section IV-B to explain how useful information can be extracted efficiently for the whole data retrieval array. For the power transmission drivers, traditional DCVS (differential cascode voltage switch) type level converters are used. Such level converters can easily operate at an output amplitude that is three times higher than the nominal supply voltage within our interested carrier frequency of tens of MHz's. The clock signal is globally distributed to every cell and is locally inverted if an out-of-phase signal is required. In an effort to reduce parasitic capacitance for the DR pads, we restrict the routing layers to metal 3 and below only. Uniform clock wire routing is achieved throughout the DR array by implementing the clock driver all from one side of each row. This provides a feasible routing scheme compared to an H-tree type clock network, at the expense of larger clock skew. Fig. 6 shows the data retrieval mechanism. Two differential amplifiers are used to detect both the rising and falling transitions. The input node $(V_{\rm in})$ is precharged high before the clock goes low to sensitize the amplifiers. Immediately after the clock fires, either $V_{lh}$ or $V_{hl}$ will be pulled down depending on the direction of the coupled signal. The high-to-low transition triggers the 400-to-1 AND tree gate that simultaneously monitors all DR pads and results in an UP/DN signal for the one-bit sat- Fig. 7. AC to DC conversion circuits for sensor chip power harvesting. uration counter that determines the data output. The difference between $V_{dc}$ and $V_{dc1}/V_{dc2}$ is designed to be 50 mV to mitigate input offset voltage and the impact of noise. The operation of the receiver is synchronized to ext\_clk. The signal transition only happens after the negative edge of ext\_clk and is latched at the positive edge. In this scheme, signal preset is used to both precharge $V_{\rm in}$ and enable the decoder to detect switching events. In other words, the impact of noise on the floating node $V_{\rm in}$ can be minimized by properly control of the pulsewidth of preset. The pulsewidth of the preset signal and the delay from ext\_clk can both be programmable through delay lines. ## B. Sensor Chip Circuit Design The main building block of the sensor chip is the AC to DC conversion circuit shown in Fig. 7. The AC coupled inputs $V_{\rm in}$ and $V_{\rm inn}$ are rectified into DC supply voltages by cascading voltage doublers. Each voltage doubler contains a full-bridge hybrid cross-coupled pMOS rectifier. Transistors md1 and md2 set the lowest voltage of $V_{n1}$ and $V_{n2}$ to $V_{dc1}$ . After each input transition at $V_{\rm in}$ and $V_{\rm inn}$ , $V_{dc2}$ is charged with a potential equals to $V_{dc1}+\triangle V_{in}$ by the cross-coupled pMOS md3 and md4, where $\triangle V_{\rm in}$ is the coupled amplitude for the sensor chip. Although replacing md1 and md2 with cross-coupled nMOS transistors are advantageous in reducing turn on voltage at the first few stages, it is not feasible for stages with higher voltage inputs. The reason is that without a triple well or deep NWELL process, body effect can eventually result in large nMOS threshold voltage. At the output of the 10th stage, a voltage limiter prevents the supply voltage from going above operating range. The voltage efficiency of the voltage doubler is 74% using the same definition from [17] The design of the voltage limiter is shown in Fig. 8(a). The general concept is similar to the mode selector in [17]. In this work, a shunt transistor m10 is used to discharge current from $V_{\rm in}$ (VDD10 for Fig. 7) to ground when $V_{\rm in}$ is above a certain voltage level. To help explain how the voltage is set in hardware, the open-loop voltage transfer curve in Fig. 8(b) is used. Node n2 will remain close to VSS before $V_{\rm in}$ exceeds $2\Delta V$ (where $\Delta V$ is the turn-on voltage of the diode-connected transistors m5 and m6). When $V_{\rm in}$ increases beyond $2\Delta V$ , the excessive voltage drop will occur mainly across R1, and thus the voltage on $n^2$ begins to track the supply voltage. On the other hand, voltage n1 will be limited at $2\Delta V$ once the supply voltage is higher than this value. By comparing n1 and n2, the amplifier output n3 will begin to turn on m10 strongly when the supply voltage is greater than 1.6 V. Since each voltage doubler stage is identical, intermediate voltage levels VDD1 through VDD10 are inherently generated. In this work, we use VDD4 (0.65 V) to supply the voltage for a 4-bit LFSR circuit to generate a data stream with low power consumption and then up-convert to VDD10 to increase signal strength before transmission. A power-on-reset circuit is usually required to avoid the deadlock situation when all the register outputs are zero. This is relatively easy for the LFSR circuit used in this work to represent logic, since the situation can be avoided by using a NAND4 gate to force advancing the state of LFSR if it starts at the deadlock state. For clock synchronization, the system clock is amplitude modulated with carrier frequency $f_c$ using the same power channels. An envelope detector is used to demodulate the clock signal. The differential AC input signal is first rectified and then filtered by a RC low-pass filter. Since the input amplitude varies due to several factors such as the transmitting amplitude and the distance between pads, a level converter is required so that the demodulated clock is able to drive the logic blocks at 0.65 V. For robust level conversion for subthreshold input voltage, a Fig. 8. Voltage limiter. (a) Circuits diagram. (b) Open loop voltage transfer curve. Fig. 9. Chip die photo. single stage comparator is implemented. In this circuit, VDD1 and VDD2 from the voltage doubler stages are used as the reference voltage and bias voltage for the comparator, respectively. In this way, as long as the rectified voltage is higher than VDD1 the demodulator is able to work properly. #### IV. CHIP MEASUREMENT ## A. Test Chip A test chip was fabricated in 0.13 $\mu$ m CMOS technology. The die photo is shown in Fig. 9. The active die area consumed by the sensor chip is $0.014 \text{ mm}^2$ . The size of the data retrieval array is 1.1 mm $\times$ 1.1 mm while the total size of the DR controller and clock generator is 0.08 mm<sup>2</sup>. During measurement, the data retrieval chip is packaged and mounted on a PCB. The sensor chip is diced to 0.5 mm by 0.5 mm, and is manually dropped on top of the data retrieval array without precise positioning. Once the two chips are stacked, we first perform alignment detection and scan out the information to be externally processed by a PC. The PC will match the data to a known pattern and determine the channel that a particular pad should be assigned to for the DR array. Alternatively, the computations can also be processed on chip if an ALU (Arithmetic Logic Unit) is available. Data clock $f_{ m data}$ is generated externally by a function generator and sent along with the decoded data to a PC-based logic analyzer to compute BER (Bit Error Rate). Fig. 10. Parasitic components for the system of two chips in a stack. ### B. Alignment Detection We have seen that alignment information can be obtained using the ring oscillators to extract different coupling capacitances seen by each DR pad. To reduce the conversion time, we would like to run as many alignment detectors in parallel as possible. However, activating all alignment detectors at the same time will yield results that do not contain any alignment information. This can be explained by Fig. 10 showing the parasitic components of the system when two chips are put in a stack. For DR pads P1 through P5, the parasitic capacitors include coupling capacitors $C_{c1}$ through $C_{c5}$ , ground capacitors $C_{g1}$ through $C_{g5}$ and capacitors $C_{p1}$ through $C_{p4}$ that exist between pads. By simultaneously oscillating all the pads at the same time, the coupling capacitances will be blocked from the AC ground and therefore the location of the sensor chip will not have any impact on the alignment detectors. In addition, since the impedance of $C_{p1}$ through $C_{p4}$ is low at high frequency the whole system will oscillate at the same frequency. To solve this problem, at least one neighboring pad should be grounded for any given oscillating pad. For example, P2 and P4 are grounded when **P1**, **P3** and **P5** are running to provide a close return AC path to ground. From this analysis, we can develop the alignment detection algorithm in a systematic way (Fig. 11). The DR array is first divided into four quadrants and only one quadrant is activated at a time. By repeating the capacitance-to-digital conversion Fig. 11. Procedures for alignment detection and pad reconfiguration. four times the results can be merged into a two dimensional table. The table represents a set of zero calibration values for the specific data retrieval chip. The same procedure needs to be repeated again every time the sensor is dropped on top of the DR array to generate another 2-D table that represents the actual alignment. A 2-D contour plot shown on the bottom right of Fig. 11 can be obtained by simply subtracting values from the 2-D tables. Each pixel of the plot indicates the value of excessive coupling capacitance due to the existence of the sensor chip. From the plot, both the outline of the sensor chip and the position of the power pads and signal pad can be clearly seen. With the digitized alignment information, the clusters for power pads and signal pad can be computed by comparing the results with a known pattern coming from the chip geometry. As a result, the channels for power transmission and signal reception can be identified and reconfigured properly every time regardless of the position and orientation of the sensor chip. #### C. Measurement Results Measured waveforms of the test chip are shown in Fig. 12. At a clock frequency of 1.1 MHz, the decoded output shows the data sequence that repeats every 15 cycles. We define the achievable operating frequency (or data rate, since there is only one serial data bit) of this system to be when no errors occur in $10^9$ cycles. Achievable data rate is measured with different transmitting amplitude $(A_{\rm in})$ and carrier frequency $f_c$ . The results are shown in Fig. 13. I/O devices are used for power transmission so $A_{\rm in}$ can be as high as 3.3 V in this 0.13 $\mu$ m technology. The system starts successfully receiving sequence of data with BER less than $10^{-9}$ when $A_{\rm in}$ exceeds 1.8 V. Estimated working distance is also shown on the second x axis of the $f_{\rm data}$ plot. Based on measurement data, I/O devices would not be needed if the passivation thickness were reduced by 1/3 Fig. 12. Decoded data waveform showing pseudo random bit sequences up to 15 unrepeated cycles. from its 5.6 $\mu$ m original value (e.g., by further polishing). Increasing $A_{\rm in}$ monotonically increases the data rate as expected. At 3 V, a data rate as high as 2.5 Mbps can be achieved with $f_c$ of 216 MHz. However, it is observed that raising $f_c$ above 150 MHz in fact reduces $f_{\rm data}$ . The reason is that at higher frequencies the efficiency of the voltage doublers start to decrease. From simulation results, the voltage efficiency decreases by 7% if $f_c$ is increased from 100 MHz to 400 MHz. Since targeted data working sets for sensor nodes are on the order of kb [18], [19], the achievable data rate is sufficient for complete data retrieval on the millisecond timescale. Energy numbers for the test chip are shown in Fig. 14. It is clear that increasing $f_c$ penalizes overall energy consumption since data rate does not scale well with carrier frequency $f_c$ . In Fig. 13. Operating frequency versus transmitting amplitude and carrier frequency with estimated working distance showing on the second x-axis. Fig. 14. Energy consumption versus transmitting amplitude and carrier frequency. this measurement, the transmitting amplitude for minimal energy is around 2.8 V. If operated above the minimal energy point, the junction will be slightly forward-biased after each transition for the rectifier circuit shown in Fig. 7. Therefore, the charge that can be harvested begins to saturate and results in lower rectifier efficiency. 2 nJ/bit is the lowest energy achieved by the proposed system. Fig. 15(a) shows BER with respect to the window size $(T_w)$ , which is related to the modulated clock for power transmission. $T_w$ is defined as the period when the output $\operatorname{clk_{mod}}$ [Fig. 15(b)] remains at 0. It is required for clock synchronization purpose as the sensor chip needs to demodulate the clock signal and send back the data within the time when $T_w$ is low. This sets the lower bound for $T_w$ because of the demodulator's response time. From Fig. 15(a), the bathtub shape of BER suggests that there is also an upper bound for $T_w$ . The reason is that the charge that can be harvested by the sensor chip reduces as $T_w$ increases for a given period of time. In general, we need to fine tune $T_w$ within Fig. 15. (a) $T_w$ versus BER. (b) Clock modulation circuit that defines $T_w$ . a range of tens of nanoseconds for higher data rate. On the other hand, since data rates close to MHz may be excessive for the application, the design requirement for $T_w$ can be relaxed by simply reducing the transmitting data rate. Fig. 16 shows the data rate versus BER for 10 random locations at which the sensor was dropped. The alignment 2-D contour plots (8 out of 10 locations) are also shown for the corresponding BER curves. Some regions yield a lower data rate mainly because the electric field between the pads is not as strong as the others. The results are distributed into two distinct regions of the plot, however, there is no clear correlation between the position and the achievable data rate. Nonuniform surface of the passivation layer may be one cause for the discrepancy. These results verify that the proposed system adapts to different locations and orientations without the need for precise positioning. ## V. CONCLUSION In this work, we presented a near field data retrieval system using capacitive coupling. To alleviate the problem of chip misalignment, an alignment detection and pad reconfiguration method was proposed. The data retrieval pad is divided into an array of micropads, and each micropads can be assigned for sending power or receiving data depending on the alignment information. From the chip measurement results, it was shown that data rate higher than 900 kbps can be achieved across ten random positioning tests. For small form factor sensor systems, this work provides the advantage of little hardware overhead and a flexible operating frequency that is not limited by the dimension of passive components. Fig. 16. Data rate versus BER with 10 random position testing. #### REFERENCES - [1] S. Hanson, B. Zhai, M. Seok, B. Cline, K. Zhou, M. Singhal, M. Minuth, J. Olson, L. Nazhandali, T. Austin, D. Sylvester, and D. Blaauw, "Performance and variability optimization strategies in a sub-200 mV, 3.5 pJ/inst, 11 nW subthreshold processor," in *Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2007, pp. 152–153. - [2] Y. Ramadass and A. Chandrakasan, "Minimum energy tracking loop with embedded DC-DC converter delivering voltages down to 250 mV in 65 nm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2007, pp. 64–587. - [3] M.-E. Hwang, A. Raychowdhury, K. Kim, and K. Roy, "A 85 mV 40 nW process-tolerant subthreshold 8 × 8 FIR filter in 130 nm technology," in *Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2007, pp. 154–155. - [4] Y.-S. Lin, S. Hanson, F. Albano, C. Tokunaga, R.-U. Haque, K. Wise, A. Sastry, D. Blaauw, and D. Sylvester, "Low-voltage circuit design for widespread sensing applications," in *Proc. IEEE Int. Symp. Circuits* and Systems, May 2008, pp. 2558–2561. - [5] U. Karthaus and M. Fischer, "Fully integrated passive UHF RFID transponder IC with 16.7-μW minimum RF input power," *IEEE J. Solid-State Circuits*, vol. 38, no. 10, pp. 1602–1608, Oct. 2003. - [6] N. Miura, D. Mizoguchi, M. Inoue, T. Sakurai, and T. Kuroda, "A 195-Gb/s 1.2-W inductive inter-chip wireless superconnect with transmit power control scheme for 3-D-stacked system in a package," *IEEE J. Solid-State Circuits*, vol. 41, no. 1, pp. 23–34, Jan. 2006. - [7] T. Kuroda, "Wireless proximity communications for 3-D system integration," in *IEEE Workshop on RFIT*, Dec. 2007, pp. 21–25. - [8] R. Drost, R. Hopkins, and I. Sutherland, "Proximity communication," in *Proc. IEEE Custom Integrated Circuits Conf. (CICC)*, Sep. 2003, pp. 469–472. - [9] A. Fazzi, R. Canegallo, L. Ciccarelli, L. Magagni, F. Natali, E. Jung, P. Rolandi, and R. Guerrieri, "3D capacitive interconnections with monoand bidirectional capabilities," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2007, pp. 356–608. - [10] E. Culurciello and A. G. Andreou, "Capacitive inter-chip data and power transfer for 3-D VLSI," *IEEE Trans. Circuits Syst. II*, vol. 53, no. 12, pp. 1348–1352, 2006. - [11] K. Kanda, D. Antono, K. Ishida, H. Kawaguchi, T. Kuroda, and T. Sakurai, "1.27 Gb/s/pin 3 mW/pin wireless superconnect (WSC) interface scheme," in *IEEE ISSCC Dig. Tech. Papers*, 2003, vol. 1, pp. 186–487. - [12] R. Drost, R. Hopkins, R. Ho, and I. Sutherland, "Proximity communication," *IEEE J. Solid-State Circuits*, vol. 39, no. 9, pp. 1529–1535, Sep. 2004. - [13] R. Drost, R. Ho, D. Hopkins, and I. Sutherland, "Electronic alignment for proximity communication," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2004, pp. 144–518. - [14] R. Canegallo, M. Mirandola, A. Fazzi, L. Magagni, R. Guerrieri, and K. Kaschlun, "Electrical measurement of alignment for 3-D stacked chips," *Proc. ESSCIRC*, pp. 347–350, Sep. 2005. - [15] Y.-S. Lin, D. Sylvester, and D. Blaauw, "Sensor data retrieval using alignment independent capacitive signaling," in *Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2008, pp. 66–67. - [16] Raphael. Synopsys Inc., Mountain View, CA, 2005. - [17] F. Kocer and M. Flynn, "An RF-powered, wireless CMOS temperature sensor," *IEEE Sensors J.*, vol. 6, no. 3, pp. 557–564, 2006. - [18] L. Nazhandali, B. Zhai, A. Olson, A. Reeves, M. Minuth, R. Helfand, S. Pant, T. Austin, and D. Blaauw, "Energy optimization of subthreshold-voltage sensor network processors," in *Proc. Int. Symp. Computer Architecture (ISCA)*, Jun. 2005, pp. 197–207. - [19] M. Seok, S. Hanson, Y.-S. Lin, Z. Foo, D. Kim, Y. Lee, N. Liu, D. Sylvester, and D. Blaauw, "The phoenix processor: A 30 pW platform for sensor applications," in *Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2008, pp. 188–189. **Yu-Shiang Lin** (S'04–M'08) received the B.S. and M.S. degrees in electrical engineering from Nation Taiwan University, Taipei, Taiwan, in 2000 and 2002, respectively. In 2008, he received the Ph.D. degree in electrical engineering from the University of Michigan, Ann Arbor. Since 2008, he has been with IBM T. J. Watson Research Center, Yorktown Heights, NY, where he is a Postdoctoral Researcher. His research has focused on ultra-low-power VLSI circuits design. **Dennis Sylvester** (S'95–M'00–SM'04) received the Ph.D. degree in electrical engineering from the University of California at Berkeley where his dissertation research was recognized with the David J. Sakrison Memorial Prize as the most outstanding research in the UC Berkeley EECS Department. He is now an Associate Professor of electrical engineering and computer science at the University of Michigan, Ann Arbor. He previously held research staff positions in the Advanced Technology Group of Synopsys, Mountain View, CA, and Hewlett-Packard Laboratories, Palo Alto, CA, and a visiting professorship in electrical and computer engineering at the National University of Singapore. He has published numerous articles along with one book and several book chapters in his field of research, which includes low-power circuit design and design automation techniques, design-for-manufacturability, and interconnect modeling. He also serves as a consultant and technical advisory board member for several electronic design automation and semiconductor firms in these areas. Dr. Sylvester received a National Science Foundation (NSF) CAREER Award, the Beatrice Winner Award at ISSCC, an IBM Faculty Award, an SRC Inventor Recognition Award, and several best paper awards and nominations. He is the recipient of the ACM SIGDA Outstanding New Faculty Award and the University of Michigan Henry Russel Award for distinguished scholarship. He has served on the technical program committee of numerous design automation and circuit design conferences, the steering committee of the ACM/IEEE International Symposium on Physical Design, and was general chair for the 2005 ACM/IEEE Workshop on Timing Issues in the Synthesis and Specification of Digital Systems (TAU). He is currently an Associate Editor for IEEE TRANSACTIONS ON CAD and previously served as Associate Editor for IEEE TRANSACTIONS ON VLSI SYSTEMS. He is a member of ACM and Eta Kappa Nu. **David Blaauw** (M'94–SM'07) received the B.S. degree in physics and computer science from Duke University, Durham, NC, in 1986, and the Ph.D. degree in computer science from the University of Illinois, Urbana, in 1991. Until August 2001, he worked for Motorola, Inc., Austin, TX, where he was the manager of the High Performance Design Technology group. Since August 2001, he has been on the faculty at the University of Michigan, Ann Arbor, where he is a Professor. His work has focused on VLSI design with particular em- phasis on ultra-low-power and high-performance design. Dr. Blaauw was the Technical Program Chair and General Chair for the International Symposium on Low Power Electronic and Design and was the Technical Program Co-Chair and member of the Executive Committee the ACM/ IEEE Design Automation Conference. He is currently a member of the ISSCC Technical Program Committee.