Adaptive MRAM Write and Read with MTJ Variation Monitor

Shaodi Wang, Student Member, IEEE, Hochul Lee, Student Member, IEEE, Cecile Grezes, Member, IEEE, Pedram Khalili Amiri, Member, IEEE, Kang L. Wang, Fellow, IEEE, and Puneet Gupta, Senior Member, IEEE

Abstract—Temperature and wafer-level process variations significantly degrade operation efficiency of Spin-transfer torque random access memory (STT-MRAM) and magnetoelectric random access memory (MeRAM), where the write and read reliability issues are exacerbated by the variations. We propose adaptive write and read schemes for highly efficient STT-MRAM and MeRAM programming and sensing that optimally selects write and read pulses to overcome process and temperature variation. With adaptive write, the write latency of STT-MRAM and MeRAM cache are reduced by up to 17% and 59% respectively, and application run time is improved by up to 41%. With adaptive read, the sensing margin is dramatically improved by 1.4X while maintaining read disturbance correctable by error-correcting-code (ECC) correction. To further mitigate read disturbance impact on memory system, additional adaptive read scheme can dynamically lower read voltage according to the proposed monitor result. It can extend memory service time by half to one year, and reduce read disturbance induced memory failure by 59% to 84%. To better support these schemes, we also propose, design, and evaluate low-cost MTJ-based variation monitor, which precisely senses process and temperature variation. The monitor is over 10X faster, 5X more energy-efficient, and 20X smaller compared with conventional thermal monitors of similar accuracy.

Index Terms—MeRAM, STT-MRAM, adaptive write, adaptive read, thermal monitor, process variation, temperature variation, thermal activation, read disturbance, sensing margin

1 INTRODUCTION

Spin-transfer torque magnetoresistive random access memory (STT-MRAM) and magnetoelectric random access memory (MeRAM) are promising non-volatile memory technologies. STT-MRAM is designed with STT magnetic tunnel junctions (STT-MTJ) [1, 2], providing high endurance, fast programming and accessing time, and being identified as a possible replacement of current memory technologies, such as static RAM (SRAM) cache [3, 4] and Dynamic RAM (DRAM) memory [5]. MeRAM designed with voltage-control MTJ (VC-MTJ) [6–9] are switched by voltage-controlled magnetic anisotropy (VCMA) effect, providing more promising programming speed, lower programming energy and higher memory density [10, 11].

However, reliability issues are the main challenges for both STT-MRAM and MeRAM, including write error, read disturbance, and sensing error. In a MRAM write operation, thermal fluctuation can cause a write error. To reduce write error rate (WER) of STT-MRAM, traditionally, write pulse amplitude and duration should be increased, but as a trade-off, write energy increases, memory density decreases due to larger access transistors, and write latency increases. Nevertheless, for MeRAM, there is no previous way to avoid write errors [12]. For read disturbance, the STT-MTJ may falsely switch in a read operation due to the thermal activation, but MeRAM is free from this problem because its read current direction is opposite to write current, which strengthens VC-MTJ’s thermal stability. The high-to-low resistance difference in MTJ is quantified by tunnel magnetoresistance (TMR, defined as \((R_H - R_L)/R_L\)), and both STT-MRAM and MeRAM have low TMR, leading to a narrow sensing margin and possible read errors.

Process and temperature variation further exacerbates these problems [10, 13–15]. Local variation including etching-induced MTJ diameter and oxide tunnel barrier thickness variation leads to resistance change or MTJ functional failure [16]. Wafer-level variations, including thickness variation of free layer and oxide tunnel barrier layer, affect MTJ performance more severely than local variation [17, 18]. The wafer-level free layer thickness variation can dramatically change energy barrier and thermal stability, especially for out-of-plane MTJs. Temperature variation during operation also affects energy barrier, STT and VCMA effect, and MTJ resistance. Temperature and process variation together can change the energy barrier by 200%, indicating that extreme high write energy is required if STT-MRAM is designed for worst process and temperature corner. Unlike STT-MRAM, MeRAM requires precise voltage amplitude to achieve the least WER, but the voltage varies with energy barrier and hence is sensitive to process and temperature variation. Temperature and process variations also change MRAM’s TMR dramatically [19]. For example, TMR drops from ~205% to ~140% with temperature rising from 200K to 300K [20]. The change mainly comes from the anti-parallel (AP) resistance change. This indicates that sensing margin in a read operation gets narrow at high temperature, that may result in read errors.

We designed an MTJ-based variation monitor [21] utilizing thermal activation and VCMA effect [21]. The monitor enables in-situ process and temperature variation sensing.
The monitor achieves remarkable area, power, and latency improvement compared with conventional on-chip thermal monitors. We proposed an adaptive write scheme which selects optimized write pulse for STT-MRAM and MeRAM to achieve faster write speed based on run-time variation sensing [21]. We also proposed an adaptive read scheme, which smartly selects sensing voltage and sensing resistance to optimize the trade-off between read disturbance rate (RDR) and sensing error rate.

Our contributions are summarized as follows.

- We have designed an MTJ-based variation monitor to sense process and temperature variations. Compared with conventional thermal monitors, the monitor is 10X faster, 5X energy-efficient, and 20X smaller. The monitor directly utilizes MTJs from regular MRAM array without adding fabrication cost overhead.
- We propose an adaptive write scheme that selects write pulse according to ambient process and temperature variation to achieve fast write. We evaluate the proposed method in both circuit-level and system-level. The write latency of MRAM based caches is improved by up to 59%. Applications can be sped up by up to 41%.
- We propose an adaptive read scheme to dynamically select read voltages and reference resistors to maintain read disturbance rate under control while improving sensing margin.

## 2 Background

### 2.1 STT-MTJ and VC-MTJ

STT-MTJ and VC-MTJ are resistive memory devices and share a similar device structure, their resistance is determined by the two ferromagnetic layers. One layer has a fixed magnetic direction (referred as reference layer) while the other one has a switchable magnetic direction (referred as free layer). A low ("1") and high ("0") resistance are present when magnetic directions are parallel (P state) or anti-parallel (AP state) respectively. The difference in resistance is quantified by tunnel magnetoresistance (TMR, defined as \((R_H - R_L)/R_L\)), where TMR of 180% [22] has been demonstrated in a 8Mb STT-MRAM chip. Based on the magnetization direction, MTJs are classified into in-plane and out-of-plane (perpendicular magnetized) devices. In this paper, we consider out-of-plane MTJs, which have more efficient write, less fabrication challenge, and higher thermal stability (retention time) [23–25].

By contrast, STT-MTJ is switched by bidirectional current, while VC-MTJ is switched by one-directional current pulse. Fig. 1 shows the STT effect. Polarized electrons flowing from the fixed layer to the free layer switch the magnetization of the free layer to P state; when electrons flow in the opposite direction, the reflected electrons from

![Fig. 1: Spin-transfer torque induced switching.](image)

**Fig. 2:** VCMA-induced precessional switching. A positive (negative) voltage on an MTJ reduces (increases) the energy barrier separating the two magnetization states. A positive voltage over \(V_C\) gives rise to a full energy barrier reduction and precessional switching.

<table>
<thead>
<tr>
<th></th>
<th>Fixed</th>
<th>Free</th>
</tr>
</thead>
<tbody>
<tr>
<td>Transmitted electrons with spins</td>
<td>MGO</td>
<td>Free</td>
</tr>
<tr>
<td>Reflected electrons with spins</td>
<td>Fixed</td>
<td>MGO</td>
</tr>
</tbody>
</table>

In [27], an early write termination methodology has been proposed to complete STT-MRAM write upon MTJ switching through sensing voltage change on bit-lines. However, modern STT-MTJs are designed with low resistance leading to little voltage change on bit-lines during MTJ switching. Moreover, the scheme cannot assist MeRAM due to its long sensing latency of over 0.5ns. In [28, 29], a negative differential resistance (NDR)-assisted sensing scheme has been proposed to amplify sensing margin. The NDR’s lowest resistance should designed between MTJ’s high and low resistance states. However, the MTJ resistance varies with temperature, hence the reliability of the NDR sensing scheme can be improved by the proposed method in Section 7 through designing adaptive NDR resistance.

### 3 Related Works

A MTJ-based sensor has been proposed in [26] to sense magnetic field attack to STT-MRAM. However, this monitor used smaller sized MTJs than data MTJs to sense magnetic attack for the reason that small MTJs have low retention time and are switched earlier than bigger MTJs. However, the smaller sized MTJs would have unexpected physic phenomena from data MTJs, (e.g., single magnetic domain vs multi magnetic domain), and fabrication would be more challenging to print smaller sized monitor MTJs. In [30], an adaptive write scheme has been proposed for STT-MRAM. Slow switching MTJ columns are marked, and are written with a boosted current. However, temperature variation was not considered. In [31–33], several self-monitored programming schemes have been proposed, where write current is terminated once an MTJ switching is detected. Two main drawbacks of such schemes exist: 1) with a write current through an MTJ, its resistance gets easier to oscillate due to the stochastic switching behavior (i.e., fluctuation of magnetization), where a false switching (i.e. resistance
changes and recovers back) leads to a false resistance change detection, and then a false write termination and a write error. 2) The monitoring operation is performed every write operation adding to energy overhead. In [34], a current boosting scheme has been proposed. In this scheme, a write current is boosted up if the MTJ state has not toggled after certain write time.

In [35], a variation-tolerant sensing scheme has been proposed to use a same sensing path to sense data MTJ and reference resistor, which eliminated variation impact from CMOS transistors. But large systematic-variation induced read disturbance was not handled, e.g. 100 °C temperature change, which can be handled by the proposed monitor in this work. In [36], to avoid read disturbance rate, one more terminal is added to the two-terminal MTJ. During a read operation, the net torque acting on the storage cell always acts in a direction to refresh the data stored in the cell. However, three terminals make it difficult to access the MTJ as well as hurt cell density. Other recent works have approached the read disturbance mitigation from different angles [37–41].

4 WRITE ERROR AND READ DISTURBANCE RATE UNDER VARIATION

The switching behavior of STT-MRAM and MeRAM are affected by temperature and free layer thickness ($t_{FL}$) [14, 42]. We simulate the switching behaviour of STT-MRAM and MeRAM under different $t_{FL}$ and temperature corners to obtain WER using an LLG-based numerical simulator including temperature dependence, VCMA effect, STT effect, and thermal fluctuation, which has been verified against experimental data in [10]. In the simulations, the $t_{FL}$ variation is assumed to be within 5% across wafer [18], the temperature varies from 270K to 370K, and the local variations including resistance variation are simply treated as random Gaussian variation in the simulations together with variation of access transistors [43] due to line edge roughness, and random doping fluctuation.

The WER of STT-MRAM and MeRAM under different temperature and $t_{FL}$ corners are shown in Fig. 3. According to simulation results, the variation can change WER by over 1,000X. The WER of STT-MRAM is mainly affected by temperature only, while MeRAM is affected by both $t_{FL}$ and temperature. To reduce WER, adaptive write pulses should be chosen according to the temperature and process variation.

The read disturbance rate of STT-MRAM under variation is shown in Fig. 4. In STT-MRAM, P-to-AP is selected as the read current direction due to its high resistance to spin polarized switching and hence results in lower read disturbance rate than AP-to-P switching. As expected, read disturbance increases with read voltage and temperature. Thicker free layer thickness also increases read disturbance because of the reduced perpendicular magnetic anisotropy and hence thermal stability [44]. Fortunately, MeRAM is free from read disturbance because its read current direction is opposite to the direction that can switch the MTJ. Actually, the read current strengthens data retention rather than destroying it. Overall, the variations can shift read disturbance rate by over 10X.

5 MTJ BASED VARIATION MONITOR

In this section, we propose an MTJ-based variation monitor offering a cheaper solution for in-situ variation monitoring application than exhausting chip testing and expensive conventional thermal monitors. The monitor senses combined temperature and wafer-level $t_{FL}$ variation.

5.1 Sensing Principle

Monitoring variation through directly WER measurement is expensive, which requires large number of writes and reads. The proposed monitor utilizes thermal activation and VCMA effect to indirectly monitor variation by sensing the thermal activation rate in MTJs under different stress voltage and current.
Ref. [48x198] Then P activation switching rate can be obtained by performing is more possible to be thermally activated, while the top MTJ

\[ t_{R,STT} = \exp(\Delta (1 - I_{MTJ}/I_C(\Delta))) \]
\[ t_{R,VC} = \exp(\Delta (1 - V_{MTJ}/V_C(\Delta))) \]

As described by (1) [45, 46], the retention time (i.e., the mean of switching time under non-write state) of STT-MTJ \( t_{R,STT} \) and VC-MTJ \( t_{R,VC} \) exponentially depends on thermal stability \( \Delta \), proportional to energy barrier, critical current of STT-MTJs \( I_C(\Delta) \), and critical voltage of VC-MTJs \( V_C(\Delta) \). The current and voltage across STT-MTJ and VC-MTJ respectively can shorten retention time. The write pulse width and voltage that create instantaneous switching (<10ns) for STT-MRAM and MeRAM depend on \( I_C(\Delta) \) and \( V_C(\Delta) \), which also depend on \( \Delta \). This indicates that knowing the \( t_{R,STT} \) and \( t_{R,VC} \) changes due to temperature and process variation can predict the MRAM write behavior change.

5.2 Circuit Implementation and Simulation

Retention time of MTJs is too long to be measured directly. Fortunately, we observe that, as illustrated by the Equation (1), applying current/voltage on MTJs reduces retention time exponentially. This observation is demonstrated in experiment measurement, where retention time decreases exponentially with increasing stress voltage due to VCMA effect in Fig. 5. Inspired by this observation, we introduce a stress operation in the proposed variation monitor. We apply low voltage or current across MTJs to reduce retention time and hence to increase thermal activation rate, and we call them stress voltage or stress current for simplicity.

\[ P_{SW,STT} = 1 - \exp(-t_S/t_{R,STT}) \]
\[ P_{SW,VC} = 1 - 1/2 * \exp(-t_S/t_{R,VC}) \]

When the retention time reduces to sub-\( \mu s \), the MTJ switching rate \( P_{SW} \) due to thermal activation during stress time \( t_S \) in tens of ns can be measured as explained in Eqn. (2). Then \( P_{SW} \) (correlated to \( t_{R,STT} \) and \( t_{R,VC} \)) inherently reflects the ambient variation.

We use an example in Fig. 7 to simply illustrate the proposed sensing principle. The top MTJ is assumed to have retention time of 10 years, while the bottom one suffers from variation and has retention time of only 10 hours at normal conditions. To sense the variation difference, we apply the same stress voltage across the two MTJs. As stated in this section, their retention time are exponentially reduced. They reduced to 100 \( \mu s \) and 10 ns respectively. During the voltage stressing time of 20 ns, the bottom MTJ is more possible to be thermally activated, while the top MTJ state most likely remains unchanged. Therefore, thermal activation switching rate can be obtained by performing more such tests on single MTJ or an MTJ array. We choose to do tests simultaneously on an array to speed up sensing operation. We set a threshold for the thermal activation rate. If the switching rate reaches selected threshold after a stress operation, the stress level is output to reflect ambient variation. Otherwise, the monitor continues to try a higher stress level of voltage/current.

The monitor design is shown in Fig. 6. To minimize fluctuation of sensing results caused by MTJ stochastic switching, a number of MTJs are sensed simultaneously, and hence the individual stochastic switching fluctuation is averaged out. In a stress operation, all MTJs in the monitor are in AP state initially. The write control circuit applies a stress current (for STT-MRAM) or voltage (for MeRAM) simultaneously on all MTJs in the monitor array for 20ns. The stress current (for 256-MTJ bit-line) ranges from 2.5mA to 10mA, which is precisely controlled by the effective width of transistors in the stress current selection array, where the stress current variation is close to 0 due to the large transistor width guaranteeing monitor accuracy. The stress voltage on VC-MTJs is adjusted by dividing voltage on bit-lines and resistors (vary from 200Ω to 700Ω). The stress voltage variation is also close to 0 because the equivalent parallel resistance of all VC-MTJs on a bit-line averages out individual MTJ resistance variation.

After a stress operation, the read control circuit selects each MTJ one by one and reads its state. In the read, the bit-line \( (BL) \) and reference bit-line \( (BL_{ref}) \) are pre-charged and pulled down by the read MTJ and reference resistor separately. The difference between \( V_{sense} \) and \( V_{ref} \) creates an output to S Latch, and a switched MTJ raises S’s output from 1 to 0, then the XOR of S Latch and D Latch (output is constantly 1) creates a rise edge, which is counted by Counter2. At last a switched MTJ is reset by a write pulse for future stress operations.

We simulate the monitor design using a 65nm technology node commercial cell library. Please note that the simulation results in this section are compared with other works [47–50] designed in 65nm. In the following sections, designs are simulated with advanced 32nm technology. The stress pulses are shown in Fig. 8 (a). Stress current has \(< 0.3\% \) and \(< 4.7\% \) variation due to temperature \( (27^\circ C \) to \( 100^\circ C) \) and oxide thickness variation \( (9\% \) resistance change) respectively, while stress voltage has \(< 1\% \) and \(< 2\% \) variation accordingly. In addition, switched MTJs \( (e.g., 30\%) \) during stress time can cause up to \( 10\% \) and \( 2\% \) stress current and voltage change respectively. The low variation demonstrates the proposed monitor accuracy.

Fig. 8 (b) shows the simulated waveforms of read, counting, and reset operations. The first and third readings are performed on switched MTJs, where write pulses follow reads to reset MTJs, and the counter increases because of the detected MTJ switching. The second read is on a non-switched MTJ, and hence no action is taken after the read. If the counted number reaches the selected threshold \( (e.g., 64 \) out of \( 256 \) MTJs), it sends out a completion signal and outputs the current stress level, which presents the ambient variation level. If the selected threshold is not reached after reading all MTJs, the counter is reset, and a higher stress level is selected in the next variation sensing cycle. We simulate the circuitry to obtain the switching rate
Fig. 6: The schematic of STT-MRAM and MeRAM based variation monitor. Variation monitoring operations: 1) apply stress voltage/current on MRAM array controlled by stress voltage/current selection circuitry; 2) select every MTJ (controlled by MTJ selection circuit) one by one to read and count MTJ switching rate (controlled by sensing and switched MTJ counting circuit).

and standard deviation ($\sigma$) of a 256-MTJ variation monitor with different stress levels and variation corners as shown in Fig. 9. If we select a switching rate threshold to any value between 10% to 30%, the voltage levels to reach the threshold for different variation levels (10°C temperature difference between two consequent curves) can be well differentiated, e.g., the dotted curves show the standard deviation (accuracy of the monitor) is much smaller than curve gaps, and the variation levels can be determined. Hence, temperature variation of 100°C can be distinguished with ten stress levels, achieving the accuracy of 10°C.

Previously, we show that the proposed monitor can sense thermal stability by appropriately selecting stress voltage levels. With sensed thermal stability, it can assist to optimize MRAM variability and reliability. However, the stress voltage level selection is not straightforward. We use one example application to show how these levels are selected. In this example, the monitor can warn retention hazard when MTJ's retention error rate reaches a threshold $E_r$. Though $E_r$ is too low to be easily sensed, we are able to find a stress voltage $V_s$ such that stressing such MTJ for 20ns can increase the switching rate to 20%. When stress time and stressed switching rate threshold are given, $V_s$ is only determined by $E_r$. The mapping of $V_s$ and $E_r$ can be extracted from chip test. Therefore, in this application, the proposed monitor reaching switching rate threshold with stress voltage $V_s$ indicates MRAM arrays have average retention error rate over $E_r$. Multiple stress voltage levels may be introduced for other applications like the adaptive write in Section 6.

Table 1 shows the comparison between the proposed variation monitor and conventional thermal monitors. In conventional monitors, long latency and high energy are consumed by analog-to-digital blocks and sensing bipolar transistors. The proposed monitor is less accurate but faster with lower energy/sample and smaller area. Larger sensing array can improve the accuracy by reducing the standard deviation ($\sigma$) (Fig. 9) allowing for using finer granularity of stress levels at the expense of sensing energy and latency. In addition, the granularity of stress current/voltage is also

![Fig. 7: The simplified illustration of the proposed sensing principle. Two MTJs with variations have different activation rate after voltage stressing.](image)

![Fig. 8: (a) Different stress current/voltage in the proposed monitor. (b) Simulated waveforms of read, reset and counting operations.](image)
6 ADAPTIVE WRITE

6.1 Adaptive Write Scheme

The adaptive write scheme is to dynamically select an optimized pulse width (voltage) for STT-MRAM (MeRAM) to minimize write latency according to ambient variation. Multiple pulse widths are simply implemented by delay circuits shared by multiple bit-lines, and hence introducing negligible overhead. Generating multiple write pulse voltages requires voltage regulators, which are also shared by the proposed variation monitor. For each group of variation

TABLE 1: Comparison between conventional thermal monitors and the proposed variation monitor. The proposed monitor uses 256 MTJs and 10 stress levels

<table>
<thead>
<tr>
<th>Monitor</th>
<th>Latency</th>
<th>Accuracy</th>
<th>Energy</th>
<th>Area</th>
</tr>
</thead>
<tbody>
<tr>
<td>S1 [47]</td>
<td>0.1ms</td>
<td>9°C</td>
<td>0.045μJ</td>
<td>0.115m^2</td>
</tr>
<tr>
<td>S2 [48]</td>
<td>0.2ms</td>
<td>3°C</td>
<td>0.24μJ</td>
<td>0.05m^2</td>
</tr>
<tr>
<td>S3 [49]</td>
<td>1ms</td>
<td>2°C</td>
<td>0.49μJ</td>
<td>0.07m^2</td>
</tr>
<tr>
<td>S4 [50]</td>
<td>100ms</td>
<td>0.1°C</td>
<td>13.8μJ</td>
<td>0.4m^2</td>
</tr>
<tr>
<td>this(STT)</td>
<td>1-10μs</td>
<td>10°C</td>
<td>0.12-1.2nJ</td>
<td>0.005m^2</td>
</tr>
<tr>
<td>this(Me)</td>
<td>1-10μs</td>
<td>10°C</td>
<td>0.27-2.7nJ</td>
<td>0.005m^2</td>
</tr>
</tbody>
</table>

![Fig. 9: Switching rate of (a) STT-MTJ- and (b) VC-MTJ-based variation monitor under different stress current and voltage respectively. The color lines are switching rate for only temperature variation (10°C interval). The dot lines outline standard deviations (σ) of thermal activation rate (σ is caused by process variation and random thermal activation).](image)

![Fig. 10: Optimal write pulses for (a) STT-MRAM and (b) MeRAM under different t_FL and temperature corners.](image)

the entire MRAM array. Local variations like temperature variation over MRAM array [15] can be captured by placing multiple proposed monitors. One such monitor only uses one bit-line with an area overhead of <0.005% (i.e., adding monitor circuits in MRAM boundary does not affect MRAM fabrication regularity). The monitor also consumes negligible power (i.e., 2.7nW for one variation sample per second) compared with power of MRAM array (>10 mW).

6.2 Adaptive Write using Variation Monitor

In this section, we evaluate the write scheme with the proposed variation monitor. The write circuit for MRAM is designed to enable program-and-verify [51] which performs a read check following a write (the writing data is prestored in D Latch in Fig. 6), and if a write error is detected, additional writes are performed to correct the error. With this, 0 WER is guaranteed for MeRAM and STT-MRAM irrespective of the single write pulse voltage/width. For STT-MRAM, shortening single write pulse leads to reduction in both latency and energy for a single write trial, As a trade-off, WER increases as well as the chance of additional writes, possibly adding overall latency and energy. There is an optimal single write pulse achieving minimum expected write latency. Such optimal pulse can reduce STT-MRAM's expected latency and energy by over 60% compared with conventional write design [10]. The optimal pulse widths (voltages) for minimum expected latency (including initial write, read checks, and additional writes) of STT-MRAM (MeRAM) are shown in Fig. 10. The pulse width for STT-MRAM spans from 4.25ns to 6.75ns mainly affected by temperature. The voltage range for MeRAM is from 1.05V to 1.75V affected by both temperature and t_FL.

In the following evaluation, the combined temperature and t_FL corners are divided into groups based on the variation monitor’s outputs (stress levels reaching P_{SW} threshold). Each group has an optimized write pulse minimizing the maximum write latency in the group. More write pulse choices (equal to stress levels) result in shorter programming latency.

Our evaluation flow is illustrated in Fig. 11 (a). We simulate the peripheral circuit (see Fig. 6) with a bit-line size of 256 MTJs using 32nm commercial library and simulate the WER of MTJs with LLG-based numerical model. In the bit-line programming simulations, 10 temperature variation corners from 270K to 370K and three wafer-level free layer thickness variation corners of 0.06nm are enumerated. The 30 temperature-process variation-corner combinations are classified into groups according to the output levels from the proposed variation monitor. For each group of variation

Constrained by process variation of CMOS circuit. Fortunately, the achieved accuracy is enough for selecting optimal write pulse and reliable read voltage for STT-MRAM and MeRAM (i.e., Sections 6.1 and 7 show that three stress levels are enough) indicating that the proposed monitor supports the proposed adaptive write and read schemes with less overhead. The area of the monitor is dominated by the 8-256 decoder (97.1% of total transistors). The area of 8-256 decoder was estimated through synthesize, place and route using commercial 65nm library.

FIG. 9: Switching rate of (a) STT-MTJ- and (b) VC-MTJ-based variation monitor under different stress current and voltage respectively. The color lines are switching rate for only temperature variation (10°C interval). The dot lines outline standard deviations (σ) of thermal activation rate (σ is caused by process variation and random thermal activation).
corners, the maximum write latency is minimized by selecting one optimal write pulse (pulse width for STT-MRAM and pulse voltage of MeRAM).

Bit-line-level results show that STT-MRAM has write latency variation from 5.5 ns to 7.5 ns and MeRAM has that from 4 ns to 10.1 ns. With the inputs of bit-line results, we use NVSIM [52] to obtain latency and energy of MRAM array (cache). In Fig. 13, the write latency of STT-MRAM L2 Cache with different $t_{FL}$ corners is shown to decrease with increased number of pulse choices, and each point is the maximum or average latency over 10 temperature corners from 270K to 370K. The maximum write latency of STT-MRAM is improved by up to 17%. The maximum latency for $t_{FL}$ corner of 1.17nm does not see improvement because the corner with 1.17nm $t_F$ and 270K is always the worst one to be optimized no matter how many groups (pulse width choices) are used. MeRAM’s write latency reduction is up to 59%, but there is a latency jump for $t_{FL}$ of 1.19nm from one to two voltage choices. This is because when only one group (single write voltage) is used, the optimal voltage of 1.19nm $t_{FL}$ corner is close to the optimal voltage for all corners (i.e., the voltage to minimize WER for all corners in Fig. 3b), but when two groups are used, the optimal voltage for 1.19nm corner gets farther from those for both groups. As seen, three choices are efficient enough for write latency improvement.

We modified gem5 [53] (i.e., original Gem5 only has fixed cache write time, we have added the support for varying cache write time, which is necessary for MRAM evaluation) to simulate two cases: 1) an x86 processor with one core and one single-level 8-MB MRAM data cache; 2) an x86 processor with two cores, two 1-Mb MRAM L2, and one 16-MB MRAM L3 caches (L1 uses default SRAM). We modified McPAT [54] to simulate processor power and used Hotspot [55] to simulate MRAM temperature with the structure shown in Fig. 11b.

We simulated one billion instructions of SPEC benchmarks using our evaluation flow. The application run time reduction with adaptive write is shown in Fig. 12. The processors with single-level MRAM see noticeable application speedup after using adaptive write, where up to 41% and 9% run time reduction are shown for MeRAM and STT-MRAM respectively. However, the improvement is much less for processors with MRAM L2 and L3 (up to 10% and 2% for MeRAM and STT-MRAM respectively), because cache write latency improvement is hidden by SRAM L1. This indicates that the adaptive write scheme may be more efficient for embedded applications with single-level MRAM cache. Compared with MeRAM, STT-MRAM write latency improvement is not significant.

### 6.3 Cache Power Saving

In the adaptive write proposed in this paper, we aim to improve write latency for both STT-MRAM and MeRAM regardless of the power. Fortunately, the cache power is also reduced with increased number of write pulse choices as an additional benefit of the adaptive write. For STT-MRAM, more pulse choices lead to shorter overall programming time and possibly less MTJ switching time indicating energy reduction. This is because driving current to switch MTJ dominates power consumption, and less programming time usually leads to less energy. For MeRAM, the adaptive write chooses appropriate write voltage to reduce WER, indicating less additional program-and-verify cycles. The energy of MeRAM is dominated by repeated bit-line charging and discharging and hence less cycles give rise to energy reduction. Fig. 14 shows that the maximum and average power of L3 Cache over different variation corners decrease with increased pulse choices. Again, the adaptive write in this paper is designed for latency reduction, but it can also be designed for power reduction alternatively, which will achieve even more energy reduction than Fig. 14.

### 7 Adaptive Read

To improve the STT-MRAM read reliability and efficiency, MTJ sensing margin should be maximized while maintaining a read disturbance rate below the error-correcting-code’s (ECC) tolerable rate [56]. This is non-trivial because of the tradeoff between sensing margin and read disturbance. To improve sensing margin, a large sensing current is required to create more voltage difference, which however increases read disturbance rate. Moreover, the sensing margin and read disturbance rate are also severely affected by process and temperature variation. Simply designing for the worst variation corner will lead to insufficient reliability margin. To resolve this issue, we propose an adaptive read scheme which dynamically control sensing circuits according to process and temperature variations. This scheme can improve sensing margin without sacrificing read disturbance.

Read disturbance rate depends on STT-MTJ thermal stability, which varies with sensing current amplitude, free layer thickness and temperature. On the other hand, sensing margin also depends on sensing current amplitude and temperature. This is because STT-MTJ resistance, which strongly affects sensing margin, has strong dependence on temperature, especially for the AP resistance as illustrated in Fig. 15. Therefore temperature variation is important to both read disturbance and sensing margin, and fortunately it can be monitored. Together with temperature, wafer-level free layer thickness variation, which affects read disturbance, can be monitored by the proposed variation monitor. Therefore, according to outputs from a conventional temperature monitor and the proposed variation monitor, the proposed adaptive read is able to select between two
reference resistors and two read voltages to improve MRAM read reliability. Read voltage selection is to maintain read disturbance rate within ECC’s capability [56], where the selection is based on MRAM thermal stability monitored by the proposed monitor. Reference resistor selection is to improve sensing margin according to MTJ temperature-related resistance change assisted by a conventional temperature monitor.

### 7.1 Adaptive Sensing Circuit using Multiple Reference Resistance

As stated in Section 4, STT-MTJs with low thermal stability are susceptible to read disturbance. Where high read voltage, high temperature and low free layer thickness usually result in low thermal stability. The level of temperature and free layer variation are obtained using the proposed monitor (Section 5). To select a read voltage in the adaptive read, one threshold stress current level is set in the variation monitor: when the monitor’s output (variation-induced thermal stability change) is below the threshold, a high read voltage is selected, and vice versa. For temperature dependence of STT-MTJ resistance, the AP resistance changes dramatically with temperature [20], and the change is approximately linear, while the P resistance is more stable. Therefore low TMR and low sensing margin presents at high temperature. Experiment data [57] shows that TMR drops from 192% at 4.2K to 90% at room temperature, and the TMR will further drop at higher temperature like chip operating temperature (e.g., over 80°C). To improve the sensing margin, a low and a high reference resistors are selected at high and low temperature respectively as illustrated in Fig. 15. The resistor selection is controlled by an on-chip temperature monitor. Again, one threshold temperature is needed for the selection, which can be obtained from experimental data or empirical models like in [20].

The proposed sensing circuit is shown in Fig. 16. High and low read voltages are selected by the signal “Low ∆”, which is an output from the proposed variation monitor. Reference resistors are selected by the signal “Low temperature”, which is an output from an on-chip temperature monitor.

We conduct an example evaluation. First we fitted an
MTJ resistance model from [20]. Then we simulated and fitted a read disturbance model based on an MTJ switching model from [10]. In the models, MTJ AP resistance drops from 5k Ω (15 °C) to 3.33k Ω (120 °C), and P resistance drops from 2k Ω to 1.8k Ω. In our sensing circuit, the current direction was chosen to the direction of P-to-AP switching current for the reason that P-to-AP switching is more resistant than AP-to-P switching, hence our selection gives lower read disturbance rate. Therefore, only P MTJs are possibly disturbed in a read operation. In the evaluation, the tolerable read disturbance rate by ECC is 10^{-9} [56]. According to Fig. 4, we used 0.66V and 0.78V as low and high read voltages, giving rise to 100 mV and 150 mV voltage drops across P MTJ respectively. In addition to temperature variation, we also considered another 10% resistance variation due to process variation which is not monitored. Hence the reference resistance should be designed to sense 90% \( R_{AP}(T) \) and 110% \( R_{P}(T) \), where T is temperature. To maximize sensing margin, 3.25k Ω and 2.85k Ω were chosen as the high and low reference resistances. CMOS sensing circuit were simulated using SPICE with a Verilog-A MTJ model [10] and 32nm commercial library (temperature models included).

The STT-MRAM will normally work at room temperature with low process variation, where the adaptive read scheme selected the high reference resistor and the high read voltage. As a comparison, the conventional non-adaptive read design has to be designed for the worst variation corner (high temperature and strong process variation), which uses a low read voltage and a low reference resistor. We performed circuit simulations of the proposed adaptive read design and the conventional non-adaptive read design. The sensing waveforms are shown in Fig. 17. The sensing margin \( (V_{in} - V_{ref}) \) was improved from 26.8 mV of non-adaptive read to 37.8 mV of adaptive read. In the meantime, the read disturbance rates for both designs are controlled below 10^{-9} all the time.

### 7.2 Adaptive read for lower disturbance rate

At some temperature and process variation corners, an MTJ is easily disturbed by high read current, creating high read disturbance rate. With the proposed monitor, an adaptive read scheme is proposed to dynamically lower read voltage at such corners to reduce read disturbance rate, leading to long service time before a failure. To evaluate its benefits on system reliability, we simulate the failure rates of MRAM systems with or without adaptive read using an memory reliability simulator MEMRES [56]. The simulator enables fast memory reliability simulation with system-level reliability management including ECC, page retirement, memory mirroring, memory scrubbing and rank sparing.

<table>
<thead>
<tr>
<th>Ranks</th>
<th>Chips</th>
<th>banks</th>
<th>Mats</th>
<th>Rows</th>
<th>Columns</th>
<th>Access-Rate</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>16+2</td>
<td>8</td>
<td>128</td>
<td>512</td>
<td>8192</td>
<td>1e12/hour</td>
</tr>
</tbody>
</table>

TABLE 3: Fault FIT rates for STT-MRAM. The read disturbance error rates are for STT-MRAM (\( t_{FL} = 1.2nm \)) under 320K and 370K using adaptive read and non-adaptive read.

<table>
<thead>
<tr>
<th>Fault types</th>
<th>Transient FIT</th>
<th>Permanent FIT</th>
<th>Cover-Rate</th>
</tr>
</thead>
<tbody>
<tr>
<td>Single-word</td>
<td>1.4</td>
<td>0.3</td>
<td>1</td>
</tr>
<tr>
<td>Single-column</td>
<td>1.4</td>
<td>3.6</td>
<td>1</td>
</tr>
<tr>
<td>Single-row</td>
<td>0.2</td>
<td>8.2</td>
<td>0.002</td>
</tr>
<tr>
<td>Single-bank</td>
<td>0.8</td>
<td>10</td>
<td>0.002</td>
</tr>
<tr>
<td>multi-banks</td>
<td>0.3</td>
<td>1.4</td>
<td>0.002</td>
</tr>
<tr>
<td>single-lane</td>
<td>0.9</td>
<td>2.8</td>
<td>0.002</td>
</tr>
<tr>
<td>Read disturbance error rate</td>
<td>non-adaptive read</td>
<td>5.3e-7(370K), 3.43e-8(320K)</td>
<td>adaptive read</td>
</tr>
</tbody>
</table>

The tested 8-GB STT-MRAM configuration is shown in Table 2. In the simulation, a single-error correction-and-double-error detection (SECDDED) [58] is enabled to correct any single-bit error in a 72-bit word (64 data bits and 8 parity bits). SECDDED is very efficient to correct read disturbance error for that MRAM read disturbance causes one bit flip in a word. The MRAM also enables scrubbing function, which periodically scans entire memory and fixes...
all detected soft errors, e.g., MRAM retention error, read disturbance error. The two methods are most cost-effective for read disturbance error. Table 3 shows injected fault FITs in the simulation. All fault types and FIT are obtained from DRAM field studies [59, 60] except the read disturbance rate, because MRAM and DRAM share similar peripheral circuits, and those faults are mostly caused by peripheral circuit failures.

![Graph showing failure rates of STT-MRAM in a 7-year operation.](image)

We performed 50,000 simulations of 7-year long STT-MRAM operation. The MRAM system fails only when the ECC cannot correct faults. Though read disturbance error itself can be corrected by SECDED, the coincidence of read disturbance error and other faults, e.g., column fault, in a single word can result in an ECC failure. The failure rates (accumulated failure probability) are plotted in Fig. 18. As seen from the results, adaptive read can relative reduce system failure rates and extend memory service time by about half to one year. Among all failures, read disturbance only accounts for about 22% to 24% failures for non-adaptive read cases, and 5% to 10% for adaptive read cases. If we focus on the read disturbance related failures, adaptive read reduces them by 84% and 59% respectively for 320K and 370K cases. This demonstrates the effectiveness of adaptive read.

8 Conclusion

We design an MTJ-based variation monitor to sense process and temperature variation. At the same accuracy, the variation monitor achieves 20X smaller area, 10X faster speed, and 5X less energy. We propose an adaptive write scheme to minimize write latency of STT-MRAM and MeRAM according to ambient process and temperature variation. The write latency of STT-MRAM and MeRAM cache is reduced by up to 17% and 59% respectively, while simulated application run time shows up to 1.7X improvement. We also propose an adaptive read design to improve sensing margin while maintaining read disturbance rate with ECC’s capability. It dynamically selects between two read voltages and two reference resistors according to chip temperature and process variations. This scheme can improve the sensing margin by 1.4X against non-adaptive read. To further mitigate read disturbance impact on memory system, adaptive read can dynamically lower read voltage according to the proposed monitor result. It can extend memory service time by half to one year, and reduce read disturbance induced memory failure by 59% to 84%.

References

Shoaodi Wang (S'12) is currently a researcher in the NanoCAD lab at Department of Electrical Engineering, UCLA. Shoaodi received the Ph.D. degree in electrical engineering from UCLA, in 2017, and the B.S. degree from Peking University in 2011.

Hochul Lee (S'13) received his B.S. in Electrical Engineering from Korea University, Seoul, South Korea in February 2005. In September 2005, he joined Semiconductor Material Device Lab (SMDL) in Seoul National University (SNU) to pursue his M.S degree. After graduation, he worked for Samsung Electronics Flash memory circuit design team until July 2012. He joined in UCLA DRL and is currently a Ph.D. candidate exploring MTJs based hybrid CMOS circuit.

Cecile Grezes (M'15) received the B.Sc. degree in Physics and Mathematics from the Université Joseph Fourier, Grenoble, in 2008, the M.Sc. in Physics from the Ecole Normale Supérieure, Paris, in 2011, and the Ph.D. degree (cum laude) in physics from CEA Saclay/Université Pierre et Marie Curie, Paris in 2014.

Pedram Khalili Amiri (M'05) received the B.Sc. degree from the Sharif University of Technology in 2004 and the Ph.D. degree (cum laude) in electrical engineering from Delft University of Technology in 2008. He is an Assistant Adjunct Professor at the EE Dept. of University of California at Los Angeles since 2009.

Kang L. Wang (F'92) received the B.S. degree from the National Cheng Kung University, Taiwan, and the M.S. and Ph.D. degrees from the Massachusetts Institute of Technology, Cambridge, MA, USA. He is currently a Distinguished Professor and holds the Raytheon Chair Professor in physical science and electronics with the Electrical Engineering Department, University of California at Los Angeles, Los Angeles, CA, USA.

Puneet Gupta (M'07-SM'16) is currently a faculty member of the Electrical Engineering Department at UCLA. He received the B.Tech degree in Electrical Engineering from Indian Institute of Technology, Delhi in 2000 and Ph.D. in 2007 from University of California, San Diego. He co-founded Blaze DFM Inc. (acquired by Tela Inc.) in 2004 and served as its product architect till 2007.