Understand SSD overheating and what to do about it

Blog

HomeHome / Blog / Understand SSD overheating and what to do about it

May 27, 2023

Understand SSD overheating and what to do about it

Getty Images/iStockphoto Even though storage vendors like to position their products as "cool," the truth is that storage hardware generates heat -- a lot of it. Too much heat in an SSD can weaken its

Getty Images/iStockphoto

Even though storage vendors like to position their products as "cool," the truth is that storage hardware generates heat -- a lot of it. Too much heat in an SSD can weaken its performance and endurance.

There are several reasons why an SSD overheats. A heat sink is just one way to mitigate the problem.

Most of the commonly used SSDs in commercial and consumer applications are at risk of overheating. SSDs can get hot for a variety of reasons. The root issue is the property of electrical resistance, a ubiquitous problem in all electronics. SSDs are no exception.

Heat was not always a problem for SSDs. Earlier generations of the technology, such as simple, low-performance SATA SSDs, did not have much of a thermal problem. Today, when people talk about overheating SSDs, they are almost always referring to high-performance SSDs that use the NVMe interface specification. Current high-performance NVMe SSDs offer higher data transfer rates than their predecessors. They have far greater processing capability than what came before. All this additional, dense hardware and higher rates of storage activity translate into heat.

How hot is hot? A typical consumer-grade NAND memory chip functions at temperatures ranging from 0 to between 70 and 85 degrees Celsius (158 to 185 degrees Fahrenheit). Without a heat sink, a Gen3x4 SSD will reach 70 degrees Celsius within three minutes, assuming an ambient temperature of 25 degrees Celsius. A Gen4x4 SSD will hit 70 degrees in 40 seconds. When the chip reaches 70 degrees Celsius, problems start.

This issue is more serious as SSDs accelerate with the evolution of PCIe technology, which is now heading to Gen5. The challenge for SSD makers is to keep increasing performance, while dealing with the heat generated by the SSD controller and other components.

Electrical resistance is the primary reason SSDs overheat. Other factors can exacerbate this basic physical law. An M.2 NVMe SSD can execute millions of processes simultaneously. This increases with each SSD generation.

Also, the NAND flash is not functioning in isolation. The drive is typically housed in a piece of hardware that may also contain a controller integrated circuit and other heat-generating electronics packed onto limited printed circuit board (PCB) space. The SSD may be designed with multiple die stacking per chip. In some cases, the design is double-sided, which is good for space efficiency but acts as a sandwichlike insulator to the interior copper PCB.

If the SSD is housed in an enclosure that has limited to no airflow, the heat problem will get worse. If the platform is fanless, that will further compound the cooling challenges. The ambient temperature of the device containing the SSD, along with the temperature of the room where it is located, also contributes to SSD thermal issues. While this may be less of an issue in a well-cooled data center, if the SSD is running in a high-speed PC with other devices on the motherboard generating heat, the ambient environment can easily reach 50 degrees Celsius. At that temperature, the drive is on the verge of exceeding its heat limits even in the idle state.

Overheating worsens an M.2 NVMe SSD's performance and causes damage to its data retention and endurance. SSDs retain data by trapping electrons in the transistor gate. By detecting the number of electrons, the SSD distinguishes between the zeros and ones that make up digital data.

Excessive heat causes an increase in the energy of the electrons in the drive's charge trap/floating gate, making it easier for them to escape, which means higher numbers of bit errors. If there are too many bit errors, uncorrectable errors occur.

Besides, the temperature changes during the SSD device operation can also lead to the "cross-temperature" effect, where the drive writes at a low temperature but reads at a high temperature. As the temperature moves from low to high or high to low, the threshold voltage shifts significantly, causing fail bits to occur.

To protect SSDs from poor data retention caused by overheating, the thermal throttling mechanism has been designed and widely implemented in the controller firmware. When the chip reaches 70 degrees Celsius, the SSD will activate its thermal throttling mechanism, which lowers performance to enable the chips to cool. This can improve data retention and endurance, but user experience suffers due to deceleration of performance.

However, a good thermal throttling design can lead to an SSD with the least performance reduction in exchange for maximum cooling.

Seeing that 80 degrees Celsius should be an SSD's upper limit for temperature, manufacturers need to supply a cooling mechanism for the SSD. Without one, the drive will quickly heat beyond 70 degrees Celsius, which diminishes data integrity and endurance. Several options are available.

In some cases, such as low-intensity use, airflow in the computer case or around the motherboard is enough for a drive to maintain acceptable temperatures. For more high-speed operations, a heat sink serves to dissipate heat from the drive.

Heat sinks come in two basic varieties. An active heat sink attaches directly to the SSD. It uses fans to cool it down.

In contrast, a passive heat sink cools the SSD through heat transfer, for example, through a slab of conductive metal attached to the SSD. This setup continuously pulls the heat it generates and dissipates it into the air. This is sometimes called a heat spreader.

Passive heat sinks offer several advantages over their active counterparts. They don't generate noise, they are not bulky and they also tend to be less expensive. The limitation of the passive heat sink is that it can't be switched into a higher rate of heat reduction if the SSD goes into a high-speed mode of operation. Its cooling capabilities are fixed.

One under-the-radar mechanism for heat dissipation is a metal foil label, which pulls heat from the chip. Copper is preferable to aluminum.

A combination of sufficient, consistent airflow and a heat sink is the best approach. This assumes, of course, that the input air temperature is low enough to reduce heat on the SSD.

Some motherboard vendors make integrated heat sinks out of solid blocks of aluminum for M.2 NVMe SSDs. They function as a heat buffer instead of a heat sink because that much material can absorb heat without having sufficient surface area to dissipate it. This design gives the drive more time before needing to throttle.

Motherboard manufacturers have taken a leading role directly from the factories with increased density in passive cooling capabilities and even some highly customized active coolers. Cheaper motherboards now often ship with passive coolers, but the motherboard manufacturers have delivered an upgrade path to active cooling for sustained workloads to run at high speeds uninterrupted.

Given the risks of overheating an SSD, cooling would seem to be a natural countermeasure. However, cooling comes with its own problems.

On the pro side, cooling eliminates thermal throttling to enable sustainable write and read speeds, as well as increased drive life span and better data retention. This holds up even during periods of prolonged use.

Cons include costs and the space these cooling products can take up.

About the author Rick Wang is technical marketing engineer at Phison. His responsibilities include automotive solutions marketing, planning and business development. In addition, Wang is responsible for market analysis at Phison in the fields of embedded and emerging storage offerings that include industrial, smart TVs, drones, virtual/augmented reality and blockchain. Wang holds a master's degree in materials science and engineering from National Taiwan University.

About the author