High temperature when not in use (and after being ejected)

Hi, when I use the Plugable NVMe enclosure (the Realtek controller) on Ubuntu 20.04 (kernel 5.4), the drive heats up to a temperature where I can not physically touch the drive, even when the drive is not in use. In this case, the drive is only plugged in and is not mounted by the system. This happens even after the drive is “ejected” and no longer recognized by the system (/dev/sda no longer exists). I don’t know the exact temperature since smartctl does not recognize the USB bridge, so I cannot get SMART statistics.

On the other hand, if I boot from this enclosure over USB rather than using it like a typical USB drive, it does not heat up when not in use. This makes me worry that the temperature control is not working properly when I’m using it as non-boot, regular USB storage.

Is there a way to fix or debug this heating problem? I could understand it becoming hot when in use, but in this case the drive is not even mounted.

# lshw
[...]
*-usb
    description: Mass storage device
    product: RTL9210
    vendor: Realtek
    physical id: 1
    bus info: usb@4:1
    logical name: scsi0
    version: 31.00
    serial: [snip]
    capabilities: usb-3.20 scsi
    configuration: driver=uas maxpower=896mA speed=10000Mbit/s
  *-disk
       description: SCSI Disk
       product: THNSN5256GPUK NV
       vendor: NVMe
       physical id: 0.0.0
       bus info: scsi@0:0.0.0
       logical name: /dev/sda
       version: 4103
       serial: [snip]
       size: 238GiB (256GB)
       capabilities: gpt-1.00 partitioned partitioned:gpt
       configuration: ansiversion=6 guid=[snip] logicalsectorsize=512 sectorsize=512

Hi,

Thanks for reaching out to us, I am sorry this is not working as expected and I am happy to help!

NVMe SSDs have built-in power management and should be entierly independent of the operating system ( whether booted from this drive or connected as a non-boot drive ) and it should not be utilizing significant power when idle, not enough to get hot at least. When idle and not mounted ( using the safely disconnect option for example ) it should be using less power than when it is the boot drive.

This likely indicates the operating system is accessing the drive for some reason. I have not been able to replicate this behavior on my Fedora 32 system. With the drive either mounted but inactive, unmounted, or after selecting “Safely Remove Drive” where the device is logically disconnected it no getting excessively warm to the touch.

The latest builds of the SmartMonTools package support the Realtek RTL9210 chipset in this enclosure and can be downloaded here ( https://builds.smartmontools.org/ ). For example I downloaded “builds/smartmontools-linux-x86_64-static-7.2-r5076.tar.gz”, after extracting the package a new directory structure is created “usr/local/sbin/smartctl”

The SSD temperature can be read with the following ( pointing to the extracted smartctl from the package above, please note the ./ at the beginning to indicate this is being run from the current directory and not the root directory ):

sudo ./usr/sbin/smartctl -a /dev/sdn | grep -Ei "^Temperature: "

Does the drive still get excessively warm when connected and the file system is mounted, or does this only happen after the drive has been ejected from the system and the drive letter is no longer assigned?

Thank you,

Pat
Plugable Technology
support@plugable.com

Hi Pat, thanks for your reply!

Does the drive still get excessively warm when connected and the file system is mounted, or does this only happen after the drive has been ejected from the system and the drive letter is no longer assigned?

The drive begins heating from the moment it’s plugged in, and doesn’t stop heating until it’s unplugged. For example if I plug it in and leave it alone, it will heat up. If I plug it in and immediately “Power off this disk” using gnome-disks, it will also continue to heat up.

This likely indicates the operating system is accessing the drive for some reason.

This would make sense, but it’s occurring even after ejecting the drive on a fresh install of Ubuntu. I don’t have another device to test it on at the moment (only this Dell XPS 9360). I also tried disabling Thunderbolt in the BIOS and it has not made a difference. I have also tried using the USB-A cable instead with a different port, and it also does not make a difference.

I built smartmontools from src, and it prints output, then freezes for about a minute, then prints an error message. When this happens, it also “ejects” the drive (no longer available under /dev/sda), but the drive continues to heat up. The drive does not seem to be exceeding the 78 degree warning threshold though, which is a good sign.

# ./smartctl -a /dev/sda
smartctl 7.2 (build date Jul 28 2020) [x86_64-linux-5.4.0-42-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       THNSN5256GPUK NVMe TOSHIBA 256GB
Serial Number:                      [snip]
Firmware Version:                   5KDA4103
PCI Vendor/Subsystem ID:            0x1179
IEEE OUI Identifier:                [snip]
Controller ID:                      0
Number of Namespaces:               1
Namespace 1 Size/Capacity:          256,060,514,304 [256 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            [snip]
Local Time is:                      Mon Jul 27 20:54:58 2020 [snip]
Firmware Updates (0x02):            1 Slot
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x001e):     Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Warning  Comp. Temp. Threshold:     78 Celsius
Critical Comp. Temp. Threshold:     82 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     6.00W       -        -    0  0  0  0        0       0
 1 +     2.40W       -        -    1  1  1  1        0       0
 2 +     1.90W       -        -    2  2  2  2        0       0
 3 -   0.0120W       -        -    3  3  3  3     5000   25000
 4 -   0.0060W       -        -    4  4  4  4   100000   70000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         2
 1 -    4096       0         1

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        67 Celsius
Available Spare:                    100%
Available Spare Threshold:          50%
Percentage Used:                    19%
Data Units Read:                    24,274,927 [12.4 TB]
Data Units Written:                 20,339,692 [10.4 TB]
Host Read Commands:                 764,812,557
Host Write Commands:                593,173,812
Controller Busy Time:               2,439
Power Cycles:                       4,036
Power On Hours:                     9,375
Unsafe Shutdowns:                   421
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               67 Celsius

Read Error Information Log failed: Connection timed out

Edit: Here is the dmesg log if it gives you any useful information:

[ 3947.199911] usb 4-1: new SuperSpeedPlus Gen 2 USB device number 2 using xhci_hcd
[ 3947.233754] usb 4-1: New USB device found, idVendor=0bda, idProduct=9210, bcdDevice=31.00
[ 3947.233761] usb 4-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 3947.233766] usb 4-1: Product: RTL9210
[ 3947.233769] usb 4-1: Manufacturer: Realtek
[ 3947.233772] usb 4-1: SerialNumber: 012345678903
[ 3947.240217] usb 4-1: Enable of device-initiated U1 failed.
[ 3947.241930] usb 4-1: Enable of device-initiated U2 failed.
[ 3947.303202] usb 4-1: Enable of device-initiated U1 failed.
[ 3947.304837] usb 4-1: Enable of device-initiated U2 failed.
[ 3947.308115] scsi host0: uas
[ 3947.312469] scsi 0:0:0:0: Direct-Access     NVMe     THNSN5256GPUK NV 4103 PQ: 0 ANSI: 6
[ 3947.323391] sd 0:0:0:0: Attached scsi generic sg0 type 0
[ 3947.335215] sd 0:0:0:0: [sda] 500118192 512-byte logical blocks: (256 GB/238 GiB)
[ 3947.337717] sd 0:0:0:0: [sda] Write Protect is off
[ 3947.337722] sd 0:0:0:0: [sda] Mode Sense: 37 00 00 08
[ 3947.342689] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 3947.346873] sd 0:0:0:0: [sda] Optimal transfer size 33553920 bytes
[ 3947.396234]  sda: sda1 sda2 sda3
[ 3947.423610] sd 0:0:0:0: [sda] Attached SCSI disk

---- After starting smartctl ----

[ 4150.285057] sd 0:0:0:0: [sda] tag#12 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD IN
[ 4150.285061] sd 0:0:0:0: [sda] tag#12 CDB: opcode=0xe4 (vendor) e4 00 20 02 01 00 00 00 00 00 00 00 00 00 00 00
[ 4150.305057] scsi host0: uas_eh_device_reset_handler start
[ 4155.648989] xhci_hcd 0000:39:00.0: Timeout while waiting for setup device command
[ 4161.028989] xhci_hcd 0000:39:00.0: Timeout while waiting for setup device command
[ 4161.240989] usb 4-1: device not accepting address 2, error -62
[ 4166.656893] xhci_hcd 0000:39:00.0: Timeout while waiting for setup device command
[ 4166.871548] usb 4-1: Device not responding to setup address.
[ 4167.076840] usb 4-1: device not accepting address 2, error -71
[ 4167.287543] usb 4-1: Device not responding to setup address.
[ 4167.499592] usb 4-1: Device not responding to setup address.
[ 4167.704836] usb 4-1: device not accepting address 2, error -71
[ 4167.919551] usb 4-1: Device not responding to setup address.
[ 4168.131644] usb 4-1: Device not responding to setup address.
[ 4168.336814] usb 4-1: device not accepting address 2, error -71
[ 4168.352864] scsi host0: uas_eh_device_reset_handler FAILED err -19
[ 4168.352874] sd 0:0:0:0: Device offlined - not ready after error recovery
[ 4168.352940] usb 4-1: USB disconnect, device number 2
[ 4168.359604] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[ 4168.596857] sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
[ 4168.855549] usb 4-1: Device not responding to setup address.
[ 4169.067527] usb 4-1: Device not responding to setup address.
[ 4169.272797] usb 4-1: device not accepting address 3, error -71
[ 4169.483549] usb 4-1: Device not responding to setup address.
[ 4169.695582] usb 4-1: Device not responding to setup address.
[ 4169.900984] usb 4-1: device not accepting address 4, error -71
[ 4169.909092] usb usb4-port1: attempt power cycle
[ 4170.843586] usb 4-1: Device not responding to setup address.
[ 4171.055785] usb 4-1: Device not responding to setup address.

---- After unplugging device ----

[ 4330.320805] pcieport 0000:00:1c.0: AER: Corrected error received: 0000:00:1c.0
[ 4330.320821] pcieport 0000:00:1c.0: AER: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[ 4330.320833] pcieport 0000:00:1c.0: AER:   device [8086:9d10] error status/mask=00000001/00002000
[ 4330.320840] pcieport 0000:00:1c.0: AER:    [ 0] RxErr
[ 4330.370162] xhci_hcd 0000:39:00.0: Refused to change power state, currently in D3
[ 4330.370190] xhci_hcd 0000:39:00.0: BAR 0: error updating (0xd9f00000 != 0xffffffff)
[ 4330.451804] xhci_hcd 0000:39:00.0: Refused to change power state, currently in D3
[ 4330.451825] xhci_hcd 0000:39:00.0: Controller not ready at resume -19
[ 4330.451827] xhci_hcd 0000:39:00.0: PCI post-resume error -19!
[ 4330.451830] xhci_hcd 0000:39:00.0: HC died; cleaning up
[ 4330.464131] xhci_hcd 0000:39:00.0: remove, state 4
[ 4330.464135] usb usb4: USB disconnect, device number 1
[ 4330.464773] xhci_hcd 0000:39:00.0: USB bus 4 deregistered
[ 4330.464780] xhci_hcd 0000:39:00.0: remove, state 4
[ 4330.464782] usb usb3: USB disconnect, device number 1
[ 4330.464970] xhci_hcd 0000:39:00.0: Host halt failed, -19
[ 4330.464973] xhci_hcd 0000:39:00.0: Host not accessible, reset failed.
[ 4330.465189] xhci_hcd 0000:39:00.0: USB bus 3 deregistered
[ 4330.467139] pcieport 0000:02:00.0: Refused to change power state, currently in D3
[ 4330.469839] pci_bus 0000:03: busn_res: [bus 03] is released
[ 4330.469956] pci_bus 0000:04: busn_res: [bus 04-38] is released
[ 4330.470184] pci_bus 0000:39: busn_res: [bus 39] is released
[ 4330.470300] pci_bus 0000:02: busn_res: [bus 02-39] is released

Hi,

Thanks for getting back to me with these additional details!

Based on this set of lines in the dmesg output the system is unable to set the enclosure to either the U1 or U2 low power states ( defaulting to U0 full power ).

[ 3947.240217] usb 4-1: Enable of device-initiated U1 failed.
[ 3947.241930] usb 4-1: Enable of device-initiated U2 failed.
[ 3947.303202] usb 4-1: Enable of device-initiated U1 failed.
[ 3947.304837] usb 4-1: Enable of device-initiated U2 failed.

After smartctl runs the USB connection is being dropped, that is definitely not the expected behavior but could be caused by improperly formatted commands. To help rule out a compilation error could you try the SmartMonTools static linked pre-compiled binaries here ( https://1002-105252244-gh.circle-artifacts.com/0/builds/smartmontools-linux-x86_64-static-7.2-r5076.tar.gz ) and let me know if this is similarly affected?

Additionally, when booting to the external enclosure, is this on the same computer using the same USB port? Also does the external drive have Ubuntu 20.04 installed with the latest updates?

I am not 100% sure if dmesg will capture the initial USB connection details for the operating system booted from the external enclosure but we can quickly search the dmesg output for ‘U1 failed’ and ‘U2 failed’ using the following:

dmesg | grep -Ei “U[12] failed”

Running smartctl could cause the active SSD to disconnect when booted from the enclosure which has the potential to cause data loss to any files not fully written to the drive, I would recommend not running smartctl while booted from the external unless it passes successfully on the internal operating system without taking the USB enclosure offline.

Thank you,

Pat
Plugable Technology
support@plugable.com

Experiencing the same error - unable to set low power modes. Any ideas about how to resolve?

After smartctl runs the USB connection is being dropped, that is definitely not the expected behavior but could be caused by improperly formatted commands. To help rule out a compilation error could you try the SmartMonTools static linked pre-compiled binaries here ( https://1002-105252244-gh.circle-artifacts.com/0/builds/smartmontools-linux-x86_64-static-7.2-r5076.tar.gz ) and let me know if this is similarly affected?

There is no difference running the pre-built binary, it also causes the drive to disconnect.

Additionally, when booting to the external enclosure, is this on the same computer using the same USB port? Also does the external drive have Ubuntu 20.04 installed with the latest updates?

It’s the same computer and USB port (USB 3 gen 2 + thunderbolt), but running Ubuntu 18.04 with kernel 5.4.0-40-generic (as opposed to 5.4.0-42-generic on 20.04):

Linux <name> 5.4.0-40-generic #44~18.04.1-Ubuntu SMP Wed Jun 24 23:13:08 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

I am not 100% sure if dmesg will capture the initial USB connection details for the operating system booted from the external enclosure but we can quickly search the dmesg output for ‘U1 failed’ and ‘U2 failed’ using the following:

Here is the seemingly relevant dmesg output while booting from the Plugable drive:

[    1.842357] usb 4-1: new SuperSpeedPlus Gen 2 USB device number 2 using xhci_hcd
[    1.842778] nvme nvme0: missing or invalid SUBNQN field.
[    1.842801] nvme nvme0: Shutdown timeout set to 8 seconds
[    1.856570] nvme nvme0: 4/0/0 default/read/poll queues
[    1.863724]  nvme0n1: p1 p2 p3
[    1.875737] usb 4-1: New USB device found, idVendor=0bda, idProduct=9210, bcdDevice=31.00
[    1.875739] usb 4-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[    1.875741] usb 4-1: Product: RTL9210
[    1.875742] usb 4-1: Manufacturer: Realtek
[    1.875744] usb 4-1: SerialNumber: 012345678903
[    1.879250] usb 4-1: Enable of device-initiated U1 failed.
[    1.880945] usb 4-1: Enable of device-initiated U2 failed.
[    1.884029] usbcore: registered new interface driver usb-storage
[    1.886300] usb 1-1: New USB device found, idVendor=0424, idProduct=2742, bcdDevice=92.00
[    1.886302] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[    1.886303] usb 1-1: Product: USB2742
[    1.886304] usb 1-1: Manufacturer: Microchip Tech
[    1.886793] hub 1-1:1.0: USB hub found
[    1.886812] hub 1-1:1.0: 2 ports detected
[    1.962611] usb 4-1: Enable of device-initiated U1 failed.
[    1.964232] usb 4-1: Enable of device-initiated U2 failed.
[    1.966778] scsi host0: uas
[    1.966886] usbcore: registered new interface driver uas
[    1.969902] scsi 0:0:0:0: Direct-Access     NVMe     THNSN5256GPUK NV 4103 PQ: 0 ANSI: 6
[    1.980315] sd 0:0:0:0: Attached scsi generic sg0 type 0
[    1.993032] sd 0:0:0:0: [sda] 500118192 512-byte logical blocks: (256 GB/238 GiB)
[    1.995581] sd 0:0:0:0: [sda] Write Protect is off
[    1.995582] sd 0:0:0:0: [sda] Mode Sense: 37 00 00 08
[    2.000647] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    2.004904] sd 0:0:0:0: [sda] Optimal transfer size 33553920 bytes
[    2.013920] usb 2-1: new SuperSpeed Gen 1 USB device number 2 using xhci_hcd
[    2.034320] usb 2-1: New USB device found, idVendor=0424, idProduct=5742, bcdDevice=92.00
[    2.034322] usb 2-1: New USB device strings: Mfr=2, Product=3, SerialNumber=0
[    2.034323] usb 2-1: Product: USB5742
[    2.034324] usb 2-1: Manufacturer: Microchip Tech
[    2.035141] hub 2-1:1.0: USB hub found
[    2.035317] hub 2-1:1.0: 2 ports detected
[    2.045898]  sda: sda1 sda2 sda3
[    2.073437] sd 0:0:0:0: [sda] Attached SCSI disk
[    2.161878] usb 1-3: new full-speed USB device number 3 using xhci_hcd
[    2.311803] usb 1-3: New USB device found, idVendor=8087, idProduct=0029, bcdDevice= 0.01
[    2.311804] usb 1-3: New USB device strings: Mfr=0, Product=0, SerialNumber=0

In this case, even though there are “Enable of device-initiated U1 failed” errors, the drive does not heat up like it does when not booting from it.

Hi Gamma,

Thank you for the additional details, based this I don’t think the U1/U2 power modes are causing the drive heat related issues. I have left one drive connected to my Fedora 32 system overnight without the file system mounted and the drive maintained the same idle temperature throughout connection.

My next step is to start up a test computer with Ubuntu 20.04 installed to see if I can replicate similar results. For reference, could you let me know the manufacturer name and model number of the computer you are using and I will try to find the closest match in my available test computers?

Thank you,

Hi Hellloooooo,

Thank you for contacting us as well, could you also let me know if you are using Ubuntu 20.04 and let me know the manufacturer name and model number of the computer?

Thank you both for these additional details, I will try to get a similar system setup with Ubuntu 20.04 to test and see if this is a specific issue related to Ubuntu 20.04, Ubuntu 20.04 and the hardware, or something else entirely.

Pat
Plugable Technologies
support@plugable.com

Hi Pat, really appreciate the help, especially for Linux :slight_smile:

I’m using the Dell XPS 9360 (only hardware changes were replacing the SSD and wireless card):

https://topics-cdn.dell.com/pdf/xps-13-9360-laptop_setup-guide_en-us.pdf
https://wiki.archlinux.org/index.php/Dell_XPS_13_(9360) — ubuntu works out of the box on this laptop, but there are some device details here