nvme0 error

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

nvme0 error

Stefan Bethke-2
nvme0: async event occurred (type 0x1, info 0x00, page 0x02)
nvme0: device reliability degraded

Should I be concerned? I'm using this Samsung SSD as cache and log for ZFS on a 12-stable machine.

nvd0: <SAMSUNG MZVPW128HEGM-00000> NVMe namespace
nvd0: 122104MB (250069680 512 byte sectors)

# nvmecontrol logpage -p 2 nvme0
SMART/Health Information Log
============================
Critical Warning State:         0x04
 Available spare:               0
 Temperature:                   0
 Device reliability:            1
 Read only:                     0
 Volatile memory backup:        0
Temperature:                    311 K, 37.85 C, 100.13 F
Available spare:                100
Available spare threshold:      10
Percentage used:                110
Data units (512,000 byte) read: 18417596
Data units written:             164091845
Host read commands:             499986873
Host write commands:            1491808067
Controller busy time (minutes): 48315
Power cycles:                   59
Power on hours:                 20432
Unsafe shutdowns:               26
Media errors:                   0
No. error info log entries:     22
Warning Temp Composite Time:    0
Error Temp Composite Time:      0
Temperature Sensor 1:           311 K, 37.85 C, 100.13 F
Temperature Sensor 2:           330 K, 56.85 C, 134.33 F
Temperature 1 Transition Count: 0
Temperature 2 Transition Count: 0
Total Time For Temperature 1:   0
Total Time For Temperature 2:   0


Stefan

--
Stefan Bethke <[hidden email]>   Fon +49 151 14070811


signature.asc (541 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: nvme0 error

Warner Losh
On Thu, Apr 30, 2020 at 11:48 AM Stefan Bethke <[hidden email]> wrote:

> nvme0: async event occurred (type 0x1, info 0x00, page 0x02)
> nvme0: device reliability degraded
>

type 1: SMART event
info 0: reliability error
page 2: look at what's up here

1.4 standard says:
NVM subsystem Reliability: NVM subsystem reliability has been compromised.
This may be due to significant media errors, an internal error, the media
being placed in read only mode, or a volatile memory backup device failing.
This status value shall not be used if the read-only condition on the media
is due to a change in the write protection state of a namespace (refer to
section 8.19.1).

Should I be concerned? I'm using this Samsung SSD as cache and log for ZFS

> on a 12-stable machine.
>
> nvd0: <SAMSUNG MZVPW128HEGM-00000> NVMe namespace
> nvd0: 122104MB (250069680 512 byte sectors)
>
> # nvmecontrol logpage -p 2 nvme0
> SMART/Health Information Log
> ============================
> Critical Warning State:         0x04
>  Available spare:               0
>  Temperature:                   0
>  Device reliability:            1
>  Read only:                     0
>  Volatile memory backup:        0
> Temperature:                    311 K, 37.85 C, 100.13 F
> Available spare:                100
> Available spare threshold:      10
> Percentage used:                110
> Data units (512,000 byte) read: 18417596
> Data units written:             164091845
> Host read commands:             499986873
> Host write commands:            1491808067
> Controller busy time (minutes): 48315
> Power cycles:                   59
> Power on hours:                 20432
> Unsafe shutdowns:               26
> Media errors:                   0
> No. error info log entries:     22
> Warning Temp Composite Time:    0
> Error Temp Composite Time:      0
> Temperature Sensor 1:           311 K, 37.85 C, 100.13 F
> Temperature Sensor 2:           330 K, 56.85 C, 134.33 F
> Temperature 1 Transition Count: 0
> Temperature 2 Transition Count: 0
> Total Time For Temperature 1:   0
> Total Time For Temperature 2:   0
>

I'm thinking percent used 110 may be the thing it's alerting on, the
standard says:

Percentage Used: Contains a vendor specific estimate of the percentage of
NVM subsystem life used based on the actual usage and the manufacturer’s
prediction of NVM life. A value of 100 indicates that the estimated
endurance of the NVM in the NVM subsystem has been consumed, but may not
indicate an NVM subsystem failure. The value is allowed to exceed 100.
Percentages greater than 254 shall be represented as 255. This value shall
be updated once per power-on hour (when the controller is not in a sleep
state). Refer to the JEDEC JESD218A standard for SSD device life and
endurance measurement techniques.

Warner


>
> Stefan
>
> --
> Stefan Bethke <[hidden email]>   Fon +49 151 14070811
>
>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: nvme0 error

Stefan Bethke-2
Am 30.04.2020 um 20:06 schrieb Warner Losh <[hidden email]>:
>
> I'm thinking percent used 110 may be the thing it's alerting on, the standard says:

Thanks! I figured as much, but I wasn't sure how to interpret the data.

I've noticed that filesystem access appears to have slowed on that box. It's sister shows only 27 percentage used, and seems to work ust as always.


Thanks,
Stefan

--
Stefan Bethke <[hidden email]>   Fon +49 151 14070811


signature.asc (541 bytes) Download Attachment