Tag: SMART

  • Checking SSD health with ESXi 5.1

    A new feature with ESXi 5.1 is the ability to check SSD health from the command line. Once you have SSH’d into the ESXi box, you can check the drive health with the following command:

     

    esxcli storage core device smart get -d [drive]

     

    …where [drive] takes the format of: t10.ATA?????????. You can find out the right drive name by the following:

     

    ls -l /dev/disks/

     

    This will return output something like the following:

     

    mpx.vmhba32:C0:T0:L0
    mpx.vmhba32:C0:T0:L0:1
    mpx.vmhba32:C0:T0:L0:5
    mpx.vmhba32:C0:T0:L0:6
    mpx.vmhba32:C0:T0:L0:7
    mpx.vmhba32:C0:T0:L0:8
    t10.ATA_____M42DCT064M4SSD2__________________________000000001147032121AB
    t10.ATA_____M42DCT064M4SSD2__________________________000000001147032121AB:1
    t10.ATA_____M42DCT064M4SSD2__________________________0000000011470321ADA4
    t10.ATA_____M42DCT064M4SSD2__________________________0000000011470321ADA4:1

     

    Here I can use the t10.xxx names without the :1 at the end to see the two SSDs available, copying and pasting the entire line as the [drive]. The command output should look like:

     

    ~ # esxcli storage core device smart get -d t10.ATA_____M42DCT064M4SSD2__________________________000000001147032121AB
    Parameter                     Value  Threshold  Worst
    —————————-  —–  ———  —–
    Health Status                 OK     N/A        N/A
    Media Wearout Indicator       N/A    N/A        N/A
    Write Error Count             N/A    N/A        N/A
    Read Error Count              100    50         100
    Power-on Hours                100    1          100
    Power Cycle Count             100    1          100
    Reallocated Sector Count      100    10         100
    Raw Read Error Rate           100    50         100
    Drive Temperature             100    0          100
    Driver Rated Max Temperature  N/A    N/A        N/A
    Write Sectors TOT Count       100    1          100
    Read Sectors TOT Count        N/A    N/A        N/A
    Initial Bad Block Count       100    50         100

    One figure to keep an eye on is the reserved sector count – this should be around 100, and diminishes as the SSD replaces bad sectors with ones from this reservoir. The above statistics are updated every 30 minutes. As a point of interest, in this case ESXi isn’t picking up on the data correctly – the SSD doesn’t actually have exactly 100 power-on hours and 100 power cycle count.

    Assuming it works for your SSDs, this is quite a useful tool – knowing when a drive is likely to fail can give you the opportunity for early replacement and less downtime due to unexpected failures.