How to Check Hard Disk Health Without Downtime
Especially if you are using a Windows Server in a production environment, you would definitely want to check the health of your hard disks without downtime. After all, most servers are “always-on” with only few opportunities to take a server offline; however, there are several ways you can check on your hard drives without having to shut down the server or interrupt its services.
Step 1: Check event viewer logs for disk warnings
If there are serious disk issues simmering, many times you will find disk warnings in the event viewer logs. Unfortunately the term ‘warning’ was not chosen carefully, they should have named them ‘errors’, since in our experience a logged disk ‘warning’ is really nearly always indicating a hardware fault, such as a bad sector. Note that many disk errors, as well as bad sectors, don’t show up in Windows until the disk has exhausted its ability to self repair. Ideally you would want to catch a failing drive long before that, and below we have a method of doing exactly that. Before we get to internal disk checks, there are more common issues, such as file system corruption.
Step 2: Check for file system corruption and inconsistencies
The good old chkdsk command in Windows was brought to Windows from MS-DOS, and that for a good reason. It’s a very important tool. The file system may become corrupt for many different reasons, the most common one is a sudden loss of power, blue screens, or driver bugs. To check for file system consistency without downtime, use:
chkdsk C:
Without options. If you suspect the disk may have bad sectors, use the following command, but it will require the volume to be taken offline. In the case of the system boot disk, it will require a scan before Windows completes booting:
chkdsk D: /b
The /b parameter tells chkdsk to scan and test every single disk sector. Note that the disk may report no errors even if there were bad sectors, because the disk uses self repair mechanisms when you actualize sectors. By scanning every single sector, the disk is forced to check on each sector, which is normally not done. Therefore, even a successful /b scan with no bad sectors reported may hide the fact that some sectors were bad and were safely replaced by the drive internally. Modern drives ship with extra space to accommodate a certain number of bad sectors. When these are all used up, however, the drive will start reporting the issue to the operating system in the form of a read or write error; hence, the need to check the Event Viewer logs.
Step 3: Check internal hard drive error reports
A simple way to check all drives is to run this command:
wmic diskdrive get status
But you will likely find that it’s a little too primitive as it only shows “OK” for each drive listed without any additional information. A much more detailed report can be obtained by using the disk’s SMART mechanism.
Which hard drive is it?
Before we dig further, we need to know where (on which disk) each partition is stored. Using diskpart.exe, or if you have a full user interface, Windows Disk Management, you can check on each drive and see which partitions are stored on which drives. Ideally you will have tagged each disk in your server with the serial number so you know which one to pull out if need be.
Here’s an example. We run diskpart and then select disk 0, then lookup the details:
DISKPART> list disk Disk ### Status Size Free Dyn Gpt -------- ------------- ------- ------- --- --- Disk 0 Online 3726 GB 0 B * Disk 1 Online 447 GB 0 B Disk 2 Online 238 GB 1024 KB * Disk 3 Online 2794 GB 0 B * Disk 4 Online 7452 GB 1024 KB * Disk 5 Online 7452 GB 0 B * DISKPART> select disk 0 Disk 0 is now the selected disk. DISKPART> detail disk TOSHIBA MG03ACA400 ATA Device Disk ID: {0820091B-E651-405F-8CF8-F87426A34014} Type : SATA Status : Online Path : 0 Target : 0 LUN ID : 0 Location Path : PCIROOT(0)#PCI(1100)#ATA(C00T00L00) Current Read-only State : No Read-only : No Boot Disk : No Pagefile Disk : No Hibernation File Disk : No Crashdump Disk : No Clustered Disk : No Volume ### Ltr Label Fs Type Size Status Info ---------- --- ----------- ----- ---------- ------- --------- -------- Volume 0 X Toshiba4TB NTFS Partition 3725 GB Healthy
The drive letter X and volume label show up in the disk details so we know which partition is on the disk. What is not so good about diskpart is that it didn’t tell us the disk’s serial number. If you happen to have many “TOSHIBA MG03ACA400” in your server, which is common practice especially for RAID setups, then it’s going to be difficult to narrow down the affected drive.
That’s why we typically recommend using the Disk Information screen in BackupChain instead to obtain all the relevant disk information on one screen, including the disk’s serial number.
Working with smartmontools
A useful and free tool for this purpose is smartmontools and it also works on Core and Hyper-V installations of Windows Server without a user interface, straight from the command line, without any dependencies. It’s a simple install, for help simply issue this command:
smartctl.exe -h
Below is a sample report for hard disk drive #9. The parameter is sdj because it’s the 10th drive, counting from 0. The numbers and letters are as follows:
abcdefghij 0123456789
Note the numbering starts with 0 in Windows, hence disk #0 is sda. To get the report for drive #9, we use device sdj with parameter /dev/sdj as shown below.
In bold we have highlighted the information that is most useful for a quick look at a drive’s health:
C:\Program Files\smartmontools\bin>smartctl.exe -a /dev/sdj smartctl 6.6 2017-11-05 r4594 [x86_64-w64-mingw32-2016-1607] (sf-6.6-1) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Device Model: ST14000NM001G-2KJ103 Serial Number: ZL2H**** LU WWN Device Id: 5 000c50 0db6cb3f3 Firmware Version: SN03 User Capacity: 14,000,519,643,136 bytes [14.0 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 7200 rpm Form Factor: 3.5 inches Device is: Not in smartctl database [for details use: -P showall] ATA Version is: ACS-4 (minor revision not indicated) SATA Version is: SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Tue Sep 07 14:51:45 2021 PDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 567) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: (1234) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x70bd) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 082 064 044 Pre-fail Always - 160452326 3 Spin_Up_Time 0x0003 091 090 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 15 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 075 060 045 Pre-fail Always - 33400174 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 532 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 15 18 Unknown_Attribute 0x000b 100 100 050 Pre-fail Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 067 045 040 Old_age Always - 33 (Min/Max 33/34) 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 0 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 176 194 Temperature_Celsius 0x0022 033 040 000 Old_age Always - 33 (0 22 0 0 0) 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0023 100 100 001 Pre-fail Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 268 (195 211 0) 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 74804549797 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 2556893914 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 198 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
If the drive already logged errors at some point in the past, they will be listed in the above report. At that point, it’s best to replace the drive long before it fails.
Step 4: Run Internal Disk Tests Without Downtime
Unlike chkdsk, the drive’s internal firmware allows for short and long tests to be run while the disk is in use. To run a test use these commands:
smartctl -t short /dev/sda smartctl -t long /dev/sda
The above command with parameter ‘short’, runs a very quick self test without downtime. The ‘long’ parameter is a much more comprehensive test. This is also the test that the hard drive’s manufacturer asks you to run if you suspect a disk issue and want to send the drive back for a replacement. Note the above example uses /dev/sda, this means we want drive #0 tested.
If you run a long scan, it could take many hours to finish, but it won’t affect the services running on your server. To check on the scan, run this command and it will report how many percent are left and whether any errors were found:
smartctl -a /dev/sda
Note that if the disk is in a seriously bad shape the scan may cause the disk to fail while it’s being scanned. Sometimes the controller may freeze up and lose connectivity to the drive. In that case a reboot will be needed and you should proceed immediately with recovering any files that are recoverable, and replace the drive as soon as possible.
Other Hints
In the case of mechanical drives, and if you have physical access to the server you can sometimes spot a disk problem by listening. When a drive spots a bad sector, it goes into a cycle where it retries to read the sector many times over. Each time a click noise can be heard. These repeated clicks may be an indication that the disk is about to fail. It would make sense to run the above scans in that case.
Don’t Wait for Hard Drives to Fail, Prevention is Always Better
Download BackupChain today and use the fully functional trial to take all backup features for a test drive. If you suspect the drive may be failing, we don’t recommend taking disk images, as they may be placing too much mechanical stress on the drive. File-level backup is generally a better option if you suspect a drive may be failing. Apart from file server backup, disk imaging and disk cloning, BackupChain offers a wide range of backup features, such as: Virtual Machine Backup, VMware Backup, Hyper-V Backup, VirtualBox backup, and Windows Server Backup.
Backup Software Overview
The Best Backup Software in 2025 Download BackupChain®BackupChain is the all-in-one server backup software for:
Server Backup
Disk Image Backup
Drive Cloning and Disk Copy
VirtualBox Backup
VMware Backup
Image Backup
FTP Backup
Cloud Backup
File Server Backup
Virtual Machine Backup
BackupChain Server Backup Solution
Hyper-V Backup
Popular
- Best Practices for Server Backups
- NAS Backup: Buffalo, Drobo, Synology
- How to use BackupChain for Cloud and Remote
- DriveMaker: Map FTP, SFTP, S3 Sites to a Drive Letter (Freeware)
Resources
- BackupChain
- VM Backup
- V4 Articles
- Knowledge Base
- FAQ
- BackupChain (German)
- German Help Pages
- BackupChain (Greek)
- BackupChain (Spanish)
- BackupChain (French)
- BackupChain (Dutch)
- BackupChain (Italian)
- Backup.education
- Sitemap
- BackupChain is an all-in-one, reliable backup solution for Windows and Hyper-V that is more affordable than Veeam, Acronis, and Altaro.
Other Backup How-To Guides
- How to Rollout Windows Updates Without Internet Connection
- Convert VHD files to VHDX, VMDK, VDI, and Physical Disk
- All Fixes for: The driver detected a controller error on \Device\Harddisk2\DR2
- Easy and Secure FTPS Server
- How Set up a Windows Disk Imaging Backup Task
- Restoring a Hyper-V Virtual Machine
- How to Delete All VSS Shadows and Orphaned Shadows
- How to Turn Off Windows 10 Upgrade Notification and Icon
- FTP Server TCP/IP Port Exhaustion Prevention
- Why Hard Drives Fail, Crash, Corrupt, and Click
- Best Practices for Server Backup, Data Backup, and Hyper-V Backup
- Backup Verification and Validation: Use Self Validating Backups
- Differential Backup
- How to Delete VSS Shadows You Can’t Delete: outside of your allowed context…
- BackupChain Backup Software Rewards for MSPs and Users
- Low-Cost Backup Software and Cloud Storage Solution
- Deduplication of Virtual Machine Backups in Hyper-V and VMware
- Server 2012 Pulled into Saved State With Server 2008 R2 Host
- How to Fix Event 1135 FailoverClustering: Cluster node was removed
- Why a Hyper-V Checkpoint Isn’t a Backup