Monitoring Windows software RAID
(While setting up this site in September 2020 I went looking for articles to add to it from my scattered archives. This is one that I came across that probably has little relevance any more but is perhaps an interesting bit of history.)
Microsoft’s “server class” operating systems have a useful software RAID feature that allows you to use multiple disk drives as a single volume in various ways to obtain protection from disk failure through redundancy (among other things). The problem is that there seems to be no way to know when a disk has failed except to log in and check the status manually. This means that when one drive in a redundancy group fails things carry on as they’re supposed to until the next drive fails and then you lose everything. So there’s not really much protection unless you frequently do manual checks of the status so that you can replace failed drives in a timely fashion. For a system administrator responsible for a large number of servers this is a real chore.
I devoted some effort to researching this and found that there is very little available in the way of useful solutions to this problem although there is a small number over-priced network management packages available, all laden with unwanted features. I therefore set about devising a solution of my own and soon discovered that this is easier said than done - not only do Microsoft (incredibly) not include any automatic RAID monitoring software in their “server class” operating systems, they also don’t apparently provide any easily accessible API that could be used for the purpose either. The sprawling WMI system describes such things but they seem not to be implemented. Given the supposed application of these systems I find this extraordinary and to this day I still have the uncomfortable feeling that I must be missing something glaringly obvious. However, after devoting more than enough time to trying to find a solution I decided that I had acquired enough information to be able to hack up something that would be adequate and would be more cost effective than further research.
My solution revolves around a command-line utility called diskpart.exe
. I have constructed VBScript and .BAT
wrappers for this utility which allow it to be used for automated RAID monitoring. At the time of writing I have seen this work with just one, artifically induced, RAID failure so for the time being I am continuing manual monitoring, though at a lower frequency, until I have more confidence.
My implementation of this solution is for Windows 2000. It should work for other versions of Windows that diskpart works on but the batch file may need tweaking because I have assumed the diskpart.exe
has been downloaded and installed in a directory along with the other files described here. This is what you need for Windows 2000:
diskpart.exe
- the core utility.dpscript.txt
- a simple script for the diskpart utility.dprun.bat
- a small batch file wrapper for the diskpart tool.volcheck.vbs
- the VBScript wrapper that calls dprun.bat and parses the output.
These files are described in detail below. You should assemble them all in a single directory, e.g. C:\RAIDCheck
, and then set up a scheduled task to call the VBScript wrapper on a regular and frequent basis (at least daily), like this: CScript /NoLogo "C:\RAIDCheck\volcheck.vbs"
Note that the script creates a text file from the output of the diskpart utility, which it then parses. This means that it will need write access to the directory in which all these files reside. I assume that you need administrator rights to run the diskpart utility in any case but I haven’t checked this. I would suggest using an administrator account to run the utility when setting up the scheduled task.
NB: in order to run diskpart, the built-in user SYSTEM must have access rights to some DCOM applications (I don’t know exactly which ones but they include the Logical Disk Manager). This caused me a problem where diskpart would fail with an error like DISKPART.EXE - Application Error : The instruction at "0x006a0183" referenced memory at "0x886a6b79". The memory could not be "written".
and the event log would show an access denied error from LDM. If you see this problem then run DCOMCnfg
and under the “Default Security” tab, click the “Edit Default…” button in the “Default Access Permissions” section. If the list of users and groups allowed access is non-blank and does not include the user SYSTEM then you must add it.
I suggest testing your initial installation by modifying the volcheck.vbs
script to change the word “Healthy” to something else. This should cause disk status reports to be emailed to you whenever the scheduled task runs, which verifies that everything is installed and configured properly. Once you’re happy with that, restore the script to its original state and your monitoring should be good. Ideally you should then force various failure modes of your RAID arrays to ensure that these will all be detected.
That’s it. No guarantees, as usual. Good luck!
Merlyn Kline
January 2006
diskpart.exe
This is the core utility. For Windows 2000 it needs to be downloaded from Microsoft’s web site. I found it at http://www.microsoft.com/windows2000/techinfo/reskit/tools/new/diskpart-o.asp but it may have moved since then. For simplicity I copied the executable into the directory with the other files described here but you could keep it elsewhere, in which case you will need to modify the dprun.bat file or just make sure that diskpart is in your system path when the scheduled task runs.
dpscript.exe
The diskpart utility needs a script so we don’t have to interact with it at the command line. This is just a text file containing a single line, as follows:
list volume
dprun.bat
I created this batch file wrapper for the diskpart utility because I couldn’t see how do do I/O redirection from within the VBScript wrapper and I lost patience trying to work it out. It’s very simple:
@echo off
%1
cd "%2"
diskpart /s dpscript.txt > dpout.txt
Note that it creates the file dpout.txt
in our working directory.
volcheck.vbs
This is the main script which calls the diskpart utiltity, parses the output and emails the administrators if a problem is detected. As implemented it requires access to an SMTP server to handle the email for it. You will need to set the two constants at the top of the script, one being the email address to send reports to and the other being the name of the SMTP server to use for this purpose. If you have a different email environment (e.g. MS Exchange) then you will need to change the SendEmail
subroutine. Alternatively you may wish to change it to perform some other type of notification if email is not appropriate for you.
' VBScript to check the health of drives.
'
' Merlyn Kline
' Jan 2006
'
Option Explicit
'----------------------------------------------------------------------------
' Global constants
Const strVersion = "1.0"
Const strEmail = "sysadmin@your.domain" ' Email address for reports
Const strSMTPHost = "some.smtp.server" ' The name of an SMTP server that will forward mail for you
Dim CRLF : CRLF = Chr(13) & Chr(10)
'----------------------------------------------------------------------------
' Utility subroutines
'-----------------------
' Send an email
Sub SendEmail(strTo, strSubj, strText)
Set message = CreateObject("CDO.Message")
message.subject = strSubj
message.from = strEmail & " (" & WScript.ScriptName & "v " & strVersion & " on " & ScriptComputer & ")"
message.to = strTo
message.textbody = strText
message.Configuration.Fields.Item("http://schemas.microsoft.com/cdo/configuration/sendusing") = 2
message.Configuration.Fields.Item("http://schemas.microsoft.com/cdo/configuration/smtpserver") = strSMTPHost
message.Configuration.Fields.Update
message.send
Set message = Nothing
End Sub
'-----------------------
' Get drive letter this script is on (returns blank if none is available, e.g. in a network path)
Function ScriptDrive
Dim strP, strRes
strP = WScript.ScriptFullName
If Mid(strP,2,1) = ":" Then strRes = Left(strP,1)
ScriptDrive = strRes
End Function
'-----------------------
' Get the path to directory this script is in (no drive letter or trailing \)
Function ScriptPath
Dim strRes
strRes = WScript.ScriptFullName
If Mid(strRes,2,1) = ":" Then strRes = Mid(strRes,3)
ScriptPath = Left(strRes,InStrRev(strRes,"\")-1)
End Function
'-----------------------
' Get the name of the computer this script is on
Function ScriptComputer
Dim strRes, objNet
Set objNet = WScript.CreateObject("WScript.Network")
strRes = objNet.UserDomain & "\" & objNet.ComputerName
ScriptComputer = strRes
End Function
'----------------------------------------------------------------------------
'----------------------------------------------------------------------------
'-----------------------
' Get and interpret a drive report. Return an emtpy string if no problems, otherwise return the whole report
Function CheckVolumes
Dim strRes, strCMD, objShell, iStatus, fso, strPath, textfile, bStarted, strStatus, strReport, bError
Set objShell = WScript.CreateObject("WScript.Shell")
strCMD = """" & ScriptDrive & ":" & ScriptPath & "\dprun.bat"" " & ScriptDrive & ": """ & ScriptPath & """"
iStatus = objShell.Run(strCMD,0,true)
If iStatus<>0 Then
strRes = "ERROR: diskpart.exe returned error code " & iStatus
Else
Set fso = CreateObject("Scripting.FileSystemObject")
strPath = ScriptDrive & ":" & ScriptPath & "\dpout.txt"
If Not fso.FileExists(strPath) Then
strRes = "diskpart output file '" & strPath & "' not found"
Else
Set textfile = fso.OpenTextFile(strPath ,1)
bStarted = False
bError = True
strReport = ""
Do While (Not textfile.AtEndOfStream)
strLine = textfile.ReadLine
strReport = strReport & strLine & CRLF
If bStarted Then
If strLine<>"" Then
strStatus = Mid(strLine,61,9)
If strStatus<>"Healthy " And strStatus<>" " Then bError = 1
End If
Else
If strLine = " ---------- --- ----------- ----- ---------- ------- --------- --------" Then bStarted = True : bError = False
End If
Loop
If bError Then strRes = strReport
If Not bStarted Then strRes = "**UNRECOGNISED REPORT FORMAT**" & CRLF & CRLF & strRes
End If
End If
CheckVolumes = strRes
End Function
Dim strReport
strReport = CheckVolumes
If strReport <> "" Then SendEmail strEmail, ScriptComputer & " drive PROBLEM report", strReport