In March, 2011, Microsoft added asymmetric storage support for Windows Server 2008 and 2008 R2 failover clustering. That means that not all nodes in the cluster have to share the same storage; some nodes can use one set of storage, and some nodes can use another. A likely scenario for this is in a multi-site cluster, where each site has its own storage array.
When adding nodes in a cluster that has asymmetric storage, you may see an error similar to “There was an error creating, configuring, or bringing online the Physical Disk resource (disk) ‘Cluster Disk 1’ “. You may see this error in several different locations, but generally you will first see this on the report screen at the end of the Add Node wizard, where it will appear similar to this (click for full-size image):
The reason for this is logical, but not obvious, nor is the solution obvious. First, and most importantly, the nodes were added successfully and there is no problem – you just can’t use the disks yet. Do not remove the nodes and go through the Add Node process again.
Explanation
Starting with Windows Server 2008, when you expose disks to the cluster, those disks are placed in a hidden cluster resource group called “Available Storage”. Any of your “Services/Applications”, like SQL Server or MS-DTC, are also resource groups, but that has all been abstracted to make the interface more friendly and less technical. Every resource group is owned by a node, and the active ownership changes when you fail between nodes. If you click on any “Service/Application” in Failover Cluster Management, the top, right pane will show the current owner.
So what happened with your installation and why did it give you an error? Imagine that you have a single cluster with three local nodes (LocalNode1, LocalNode2, LocalNode3) and two remote nodes (RemoteNode1, RemoteNode2). The local nodes use one storage array, and the remote nodes use another storage array. You initially set up the cluster with the local nodes and everything works correctly. You complete the setup by installing SQL Server, or another application, using all of the disks in Available Storage.
At that point, the “Available Storage” resource group has no disks, but the resource group is owned by one of the local nodes. Although you can’t manage Available Storage like other resource groups, behind-the-scenes, it is a resource group just like any other.
You then use the Add Node wizard to perform cluster validation, and the nodes are added successfully, except that you get the error mentioned above. The reason for this is simple:
- All unused cluster disks are placed into the Available Storage resource group
- The Available Storage resource group is owned by a local node
- The possible owners of the new disks are only the remote nodes, because it is an asymmetric storage setup, and that storage array is only available to the remote nodes, not the local nodes.
Therefore, the disks are successfully added, but they cannot be brought online by the owner of Available Storage (a local node) because it cannot be a possible owner of the remote disks. This is also indicated further in the error message screen above: Resource for ‘Cluster Disk 1’ has been created but will not be brought online because the disk is not visible from the node which currently owns the “Available Storage” group. To bring this resource online, move that group to a node that can see the disk. The possible owners list for this disk resource has been set to the nodes that can host this resource.
The solution is simple – you just need to change the current owner of Available Storage to be one of the nodes that is a possible owner of the new disks, in this case RemoteNode1 or RemoteNode2. Unfortunately, there is currently no way to do this through the Failover Cluster Management GUI, and you must resort to cluster.exe:
C:\>cluster.exe group "available storage" /moveto:RemoteNode1
You will immediately see a message saying “Moving resource group ‘available storage’…”, followed by Available Storage with a Status of “Online” on the new node. If you watch Available Storage in Failover Cluster Management, you will see the disks come online as the move happens.
While this is not initially straight-forward, it is very simple. In more complex setups, you may have unused cluster disks on several different asymmetric storage devices. In this case, you will always have at least some disks that are offline, because Available Storage can only be owned by one node at once, and that node will not have access to the asymmetric disks.
The Windows Clustering team is aware of this issue, and I fully expect this to be addressed in a future release. For now, I am thankful that I have the ability to create an asymmetric storage cluster and will deal with some of the minor issues like this.