Cluster Fault Tolerance Part I
- 0
- Add a Comment
There are many ways to keep cluster nodes available and performing at an acceptable level for all groups and their resources. For example, by making the network fault tolerance, you can increase availability of cluster nodes. A cluster by definition is fault tolerant; however, it is only as fault tolerant as the environment and setup.
You can implement a standby node, as in the hot spare cluster model; by having a physical computer that is not being used but standing by in case one of the active nodes in the cluster has a hardware failure. A standby node is ready to assume another nodes role in the event of hardware failure. If you do not want to have a computer sitting idle, another option is to just keep spare hardware parts on hand for emergencies.
The hardware components that have on had include all the internal parts of the PC - even all expansion cards, especially the SCSI adapter. The external parts include all the SCSI or Fibre Channel cabling and connectors.
You should implement hardware level RAID for the external shared cluster disks. This will allow your cluster to continue to function even after a shared disk failure. Spare external SCSI hard disks should also be kept on hand to replace any failed SCSI disks that are used as shared cluster disks.
When it comes to individual cluster nodes, the internal hard disk that contains the operating system files should be implemented as RAID level 1 (disk mirroring) with two hard disk controllers. This will enable a single node to continue to function if one internal hard disk in the mirror fails or if the controller that manages the hard drive hosting the operating system files fails.
Tags: microsoft, diana huggins, cluster node, scsi, cluster fault tolerance, fibre channel
