E-Mail:
Author Avatar

Cluster Fault Tolerance Part II

The network can become a single point of failure for your cluster, meaning certain network problems can leave the cluster inaccessible to clients. To prevent the network from becoming a single point of failure, you must make it fault tolerant as well. This may require you to keep spare hardware on hand such as routers, switches, hubs, and network cards. If any piece of network hardware fails, it can be replaced as soon as the fault is detected.

How you increase fault tolerance of your cluster will depend on the network configuration. On a single subnet network, you do not have to worry about routers and switches. Configuring the cluster nodes with multiple network adapters and having spares on hand as replacements is sufficient.

However, in a multi-netted network there are additional considerations such as the physical set up of the network. If the network maintains multiple routers, there should be placed in such as way that there are at least two paths available to any given subnet. This way there is more than one route to any given network and the network is still available in the event that a router is unavailable.

For example, if the physical network consists of three networks connected by three routers, the routers should be set up to form a “loop”. If one router in the loop fails, all networks are still accessible through the remaining routers.

How does this apply to your cluster? If your cluster is placed on a network that suddenly becomes inaccessible to clients due to network hardware failure, the cluster is obviously inaccessible.

Tags: , , , , ,

What Do You Think?

 


Anti-Spam Image

Want to Start a Blog Here for Free?

Are you an expert in one subject or another? If your goal is to help others and dispense hard-earned information back to the community, stake a claim on your very own Lockergnome blog today! You can write about anything - no matter the topic. Sign-up to start blogging!