Nexenta

Simple Failover Hot

Plugin Details

Author Nexenta Systems, Inc
Version 2.0
License Commercial
Demo http://www.nexenta.com/demos/simple-failover-80.html

This pluggable extension introduces 'simple-failover' group of appliances. A failover group contains two or more appliances, which share certain common attributes (for instance, a so called "failover IP address") and replicated storage. The plugin integrates with Nexenta AutoCDP service, to support failover of NexentaStor volumes block-mirroredover IP network.

Please note that this plugin is installed on all appliances that are members of a 'simple-failover' group.

Definitions

Failover group contains two or more appliances. None of the appliances in the group is specificaly designated to be (what's often called) the "primary" or "active", as opposed to being "secondary" or "passive". In fact, any appliance in the group may be replaced by (or more exactly, failed over to) any other appliance in this same group.
The plugin supports up to two failover IP interfaces (defined and illustrated below). The 'simple-failover' group of appliances can be deployed with and without AutoCDP - the existence of replicated volumes is detected at failover and appliance startup time.

 

Network interface configuration

Each appliance in the group must have a spare (logical or physical) network interface, also termed "failover networking interface". Starting version 2.0, the plugin supports up to two failover interfaces. At failover time, one of those selected "failover" networking interfaces performs an important function - it gets configured with a per-group static settings:

  • failover IP
  • failover mask

The idea (and the requirement for the simple-failover configuration to work) is that the clients can access any of the appliances in the failover group via the configured failover IP and failover mask. On the picture below the concept of the failover is illustrated:

simple failover diagram

For this configuration to work, each of the designated (failover) network interfaces on the corresponding appliances - members of the group - must be programmable with this same parameters of static network interface configuration.

From creating the group to failing over

This is how it works. At group creation time you will be prompted to specify failover network interface on each appliance in the group. Secondly, and in addition, you will be asked to specify a per-group preferences {failover IP, failover mask}.

Later, at actual failover time, this static network interface configuration effectively "travels" to the failover appliance - that is, to the appliance that is used to manually perform failover, via:

nmc:/$ setup group simple-failover [group_name] failover

In other words, during the actual failover:

failed appliance => failover appliance

The previously selected failover interface of the "failover appliance" gets configured with the per-group attributes (failover IP and failover mask). Assuming - as illustrated on the picture above - that CIFS/NFS/iSCSI/etc. clients were using  (failover IP and failover mask) to access the failed appliance and data volumes behind it, the failover will remain transparent to these same clients and applications, in terms of being able to access the same data volumes.

Upon "failing over", the appliance that is used to carry out the operation takes over the networking and storage functions previously performed by the failed appliance, and the entire simple-failover group of appliances transitions to a failover state. Failover is a compound operation that is either done as a whole, or not done. Once (and if) the failed appliance is brought back to service, the failover operation can be (again, atomically) undone.

Starting from 3.0.4 version failover MAC address is assigned automatically. When one of the node in a cluster is down, the second one is getting up with the same IP and MAC address and subnet mask.
For advanced users the posibility to specify MAC address manually was left. During the proccess of creating simple failover group, run:

 nmc:/$ setup group simple-failover [group_name] failover -m

What happens after failover?

Upon failover the specified group of appliances transitions to a failover state, with the local (failover) appliance taking over networking and storage functions previously performed by the failed appliance. Failover operation is a compound transaction that can be undone (via 'undo-failover' operation), and that includes the following steps:

  • making sure that the failed appliance in the given failover group is not reachable. Note that failover will fail if the failed appliance is still up;
  • validating the failed appliance's configuration - that is, confguration of the failed appliance. Note appliances in the failover group periodically backup their respective configurations, with backup destinations being other members of the group.
  • creating rollback checkpoint. This checkpoint is later used to "undo" the effects of applying failed appliance's configuration. As such, the corresponding rollback is part of the 'undo-failover' operation.
  • configuring the specified (failover) networking interface. The corresponding parameters, as well as appliances - members of the failover group and other persistent parameters, are specified at failover group creation time.

Successful failover operation results in:

  1. Re-importing (activating) of all volumes that were block-mirrored from the failed appliance => to the local (failover) appliance. For related information, please see "Continuous Data Protection" section of the NexentaStor User Guide, or search F.A.Q. pages for "auto-cdp", or review the corresponding pluggable extension web page.
  2. Re-importing (activating) of all iSCSI based volumes accessible from both the failed appliance and the failover appliance. (Note: It is important to ensure that at any given time only one appliance member of the group accesses truly iSCSI-shared volumes (as opposed to 'auto-cdp' replicated volumes). This can be done manually. However, as of now the corresponding functionality is NOT part of the simple-failover plugin.)
  3. NexentaStor auto-services: auto-sync, auto-tier, auto-snap, auto-scrub defined on the shared and replicatd volumes (see above).
  4. Finally, restoring all NFS and CIFS shares defined on all shared and replicated volumes.

How to activate the previously failed appliance?

Caution: For NexentaStor versions 2.0 or older, please make sure to physically disconnect the failover interface prior to powering up the previously failed appliance. This requirement/limitation is removed in NexentaStor v2.1.

As far as activating the previously failed appliance, the corresponding verb is called undo-failover:

nmc:/$ setup group simple-failover [group_name] undo-failover

Both the 'failover' and 'undo-failover' operations are executed on the same local (the failed-over) appliance. For instance, if you have nodeA and nodeB, and nodeA failed, you then drive both 'failover' and 'undo-failover' operations on nodeB.

Note again that simple-failover (being simple and manually operated) does not guard against concurrent access to shared data.

It is still a group
Failover group of appliances IS a group, and therefore provides a superset of the corresponding "basic group" functionality. In particular, you could still use the generic 'switch' command, to switch Nexenta Management Console to operate in a group mode - that is, execute CLI commands on all appliances in the group.

For more information, please see NexentaStor User Guide, or the man page (-h) for the following command:
nmc:/$ switch group

 

Limitations

Shared storage is NOT supported. The simple-failover extension module does not employ SCSI PGR or any alternative mechanism to ensure that no two appliances in the cluster access to the shared storage (if available) simultaneously. For shared storage support and automated failover, please check out another NexentaStor extension module: HA Cluster.

Getting started

To install this plugin, run:

nmc:/$ setup plugin install simple-failover

This pluggable module must be installed on each appliance that is a member of a simple-failover group of appliances.

You can use management console 'show' command to view already installed plugins, and plugins that can be installed:

nmc:/$ show plugin -h

Or, you can view, install and uninstall the NexentaStor extension modules using appliance's web GUI, as shown below:

To get started, and for the man page, run:

nmc:/$ create group simple-failover -h 


 

References

Live Demo

http://www.nexenta.com/demos/simple-failover-80.html

User reviews

Average user rating from: 1 user(s)

To write a review please register or login.
Overall rating: 
 
3.7
Usability:
 
3.0   (1)
Performance:
 
5.0   (1)
Stability:
 
3.0   (1)
 
 

It is here finally...

Overall rating: 
 
3.7
Usability:
 
3.0
Performance:
 
5.0
Stability:
 
3.0
Was this review helpful to you?
Yes No
Reviewed by Erast
December 02, 2008
Top 10 Reviewer
Comments (0)
View all my reviews
Report this review
 
5 of 10 people found the following review helpful

This is what I was waiting for years to come - simple HA service, which would handle my ZFS pools. Sun Cluster is too heavy, too complex. Usability could be improved over time I guess, but I pretty much happy with it, and it creates great addition to auto-cdp (block-level replication) service.

 
 
Powered by JReviews

Free Trial, try it now!

This site is built with Open Source programs including Joomla and its components such as VirtueMart.


Terms of Use | Privacy Policy | Trademarks & Copyrights