Resiliency - Technical Discussion

MOVEit DMZ Data Stores

MOVEit DMZ stores various pieces of data used in the application on one of four main locations:

On a standalone system, all four of these data stores are kept on a single machine.

ResilOneDMZ.gif (10064 bytes)

MOVEit DMZ Data Store Distribution

In a resilient system, MOVEit DMZ distributes its data stores among multiple machines. It is the primary job of MOVEit DMZ resilient software to not only detect component failure, but keep the various data stores replicated so that the overall system can survive the loss of any individual component.

ResilComponents.gif (23754 bytes)

Registry

Certain MOVEit DMZ settings are stored in the registry on each node. MOVEit DMZ Resiliency uses small files on the NAS server to help replicate registry settings from one MOVEit DMZ node to all other MOVEit DMZ nodes.

Windows FileSystem

MOVEit DMZ stores all its files, including its AES-encrypted data files, on a NAS. At first glance, this would appear to mean that the NAS is a single point of failure, but there are several ways to approach this as a risk.

Using a SAN as a NAS in MOVEit DMZ Resiliency

MOVEit DMZ Resiliency can support using a SAN to store the MOVEit DMZ AES encrypted files.

ResilComponentsSAN.gif (29937 bytes)

Using a SAN requires using an intermediate machine configured to act as a NAS interface. For example, if a configuration calls for two MOVEit DMZ resilient nodes, and a fiber SAN attachment is available, then a third box should be set up to connect to the SAN (via fiber) and to share the SAN drive with MOVEit DMZ Primary and Secondary nodes. This enables the SAN to be used as if it were a NAS device.

WARNING: The system sharing the SAN drive should be equipped with resilient features like redundant power supplies and NICs, but may not need large local or RAID hard drives because it will only be a pass-through device.

Using Windows Cluster Distributed File System Services as NAS

MOVEit DMZ Resiliency generally supports the use of Windows Cluster Distributed File System services as a NAS, but all MOVEit DMZ nodes must be members of a Windows domain in this configuration.

MySQL Database ("DB")

MOVEit DMZ Resiliency mainly uses MySQL's replication facility to keep the databases on the Primary and Secondary nodes up to date. (Other database engines are not supported by MOVEit DMZ Resiliency.) Other "application" nodes play no part in MySQL database replication. What MOVEit DMZ provides above and beyond the MySQL replication facility is the ability to detect failures, notify administrators, and switch roles of the databases on the fly.

MySQL uses the terms "Master" and "Slave" to designate which database is actually performing queries and updates requested by all active MOVEit DMZ nodes (the Master) and which database is passively receiving database updates from the Master node (the Slave). By default, the Primary node is also the Master database and the Secondary node is also the Slave database. However, during an event in which the Master database goes down the database on the Secondary will automatically be promoted to Master status. A short period of downtime may be required to switch roles back (or resync the databases) after an unexpected failure, as a failed Master sometimes will not automatically become a Slave.

Windows Certificate Store

SSL client and server certificates must be available to individual servers in the cluster. However, MOVEit DMZ Resiliency services automatically replicate this information between nodes using signal and certificate files passed through the common NAS.

Time Synchronization

MOVEit DMZ Resiliency requires all nodes to keep accurate time. To accomplish this goal, the MOVEit DMZ resiliency installation involves pointing each node to a time server.

MOVEit DMZ currently supports two different configurations for time synchronization depending on the operating system because of time service bugs in the some editions of Windows 2003.

Systems configured with Windows 2008 may use the Primary node as their time source. This "sync to Primary" configuration has the advantage keeping all nodes in sync as long as the Primary node stays up, even if outside network connectivity fails.

Any system, regardless of operating system, may use the local firewall or an external time server as their time source, and this is the configuration currently recommended. This "sync to external" configuration has the advantage of keeping all nodes in sync even if the Primary node fails. To configure your firewall to access an external time source (usually over UDP port 123), consult the "Time Service" section of the "System Configuration - Firewall Configuration" documentation.

See "Resiliency - Common Procedures - Change Time Server" for information about how to change your configured time server(s).