Resiliency - Configuration Utility - Resiliency Configuration

The MOVEit DMZ Config utility (as of DMZ 6.0) contains links to the older DMZ Config utility to manage Resiliency. Formerly, one could navigate to these settings from the Resiliency Tab.

dmzconfig2_status.png (12000 bytes)

Resiliency Tab

The older DMZ configuration utility is used only for the tab called "Resiliency". All options on this tab will be grayed out when Resiliency features are not enabled. The Resiliency tab itself contains settings that are entered during installation and are rarely changed:

configutil_resiliency.gif (8605 bytes)

Resiliency Timings

Choosing the Timings button presents a dialog which allows you to choose how quickly the system will fail over when a problem is detected:

configutil_resiliency_timings.gif (6719 bytes)

Choose 30 seconds, 1 minute, or 2 minutes. Shorter failover periods allow your site to be down for a shorter period of time if there is a failure. However, shorter failover timings may result in a failover even if the problem is transient.

Advanced Resiliency Tasks Dialog

Choose the Advanced button to launch the Advanced Resiliency Tasks window. This window displays the current status of the Resilient MOVEit DMZ system and exposes controls which affect its behavior and sometimes repair.

The Advanced Resiliency Tasks dialog allows you to monitor and control resiliency operation. You may wish to leave this dialog open, as it allows you to view in real-time the status of the various nodes.

configutil_resiliencyadvanced.gif (12214 bytes)

Hint: Use the following command in a shortcut to launch the Configuration Utility and go directly to the Advanced Resiliency Tasks tab:

"C:\Program Files\MOVEit\midmzcfg.exe resil"

Controlling Resiliency

In normal use, MOVEit DMZ Resiliency will automatically deal with node outages. It will change database settings, start and stop services, and perform other resiliency-related tasks as needed. However, under some unusual circumstances, operator intervention is required. The Advanced Resiliency Tasks dialog allows an operator to change settings and take actions that are normally handled automatically.

Buttons in this dialog should be used with care, as misuse may result in loss of data. All buttons (except the stop / start services buttons) present an "Are you sure?" dialog before performing any dangerous action. The "Are you sure?" dialogs explain the actions that will be taken if the user chooses Yes.

Hint: The three most common procedures performed using these controls are software upgrades, database resynchronization and database node switching. Each of these procedures is carefully detailed on its own page; simply pressing the buttons on this dialog to "see what they do" could take down ALL nodes of your resilient system!

The following buttons are available:

Switch DB Node to Master

Switch DB Node to Slave

Propagate Data

Monitoring Resiliency

The Status portion of the Advanced Resiliency Tasks dialog shows the status of each MOVEit server and each database server. If, as is normally the case, a given computer is both a MOVEit server and a database server, this display show one line for each of its roles.

The status display gets its information from the resiliency status file on the NAS. This file is normally named MOVEitDMZ\Resil\rsstatus.xml. During normal operation, each node updates its status every few seconds. If a node has not updated its status for a certain length of time, the resiliency system starts investigating and eventually takes corrective action. During that time, it sets appropriate status messages in the status display.

Status Message Details

SQL node 1/2 thinks it's slave/master - This message shows what each database node thinks about its priority. ("master"="primary";"slave"="secondary") If multiple nodes think they are "master" or "slave", you may be looking at a deadlock; use the "Make me Master" and "Make me DB Slave" buttons to resolve this.

SQL node 1/2 status: "OK" or an error - This message shows the reported health of a particular node's DB Resil service. "OK" is the best possible message; there are a variety of errors that may appear here instead. "SQL node" status messages are only reported by nodes 1 and 2, the database nodes.

DMZ node X status: "OK" or an error - This message shows the reported health of a particular node's DMZ (or "Web") Resil service. "OK" is the best possible message; there are a variety of errors that may appear here instead. "DMZ node" status messages are reported by all nodes in the MOVEit DMZ Resiliency cluster.

"Old:" Prefix - If all of the messages from a particular node show an "Old:" prefix, look for updating timestamps. If the timestamps update regularly but the "Old:" prefixes remain, this is often a sign that the clock on the node marked with "Old:" has drifted behind the clocks on other nodes. If the timestamps are not updated, this is often a sign that the "Old:" node is offline or cannot access the NAS. If the "SQL node" messages show "Old:" prefixes while the "DMZ node" messages do not (or visa versa), this may be a sign that the SQL or DMZ Resil service on the server may be down.

Example

Here is a sample Status display:

This DMZ node's ODBC DSN is using database node 2
Old: 2003-10-20 14:54:16.24  SQL node 1 thinks it's slave
     2003-10-21 12:36:38.01  SQL node 2 thinks it's master
Old: 2003-10-20 14:54:16.25  SQL node 1 status: All OK
     2003-10-21 12:36:38.01  SQL node 2 status: All OK
Old: 2003-10-20 14:54:15.35  DMZ node 1 status: Services are down.
     2003-10-21 12:36:43.01  DMZ node 2 status: All OK

In this example, the configuration program is being run on computer acting as both database node 2 and MOVEit node 2. This node is acting as the database master. Node 1 is down, so the messages for node 1 are old and cannot be trusted.

Controlling and Monitoring Services

In non-Resilient configurations, MOVEit DMZ's services all start automatically when a machine is rebooted. In a Resilient configuration, however, the only automatic MOVEit DMZ services are the Resiliency-specific services. Once these services have initialized and have found a sound system, they will bring up other MOVEit DMZ services such as FTP. However, there are times when you may wish to start and/or stop various MOVEit DMZ services, including the Resiliency services, by hand. (For example, most MOVEit DMZ services must be down during a database resynchronization.)

The Services portion of this dialog shows the status of each of the services related to Resiliency, and allows you to stop and start them. This duplicates the functionality of the Windows Administrative Tools / Services applet, and is provided for your convenience.

Service Comments
Web server This is the Microsoft World Wide Web Publishing Service (IIS). It should normally be up on all MOVEit nodes.
DMZ FTP This is the MOVEit DMZ FTP service, and should normally be up on all MOVEit nodes.
DMZ SSH This is the MOVEit DMZ SSH service, and should normally be up on all MOVEit nodes.
Scheduler This is the Microsoft Task Scheduler service, used to run periodic cleanup and email tasks for MOVEit DMZ. It should normally be up only on the master database node. (The dialog will prevent you from starting Scheduler on a secondary node.)
Helper This is the MOVEit DMZ Helper service, used to work with web-based certificate requests and perform other tasks that other services cannot do by themselves. This service runs on all nodes.
MySQL This is the MySQL database service, and should normally be up on all database nodes. Even though the slave node is not used by MOVEit DMZ, the database service should be running on the slave node so the slave database can be kept up-to-date.
DB Resil This is the MOVEit DMZ Database Resiliency service that monitors and controls the database server, and should be up on all database nodes. This service automatically stops and starts the Scheduler, MySQL and SysStat services as necessary; when you stop this service you will be asked if you want to take down all those services as well.
Web Resil This is the MOVEit DMZ Web Resiliency service that monitors and controls the web, FTP, and SSH servers, and should be up on all MOVEit nodes. This service automatically stops and starts the web, FTP, SSH and Helper services as necessary; when you stop this service you will be asked if you want to take down all those services as well.
SysStat This is the MOVEit SysStat service that gathers statistics on system utilization. This service runs on all nodes.
WLBS This is Windows Load Balancing Service, an optional Microsoft service that may run on web nodes. If the service is running, MOVEit DMZ resiliency automatically stops and starts it as MOVEit DMZ becomes unavailable or available. WLBS is not required; you may use an external third-party load balancer in its place.

Because the DB Resil and Web Resil services normally control the other services listed above, you should normally not have to stop or start those other services yourself.